Generalized context-dependent graph-theoretic model of folklore and literary texts
https://doi.org/10.15514/ISPRAS-2022-34(1)-6
Abstract
One of the problems of automatic text processing is their attribution. This term is understood as the establishment of the attributes of a text work (determination of authorship, time of creation, place of recording, etc.). The article presents a generalized context-dependent graph-theoretic model designed for the analysis of folklore and literary texts. The minimal structural unit of the model (primitive) is a word. Sets of words are combined into vertices, and the same word can be related to different vertices. Edges and graph substructures reflect the lexical, syntactic and semantic links of the text. The characteristics of the model are its fuzziness, hierarchy and temporality. As examples, a hierarchical graph-theoretical model of components (on the example of literary works by A. S. Pushkin), a temporal graph-theoretic model of a fairy tale plot (on the example of Russian fairy tales by A. M. Afanasyev) and a fuzzy graph-theoretic model of «strong» connections of grammatical classes (on the example of anonymous articles from the pre-revolutionary magazines «Time», «Epoch» and the weekly «Citizen», edited by F. M. Dostoevsky). The model is built in such a way that it can be further explored using artificial intelligence methods (for example, decision trees or neural networks). For this purpose, a format for storing such data was implemented in the information system «Folklore», as well as procedures for entering, editing and analyzing texts and their graph-theoretic models.
About the Authors
Nikolai Dmitrievich MOSKINRussian Federation
Candidate of Technical Sciences, Associate Professor, Associate Professor of the Department of Probability Theory and Data Analysis
Aleksandr Aleksandrovich ROGOV
Russian Federation
Doctor of Technical Sciences, Professor, Head of the Department of Probability Theory and Data Analysis
Roman Vladimirovich VORONOV
Russian Federation
Doctor of Technical Sciences, Professor of the Department of Applied Mathematics and Cybernetics
References
1. . Afanasyev A.M. Folk Russian fairy tales by A. N. Afanasyev: in 3 volumes. Moscow, State Publishing House of Fiction (Goslitizdat), 1957. (in Russian)
2. . Bershtein L.S., Bozhenyuk A.V. The use of temporal graphs as models of complex systems. Izvestiya SFedU. Engineering Sciences, vol. 4 (105), 2010, pp. 198–203. (in Russian)
3. . Bershtein L.S., Bozhenyuk A.V. Fuzzy graphs and hypergraphs. Moscow, Scientific world, 2005, 256 p. (in Russian)
4. . Gaaze-Rapoport M.G. Search for variants in the composition of fairy tales // Zaripov R.H. Machine search for variants in modeling the creative process. Moscow, Nauka, 1983, pp. 213–223. (in Russian)
5. . Gladky A.V. Syntactic structures of natural language. Moscow, LKI, 2007, 152 p. (in Russian)
6. . Zubov A.V., Zubova I.I. Fundamentals of artificial intelligence for linguists. Moscow, University book; Logos, 2007, 320 p. (in Russian)
7. . Ilvovsky D.A., Chernyak E.L. Systems of automatic processing of texts. Open systems. DBMS, Moscow, Open Systems, vol. 1, 2014, pp. 51–53. (in Russian)
8. . Kasyanov V.N., Evstigneev V.A. Graphs in programming: processing, visualization and application. St. Petersburg, BHV-Petersburg, 2003, 1104 p. (in Russian)
9. . Moskin N.D. Graph-theoretic models of folklore texts and methods of their analysis. Petrozavodsk, PetrGU Publishing House, 2013, 148 p. (in Russian)
10. . Milov L.V., Borodkin L.I., Ivanova T.V., Neberekutina E.V., Polyanskaya I.V., Romankova N.V., Sarkisova G.I. From Nestor to Fonvizin: New methods for determining authorship. Moscow, Progress, 1994, 445 p. (in Russian)
11. . Rogov A.A., Abramov R.V., Buchneva D.D., Zakharova O.V., Kulakov K.A., Lebedev A.A., Moskin N.D., Otlivanchik A.V., Savinov E.D., Sidorov Y.V. The problem of attribution in the magazines «Time», «Epoch» and the weekly «Citizen». Petrozavodsk: Publishing house "Islands", 2021, 391 p. (in Russian)
12. . Sokolov I.A. Theory and practice of application of artificial intelligence methods. Bulletin of the Russian Academy of Sciences, vol. 89, issue 4, 2019, pp. 365–370. (in Russian)
13. . Hozyainov S.A. Attribution of publicistic works attributed to A. S. Pushkin: texts of 1830-1836. St. Petersburg, 2008, 24 p. (in Russian)
14. . Shchegoleva L.V., Lebedev A.A., Moskin N.D. Methods of data analysis in the problem of distinguishing between folklore and author's texts. Questions of linguistics, Moscow, Russian Academy of Sciences, 2020, vol. 2, pp. 61–74. (in Russian)
15. . Calle-Martin J., Miranda-Garcia A. Stylometry and Authorship Attribution: Introduction to the Special Issue. English Studies, vol. 93(3), 2012, pp. 251–258.
16. . Stamatatos E. A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, vol. 60(3), 2009, pp. 538–556.
17. . Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I. Attention is all you need. Advances in neural information processing systems, 2017, pp. 5998–6008.
18. . Zečević A. N-gram based text classification according to authorship. Proceedings of the Second Student Research Workshop associated with RANLP 2011, 2011, pp. 145–149.
19. . Zhou J., Cui G., Hu S., Zhang Z., Yang C., Liu Z., Wang L., Li C., Sun M. Graph neural networks: A review of methods and applications. AI Open, 2020, pp. 57–81.
Review
For citations:
MOSKIN N.D., ROGOV A.A., VORONOV R.V. Generalized context-dependent graph-theoretic model of folklore and literary texts. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(1):73-86. (In Russ.) https://doi.org/10.15514/ISPRAS-2022-34(1)-6