Clarifying Knowledge about Early Contacts of Native Speakers of the Proto-Finno-Volgaic Language Using Neural Networks
https://doi.org/10.15514/ISPRAS-2025-37(6)-42
Abstract
The article explores the potential of artificial intelligence for discovering new etymologies. It consists of two parts: the first describes the structure of the neural network, while the second provides examples of new types of etymologies, including Erzya additions to existing well-known etymologies, separate Finnic-Erzya parallels, and new hypotheses regarding borrowings from Baltic and Germanic languages. The purpose is to demonstrate the kinds of new etymologies that can be proposed within a relatively short time frame for languages with an established etymological tradition through the use of a neural network. The study utilizes a Finnish-Russian dictionary containing 17,212 lexemes and an Erzya-Russian dictionary comprising 8,512 lexemes, both hosted on the LingvoDoc platform. A neural network capable of proposing new etymologies for dictionaries on the lingvodoc.ispras.ru platform has been developed. Using this tool, Finnish and Erzya dictionaries were processed, resulting in the identification of over 100 new etymologies. Among these, 16 etymologies are discussed in the article, pertaining both to native Finno-Ugric vocabulary and borrowings.
About the Authors
Yulia Viktorovna NORMANSKAYARussian Federation
Dr. Sci. (Philology), Chief Researcher, Head of the Laboratory “Linguistic Platforms” at Ivannikov Institute for System Programming of the Russian Academy of Sciences; Leading Researcher, of the Department of the Ural-Altaiс Languages at the Institute of Linguistics of the Russian Academy of Sciences.
Oxana Vladimirovna GONCHAROVA
Russian Federation
Cand. Sci. (Philology), Senior Researcher of the Laboratory “Linguistic Platforms” at Ivannikov Institute for System Programming of the Russian Academy of Sciences.
References
1. Bergsma Sh., Kondrak G. Alignment-based discriminative string similarity. In Proc. ACL. 2007
2. Johann-Mattis List. 2012. LexStat: Automatic Detection of Cognates in Multilingual Wordlists // Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, pages 117–125, Avignon, France. Association for Computational Linguistics.
3. Jäger G., List J.-M., Sofroniev P. Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists // Conference: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, 2017, Long Papers.
4. Mitkov R., Pekar V., Blagoev, D. et al. Methods for extracting and classifying pairs of cognates and false friends. Machine Translation 21, 29–53 (2007).
5. Dinu L.P., Ciobanu A.M. Building a Dataset of Multilingual Cognates for the Romanian Lexicon // Proceedings of the Ninth International Conference on Language Resources and Evaluation LREC 2014, p. 3313-3318.
6. Rama T. Siamese Convolutional Networks for Cognate Identification // Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, p. 1018–1027.
7. Fourrier C., Sagot B. Probing Multilingual Cognate Prediction Models // Findings of the Association for Computational Linguistics: ACL 2022. p. 3786-3801.
8. Dyen I., Kruskal J. B., Black P. An Indo-European classification: A lexicostatistical experiment. Transactions of the American Philosophical Society 1992, 82(5), p. 1–132.
9. Wichmann S., Holman E.W. Languages with longer words have more lexical change // Approaches to Measuring Linguistic Differences, 2013, p. 249–281.
10. Batsuren Kh., Bella G., Giunchiglia F. A large and evolving cognate database // Language Resources and Evalution, vol. 56, 2022, p. 1-25.
11. https://ukc.datascientia.eu/concept, дата обращения 26.11.2025.
12. https://datascientiafoundation.github.io/LiveLanguage/datasets/cognet/, дата обращения 26.11.2025.
13. Alreshidi H., Aldhlan K. Auto-Extracting Method of Cognates Words in Arabic and English Languages // International journal of advanced studies in Computer Science and Engineering (IJASCSE), vol. 6, issue 01, 2017.
14. Kanojia D., Bhattacharyya P., Kulkarni M., Haffari G. Challenge Dataset of Cognates and False Friend Pairs from Indian Languages // LREC 2020, p. 1-12.
15. Pulini M., List J.-M. Finding language-internal cognates in Old Chinese // Bulletin of Chinese Linguistics 2024, 17(1), p. 53–72.
16. https://lingvodoc.ispras.ru/, дата обращения 26.11.2025.
17. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
18. Tompson, J., Jain, A., LeCun, Y., & Bregler, C. (2015). Efficient Object Localization Using Convolutional Networks. Proceedings of CVPR. https://arxiv.org/pdf/1411.4280Ошибка! Недопустимый объект гиперссылки.
19. Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.
20. Loshchilov, I., & Hutter, F. (2019). Decoupled Weight Decay Regularization. ICLR 2019.
21. https://lingvodoc.ispras.ru/dictionary/11457/163277/perspective/11457/163278/view, дата обращения 26.11.2025.
22. https://lingvodoc.ispras.ru/dictionary/11459/62466/perspective/11459/62467/view, дата обращения 26.11.2025.
23. Suomen sanojen alkuperä, ed. by Forsberg U.-M., Itkonen E. Helsinki, 1992-2000.
24. Erzya-Russian dictionary: 27000 lexems, ed. By Serebrennikova B. A. , Buzakovoj R. N., Mosina M. V. M., 1993.
25. Uralisches etymologisches Wörterbuch, ed. by K.Rédei. Budapest, 1986 – 1988.
26. Mesarosh E. Verb-forming suffixes in the Erzya language. Studia Uralo-Altaica 42. Seged, 1999.
27. Normanskaya Ju.V. Reconstruction of the Proto-Uralic paradigmatic stress and its influence on the development of the vocalism system. M., 2018.
28. Paasonen H. Mordwinisches Wörterbuch. II Band. Helsinki, 1992.
29. Lytkin V.I., Gulyaev V.G. A short etymological dictionary of the Komi language. Moscow, 1970.
30. Ariskina T.P. Suffixed nouns in the Erzya language: semantics and functioning // Vestnik ugrovedeniya № 2(8), 2018.
31. Comparative dictionary of Komi-Zyr’an dialects, complied by Zhilina T. I., Saxarova M. A., Sorvacheva V. A. Syktyvkar, 1961.
32. Etymologisches Wörterbuch des Ungarischen, ed. by Loránd Benkő. Budapest, 1993.
33. Ryabov I.N. Word-formation relations between parts of speech in the Erzya language. Saransk, 2000.
34. Vodyasova L.P. Ways of expressing grammatical meanings in the morphology of the Erzya language // Lingvistika №30, 2016 (https://sci-article.ru/stat.php?i=1455731730).
35. Etymological dictionary of Turkic language, ed. by Sevortyan E.V., t. 1, M., 1974.
36. Kalima J. Itämerensuomalaisten kielten balttilaiset lainasanat. Helsinki.
37. Napol’skix V.V. The Balto-Slavic language component in the Lower Kama region in the middle of the 1st millennium AD // Slavyanovedenie, 2006, 2, 3-19.
38. Butylov N.V. Foreign language vocabulary in Mordovian languages (Indo-European borrowings). Saransk, 2006.
39. Dictionary of Russian dialects, ed. by F.P.Filin, vol. I. Leningrad, 1965.
Review
For citations:
NORMANSKAYA Yu.V., GONCHAROVA O.V. Clarifying Knowledge about Early Contacts of Native Speakers of the Proto-Finno-Volgaic Language Using Neural Networks. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(6):149-162. https://doi.org/10.15514/ISPRAS-2025-37(6)-42






