Векторные модели на основе символьных н-грамм для морфологического анализа текстов
https://doi.org/10.15514/ISPRAS-2020-32(2)-1
Аннотация
Об авторе
Цолак Гукасович ГУКАСЯНАрмения
Аспирант кафедры системного программирования
Список литературы
1. Pinter Y., Guthrie R., Eisenstein J. Mimicking Word Embeddings using Subword RNNs. In Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 102-112.
2. Schick T., Schütze H. Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts. In Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), 2019, pp. 489-494.
3. Zhao J., Mudgal S., Liang Y. Generalizing Word Embeddings using Bag of Subwords. In Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 601-606.
4. Sasaki S., Suzuki J., Inui K. Subword-based Compact Reconstruction of Word Embeddings. In Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), 2019, pp. 3498-3508.
5. Heinzerling B., Strube M. BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages. In Proc. of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, pp. 2989-2993.
6. Zhu Y., Vulić I., Korhonen A. A Systematic Study of Leveraging Subword Information for Learning Word Representations. In Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), 2019, pp. 912-932.
7. Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, vol. 5, 2017, pp. 135–146.
8. Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov. Learning Word Vectors for 157 Languages. In Proc. of the Eleventh International Conference on Language Resources and Evaluation, 2018, pp. 3483-3487.
9. Shibata Y. et al. Byte Pair encoding: A text compression scheme that accelerates pattern matching. Technical Report DOI-TR-161, Department of Informatics, Kyushu University, 1999.
10. Pennington J., Socher R., Manning C. D. Glove: Global vectors for word representation. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1532-1543.
11. Üstün A., Kurfalı M., Can B. Characters or Morphemes: How to Represent Words? In Proc. of the Third Workshop on Representation Learning for NLP, 2018, pp. 144-153.
12. Devlin J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding In Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), 2019, pp. 4171-4186.
13. Mikolov T. et al. Advances in Pre-Training Distributed Word Representations. In Proc. of the Eleventh International Conference on Language Resources and Evaluation, 2018, pp. 52-55.
14. Zhu Y. et al. On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages. In Proc. of the 23rd Conference on Computational Natural Language Learning (CoNLL), 2019, pp. 216-226.
15. Zeman D. et al. CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies. In Proc. of the CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies, 2018, pp. 1-19.
16. Rybak P., Wróblewska A. Semi-supervised neural system for tagging, parsing and lemmatization In Proc. of the CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies, 2018, pp. 45-54.
17. Srivastava R. K., Greff K., Schmidhuber J. Highway networks. arXiv preprint arXiv:1505.00387, 2015.
18. Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proc. of the 3rd International Conference on Learning Representations, 2015, pp. 1-15.
19. Boguslavsky I. SynTagRus – a Deeply Annotated Corpus of Russian. In Les émotions dans le discours – Emotions in Discourse, Peter Lang GmbH, Internationaler Verlag der Wissenschaften, 2014, pp. 367-380.
20. Турдаков Д., Астраханцев Н., Недумов Я., Сысоев А., Андрианов И., Майоров В., Федоренко Д., Коршунов А., Кузнецов С. Texterra: инфраструктура для анализа текстов. Труды ИСП РАН, том 26, вып. 1, 2014 г., стр. 421-438 / Turdakov D., Astrakhantsev N., Nedumov Y., Sysoev A., Andrianov I., Mayorov V., Fedorenko D., Korshunov A., Kuznetsov S. Texterra: A Framework for Text Analysis. Trudy ISP RAN/Proc. ISP RAS, vol. 26, issue 1, 2014, pp. 421-438 (in Russian). DOI: 10.15514/ISPRAS-2014-26(1)-18.
21. Андрианов И.А., Майоров В.Д., Турдаков Д.Ю. Современные методы аспектно-ориентированного анализа эмоциональной окраски. Труды ИСП РАН, том 27, вып. 5, 2015 г., стр. 5-22 / Andrianov I.A., Mayorov V.D., Turdakov, D.Y. Modern approaches to aspect-based sentiment analysis. Trudy ISP RAN/ Proc. ISP RAS, vol. 27, issue 5, 2015, pp. 5-22 (in Russian). DOI: 10.15514/ISPRAS-2015-27(5)-1
Рецензия
Для цитирования:
ГУКАСЯН Ц.Г. Векторные модели на основе символьных н-грамм для морфологического анализа текстов. Труды Института системного программирования РАН. 2020;32(2):7-14. https://doi.org/10.15514/ISPRAS-2020-32(2)-1
For citation:
GHUKASYAN Ts.G. Character N-gram-Based Word Embeddings for Morphological Analysis of Texts. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2020;32(2):7-14. (In Russ.) https://doi.org/10.15514/ISPRAS-2020-32(2)-1