Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Ontology-based syntactic analysis of domain-specific texts

https://doi.org/10.15514/ISPRAS-2021-33(4)-8

Abstract

The paper compares three methods for parsing of patients’ chief complaints extracted from electronic medical cards. We propose two methods which are based on usage of an ontology: either as a method for correction of mistake made by a parser, or for constructing syntactical dependencies according to this ontology and a limited set of rules of syntactical governance. As a control test, we use existing natural text parsing libraries. The paper demonstrates that such a simple approach could achieve a high accuracy, which is comparable to modern parsers.

About the Authors

Boris Israelevich GELTSER
Far Eastern Federal University
Russian Federation

Doctor of Medicine, professor corresponding member of RAS, head of Department of Clinical Medicine of School of Biomedicine FEFU



Tatiana Aleksandrovna GORBACH
Institute of Automation and Control Processes, Far Eastern Branch of RA
Russian Federation

PhD in Medicine, researcher at IACP FEB RAS



Valeriya Victorovna GRIBOVA
Institute of Automation and Control Processes, Far Eastern Branch of RA
Russian Federation

Doctor of technical science, Research Deputy Director of IACP FEB RAS



Olesya Vladimirovna KARPIK
Keldysh Institute of Applied Mathematics
Russian Federation

Junior researcher 



Eduard Stanislavovich KLYSHINSKIY
HSE University
Russian Federation

PhD in Computer Science, associated professor at School of Linguistics at NRU HSE



Natalia Aleksandrovna KOCHETKOVA
HSE University
Russian Federation

PhD student 



Dmitry Borisovich OKUN
Institute of Automation and Control Processes, Far Eastern Branch of RAS
Russian Federation

PhD in Medicine, researcher 



Margaret Vyacheslavovna PETRYAEVA
Institute of Automation and Control Processes, Far Eastern Branch of RAS
Russian Federation

PhD in Medicine, researcher 



Carina Iosifovna SHAKHGELDYAN
Vladivostok State University of Economics and Service
Russian Federation

Doctor of technical science, professor, director of Institute of Information Technologies at VVSU



References

1. Nugmanov R., Alimova I., Tutubalina E. Adverse drug reactions identification in social media posts and electronic health records with neural networks. European Journal of Clinical Investigation, vol.49, 2019, pp. 116-117.

2. Chapman W.W., Gundlapalli A.V. et al. Natural Language Processing for Biosurveillance. In Infectious Disease Informatics and Biosurveillance: Research, Systems and Case Studies. Springer, 2011. pp. 279-310.

3. Straka M., Straková J., Hajic J. UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging. In Proc. of the 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2019. pp. 95-103.

4. Astudillo R.F., Ballesteros M. et al. Transition-based Parsing with Stack-Transformers. arXiv:2010.10669, 2020.

5. Wang Y., Lee H.-Y., Chen Y.-N. Tree Transformer: Integrating Tree Structures into Self-Attention. In Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. pp. 1061-1070.

6. Abney S. P. Parsing By Chunks. Studies in Linguistics and Philosophy, vol. 44, 1991. pp. 19-33.

7. Molina A., Pla F. Shallow Parsing using Specialized HMMs. Journal of Machine Learning Research, vol. 2. 2002. pp. 595-613.

8. Sha F., Pereira F.C. Shallow Parsing with Conditional Random Fields. In Proc. of 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2003. pp. 134-141.

9. Кобзарева Т.Ю., Лахути Д.Г., Ножов И.М. Модель сегментации русского предложения. Труды Международного семинара Диалог 2001, 2001 г., стр. 185–194 / Kobzareva T.Yu., Lakhuti D.G., Nozhov I.M. Segmentation model of the Russian sentence. In Proc. of the International Seminar Dialogue 2001, 2001, pp. 185-194 (in Russoan).

10. Sowa J.F. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks Cole Publishing, 2000, 594 p.

11. Лукашевич Н.В. Тезаурусы в задачах информационного поиска. Изд-во Московского университета, 2011 г., 512 стр. / Lukashevich N.V. Thesauri in information retrieval problems. Publishing house of Moscow State University, 2011, 512 p. (in Russian).

12. Current Bibliographies in Medicine. URL: https://www.nlm.nih.gov/archive/20040831/pubs/cbm/umlscbm.html.

13. Aronson A.R., Lang F.-M. An overview of MetaMap: historical perspective and recent Advances. Journal of the American Medical Informatics Association, 2010, vol. 17, issue 3, pp. 229-236.

14. Valdez J. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper). Lecture Notes in Computer Science, vol. 10033, 2016, pp. 699-708.

15. MSHRUS (MeSH Russian) – Statistics. URL: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MSHRUS/stats.html

16. Shelmanov A. O., Smirnov I. V., Vishneva E.A. Information Extraction from Clinical Texts in Russian. In Proc. of the International Conference on Computational Linguistics and Intellectual Technologies (Dialog-2015), 2015, pp. 560–572.

17. Грибова В.В., Москаленко Ф.М. и др. Концепция гетерогенного хранилища биомедицинской информации. Информационные технологии, том 27, no. 2, 2019 г., стр. 97-106 / Gribova V.V., Moskalenko Ph.M. et al. A Concept for a Heterogeneous Biomedical Information Warehouse. Information technologies, vol. 25, no. 2, 2019, pp. 97-106 (in Russian).

18. spaCy: What’s New in v3.0. URL: https://spacy.io/usage/v3.

19. Nivre J., de Marneffe M.-C. et al. Universal Dependencies v1: A Multilingual Treebank Collection. In Proc. of the 10th International Conference on Language Resources and Evaluation (LREC 2016), 2016, pp. 1659-1666.

20. Апресян Ю.Д. Избранные труды, т. I. Лексическая семантика: 2 изд. М, Школа, 1995, 472 стр. / Apresyan Yu.D. Selected works, vol. I. Lexical semantics: 2nd ed. M, Shkola, 1995, 472 p. (in Russian).

21. Клышинский Э.С. Степень свободы русского синтаксиса несколько преувеличена. Сборник трудов 20-го научно-практического семинара «Новые информационные технологии в автоматизированных системах», 2017 г., стр. 112-116 / Klyshinskiy E.S. The degree of freedom of Russian syntax is somewhat exaggerated. In Proc. of the 20th Scientific-Practical Seminar on New Information Technologies in Automated Systems, 2017, pp. 112-116 (in Russian).

22. Клышинский Э.С., Логачева В.К. и др. Количественная оценка грамматической неоднозначности некоторых европейских языков. Вестник НГУ. Серия: Лингвистика и межкультурная коммуникация, 2020, том 18, вып. 1, стр. 5-21 / Klyshinskiy E.S. Logacheva V.K. et al. Quantitative Estimation of Grammatical Ambiguity: Case of European Languages. NSU Vestnik. Series: Linguistics and Intercultural Communication, vol. 18. issue 1, 2020, pp. 5-21 (in Russian).

23. Nivre J., Fang C.-T. Universal Dependency Evaluation. In Proc. of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017), 2017, p. 86-95.

24. Захаров В.П., Хохлова М.В. Анализ эффективности статистических методов выявления коллокаций в текстах на русском языке. Труды международной конференции Диалог-2010, 2010 г., стр. 136-143 / Zakharov V.P., Khokhlova M.V. Study of effectiveness of statistical measures for collocation extraction on Russian texts. In Proc. of the International Conference Dialogue 2010, 2010, стр. 136-143 (in Russian).

25. Fellbaum C. (ed.) WordNet: An Electronic Lexical Database. MIT Press, 1998, 449 p.

26. Лукашевич Н.В., Лашевич Г. и др. Порождение тезауруса типа WordNet для русского языка. Труды Пятнадцатой национальной конференция по искусственному интеллекту с международным участием (КИИ-2016), 2016 г., стр. 89-97 / Loukachevitch N.V., Lashevich G. et al. Generating russian wordmet. In Proc, of the Fifteenth National Conference on Artificial Intelligence with International Participation (CAI 2016), 2016, pp. 89-97 (in Russian).

27. Большакова Е.И., Васильева Н.Э., Морозов С.С. Лексико-синтаксические шаблоны для автоматического анализа научно-технических текстов. Труды Десятой национальной конференция по искусственному интеллекту с международным участием (КИИ-2006), 2006 г., стр. 506-524 / Bolshakova E.I., Vasilieva N.E., Morozov S.S. Lexicosyntactic patterns for automatic text processing. In Proc, of the Tenth National Conference on Artificial Intelligence with International Participation (CAI 2006), 2006, pp. 506-524 (in Russian).

28. Большакова Е.И., Баева Н.В. и др. Лексико-синтаксические шаблоны в задачах автоматической обработки текстов. Труды международной конференции Диалог-2007, 2007 г., стр. 70-75 / Bolshakova E.I., Baeva N.V. et al. Lexicosyntactic patterns for automatic text processing. In Proc. of the International Conference Dialogue 2007, 2007, pp. 70-75 (in Russian).


Review

For citations:


GELTSER B.I., GORBACH T.A., GRIBOVA V.V., KARPIK O.V., KLYSHINSKIY E.S., KOCHETKOVA N.A., OKUN D.B., PETRYAEVA M.V., SHAKHGELDYAN C.I. Ontology-based syntactic analysis of domain-specific texts. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(4):99-116. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(4)-8



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)