Natural Language Processing Algorithms for Understanding the Semantics of Text

Darkhan Orakbayevich ZHAXYBAYEV; Gulbarshyn Nurlanovna MIZAMOVA

doi:10.15514/ISPRAS-2022-34(1)-10

Natural Language Processing Algorithms for Understanding the Semantics of Text

Darkhan Orakbayevich ZHAXYBAYEV, Gulbarshyn Nurlanovna MIZAMOVA

https://doi.org/10.15514/ISPRAS-2022-34(1)-10

Full Text:

PDF (Rus)

Generate QR code

Abstract

Vector representation of words is used for various tasks of automatic processing of natural language. Many methods exist for the vector representation of words, including methods of neural networks Word2Vec and GloVe, as well as the classical method of latent semantic analysis LSA. The purpose of this paper is to investigate the effectiveness of using network vector methods LSTM for non-classical pitch classification in Russian and English texts. The characteristics of vector methods of word classification (LSA, Word2Vec, GloVe) are described, the architecture of neural network classifier based on LSTM is described and vector methods of word classification are weighted, the results of experiments, computational tools and their discussion are presented. The best model for vector word representation is Word2Vec model given the training speed, smaller word corpus size for training, greater accuracy and training speed of neural network classifier.

Keywords

test processing, keywords, selection procedure, word vector

About the Authors

Darkhan Orakbayevich ZHAXYBAYEV

West Kazakhstan Agrarian and Technical University named after Zhangir Khan
Kazakhstan

Master of Pedagogical Sciences, Lecturer of the Department of Information Systems

Gulbarshyn Nurlanovna MIZAMOVA

West Kazakhstan Agrarian and Technical University named after Zhangir Khan
Kazakhstan

Master of Technical Sciences, Lecturer of the Department of Information

References

1. Chilakapati A. Word Bags vs Word Sequences for Text Classification. URL: https://towardsdatascience.com/word-bags-vs-word-sequences-for-text- classification-e0222c21d2ec, accessed 01.02.2022.

2. Brownlee J. How to One Hot Encode Sequence Data in Python. URL: https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python accessed 05.02.2022.

3. Le Q., Mikolov T. Distributed Representations of Sentences and Documents. In Proc. of the 31st International Conference on Machine Learning, 2014, pp. 1188-1196.

4. Mikolov T., Chen K. et al. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781, 2013, 12p.

5. Pennington J., Socher R., Manning C. GloVe: Global Vectors for Word Representation. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532-1543.

6. Landauer T.K., Foltz P.W., Laham D. Introduction to Latent Semantic Analysis. Discourse Processes, vol. 25, issue 2-3, 1998, pp. 259-284.

7. Altszyler E., Sigman M., Slezak D.F. Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. arXiv preprint arXiv:1610.01520, 14 p.

Review

For citations:

ZHAXYBAYEV D.O., MIZAMOVA G.N. Natural Language Processing Algorithms for Understanding the Semantics of Text. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(1):141-150. (In Russ.) https://doi.org/10.15514/ISPRAS-2022-34(1)-10

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Natural Language Processing Algorithms for Understanding the Semantics of Text

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy