Weakly Supervised Word Sense Disambiguation Using Automatically Labelled Collections
https://doi.org/10.15514/ISPRAS-2021-33(6)-13
Abstract
State-of-the-art supervised word sense disambiguation models require large sense-tagged training sets. However, many low-resource languages, including Russian, lack such a large amount of data. To cope with the knowledge acquisition bottleneck in Russian, we first utilized the method based on the concept of monosemous relatives to automatically generate a labelled training collection. We then introduce three weakly supervised models trained on this synthetic data. Our work builds upon the bootstrapping approach: relying on this seed of tagged instances, the ensemble of the classifiers is used to label samples from unannotated corpora. Along with this method, different techniques were exploited to augment the new training examples. We show the simple bootstrapping approach based on the ensemble of weakly supervised models can already produce an improvement over the initial word sense disambiguation models.
About the Authors
Angelina Sergeevna BOLSHINARussian Federation
PhD student
Natalia Valentinovna LOUKACHEVITCH
Russian Federation
Doctor of Technical Sciences, Leading Researcher
References
1. Peters M. E., Neumann M. et al. Deep contextualized word representations. In Proc. of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, pp. 2227–2237.
2. Devlin J., Chang M.-W. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019, pp. 4171-4186.
3. Leacock C., Chodorow M., Miller G.A. Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, vol. 24, no. 1, 1998, pp. 147–165.
4. Miller G. A. WordNet: a lexical database for English. Communications of the ACM, vol. 38, no. 11, 1995, pp. 39–41.
5. Przybyła P. How big is big enough? Unsupervised word sense disambiguation using a very large corpus. arXiv preprint arXiv:1710.07960, 2017.
6. Mihalcea R., Moldovan D.I. An Iterative Approach to Word Sense Disambiguation. In Proc. of the Thirteenth International Florida Artificial Intelligence Research Symposium Conference, 2000, pp. 219-223.
7. Yuret D. KU: Word sense disambiguation by substitution. In Proc. of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007, pp. 207-214.
8. Martinez D., Agirre E., Wang X. Word relatives in context for word sense disambiguation. In Proc.of the Australasian Language Technology Workshop, 2006, pp. 42-50.
9. Taghipour K., Ng H.T. One million sense-tagged instances for word sense disambiguation and induction. In Proc. of the Nineteenth Conference on Computational Natural Language Learning, 2015, pp. 338–344.
10. Otegi A., Aranberri N. et al. QTLeap WSD/NED corpora: Semantic annotation of parallel corpora in six languages. In Proc. of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 2016, pp. 3023–3030.
11. Bovi C.D., Camacho-Collados J. et al. Eurosense: Automatic harvesting of multilingual sense annotations from parallel text. In Proc. of the 55th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), 2017, pp. 594–600.
12. Hauer B., Kondrak G. et al. Semi-Supervised and Unsupervised Sense Annotation via Translations. arXiv preprint arXiv:2106.06462, 2021.
13. Henrich V., Hinrichs E., Vodolazova T. WebCAGe–A Web-harvested corpus annotated with GermaNet senses. In Proc. of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 387-396.
14. Saif A., Omar N. et al. Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation. Procedia Computer Science, vol. 123, 2018, pp. 403-412.
15. Raganato A., Bovi C.D., Navigli R. Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia. In Proc. of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016, pp. 2894-2900.
16. Scarlini B., Pasini T., Navigli R. Just “OneSeC” for producing multilingual sense-annotated data. In Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 699-709.
17. Mihalcea R. Co-training and self-training for word sense disambiguation. In Proc. of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004), 2004, pp. 33-40.
18. Pham T.P., Ng H.T., Lee W.S. Word sense disambiguation with semi-supervised learning. Lecture Notes in Computer Science, vol. 3406, 2005, pp. 238-241.
19. Khapra M. M., Joshi S. et al. Together we can: Bilingual bootstrapping for WSD. In Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011, pp. 561-569.
20. Lison P., Barnes J., Hubin A. skweak: Weak supervision made easy for NLP. In Proc. of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, 2021, pp. 337-346.
21. Lin Y., Shen S. et al. Neural relation extraction with selective attention over instances. In Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), 2016, pp. 2124-2133.
22. Li Z., Hu F. et al. Selective kernel networks for weakly supervised relation extraction. CAAI Transactions on Intelligence Technology, vol. 6, no. 2, 2021, pp. 224-234.
23. Le P., Titov I. Boosting entity linking performance by leveraging unlabeled documents. In Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1935-1945.
24. Wang Y., Sohn S. et al. A clinical text classification paradigm using weak supervision and deep representation. BMC medical informatics and decision making, vol. 19, no. 1, 2019, pp. 1-13.
25. Kutuzov A., Kuzmenko E. To lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation. arXiv preprint arXiv:1909.03135, 2019.
26. Wiedemann G., Remus S. et al. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings. In Proc. of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers, 2019, pp. 161-170.
27. Vial L., Lecouteux B., Schwab D. Sense vocabulary compression through the semantic knowledge of wordnet for neural word sense disambiguation. arXiv preprint arXiv:1905.05677, 2019.
28. Kumar S., Jat S. et al. Zero-shot word sense disambiguation using sense definition embeddings. In Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 5670-5681.
29. Bevilacqua M., Navigli R. Breaking through the 80% glass ceiling: Raising the state of the art in Word Sense Disambiguation by incorporating knowledge graph information. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 2854-2864.
30. Berend G. Sparsity Makes Sense: Word Sense Disambiguation Using Sparse Contextualized Word Representations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 8498-8508.
31. Blevins T., Zettlemoyer L. Moving down the long tail of word sense disambiguation with gloss-informed biencoders. arXiv preprint arXiv:2005.02590, 2020.
32. Loukachevitch N. V., Lashevich G., Gerasimova A. A., Ivanov V. V., Dobrov B. V. Creating Russian wordnet by conversion. In Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference Dialogue, 2016, pp. 405-415.
33. Shavrina T., Shapovalova O. To the methodology of corpus construction for machine learning: “Taiga” syntax tree corpus and parser. In Proc. of “CORPORA-2017” International Conference, 2017, pp. 78-84.
34. Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proc. of the 5th Annual International Conference on Systems Documentation, 1986, pp. 24-26.
35. Luo F., Liu T. et al. Incorporating glosses into neural word sense disambiguation. arXiv preprint arXiv:1805.08028, 2018.
36. Huang L., Sun C. et al. GlossBERT: BERT for word sense disambiguation with gloss knowledge. arXiv preprint arXiv:1908.07245, 2019.
37. Loureiro D., Jorge A. Language modelling makes sense: Propagating representations through WordNet for full-coverage word sense disambiguation. arXiv preprint arXiv:1906.10007, 2019.
38. Bolshina A., Loukachevitch N. Exploring the limits of word sense disambiguation for Russian using automatically labelled collections. In Proc. of the Linguistic Forum 2020: Language and Artificial Intelligence (LFLAI), 2020, 14 p.
39. Bolshina A., Loukachevitch N. Generating training data for word sense disambiguation in Russian. In Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, 2020, pp. 119-132.
40. Panchenko A., Lopukhina A., Ustalov D., Lopukhin K., Arefyev N., Leontyev A., Loukachevitch N. RUSSE’2018: A Shared Task on Word Sense Induction for the Russian Language. In Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, 2018, pp. 547–564.
41. Kuratov Y., Arkhipov M.Y. Adaptation of deep bidirectional multilingual transformers for Russian language. arXiv preprint arXiv: 1905.07213, 2019.
42. Kutuzov A., Kuzmenko E. WebVectors: a toolkit for building web interfaces for vector semantic models. Communications in Computer and Information Science, vol. 661, 2016, pp. 155-161.
43. Kohli H. Transfer learning and augmentation for word sense disambiguation. In Advances in Information Retrieval, Springer, 2021, pp. 303-311.
44. Large J., Lines J., Bagnall A. A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates. Data Mining and Knowledge Discovery, vol. 33, no. 6, 2019, pp. 1674-1709.
45. Gale W. A., Church K. W., Yarowsky D. One sense per discourse. In Proc. of the Workshop on Speech and Natural Language, 1992, pp. 233-237.
46. Wei J., Zou K. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proc. of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 6383-6389.
Review
For citations:
BOLSHINA A.S., LOUKACHEVITCH N.V. Weakly Supervised Word Sense Disambiguation Using Automatically Labelled Collections. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(6):193-204. https://doi.org/10.15514/ISPRAS-2021-33(6)-13