Data Mining Methods to Compare Englishes
https://doi.org/10.15514/ISPRAS-2022-34(5)-10
Abstract
The paper presents the results of the corpus-based research of noun cryptotypes in 20 varieties of English (Englishes). The data for this research collected from Mark Davies’ corpora GloWbE and NOW enabled us to focus on variation in the covert classification of nouns in modern Englishes. A noun cryptotype introduced by Whorf is approached as ‘a covert type of classification of nouns, marked by lexical selection in a syntactical classifier rather than a morphological tag’. The purpose of the study has been to compare and contrast the covert classification of basic 23 emotions in 20 Englishes (64,702 tokens). 20 Englishes have been clustered with the help of Data Mining methods (such as k-means clustering and a self-organizing Kohonen map). There are six clusters that appeared to be corresponding to geographic areas: American cluster (American and Canadian Englishes); Australian cluster (Australian and New Zealand Englishes); European cluster (British and Irish Englishes); Asian cluster (Indian, Pakistani, Singapore, Hong Kong, Malaysian, Bangladeshi, Sri Lankan, and Philippine Englishes); African cluster (Kenyan, South African, Nigerian, Ghanaian, and Tanzanian Englishes); Caribbean cluster (Jamaican English). The correlation coefficients among Englishes in the Asian and African clusters (the Outer Circle in the World Englishes Paradigm of Braj B. Kachru) range from 0.74 to 0.8 due to little contact among the varieties inside these clusters. The correlation coefficients between Englishes in the American, Australian and European clusters (the Inner Circle, Kachru) range from 0.92 to 0.933, which indicates a high consistency of these varieties owing to the long lasting, enduring linguistic contacts.
About the Author
Olga Valer’evna DONINARussian Federation
Candidate of Philological Sciences, Associate Professor of the Department of Theoretical and Applied Linguistics
References
1. Abrosimova L.S. Word formation in the linguistic categorization of the world. Rostov-on-Don, SFU Publishing House, 2015, 328 p. (in Russian) / Абросимова Л.С. Словообразование в языковой категоризации мира. Ростов на Дону, Изд-во ЮФУ, 2015 г., 328 стр.
2. Boriskina O.O Linguistic categorization of the elements. In Proc. of the 2nd International Conference on Philology and Culture, 2019, pp. 149-157 (in Russian) / Борискина О.О. Языковая категоризация стихий. Материалы 2-й международной конференции «Филология и культура», 1999 г., стр. 149-156.
3. Boriskina O.O. Cryptoclasses of primary elements as an element of the ontognostic description of language. In Problems of linguistic prognostics, issue 1, Voronezh, Central Black Earth Book Publishing House, 2000, pp. 121-126 (in Russian) / Борискина О.О. Криптоклассы первостихий как элемент онтогностического описания языка. В сборнике статей «Проблемы Лингвистической Прогностики», вып. 1, Воронеж, Центрально-Черноземное книжное издательство, 2000 г., стр. 121-126.
4. Boriskina O.O. National-specific linguistic consciousness and borrowed word. In Intercultural communication and problems of national identity, Voronezh, Voronezh State University Press, 2002, pp. 406-410 (in Russian) / Борискина О.О. Национально-специфическое языковое сознание и заимствованное слово. В сборнике статей «Межкультурная коммуникация и проблемы национальной идентичности», Воронеж, издательство Воронежского государственного университета, 2002 г., стр. 406-410.
5. Boriskina O.O. The dynamic noun combinatory profile. Issues of cognitive linguistics, 2008, issue 3, pp. 64-69 (in Russian) / Борискина О.О. Моделирование синтагматической динамики слова. Вопросы когнитивной лингвистики, вып. 3, 2008 г., стр. 64-69. .
6. Boriskina O.O. Cryptotype projection of abstract entities: applications of cryptotype approach to noun combinations study. Proceedings of Voronezh State University. Series: Linguistics and intercultural communication, issue 1, 2009, pp. 32-37 (in Russian) / Boriskina O.O. Криптоклассные проекции мира непредметных сущностей: опыт криптоклассного анализа словосочетаемости. Вестник ВГУ. Серия: Лингвистика и межкультурная коммуникация, вып. 1, 2009 г., стр. 32-37.
7. Boriskina O.O. Explanation of the unexplained or about the motivation of the unmotivated. Vestnik of Saint Petersburg University. Series 9. Philology. Oriental studies. Journalism, issue 1, 2010, pp. 95-100 (in Russian) / Борискина О.О. Объяснение необъяснимого или мотивация немотивированного. Вестник Санкт-Петербургского университета. Серия 9. Филология. Востоковедение. Журналистика, вып. 1, 2010 г., стр. 95-100.
8. Boriskina O.O., Marchenko T. An Algorithm for Analysis of Distribution of Abstract Nouns in Cryptotypes. In Proc. of the 2010 International Conference on Artificial Intelligence, 2010, pp. 907-913.
9. NOW Corpus (News on the Web). Available at: https://www.english-corpora.org/now/.
10. Donina O.V. The study of cryptopytes: vizualization of reserach results. Proceedings of Voronezh State University. Series: Linguistics and intercultural communication, issue 3, 2015, pp. 105-112 (in Russian) / Донина О.В. Способы визуализации результатов криптоклассного исследования. Вестник ВГУ. Серия: Лингвистика и межкультурная коммуникация, вып. 3, 2015 г., стр. 105-112.
11. Donina O.V., Boriskina O.O. Emotive lexemes from the perspective of areal variativity. Proceedings of Voronezh State University. Series: Linguistics and intercultural communication, issue 4, 2016, pp. 41-45 (in Russian) / Донина О.В., Борискина О.О. Эмотивная лексика в аспекте ареальной вариативности. Вестник ВГУ. Серия: Лингвистика и межкультурная коммуникация, вып. 4, 2016 г., стр. 41-45.
12. Kretov A.A., Boriskina O.O, Vasilyeva N. 2004. «Flight of thought» and methods of cryptoclass investigation. Proceedings of Voronezh State University. Series: Linguistics and intercultural communication, issue. 1, 2004, pp. 61-65 (in Russian) / Кретов А.А., Борискина О.О., Васильева Н.Е. «Полёт мысли» и методика исследования криптоклассов. Вестник ВГУ. Серия: Лингвистика и межкультурная коммуникация, вып. 1, 2004 г., стр. 61-65.
13. Polyakov V.N., Yaroslavtseva E.I. The Quantitative Parameters of Typological Shift. Scientific notes of Kazan State University. Series: Humanitarian sciences, vol. 150, issue 2, 2008, pp. 97-118 (in Russian) / В.Н. Поляков, Е.И. Ярославцева. Квантитативные закономерности типологического сдвига в языках Евразии (на материале БД «Языки мира» ИЯ РАН). Ученые записки Казанского университета. Серия Гуманитарные науки, том 150, вып. 2, 2008 г., стр. 97-118.
14. Kachru B. Models for Non–native Englishes. The Other Tongue: English across cultures. Urbana: University of Illinois Press, 1992, 416 p.
15. Whorf B.L. Language, Thought and Reality. Selected Writings of Benjamin Lee. The MIT Press, 1964, 290 p.
16. Boriskina O.O. The Main Criteria for the Exploration of Noun Cryptotypes. In Proc. of the VI International Scientific Conference on Language, Culture, Society, 2011. Available at: http://www.mosinyaz.com/conferences/mnk6_s3_12/.
Review
For citations:
DONINA O.V. Data Mining Methods to Compare Englishes. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2022;34(5):163-170. https://doi.org/10.15514/ISPRAS-2022-34(5)-10