Predicting grammatical gender in Nakh languages: Three methods compared




grammatical gender, Nakh languages, computational classifiers, gender assignment


The Nakh languages Chechen and Tsova-Tush each have a five-valued gender system: masculine, feminine, and three “neuter” genders named for their singular agreement forms: B, D and J. Gender assignment in languages is generally analysed as being dependent on both forms and semantics (e.g. Corbett, 1991), with semantics typically prevailing over form (e.g. Bellamy & Wichers Schreur, 2021, Allassonnière-Tang et al., 2021). Most previous studies have considered only binary or tripartite gender systems possessing masculine, feminine, and neuter values. The five-valued system of Nakh thus represents an innovative and insightful case study for analysing gender assignment. In this paper we build on the existing qualitative linguistic analyses of gender assignment in Tsova-Tush (Wichers Schreur, 2021) and apply three machine-learning methods to investigate the weight of form and semantics in predicting grammatical gender in Chechen and Tsova-Tush. The results show that while both form and semantics are helpful for predicting grammatical gender in Nakh, semantics is dominant, which supports findings from existing literature (Allassonnière-Tang, Brown & Fedden, 2021). However, the results also show that the coded semantic information could be further fine-grained to improve the accuracy of the predictions (see also Plaster et al., 2013). In addition, we discuss the implications of the output for our understanding of language-internal and family-internal processes of language change, including how loanwords are integrated from Russian, a three-gender language.


Allassonnière-Tang, Marc & Dunstan Brown & Sebastian Fedden. 2021. Testing semantic dominance in Mian gender: Three machine learning models. Oceanic Linguistics 60(2). 302–334.

Balam, Osmer. 2016. Semantic categories and gender assignment in Contact Spanish: Type of code-switching and its relevance to linguistic outcomes. Journal of Language Contact 9(3). 405–435.

Basirat, Ali & Marc Allassonnière-Tang & Aleksandrs Berdicevskis. 2021. An empirical study on the contribution of formal and semantic features to the grammatical gender of nouns. Linguistics Vanguard 7(1). 20200048.

Bellamy, Kate & M. Carmen Parafita Couto. 2022. Gender assignment in mixed noun phrases: State of the art. In Dalila Ayoun (ed.), The acquisition of gender: Crosslinguistic perspectives. 14–48. Amsterdam / Philadelphia: John Benjamins.

Bellamy, Kate & Jesse Wichers Schreur. 2022. When semantics and phonology collide: Gender assignment in mixed Tsova-Tush-Georgian nominal constructions. The International Journal of Bilingualism 26(3). 257–285.

Breiman, Leo & Jerome H. Friedman & Richard A. Olshen & Charles J. Stone. 1984. Classification and regression trees. Boca Raton: Routledge.

Brown, Dunstan. 1998. Defining ‘sub-gender’: Virile and devirilized nouns in Polish. Lingua 104(3-4). 187–233.

Carling, Gerd & Kate Bellamy & Jesse Wichers Schreur. 2021. Gender stability in Nakh-Daghestanian, paper presented at Languages, Dialects and Isoglosses of Anatolia, the Caucasus and Iran, March 2021, Paris.

Contini-Morava, Ellen & Marcin Kilarski. 2013. Functions of nominal classification. Language Sciences 40. 263–299.

Corbett, Greville. 1982. Gender in Russian: An account of gender specification and its relationship to declension. Russian Linguistics 6(2). 197–232.

Corbett, Greville. 1991. Gender. Cambridge: Cambridge University Press.

Corbett, Greville. 2013. Systems of gender assignment. In Matthew S. Dryer & Martin Haspelmath (eds.), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at, Accessed on 2022-02-02)

Corbett, Greville G. & Norman M. Fraser. 1993. Network Morphology: A DATR account of Russian nominal inflection. Journal of Linguistics 29(1). 113–142.

Corbett, Greville & Norman M. Fraser. 2000. Default genders. In Barbara Unterbeck & Matti Rissanen (eds.), Gender in grammar and cognition I: Approaches to gender. 55–98. Berlin: Mouton de Gruyter.

Cruz, Abel. 2021. A syntactic approach to gender assignment in Spanish–English bilingual speech. Glossa: a journal of general linguistics 6(1). 1–40.

Desheriev, Y. D. [Дешериев]. 1953. Bacbijskij jazyk: fonetika, morfologija, sintaksis, leksika [The Tsova-Tush language: phonetics, morphology, syntax, lexicon]. Moscow: Izdatel’stvo AN SSSR.

Demsar, Janez & Blaz Zupan & Gregor Leban & Tomaž Curk. 2004. Orange: From experimental machine learning to interactive data mining, white paper. European Conference of Machine Learning: 2004; Pisa, Italy 3202. 537–539.

Dryer, Matthew S. 2013. Prefixing vs. suffixing in inflectional morphology. In Matthew S. Dryer & Martin Haspelmath (eds.), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at, Accessed on 2022-02-02)

Evans, Nicholas & Dunstan Brown & Greville Corbett. 2002. The semantics of gender in Mayali: Partially parallel systems and formal implementation. Language 78(1). 109–153.

Evans, Roger & Gerald Gazdar. 1989a. Inference in DATR. Proceedings of the fourth conference of the European Chapter of the Association for Computational Linguistics, Manchester, England. 66–71.

Evans, Roger & Gerald Gazdar. 1989b. The semantics of DATR. In A. G. Cohn (ed.), Proceedings of the seventh conference of the Society for the Study of Artificial Intelligence and Simulation of Behaviour, 79–87. London: Pitman/Morgan Kaufmann.

Evans, Roger & Gerald Gazdar. 1996. DATR: A language for lexical knowledge representation. Computational Linguistics 22(2). 167–216.

Fedden, Sebastian. 2011. A Grammar of Mian. Berlin / Boston: Mouton de Gruyter.

Fraser, Norman M. & Greville G. Corbett. 1995. Gender, animacy, and declensional class assignment: A unified account for Russian. In Geert Booij & Jaap van Maarle (eds.), Yearbook of Morphology 1994, 123–150. Amsterdam: Kluwer Academic Publishers.

Fraser, Norman M. & Greville G. Corbett. 1997. Defaults in Arapesh. Lingua 103(1). 25–57.

Gagliardi, Annie & Jeffrey Lidz. 2014. Statistical insensitivity in the acquisition of Tsez noun classes. Language 90(1). 58–89.

Haykin, S. 1998. Neural networks: A comprehensive foundation. Prentice-Hall: Englewood Cliffs.

Her, One-Soon & Marc Tang. 2020. A statistical explanation of the distribution of sortal classifiers in languages of the world via computational classifiers. Journal of Quantitative Linguistics 27(2). 93–113.

Hockett, Charles F. 1958. A course in modern linguistics. New York: MacMillan.

Jamalkhanov, Z. D. [Джамалханов] & Aliroev, I. Y. [Алироев]. 1991. Slovar’ pravopisanija literaturnogo čečenskogo jazyka [Orthographical dictionary of literary Chechen]. Grozny: Kniga.

Kadagidze, E. [ქადაგიძე]. 2009. C’ova-tušuri t‘ekst’ebi [Tsova-Tush texts]. Tbilisi: TSU gamomcemloba.

Kadagidze, D. & N. Kadagidze. [ქადაგიძე]. 1984. C’ova-tušur-kartul-rusul leksik’oni [Tsova-Tush-Georgian-Russian dictionary]. Tbilisi: Mecniereba.

Karmiloff-Smith, Annette. 1979. A functional approach to child language. New York / London: Cambridge University Press.

Khalilov, M. S. [Халилов]. 1999. Cezsko-russkij slovar’ [Tsez-Russian dictionary]. Moscow: Academia

Lemus-Serrano, Magdalena & Marc Allassonnière-Tang & Dan Dediu. 2021. What conditions tone paradigms in Yukuna: Phonological and machine learning approaches. Glossa: a journal of general linguistics 6(1). 60.

List, Johann-Mattis & Michael Cysouw & Robert Forkel. 2016. Concepticon: A resource for the linking of concept lists. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 2393–2400.

Matsiev, A. G. [Мациев]. 1961. Slovar’ čečenskogo jazyka [Chechen dictionary]. Moscow: Gosudarstvennoe izdatel’stvo inostrannyx i nacional’nyx sloverej.

Nichols, Johanna. 1989. The Nakh evidence for the history of gender in Nakh-Daghestanian. In Howard I. Aronson (ed.), The non-Slavic languages of the USSR: linguistic studies, 158–175. Chicago: Chicago Linguistic Society, University of Chicago.

Nichols, Johanna. 1994. Chechen. In Riks Smeets (ed.) The North East Caucasian languages, part 2, 1–78. Delmar: Caravan.

Nichols, Johanna. 2003. The Nakh-Daghestanian consonant correspondences. In Dee Ann Holisky & Kevin Tuite (eds.), Current trends in Caucasian, East European and Inner Asian linguistics: Papers in honor of Howard I. Aronson, 207–264. Amsterdam: John Benjamins.

Nichols, Johanna. 2007. Chechen morphology with notes on Ingush. In Alan S. Kaye (ed.), Morphologies of Africa and Asia, 1188–1207. State College: Penn State University Press.

Nichols, Johanna. 2011. Ingush grammar. Berkeley, Los Angeles: University of California Press.

Parks, Randolph & Daniel S. Levine & Debra L. Long. (eds.). 1998. Fundamentals of neural network modeling: Neuropsychology and cognitive neuroscience. Boston: MIT Press.

Plaster, Keith & Maria Polinsky & Boris Harizanov. 2013. Noun classes grow on trees: Noun classification in the North-East Caucasus. In Balthazar Bickel, Lore A. Grenoble, David A. Peterson & Alan Timberlake (eds.), Language typology and historical contingency: In honor of Johanna Nichols, 153–170. Amsterdam: John Benjamins.

Polinsky, Maria & Ezra Van Everbroeck. 2003. Development of gender classifications: Modeling the historical change from Latin to French. Language 79(2). 356–390.

Quinlan, J. Ross. 1993. C4.5: Programs for Machine Learning. Burlington: Morgan Kaufmann Publishers.

Quinlan, J. Ross. 1996. Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4. 77–90.

Rajabov, Ramazan. Undated. Tsez Dictionary. Unpublished MS (Los Angeles: University of Southern California).

Senft, Gunter (ed.). 2000. Systems of nominal classification. Cambridge: Cambridge University Press.

Sokolik, M. E. & Michael E. Smith. 1992. Assignment of gender to French nouns in primary and secondary language: A connectionist model. Second Language Research 8(1). 39–58.

Tagliamonte, Sali A. & R. Harald Baayen. 2012. Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2). 135–178.

Ting, Kai Ming. 2010. Precision and recall. In Claude Sammut & Geoffrey I. Webb (eds.), Encyclopedia of Machine Learning, 781–781. Boston: Springer.

Ulrich, Natalja & Marc Allassonnière-Tang & François Pellegrino & Dan Dediu. 2021. Identifying the Russian voiceless non-palatalized fricatives /f/, /s/ and /ʃ/ from acoustic cues using Machine Learning. Journal of the Acoustical Society of America 150(3). 1806–1820.

Wichers Schreur, Jesse. 2021. Nominal borrowings in Tsova-Tush (Nakh-Daghestanian, Georgia) and their gender assignment. In Diana Forker & Lore A. Grenoble (eds.), Language contact in the territory of the former Soviet Union, 15–33. Amsterdam: John Benjamins.

Wurm, S. A. & I. Heyward & Unesco. 2001. Atlas of the world's languages in danger of disappearing. Paris: Unesco Pub. Website consulted on 7-12-2021.




How to Cite

Wichers Schreur J, Allassonnière-Tang M, Bellamy K, Rochant N. Predicting grammatical gender in Nakh languages: Three methods compared. LTC [Internet]. 2022Jan.1 [cited 2023Dec.4];2(2):93-126. Available from:



Research articles