|
|||||||
|
a morphosyntactic lexicon for Serbian based on the Wiktionary Description
sh-wiktionary is a morphosyntactic lexicon for Serbian that can be used for POS-tagging, parsing and lemmatisation. It was mainly extracted from the serbo-croatian edition of the Wiktionary (sh.wiktionary.org). The lexicon contains 1 222 486 different wordforms corresponding to 117 445 different lemmas and to 3 061 616 unique combinations wordform, lemma, morphosyntactic description. Each morphosyntactic description contains a POS indication, a subcategory and a set of relevant morphosyntactic traits: case, number and gender for nouns, adjectives and pronouns; verb form, person, gender and number for verbs; degree of comparison for adjectives and adverbs. More detailed information is available in the PDF documentation of the lexicon. This resource was developed as part of the ParCoLab project by Aleksandra Miletic (UMR 5263 CLLE-ERSS, CNRS & Université Toulouse - Jean Jaurès, France). Person in charge
Aleksandra MileticContact: Licence
Some rights are reserved. sh-wiktionary is distributed under a Creative Commons BY-SA 3.0 licence. Download
References
Miletic, Aleksandra. (2017). Building a morphosyntactic lexicon for Serbian from Wiktionary. Actes de la 6e édition des Journées d'étude toulousaines (JéTou2017). Toulouse, France. article accepté Acknowledgements
Many thanks to Franck Sajous (UMR 5263 CLLE, CNRS & Université de Toulouse - Jean Jaurès) for sharing his experience in working on the Wiktionary.
|