REDAC
REsources Developed At CLLE CLLE: Cognition, Langues, Langage, Ergonomie







version française
SH-WIKTIONARY
a morphosyntactic lexicon for Serbian based on the Wiktionary
Description

sh-wiktionary is a morphosyntactic lexicon for Serbian that can be used for POS-tagging, parsing and lemmatisation. It was mainly extracted from the serbo-croatian edition of the Wiktionary (sh.wiktionary.org).

The lexicon contains 1 222 486 different wordforms corresponding to 117 445 different lemmas and to 3 061 616 unique combinations wordform, lemma, morphosyntactic description. Each morphosyntactic description contains a POS indication, a subcategory and a set of relevant morphosyntactic traits: case, number and gender for nouns, adjectives and pronouns; verb form, person, gender and number for verbs; degree of comparison for adjectives and adverbs. More detailed information is available in the PDF documentation of the lexicon.

This resource was developed as part of the ParCoLab project by Aleksandra Miletic (UMR 5263 CLLE-ERSS, CNRS & Université Toulouse - Jean Jaurès, France).

Person in charge
Aleksandra Miletic
Contact:

Licence

Some rights are reserved. sh-wiktionary is distributed under a Creative Commons BY-SA 3.0 licence.

Download
References

Miletic, Aleksandra. (2017). Building a morphosyntactic lexicon for Serbian from Wiktionary. Actes de la 6e édition des Journées d'étude toulousaines (JéTou2017). Toulouse, France. article accepté

Acknowledgements

Many thanks to Franck Sajous (UMR 5263 CLLE, CNRS & Université de Toulouse - Jean Jaurès) for sharing his experience in working on the Wiktionary.