REDAC
REsources Developed At CLLE CLLE: Cognition, Langues, Langage, Ergonomie







version française
WIKIMORPH-SR
a morphosyntactic lexicon for Serbian based on the Wiktionary
Description

wikimorph-sr is a morphosyntactic lexicon for Serbian that can be used for POS-tagging, parsing and lemmatisation. It was mainly extracted from the serbo-croatian edition of the Wiktionary (sh.wiktionary.org).

The lexicon contains 1,226,638 different wordforms corresponding to 117,445 different lemmas and to 3,066,214 unique combinations wordform, lemma, morphosyntactic description. Each morphosyntactic description contains a POS indication, a subcategory and a set of relevant morphosyntactic traits: case, number and gender for nouns, adjectives and pronouns; verb form, person, gender and number for verbs; degree of comparison for adjectives and adverbs. More detailed information is available in the PDF documentation of the lexicon.

This resource was developed as part of the ParCoLab project by Aleksandra Miletic (UMR 5263 CLLE-ERSS, CNRS & Université Toulouse - Jean Jaurès, France).

Person in charge
Aleksandra Miletic
Contact:

Licence

Some rights are reserved. wikimorph-sr is distributed under a Creative Commons BY-SA 3.0 licence.

Download
References

Miletic, Aleksandra. (2017). Building a morphosyntactic lexicon for Serbian from Wiktionary. Actes de la 6e édition des Journées d'étude toulousaines (JéTou2017). Toulouse, France. [ PDF ] [ Bibtex ]

Acknowledgements

Many thanks to Franck Sajous (UMR 5263 CLLE, CNRS & Université de Toulouse - Jean Jaurès) for sharing his experience in working on the Wiktionary.