a morphosyntactic lexicon for Serbian based on the Wiktionary

wikimorph-sr is a morphosyntactic lexicon for Serbian that can be used for POS-tagging, parsing and lemmatisation. It was mainly extracted from the serbo-croatian edition of the Wiktionary (

The lexicon contains 1,226,638 different wordforms corresponding to 117,445 different lemmas and to 3,066,214 unique combinations wordform, lemma, morphosyntactic description. Each morphosyntactic description contains a POS indication, a subcategory and a set of relevant morphosyntactic traits: case, number and gender for nouns, adjectives and pronouns; verb form, person, gender and number for verbs; degree of comparison for adjectives and adverbs. More detailed information is available in the PDF documentation of the lexicon.

This resource was developed as part of the ParCoLab project by Aleksandra Miletic (UMR 5263 CLLE-ERSS, CNRS & Université Toulouse - Jean Jaurès, France).

Aleksandra Miletic


Some rights are reserved. wikimorph-sr is distributed under a Creative Commons BY-SA 3.0 licence.


Miletic, Aleksandra. (2017). Building a morphosyntactic lexicon for Serbian from Wiktionary. Actes de la 6e édition des Journées d'étude toulousaines (JéTou2017). Toulouse, France. [ PDF ] [ Bibtex ]


Many thanks to Franck Sajous (UMR 5263 CLLE, CNRS & Université de Toulouse - Jean Jaurès) for sharing his experience in working on the Wiktionary.