a morphosyntactic lexicon for Serbian based on the Wiktionary
wikimorph-sr is a morphosyntactic lexicon for Serbian that can be used for POS-tagging, parsing and lemmatisation. It was mainly extracted from the serbo-croatian edition of the Wiktionary (sh.wiktionary.org).
The lexicon contains 1,226,638 different wordforms corresponding to 117,445 different lemmas and to 3,066,214 unique combinations wordform, lemma, morphosyntactic description. Each morphosyntactic description contains a POS indication, a subcategory and a set of relevant morphosyntactic traits: case, number and gender for nouns, adjectives and pronouns; verb form, person, gender and number for verbs; degree of comparison for adjectives and adverbs. More detailed information is available in the PDF documentation of the lexicon.
Person in chargeAleksandra Miletic
Some rights are reserved. wikimorph-sr is distributed under a Creative Commons BY-SA 3.0 licence.
Miletic, Aleksandra. (2017). Building a morphosyntactic lexicon for Serbian from Wiktionary. Actes de la 6e Ã©dition des JournÃ©es d'Ã©tude toulousaines (JÃ©Tou2017). Toulouse, France. article acceptÃ©
Many thanks to Franck Sajous (UMR 5263 CLLE, CNRS & UniversitÃ© de Toulouse - Jean JaurÃ¨s) for sharing his experience in working on the Wiktionary.