|
|||||||
|
POS tagging and lemmatisation of Serbian Description
ParCoTrain is a training and test corpus for the POS-tagging and lemmatisation of Serbian. The lemmatised section of the corpus contains 95585 tokens, whereas the POS-tagged section counts 153625 tokens (95585 of which are annotated manually, with the remaining 57977 annotated automatically and validated manually). The source texts for the corpus are contemporary Serbian novels from the second half of the 20th century. The POS-tagging gives the main POS and the subcategory. It also indicates the degree of comparison for adjectives and adverbs. A detailed description of the tagset used in the corpus can be found in the PDF documentation downloadable from this page. This resource was developed as part of the ParCoLab project by Aleksandra Miletic (CLLE-ERSS, Université Toulouse - Jean Jaurès), Antonio Balvet (STL, Université Lille 3) and Dejan Stosic (CLLE-ERSS, Université Toulouse - Jean Jaurès). Person in charge
Aleksandra MileticContact: Licence
Some rights are reserved. ParCoTrain is distributed under a Creative Commons BY-NC-SA 3.0 licence.
Download
References
|