REsources Developed At CLLE-ERSS CLLE-ERSS research unit

Version française

ParCoLab is a 3-million-word parallel corpus containing original and translated texts in three European languages: Serbian, French, and English. Each of the languages functions both as a source and as a target language.

The texts included in the corpus, which are mainly literary, are paragraph- and sentence-aligned. The alignments have been manually validated, which guarantees their quality. ParCoLab is also distinguished by the fact that it follows the current standards of corpus creation and distribution (it is stored in a TEI-compliant XML format).

The ParCoLab parallel corpus can be queried online for free. A search engine allows users to formulate queries and extract sentences containing the target expression, as well as the corresponding sentences in one or both other languages.

As a work in progress, the corpus is in continuous qualitative, quantitative, and technical development.

Person in charge
Dejan Stosic

The ParCoLab corpus can be used online for free after the creation of a user account. The texts cannot be downloaded.