Gold standard for the TALN corpus

REsources Developed At CLLE
Homepage Resources Applications Corpora Lexicons Other resources About CLLE This website Legal notice Contact

Description

The file below is the result of the manual annotation carried out in the framework of the comparison of distributional models, as described in (Tanguy et al., 2015).

The dataset is based on the analysis of the version of the TALN corpus corresponding to the years 2007-2013.

4 different annotators have judged the relevancy of the neighbors computed by the distributional models for the 30 following words:

Ajectives: complexe, computationnel, correct, empirique, important, précis, sémantique, significatif, spécialisé, temporel
Noun: élément, contrainte, dépendant, fréquence, graphe, méthode, performance, sémantique, signification, trait
Verbs: annoter, apparier, évaluer, calculer, caractériser, conduire, décrire, extraire, indexer, valider

The list of the evaluated candidates has been generated by the pooling method described in the article available below.

The notion of neighborhood is not restricted to a specific semantic relation: the neighbors may be synonyms, hypernyms, hyponyms, antonyms or words that are otherwise semantically related.

The dataset is a tabulated file (UTF-8 encoded) whose lines contain:

the target word
the POS of the target word (ADJ for adjective, NC for common noun, V for verb)
a neighbor
the number of annotators (ranging from 1 to 4) who have judged this neighbor relevant.

Design

Person in charge

Franck Sajous
Contact:

Licence

Some rights are reserved. The gold standard file is available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Download

gold-semdis-corpusTALN.csv

Reference

L. Tanguy, F. Sajous et N. Hathout (2015). Évaluation sur mesure de modèles distributionnels sur un corpus spécialisé : comparaison des approches par contextes syntaxiques et par fenêtres graphiques. TAL, 56(2), pp 103-127. [ Article ] [ Bibtex ]