REDAC
REsources Developed At CLLE CLLE research unit






CanEn Test setVersion fran├žaise
Test set for semantic shift detection
Description

A test set was developed in order to facilitate the use of the CanEn corpus for the detection of contact-induced semantic shifts in Quebec English. More specifically, it allows for the evaluation of semantic change detection systems, where semantic change detection is formulated as a binary classification task (stable vs. changing words).

A total of 80 items are included in the test set: 40 correspond to semantic shifts in Quebec English, described in the sociolinguistic literature and attested in the CanEn corpus; the remaining 40 are control items which are unlikely to be affected by contact-related semantic influence and do not present regional variation in the corpus. The construction of the test set and its use in an evaluation of semantic change detection systems are presented in more detail by Miletic et al. (2021).

Each line in the file contains a lexical item, its POS tag, and its semantic change label (separated by tabs). The label is "1" if the lexical item is a semantic shift, and "0" if it is a control item.


Contact person
Filip Miletic
Contact:

Licence

The test set is released under the Creative Commons BY-NC-SA 4.0 licence.

Download

References
  • Miletic, F., Przewozny-Desriaux, A. and Tanguy, L. (2021). Detecting contact-induced semantic shifts: What can embedding-based methods do in practice? Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 10852-10865.PDF ]