REDAC
REsources Developed At CLLE CLLE: Cognition, Langues, Langage, Ergonomie






GLAWI

GLAWI
Documentation
This documentation is based on (Sajous and Hathout, 2015) and (Hathout and Sajous, 2016).

Linguistic labels

Description

Linguistic labels (in French: marques lexicographiques) are indicators found in definitions that signal a particular usage of a word: period, geographic area, specialized domain or subculture slang, etc.
In Wiktionnaire, such labels appear under heterogeneous formats in the wikicode. We detected and normalized labels in order to:

  • remove labels from the textual content of glosses and examples;
  • encode them formally in dedicated children markups of glosses and examples elements (see an illustration in the definitions page).

In order to normalize labels, we inventoried more than 6,000 different labels and aliases. (aliases are the different ways to encode the same information in Wiktionnaire: for example the œnologie ‘enology’ domain label appears under five forms in wikicode: {{œnologie}}, {{oenologie}}, {{œnol}}, {{oenol}} and (œnologie)).

We grouped the linguistic labels into categories (diatopic, diachronic, attitudinal, etc.) that are not encoded in Wiktionnaire. Examples are given below.

XML structure of labels

<!ELEMENT labels (label)*> <!ELEMENT label EMPTY> <!ATTLIST label type (attitudinal|diafrequential|diachronic|diatopic|domain|gram|loan|other|sem|usage|register) "other" value CDATA #REQUIRED>

Main labels types and values

The figures given in the following tables are those found in the version of GLAWI extracted from Wiktionnaire's 2/10/2015 dump.

Diafrequential6,166Diatopic8,726
rarerare4,215QuébecQuebec1,717
extrĂŞmement rareextremely rare1,016FranceFrance1,138
très rarevery rare301CanadaCanada971
plus courantmore common190SuisseSwitzerland962
courantcommon186BelgiqueBelgium637
plus raremore rare176LorraineLorraine299
moins courantless common62OccitanieOccitanie246
peu usitérarely used20NormandieNormandie134
ProvenceProvence123
Diachronic24,450AcadieAcadie122
vieilliold9,431LouisianeLouisiana90
désuetdated6,043RéunionRéunion89
avant 1835before 18351,654AfriqueAfrica64
néologismeneologism820Congo-KinshasaCongo-Kinshasa47
archaĂŻquearchaic661ArdennesArdennes46
1986 73Languedoc-RoussillonLanguedoc-Roussillon44
1990 72BretagneBrittany40
766 other years5,841362 other areas1,957
Loanwords1,493Domains155,532
anglicismeAnglicism1,446localitéslocality49,060
indo-européen commun usual indo-european22géographiegeography 11,935
hispanismeHispanism11botaniquebotanic6,461
germanismeGermanism7zoologiezoology5,460
gauloisGallic4médecinemedecine5,258
catalanCatalan3chimiechemistry3,358
histoire history2,804
marine sailing2,644
Semantics23,860religionreligion2,559
figuréfigurative10,859linguistiquelinguistics2,177
par extensionby extension6,666agricultureagriculture2,071
en particulierin particular2,574anatomieanatomy2,005
analogieanalogy1,213informatique computer science1,718
métonymiemetonymy886droitlaw1,698
ellipseellipsis793physiquephysics1,579
spécialementespecially704militairemilitary1,572
métaphoremetaphor75musiquemusic1,570
hyperbolehyperbole30minéralogiemineralogy1,531
apocopeapocope24biologiebiology1,515
généralementgenerally19antiquitéantique1,327
litotelitote10cuisinecooking1,284
figurerethorical figure7367 other domains45,946
Attitudinal17,340
familierfamiliar8333
argotslang2166
populaire popular 1870
péjoratifpejorative 1587
vulgaire vulgar 770
littéraire literary 513
ironique ironic 450
plaisanterie humor 345
injurieux offensive 258
exagération exaggeration 254
soutenu formal 247
poétique poetic 183
verlan backslang 128
enfantin childish 81
euphémisme euphemism 66
très familier very familiar 65
informel informal 13
dérision derision 13
mélioratif meliorative 1


Back to GLAWI's [ main documentation page ] [ project page ]