|
GLAWI
GLAWI
Documentation
This documentation is based on (Sajous and Hathout, 2015)
and (Hathout and Sajous, 2016).
Linguistic labels
Description
Linguistic labels (in French: marques lexicographiques) are indicators found in definitions
that signal a particular usage of a word: period, geographic area, specialized domain or subculture slang, etc.
In Wiktionnaire, such labels appear under heterogeneous formats in the wikicode.
We detected and normalized labels in order to:
- remove labels from the textual content of glosses and examples;
- encode them formally in dedicated children markups of glosses and examples elements
(see an illustration in the definitions page).
In order to normalize labels, we inventoried more than 6,000 different labels and aliases.
(aliases are the different ways to encode the same information in Wiktionnaire: for example the œnologie ‘enology’ domain label
appears under five forms in wikicode: {{œnologie}}, {{oenologie}}, {{œnol}}, {{oenol}} and (œnologie) ).
We grouped the linguistic labels into categories (diatopic, diachronic, attitudinal, etc.)
that are not encoded in Wiktionnaire. Examples are given below.
XML structure of labels
Main labels types and values
The figures given in the following tables are those found in the version of GLAWI extracted
from Wiktionnaire's 2/10/2015 dump.
rare | rare | 4,215 | Québec | Quebec | 1,717 |
extrĂŞmement rare | extremely rare | 1,016 | France | France | 1,138 |
très rare | very rare | 301 | Canada | Canada | 971 |
plus courant | more common | 190 | Suisse | Switzerland | 962 |
courant | common | 186 | Belgique | Belgium | 637 |
plus rare | more rare | 176 | Lorraine | Lorraine | 299 |
moins courant | less common | 62 | Occitanie | Occitanie | 246 |
peu usité | rarely used | 20 | Normandie | Normandie | 134 |
| Provence | Provence | 123 |
Acadie | Acadie | 122 |
vieilli | old | 9,431 | Louisiane | Louisiana | 90 |
désuet | dated | 6,043 | Réunion | Réunion | 89 |
avant 1835 | before 1835 | 1,654 | Afrique | Africa | 64 |
néologisme | neologism | 820 | Congo-Kinshasa | Congo-Kinshasa | 47 |
archaĂŻque | archaic | 661 | Ardennes | Ardennes | 46 |
1986 | | 73 | Languedoc-Roussillon | Languedoc-Roussillon | 44 |
1990 | | 72 | Bretagne | Brittany | 40 |
766 other years | 5,841 | 362 other areas | 1,957 |
anglicisme | Anglicism | 1,446 | localités | locality | 49,060 |
indo-européen commun | usual indo-european | 22 | géographie | geography | 11,935 |
hispanisme | Hispanism | 11 | botanique | botanic | 6,461 |
germanisme | Germanism | 7 | zoologie | zoology | 5,460 |
gaulois | Gallic | 4 | médecine | medecine | 5,258 |
catalan | Catalan | 3 | chimie | chemistry | 3,358 |
| histoire | history | 2,804 |
| marine | sailing | 2,644 |
religion | religion | 2,559 |
figuré | figurative | 10,859 | linguistique | linguistics | 2,177 |
par extension | by extension | 6,666 | agriculture | agriculture | 2,071 |
en particulier | in particular | 2,574 | anatomie | anatomy | 2,005 |
analogie | analogy | 1,213 | informatique | computer science | 1,718 |
métonymie | metonymy | 886 | droit | law | 1,698 |
ellipse | ellipsis | 793 | physique | physics | 1,579 |
spécialement | especially | 704 | militaire | military | 1,572 |
métaphore | metaphor | 75 | musique | music | 1,570 |
hyperbole | hyperbole | 30 | minéralogie | mineralogy | 1,531 |
apocope | apocope | 24 | biologie | biology | 1,515 |
généralement | generally | 19 | antiquité | antique | 1,327 |
litote | litote | 10 | cuisine | cooking | 1,284 |
figure | rethorical figure | 7 | 367 other domains | 45,946 |
familier | familiar | 8333 |
argot | slang | 2166 |
populaire | popular | 1870 |
péjoratif | pejorative | 1587 |
vulgaire | vulgar | 770 |
littéraire | literary | 513 |
ironique | ironic | 450 |
plaisanterie | humor | 345 |
injurieux | offensive | 258 |
exagération | exaggeration | 254 |
soutenu | formal | 247 |
poétique | poetic | 183 |
verlan | backslang | 128 |
enfantin | childish | 81 |
euphémisme | euphemism | 66 |
très familier | very familiar | 65 |
informel | informal | 13 |
dérision | derision | 13 |
mélioratif | meliorative | 1 |
Back to GLAWI's [ main documentation page ] [ project page ]
|