Glawinette is a derivational lexicon of French built from the GLAWI machine readable dictionary.
The entries of Glawinette are pairs of morphologically related lexemes like accomplir_V:accomplissement_N.
Glawinette provides the word family (morphological family) of each of its entries
and a characterization of the derivational relations which the pair of lexemes are in. Relations are described by means of:
a broad alternation pattern (BAP) consisting in two regular expressions that describe the most general form relation that exists between the two words as ^(.+)r:^(.+)ssement for accomplir_V:accomplissement_N,
where the sequence (.+) represents the string accompli;
a fine-grained alternation pattern (FAP) consisting in two regular expressions that describe a form relation between the two words that uses linguistically motivated derivational exponents such as ^(.+)ir:^(.+)issement for
accomplir_V:accomplissement_N, where the sequence (.+) represents the string accompl;
Glawinette contains 156,090 lexeme pairs which fall into 15,843 word families and 5,384 derivational series.
Glawinette is available in two forms: tsv and json.
Format of the tsv table:
lemma1 = lemma of word1
lemma2 = lemma of word2
cat1 = grammatical category of word1
cat2 = grammatical category of word2
familyId = ID of the word family of word1:word2
morphOri = True if the pair comes from a morphological section and False otherwise
defOri = True if the pair comes from a definition and False otherwise
BAP1 = regular expression which corresponds to word1 in the BAP
BAP2 = regular expression that matches word2 in the BAP
FAP1 = regular expression that matches word1 in the FAP
FAP2 = regular expression that matches word2 in the FAP
FAP_matches = number of pairs that share the same FAP in Glawinette
FAP_stem = stem that matches the sequence (.+) in FAP1 and FAP2
FAP_pref = True if FAP1 or FAP2 contains a prefix and False otherwise
FAP_suff = True if FAP1 or FAP2 contain a suffix and False otherwise
The following 4 fields are only filled in if the pair comes from a definition (i.e. if defOri is 1).
defEntry = GLAWI’s entry whose definition was used to identify the couple. defEntry is either lemma1 or lemma2
defCat = category of the GLAWI’s entry whose definition was used to identify the pair. defCat is either cat1 or cat2
defTxt = text of the definition from which word1:word2 is coming
defLem = lemmatized form of the definition from which word1:word2 is coming
The json archive contains two json files:
glawinette-families.json contains the list of Glawinette word families. The families are represented as lists of word pairs. The word pairs are dictionaries as illustrated in the following excerpt:
Hathout, N., Sajous, F., Calderone, B., Namer, F. (2020).
Glawinette: a linguistically motivated derivational description of French acquired from GLAWI.
In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020),
pp. 3870-3878, Marseille, 2020.
[ PDF ]
[ Bibtex ]