REDAC
REsources Developed At CLLE CLLE research unit






GlawinetteVersion française
French derivational lexicon
Description

Glawinette is a derivational lexicon of French built from the GLAWI machine readable dictionary. The entries of Glawinette are pairs of morphologically related lexemes like accomplir_V:accomplissement_N. Glawinette provides the word family (morphological family) of each of its entries and a characterization of the derivational relations which the pair of lexemes are in. Relations are described by means of:

  • a broad alternation pattern (BAP) consisting in two regular expressions that describe the most general form relation that exists between the two words as ^(.+)r:^(.+)ssement for accomplir_V:accomplissement_N, where the sequence (.+) represents the string accompli;
  • a fine-grained alternation pattern (FAP) consisting in two regular expressions that describe a form relation between the two words that uses linguistically motivated derivational exponents such as ^(.+)ir:^(.+)issement for accomplir_V:accomplissement_N, where the sequence (.+) represents the string accompl;

Glawinette contains 156,090 lexeme pairs which fall into 15,843 word families and 5,384 derivational series.

Glawinette is available in two forms: tsv and json.

Format of the tsv table:

  • lemma1 = lemma of word1
  • lemma2 = lemma of word2
  • cat1 = grammatical category of word1
  • cat2 = grammatical category of word2
  • familyId = ID of the word family of word1:word2
  • morphOri = True if the pair comes from a morphological section and False otherwise
  • defOri = True if the pair comes from a definition and False otherwise
  • BAP1 = regular expression which corresponds to word1 in the BAP
  • BAP2 = regular expression that matches word2 in the BAP
  • FAP1 = regular expression that matches word1 in the FAP
  • FAP2 = regular expression that matches word2 in the FAP
  • FAP_matches = number of pairs that share the same FAP in Glawinette
  • FAP_stem = stem that matches the sequence (.+) in FAP1 and FAP2
  • FAP_pref = True if FAP1 or FAP2 contains a prefix and False otherwise
  • FAP_suff = True if FAP1 or FAP2 contain a suffix and False otherwise

The following 4 fields are only filled in if the pair comes from a definition (i.e. if defOri is 1).

  • defEntry = GLAWI’s entry whose definition was used to identify the couple. defEntry is either lemma1 or lemma2
  • defCat = category of the GLAWI’s entry whose definition was used to identify the pair. defCat is either cat1 or cat2
  • defTxt = text of the definition from which word1:word2 is coming
  • defLem = lemmatized form of the definition from which word1:word2 is coming

The json archive contains two json files:

  1. glawinette-families.json contains the list of Glawinette word families. The families are represented as lists of word pairs. The word pairs are dictionaries as illustrated in the following excerpt:

          
    [[{"word1": {"lemma": "autoformation", "cat": "N"}, "word2": {"lemma": "formation", "cat": "N"}},
    {"word1": {"lemma": "autoformer","cat": "V"},"word2": {"lemma": "former","cat": "V"}}, ...] ...]
        
  2. glawinette-series.json contains the list of Glawinette word pairs. Each pair is described by a dictionary that provides:

    • the lemma and category of the word1
    • the lemma and category of word2
    • the origin of the word1:word2 pair
    • the BAP
    • the FAP
    • the definition from which the pair is coming, if any.

    The following dictionary illustrates the description of the accomplir_V:accomplissement_N pair:

    
    {"word1": {"lemma": "accomplir", "cat": "V"},
    "word2": {"lemma": "accomplissement", "cat": "N"},
    "relation": {"origin": {"morpho": false, "def": true},
    "BAP": {"BAP1": "^(.+)r$", "BAP2": "^(.+)ssement$"},
    "FAP": {"FAP1": "^(.+)ir$", "FAP2": "^(.+)issement$", "stem": "accompl", "pref": false, "suff": true, "matches": 177}},
    "definition": {"entry": "accomplissement", "txt": "Action d'accomplir ou résultat de cette action.",
                   "lemmatized": "action de accomplir ou résultat de ce action ."}}
        

Developers

Nabil Hathout, Franck Sajous, Basilio Calderone, Fiammetta Namer

Person in charge

Nabil Hathout

Contact :

License/Credit

Glawinette is available under a Creative Commons By-SA 3.0 license.

Financial support
This work benefited from the support of the project DEMONEXT (ANR-17-CE23-0005) of the French National Research Agency.

Download

References

Glawinette is described in the following article:

  • Hathout, N., Sajous, F., Calderone, B., Namer, F. (2020). Glawinette: a linguistically motivated derivational description of French acquired from GLAWI. In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), pp. 3870-3878, Marseille, 2020. [ PDF ] [ Bibtex ]