REsources Developed At CLLE CLLE: Cognition, Langues, Langage, Ergonomie


GLAWIEnglish version
GLÀFF and WiktionaryX
GLAWI is a French Machine-Readable Dictionary encoded in XML format. It is a structured and normalized version of Wiktionnaire (the French language edition of Wiktionary).
This dictionary includes:
  • simple words, compounds and multiword expressions
  • inflected forms and lemmas
  • etymologies
  • pronunciations in API
  • definitions (glosses and examples)
  • translations
  • semantic relations
  • morphological relations
  • spelling variations
A description of the resource's structure, and information about the conversion process can be found in (Sajous and Hathout, 2015) and (Hathout and Sajous, 2016).

Franck Sajous, Nabil Hathout and Basilio Calderone

Person in charge
Franck Sajous
Contact :

GLAWI is available under a Creative Commons By-SA 3.0 license (the same license as Wiktionary, from which it has been extracted).
GLAWI's logo is designed by Darwin.

A description and examples of GLAWI's structure can be found in the online documentation.
More information can be found in the articles mentioned in the References section below.

G-PeTo (GLAWI Perl Tools) is a set of scripts we do provide in order to manipulate GLAWI and to extract specific information. The scripts can be used as is or they may be adapted to fit your needs.

Several versions of GLAWI are available: a "work" version is probably the one that most people need and a "dev" version includes information related to the extraction process. Both version are available with or without syntactic parsing of etymologies and definitions. The sizes given below correspond to the zipped file that you can download and to its size once uncompressed.

The four versions, released on 18/05/2016, are extracted from Wiktionnaire's 26/12/2015 dump.

Without syntactic parsing GLAWI_FR_work_D2015-12-26_R2016-05-18.xml.bz2
With syntactic parsing GLAWI_FR_workParsed_D2015-12-26_R2016-05-18.xml.bz2