REDAC
REsources Developed At CLLE-ERSS CLLE-ERSS research unit






GLAWI

GLAWIEnglish version
GLÀFF and WiktionaryX
Description
GLAWI is a French Machine-Readable Dictionary encoded in XML format. It is a structured and normalized version of Wiktionnaire (the French language edition of Wiktionary).
This dictionary includes:
  • simple words, compounds and multiword expressions
  • inflected forms and lemmas
  • etymologies
  • pronunciations in API
  • definitions (glosses and examples)
  • translations
  • semantic relations
  • morphological relations
  • spelling variations
A description of the resource's structure, and information about the conversion process can be found in (Sajous and Hathout, 2015) and (Hathout and Sajous, 2016).


Developers
Franck Sajous, Nabil Hathout and Basilio Calderone

Person in charge
Franck Sajous
Contact :

License/Credit
GLAWI is available under a Creative Commons By-SA 3.0 license (the same license as Wiktionary, from which it has been extracted).
GLAWI's logo is designed by Darwin.

Documentation
A description and examples of GLAWI's structure can be found in the online documentation.
More information can be found in the articles mentioned in the References section below.

Download
Several versions of GLAWI are available: a "work" version is probably the one that most people need and a "dev" version includes information related to the extraction process. Both version are available with or without syntactic parsing of etymologies and definitions. The sizes given below correspond to the zipped file that you can download and to its size once uncompressed.

The four versions, released on 18/05/2016, are extracted from Wiktionnaire's 26/12/2015 dump.

WORKDEV
DTDDTD_GLAWI_work.dtdDTD_GLAWI_dev.dtd
Without syntactic parsing GLAWI_FR_work_D2015-12-26_R2016-05-18.xml.bz2
(81Mb/1.7Gb)
GLAWI_FR_dev_D2015-12-26_R2016-05-18.xml.bz2
(118Mb/2.2Gb)
With syntactic parsing GLAWI_FR_workParsed_D2015-12-26_R2016-05-18.xml.bz2
(214Mb/3.4Gb)
GLAWI_FR_devParsed_D2015-12-26_R2016-05-18.xml.bz2
(251Mb/3.9Gb)


References
  • Nabil Hathout and Franck Sajous. (2016). Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary. Proceedings of the tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 136-1376, Portorož, Slovenia. [ PDF ] [ Bibtex ]
  • Franck Sajous and Nabil Hathout. (2015). GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary. Proceedings of the eLex 2015 conference, pp. 405-426, Herstmonceux, England. [ PDF ] [ Bibtex ]
  • Nabil Hathout, Franck Sajous and et Basilio Calderone. (2014). Acquisition and enrichment of morphological and morphosemantic knowledge from the French Wiktionary. Proceedings of the COLING Workshop on Lexical and Grammatical Resources for Language Processing, pp. 65-74, Dublin, Ireland. [ PDF ] [ Bibtex ] [ Dataset ]