REDAC
REsources Developed At CLLE CLLE research unit






ENGLAWI

ENGLAWI
an ENglish Great Lexicon for Accessing Wiktionary Information
Description
ENGLAWI is an English Machine-Readable Dictionary encoded in XML format. It is a structured and normalized version of Wiktionary.
The dictionary includes:
  • simple words, compounds and multiword expressions
  • inflected forms and lemmas
  • etymologies
  • pronunciations in API
  • definitions (glosses and examples)
  • translations
  • semantic relations
  • morphological relations
  • spelling variations
ENGLAWI is supplied with G-PeTo, a series of Perl Scripts intended to help extract information from the large XML file. Ready-to-use lexicons that have been extracted from ENGLAWI are also provided (see the download section below).


Developers
Franck Sajous, Basilio Calderone and Nabil Hathout

Person in charge
Franck Sajous
Contact :

License/Credit
ENGLAWI is available under a Creative Commons By-SA 3.0 license (the same license as Wiktionary, from which it has been extracted).
The GLAWI logo is designed by Darwin.

Documentation
A description the ENGLAWI structure can be found in the online documentation as well as examples of the dictionary content.


Download
The current version is extracted from Wiktionary's 01/06/2017 dump.
Resources derived from ENGLAWI:
  • DIVAE: DIatopic VAriation of English
  • WIND: Wiktionary INclusion Dates
  • ENGLAFF: an Inflectional Lexicon extracted from ENGLAWI

References
  • Franck Sajous, Basilio Calderone and Nabil Hathout (2020). ENGLAWI: From Human- to Machine-Readable Wiktionary. Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, pp. 3016-3026. [ PDF ] [ Bibtex ]