ENGLAWI an ENglish Great Lexicon for Accessing Wiktionary Information
Description
ENGLAWI is an English Machine-Readable Dictionary encoded in XML format.
It is a structured and normalized version of Wiktionary.
The dictionary includes:
simple words, compounds and multiword expressions
inflected forms and lemmas
etymologies
pronunciations in API
definitions (glosses and examples)
translations
semantic relations
morphological relations
spelling variations
ENGLAWI is supplied with G-PeTo, a series of Perl Scripts intended to help extract information from the large XML file. Ready-to-use lexicons that have been extracted from ENGLAWI are also provided (see the download section below).
ENGLAWI is available under a Creative Commons By-SA 3.0 license
(the same license as Wiktionary, from which it has been extracted). The GLAWI logo is designed by Darwin.
Documentation
A description the ENGLAWI structure can be found in the online documentation as well as examples of the dictionary content.
Download
The current version
is extracted from Wiktionary's 01/06/2017 dump.
ENGLAFF: an Inflectional Lexicon extracted from ENGLAWI
References
Franck Sajous, Basilio Calderone and Nabil Hathout (2020).ENGLAWI: From Human- to Machine-Readable Wiktionary.
Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020),
Marseille, France, pp. 3016-3026.