This documentation is based on (Sajous and Hathout, 2015)
and (Hathout and Sajous, 2016)
GLAWI is a free French Machine-Readable Dictionary encoded in XML format.
It is a structured and normalized version of Wiktionnaire (the French language edition of Wiktionary).
This dictionary includes: simple words, compounds and multiword expressions;
inflected forms and lemmas;
pronunciations in API;
definitions (glosses and examples);
This page describes how the information encoded in GLAWI is structured.
Back to [ GLAWI's main page ]
- Root element: glawi
glawi is the root element.
It has three attributes:
- lang is the Wiktionary's language edition on which the resource is grounded.
Here, fr denotes Wiktionary's French language edition (a.k.a. Wiktionnaire).
- dateDump is the version of Wiktionary's dump used to build GLAWI.
Here, 2015-12-26 refers to Wiktionnaire's dump released on the 26th December 2015.
- endParsingDate indicates when this version of GLAWI have been produced
(this attribute may be used as a version identifier).
The root element contains the articles of the dictionary.
This element corresponds to a page (URL) of Wiktionnaire.
The basic unit of Wiktionnaire's articles is the written form (or grapheme).
A given article may contain several entries having distinct or identical parts of speech (POSs).
A POS section may correspond to a canonical form (i.e. a lemma) or an inflection.
A page identifier (an integer, as found in the dump).
The article's entry/written form, which corresponds to Wiktionnaire's associated web page.
Metadata, which may be a mix of various optional elements:
Wiktionnaire has been primarily bootstrapped by automatic imports from editions of dictionaries fallen into the public domain:
mostly the 8th edition (1932-1935) of the Dictionnaire de l'Académie française (DAF8)
and the 2nd edition (1872-1877) of the Littré.
The import element is used to mention such import.
Reference to another resource: this field is used by contributors to indicate that she/he consulted a given resource when editing an article.
Such resources may be online or printed dictionaries, specialized websites, etc.
Just as in Wikipedia, categories are manually assigned to pages in Wiktionary.
GLAWI's category elements indicate the categories an article belongs to (if any).
This element indicates that a written form is a spelling variant of another one (e.g. nénuphar/nenufar `water lily')
See example of spelling variations.
Example of a meta section for the article nénuphar:
It may include pronunciation elements,
one or several pos (part of speech)
and various subsection elements.
Summary of children elements of text:
Elements without link are described inside the parent element's description.