This documentation is based on
(Sajous and Hathout, 2015)
and
(Hathout and Sajous, 2016).
GLAWI is a free French Machine-Readable Dictionary encoded in XML format.
It is a structured and normalized version of Wiktionnaire (the French language edition of Wiktionary).
This dictionary includes: simple words, compounds and multiword expressions;
inflected forms and lemmas;
etymologies;
pronunciations in API;
definitions (glosses and examples);
translations;
semantic relations;
morphological relations;
spelling variations.
This page describes how the information encoded in GLAWI is structured.
- Root element: glawi
glawi is the root element.
It has three attributes:
- lang is the Wiktionary's language edition on which the resource is grounded.
Here, fr denotes Wiktionary's French language edition (a.k.a. Wiktionnaire).
- dateDump is the version of Wiktionary's dump used to build GLAWI.
Here, 2015-12-26 refers to Wiktionnaire's dump released on the 26th December 2015.
- endParsingDate indicates when this version of GLAWI have been produced
(this attribute may be used as a version identifier).
Example:
The root element contains the articles of the dictionary.
- article
This element corresponds to a page (URL) of Wiktionnaire.
The basic unit of Wiktionnaire's articles is the written form (or grapheme).
A given article may contain several entries having distinct or identical parts of speech (POSs).
A POS section may correspond to a canonical form (i.e. a lemma) or an inflection.
- pageId
A page identifier (an integer, as found in the dump).
- title
The article's entry/written form, which corresponds to Wiktionnaire's associated web page.
- meta
Metadata, which may be a mix of various optional elements:
- import
Wiktionnaire has been primarily bootstrapped by automatic imports from editions of dictionaries fallen into the public domain:
mostly the 8th edition (1932-1935) of the Dictionnaire de l'Académie française (DAF8)
and the 2nd edition (1872-1877) of the Littré.
The import element is used to mention such import.
- reference
Reference to another resource: this field is used by contributors to indicate that she/he consulted a given resource when editing an article.
Such resources may be online or printed dictionaries, specialized websites, etc.
- category
Just as in Wikipedia, categories are manually assigned to pages in Wiktionary.
GLAWI's category elements indicate the categories an article belongs to (if any).
- spellingVariation
This element indicates that a written form is a spelling variant of another one (e.g. nénuphar/nenufar `water lily')
See example of spelling variations.
Example of a meta section for the article nénuphar:
- text
Article's content.
It may include pronunciation elements,
an etymology,
one or several pos (part of speech)
and various subsection elements.
Summary of children elements of text:
Elements without link are described inside the parent element's description.
Back to [ GLAWI's main page ]