|
ENGLAWI
ENGLAWI
Documentation
ENGLAWI is a free English Machine-Readable Dictionary encoded in XML format.
It is a structured and normalized version of Wiktionary.
This dictionary includes: simple words, compounds and multiword expressions;
inflected forms and lemmas;
etymologies;
pronunciations in API;
definitions (glosses and examples);
translations;
semantic relations;
morphological relations;
spelling variations.
This page describes how is structured the information encoded in ENGLAWI.
- Root element: glawi
glawi is the root element.
It has three attributes:
- lang is the Wiktionary's language edition on which the resource is grounded.
Here, en means Wiktionary's English language edition.
- dateDump is the version of Wiktionary's dump used to build ENGLAWI.
Here, 2017-06-01 refers to Wiktionary's dump released on the 1st June 2017.
- endParsingDate indicates when this version of ENGLAWI have been produced
(this attribute may be used as a version identifier).
Example:
The root element contains the dictionary's articles.
- article
This element corresponds to a page (URL) of Wiktionary.
The basic unit of Wiktionary's articles is the written form (or grapheme).
A given article may contain several entries having distinct or identical parts of speech (POSs).
A POS section may correspond to a canonical form (i.e. a lemma) or an inflection.
- pageId
A page identifier (an integer, as found in the dump).
- title
The article's entry/written form, which corresponds to Wiktionary's associated web page.
- meta
Metadata, which may be a mix of two optional elements:
- reference
Reference to another resource: this field is used by contributors to indicate that she/he consulted a given resource when editing an article.
Such resources may be online or printed dictionaries (e.g. 5th edition of the OED, 1976 edition of the Merriam Webster, specialized websites, etc.)
- category
Just as in Wikipedia, categories are manually assigned to pages in Wiktionary.
ENGLAWI's category elements indicate the categories an article belongs to (if any).
Categories may correspond to domains (Golf, Anatomy), specific categories of words (English words suffixed with -ism, English basic words), etc.
Categories may occur in regular articles or in Wiktionary's thesaurus (Wikisaurus). A category found in Wikisaurus is signaled by the attribute wikisaurus.
For example, the article LSD
belongs to Wiktionary's category "Recreational drugs"
and is assigned to the "Intoxication and intoxicants"
category in Wikisaurus.
The resulting XML is the following:
Example of a meta section for the article abbey:
- text
Article's content.
It may include pronunciation elements,
etymologies,
one or several pos (part of speech) section(s)
and a section that enumerates alternative forms.
Summary of children elements of text:
Elements without link are described inside the parent element's description.
Back to [ ENGLAWI's main page ]
|