REsources Developed At CLLE-ERSS CLLE-ERSS research unit


This documentation is based on (Sajous and Hathout, 2015) and (Hathout and Sajous, 2016).



In Wiktionnaire, 85% of the pages dedicated to lemmas include an etymology section. This section may provide information such as:

  • an attestation date (year or century);
  • a source language;
  • a morphemic decomposition.

As definitions' glosses and examples, etymologies are available under 4 versions:

  • the original wikicode;
  • a plain text version;
  • an XML version that formally encodes specific information;
  • a syntactic parsing of the text.
The figure below depicts the etymology for monoxyle ‘dugout’. This example illustrates that etymology sections may mention an attestation date (1759), a source language (Ancient Greek) and a morphemic decomposition (mono-|-xyle).

XML structure

etymology <!ELEMENT etymology (etym)*>

The etymology element corresponds to Wiktionnaire's etymology section that can be found at the beginning of the articles. Etymology sections are not specific to a given POS section. When an article describes several homonyms or contains different POS sections, the etymology section may mention several distinct etymologies. Every such etymology is encoded by the etym element described hereafter.


<!ELEMENT etym (labels?, wiki?, xml, txt, parsed?)> Children elements are described in following pages:
Back to GLAWI's [ main documentation page ] [ project page ]