This documentation is based on (Sajous and Hathout, 2015) and (Hathout and Sajous, 2016).



Inside pos tags, the definitions (plural) element may include several definition (singular) children, each describing a word sense. A definition contains a gloss and possibly one or more usage examples. Definitions may include labels that give attitudinal, diatopic, diachronic, diafrequential information or indicate that the word belongs to a specialized language.

Each gloss and example is available under 4 versions:

  • the original wikicode;
  • a plain text version;
  • an XML version that formally encodes specific information: markups encode wiki typesetting (boldface, italic, etc.), dates, foreign words, mathematical/chemical formulae and external/inner links. See further description;
  • a syntactic parsing of the text in CoNLL format produced by the Talismane parser.

XML structure

<!ELEMENT definitions (definition)*> <!ELEMENT definition (gloss?, example*)> <!ELEMENT gloss (labels?, wiki?, xml, txt, parsed?)> <!ELEMENT example (labels?, wiki?, xml, txt, parsed?)> <!ELEMENT labels (label)*> <!ELEMENT label EMPTY> <!ATTLIST label type (attitudinal|diachronic|diafrequential|diatopic|domain|gram|loan|other|sem|usage) "other" value CDATA #REQUIRED> The description of wiki, xml, txt and parsed is available here.
Labels are also described in a separate page.

