1 GLAWI - documentation of POS element
REDAC
REsources Developed At CLLE-ERSS CLLE-ERSS research unit






GLAWI

GLAWI
Documentation
This documentation is based on (Sajous and Hathout, 2015) and (Hathout and Sajous, 2016).

POS sections

Description

The basic unit of Wiktionnaire's articles is the written form (grapheme). A given article may contain several entries having identical or distinct parts of speech (POSs). A POS section may correspond to a canonical form (lemma) or to an inflection.

The figure below (XML version of the mousse article) show that the structures of GLAWI's and Wiktionnaire's articles are very close. The written form corresponds to a feminine noun (lemma), two masculine nouns (lemmas), an adjective (lemma) and several inflected verbal forms.

The attributes and children of the pos elements are described below.



XML structure

pos <!ELEMENT pos (pronunciations?, inflectionInfos?, paradigm?, definitions?, subsection*, translations?)> <!ATTLIST pos type CDATA #IMPLIED homoNb CDATA #IMPLIED lemma (0|1) "1" locution (0|1) "0" gender (m|f|e) #IMPLIED number (s|p|sp|singulareTantum|pluraleTantum) #IMPLIED equivMasc CDATA #IMPLIED equivFem CDATA #IMPLIED demonym (0|1) "0" #IMPLIED inconsistentNumber CDATA #IMPLIED inconsistentGender CDATA #IMPLIED genderFromRefLexicon CDATA #IMPLIED numberFromRefLexicon CDATA #IMPLIED> Attributes:
  • type: main syntactic category (e.g. nom, verbe, adjectif, adverbe, pronom, etc.)
  • homoNb: homograph number, present when several POS sections correspond to the same syntactic category (e.g. three noun POS sections in the article mousse), absent otherwise.
  • lemma: 0 when the pos section corresponds to one or several inflected form(s), 1 otherwise
  • locution: 1 when the entry is a locution, 0 otherwise. Until recently, Wiktionnaire's wikicode provided contributors with specific templates to encode locutions , e.g. {{-loc-nom-|fr}}, {{-loc-adj-|fr}}, etc. Today, the mention of locutions is based on the presence of a blank character in the written form. See, e.g., the wikicode for nouvelle cuisine in 2012 and in in 2014. Since 2014, the same wikicode {{S|nom|fr}} produces the heading Locution nominale `noun phrase' in the article nouvelle cuisine while it produces the heading `noun' in the article cuisine
  • gender: gender of nouns and adjectives. Values are m (masculine), f (feminine) or e (epicene) when the feminine and the masculine forms are the same (e.g. the noun journaliste `journalist').
  • number: number of nouns and adjectives. Values are s (singular), p (plural), sp when the singular and the plural inflections have the same written form (e.g. the noun encas `snack'), singulareTantum when only the singular form exists (e.g. the nouns internet, heur `luck', chemical elements like nitroglycĂ©rol `ethylene glycol dinitrate' and languages/dialects like l'angevin, spoken in Angers area), or conversely pluraleTantum when only the plural form exists (e.g. the nouns obsèques `funeral', Ă©checs `chess' or arrhes `down payment').
  • inconsistentGender: used to indicate that two different genders are mentioned in Wiktionnaire (with no epicene mention)
  • inconsistentNumber: used to indicate that two different numbers are mentioned in Wiktionnaire (with no sp mention)
  • equivMasc: masculine equivalent of nouns (e.g. dĂ©veloppeur ‘male software developer‘ is the masculine equivalent of dĂ©veloppeuse ‘female software developer‘)
  • equivFem: feminine equivalent of nouns (e.g. tante ‘aunt‘ is the feminine equivalent of oncle ‘uncle‘)
  • demonym: present when the mention gentilĂ© ‘demonym‘ is found in Wiktionnaire
  • genderFromRefLexicon, numberFromRefLexicon: present when missing numbers or genders have been found in reference lexicons (12 cases only, all adjectives).
Child elements:
  • pronunciations: top-level pronunciations may occur out of POS sections (at the bottom of the page) and are encoded as children of text nodes. However, pronunciations may also be found in agreement templates and in the ligne de forme (in POS sections, header starting with the entry form). For example, ligne de forme of the first noun POS section for mousse is:
         mousse \mus\ fĂ©minin
    where \mus\ is the pronunciation of mousse.
    Such pronunciations are reported in pron children of pronunciations elements. Contrary to top-level pronunciations, no area is mentioned.
  • inflectionInfos: when a POS section describes one or several inflected forms, the inflectionInfos element enumerates the morphosyntactic features of the forms and their lemmas. See the corresponding page.
  • paradigm: when a POS section describes a lemma, the inflectional paradigm element and its inflection children give all the inflected forms of the paradigm (when they are present in Wiktionnaire). See the corresponding page.
  • definitions: see the corresponding page
  • subsection (morphology and lexical semantics) and translations elements are also described in a dedicated page.


Back to GLAWI's [ main documentation page ] [ project page ]