1 ENGLAWI - documentation of POS element
REDAC
REsources Developed At CLLE CLLE: Cognition, Langues, Langage, Ergonomie






ENGLAWI

ENGLAWI's documentation
POS sections

POS sections

Description

The basic unit of Wiktionary's articles is the written form. A given article may contain several entries having identical or distinct parts of speech (POS). A POS section may correspond to a canonical form (lemma) or to an inflection.

The figure below (XML version of the home article) show that the structures of GLAWI's and Wiktionary's articles are very close. The written form corresponds to a noun, a verb, an adjective and an adverb, all of them corresponding to a lemma.

The attributes and children of the pos elements are described below.



XML structure

pos <!ELEMENT pos (inflectionInfos?, paradigm?, definitions?, usageNotes?, section*, translations?)> <!ATTLIST pos type (abbreviation|acronym|adjective|adverb|affix|article|conjunction|contraction|determiner|idiom |infix|initialism|interjection|letter|noun|number|numeral|participle|particle|phrase |postposition|prefix|preposition|prepositionalPhrase|pronoun|properNoun|proverb |punctuationMark|suffix|symbol|verb) #REQUIRED etymNb CDATA #IMPLIED lemma (0|1) #REQUIRED misspelling (0|1) #IMPLIED uncountable (0|1) #IMPLIED countable_uncountable (0|1) #IMPLIED generally_uncountable (0|1) #IMPLIED obsoleteOrArchaic (0|1) #IMPLIED reg (0|1) #IMPLIED nonComparable (0|1) #IMPLIED pluralNotAttested (0|1) #IMPLIED pluralOnly (0|1) #IMPLIED generallyPlural (0|1) #IMPLIED labels CDATA #IMPLIED> Attributes:
  • All:
    • type: main syntactic category (e.g. noun, verb, adjective, adverb, pronoun, etc.)
    • etymNb: in case of multiple etymologies, each etymology tag is numbered and POS sections refer to a given etymology index number.
    • lemma: 0 when the pos section corresponds to one or several inflected form(s), 1 otherwise
    • misspelling: 1 when the pos corresponds to a common misspelling of a word (e.g. accomodate instead of accommodate, de rigeur instead of de rigueur), 0 otherwise
    • obsoleteOrArchaic: 1 when the pos section corresponds to an archaic term (e.g. abalienation) of form (e.g. asleepe for asleep), 0 otherwise
    • labels: doc missing
  • Nouns:
    • uncountable: 1 when the noun is not countable (e.g. news, feedback, etc.), 0 otherwise. Note: countable nouns are unmarked.
    • generally_uncountable: 1 when the noun is generally not countable, but the plural form may be used is some circumstances (e.g. nonsense, negligence, information, etc.), 1 otherwise.
    • countable_uncountable: 1 when a polysemous noun may be countable and uncountable (e.g. beer), 0 otherwise.
    • pluralOnly: 1 indicates pluralia tantum (e.g. alluvials). Absent otherwise.
    • pluralNotAttested: 1 indicates that the plural form of a noun is not attested (e.g. nomadship). Absent otherwise.
    • generallyPlural: 1 indicates nouns that are mostly used in the plural form. (e.g. French fries). Absent otherwise.
  • Adjectives:
    • nonComparable (for lemmas): 1 if the adjective is not comparable (e.g. aforewritten), 0 otherwise.
  • Verbs:
    • reg (for lemmas of verbs): 1 if the verb is regular, 0 otherwise.
Child elements:
  • inflectionInfos: when a POS section describes an inflected form, the inflectionInfos provides information about the type of inflection and its lemma. See the corresponding page.
  • paradigm: when a POS section describes a lemma, the inflectional paradigm element and its inflection children give all the inflected forms of the paradigm (when they are present in Wiktionary). See the corresponding page.
  • definitions: see the corresponding page
  • usageNotes: see the corresponding page
  • Morphology and lexical semantics sections are described in a dedicated page.
  • translations are also described in a dedicated page.


Back to ENGLAWI's [ main documentation page ] [ project page ]