ANNODIS_me.zip contains the annotated Glozz files (.aa + .ac + .aam + .as) after normalisation and cleaning: ANNODIS_me.zip
For downloading the ANNODIS_me corpus without annotations (xml and Glozz files), please see below the Records of ANNODIS_me section.
The annotated structures
The following files allow direct access to the annotated structures, which can be selected on the basis of specific properties, and viewed in context:
ANNODIS_SE.xml contains all SE-type structures (Enumerative Structures) with descriptive properties and annotated cues ( readmeSE.xml describes the annotations contained in this file);
ANNODIS_CT.xml contains all CT-type structures (Topical Chains) with descriptive properties and annotated cues (readmeCT.xml describes the annotations contained in this file).
To explore these files, click on the following link: ANNODIS_me browser.
The exploration and visualisation of the annotated structures calls upon files which are in the ANNODIS_me browser archive.
Records of ANNODIS_me
corpus constitution and pre-processing
1_ANNODIS_me_Original.zip: the original documents (WIK1 and WIK2: html format; GEOP and LING: pdf format)..
2_ANNODIS_me_XML.zip: the documents in xml format conforming to TEI-P5 (including TEIP5.dtd) with transcriptTEI.xsl style-sheet designed to allow visualisation with layout respecting the original documents' typographical contrasts and disposition.
3_ANNODIS_me_GlozzFiles.zip:
the texts made ready for annotation via the Glozz annotation tool (text files= filename.ac and standoff annotations including
layout and premarked features = filename.aa) PLUS the Glozz annotation model
(ANNODIS_me.aam) and the Glozz stylesheet
(ANNODIS_me.as) required for annotating and highlighting premarked and annotated features
and structures (see the annotation manual).
6_ANNODIS_me_AfterAutoClean.zip: Glozz files after 3 automatic procedures: 1) deletion of unattached units; 2) normalisation of feature structures for cue annotation; 3) flagging of unknown annotations for manual categorisation
7_ANNODIS_me_Gold.zip:
a Gold standard version for the subset of texts that were multi-annotated.