ANNODIS_me browser for exploring and visualising the annotated structures
What are "multi-level structures"?
Structures which may appear at different granularity levels,
including very high levels, and therefore of interest as building
blocks in the construction of text. The structures annotated in the
ANNODIS resource form segments which extend from 2 sentences up to
several sub-sections.
What multi-level structures have been annotated?
Enumerative structures + cues signalling them
Definition:
enumerative structures (ESs in short) are
segments resulting from a text organisation strategy whereby text
elements are presented as having equal status with regard to a specific
interpretation criterion (co-enumerability criterion). They are
characterised by an internal structure involving the following
sub-segments:
a trigger (optional): segment which introduces the enumeration ;
several items: segments which make up the enumeration (at least two items must be identified for a structure to be annotated) ;
a closure (optional): segment which summarises or closes the enumeration.
The annotated objects are :
the trigger (if there is one)
the items (at least two)
the closure (if there is one)
the enumeraTheme (if there is one), i.e. the expression specifying the co-enumerability criterion
the cues associated with the four preceding objects
finally the enumerative structure itself, i.e. the segment which contains these objects.
Topical chains + cues signalling them
Definition:
topical chains (TCs in short) are a specific form of cohesive chains, i.e. topically homogeneous segments
composed of sentences containing topical co-referential expressions.
The annotated objects are:
a segment called "segment"
the associated topical continuity cues.
NB: TCs may contain sentences which are not topically connected (e.g.
comments, illustrations, etc.) if they occur between connected units.
Overview of the corpus annotated with multi-level structures
Corpus
Nb ESs
ES cues
Nb TCs
TC cues
WIKI (Wikipedia, 30 articles, 231,000 words)
401
2,210
266
1,853
LING (CMLF08, 25 articles, 169,000 words)
297
1,230
88
478
GEOP (IFRI, 32 articles, 266,000 words)
293
1,209
234
1,125
991
4,649
588
3,456
Publications
Colléter M., Fabre C., Ho-Dac L.-M., Péry-Woodley M.-P., Rebeyrolle J. & Tanguy L. (2012). La ressource ANNODIS multi-échelle : guide d'annotation et bonus, Carnets de grammaires 20, CLLE-ERSS. [ Article online ]
Ho-Dac L.-M., Fabre C., Péry-Woodley M.-P., Rebeyrolle J. & Tanguy L. (2012).
An empirical approach to the signalling of enumerative structures, Discours 10.
[ Article online ]
Ho-Dac L.-M., Péry-Woodley M.-P., Tanguy L. (2010).
Anatomie des structures énumératives, TALN 2010, ATALA, Université de Montréal, Montréal, July, 2010.
Ho-Dac, L.-M., Fabre, C., Péry-Woodley, M.-P., & Rebeyrolle, J. (2010).
On the signalling of multi-level discourse structures, MAD 2010: Multidisciplinary Perspectives on Signalling Text Organisation,
Moissac (France) 17-20 mars 2010 (2010), pp. 94-105
Conferences
Ho-Dac L.-M., Fabre C., Péry-Woodley M.-P., Rebeyrolle J. & Tanguy L. (2011).
High-level discourse structures : Topical Chains and Enumerative Structures in a diversified annotated corpus,
Corpus Linguistics, Birmingham, 2011.
Ho-Dac, M., Fabre, C., Péry-Woodley, M.-P., & Rebeyrolle, J. (2009).
Des indices aux marqueurs : méthodes de découverte de marqueurs discursifs complexes,
Linguistic and Psycholinguistic Approaches to Text Structuring, Paris (2009, 21-23 septembre 2009).
Ho-Dac, M., Fabre, C., Péry-Woodley, M.-P., & Rebeyrolle, J. (2009).
Corpus annotation of macro discourse structures,
1st International Conference on Corpus Linguistics, CILC-09, Murcia (2009).