PsychoGLÀFF a large-scale inflectional French lexicon designe for psycholinguistics
Description
PsychoGLÀFF is a French large-scale inflectional lexicon grounded o Wiktionnaire,
the French language edition of Wiktionary.
It is derived from GLĂ€FF and is especially oriented towards psycholinguistics.
PsychoGLĂ€FF has the following structure:
Each line describes an entry.
An entry includes fields separated by the | character :
the wordform
the morphosyntactic tag in GRACE format
the lemma
the phonological transcription encoded in IPA
the phonological transcription encoded in SAMPA
the absolute frequency of the categorized form in Frantext 20e corpus
the relative frequency (per million words) of the categorized form in Frantext 20e corpus
the absolute frequency of the categorized lemma in Frantext 20e corpus
the relative frequency (per million words) of the categorized lemma in Frantext 20e corpus
the absolute frequency of the categorized form in LM10 corpus
the relative frequency (per million words) of the categorized form in LM10 corpus
the absolute frequency of the categorized lemma in LM10 corpus
the relative frequency (per million words) of the categorized lemma in LM10 corpus
the absolute frequency of the categorized form in FrWac corpus
the relative frequency (per million words) of the categorized form in FrWac corpus
the relative frequency (per million words) of the categorized form in FrWac corpus
the absolute frequency of the categorized lemma in FrWac corpus
the length of the wordform (number of characters)
the length of the phonological transcription (number of phonemes)
the syllabification and the CV structure of the word
the number of syllables
the ratio between the number of syllables and the consonants composing the phonological form
the geometric mean of the conditional character probabilities of bigrams (calculated on the wordform)
the geometric mean of the conditional character probabilities of trigrams (calculated on the wordform)
the geometric mean of the conditional character probabilities of 4-grams (calculated on the wordform)
the geometric mean of the conditional phoneme probabilities of bigrams (calculated on the phonological transcription)
the geometric mean of the conditional phoneme probabilities of trigrams (calculated on the phonological transcription)
the geometric mean of the conditional phoneme probabilities of 4-grams (calculated on the phonological transcription)
Quantitative analysis concerning the orthographic and phonological neighborhood of the wordform are currently being implemented
and will be released in the next version.
Basilio Calderone, Nabil Hathout and Franck Sajous. (2014).
From GLÀFF to PsychoGLÀFF: a large psycholinguistics-oriented French lexical resource.
Proceedings of the 16th EURALEX Conference. Bolzano, Italy.
[ PDF ] [ Bibtex ]