Some rights are reserved.
Program |
Description |
Requires |
• splitGLAWI.pl |
splits the big GLAWI file into several files of smaller size
Command line: perl splitGLAWI.pl GLAWI.xml size(Mo) dstDir filePrefix
Example: splitGLAWI.pl GLAWI.xml 100 /tmp/SPLITS/ glawiSplit
→ produces files of size equal to 100 Mo, located in directory /tmp/SPLITS/ , named filePrefix-1.xml, filePrefix-2.xml ... filePrefix-N.xml
|
- |
• extractArticle.pl |
extracts a single article matching (exact match) a given title
Command line: perl extractArticle.pl GLAWI.xml title [outFile]
Example: perl extractArticle.pl GLAWI.xml dictionnaire dict.xml
→ extracts the article "dictionnaire"
|
- |
• extractArticles.pl |
extracts articles whose titles match the specified regexp.
Command line: perl extractArticles.pl GLAWI.xml regexp [outFile]
Example: perl extractArticles.pl GLAWI.xml "^anti" anti.xml
→ extracts all entries starting with the anti- prefix
|
- |
• extractTitles.pl |
Same as above, extracts titles only instead of articles.
|
- |
• extractArticlesWithLabelValue.pl |
extracts articles having a definition including a label whose value matches the specified one (whatever its type).
The label value is to be matched against a case-insensitive regexp.
Command line: perl extractArticlesWithLabelValue.pl GLAWI.xml labelValueRegexp [outFile]
Examples: perl extractArticlesWithLabelValue.pl GLAWI.xml "^vieilli\$" dated.xml
→ extracts articles with at least one gloss including a vieilli (dated) label value.
extractArticlesWithLabelValue.pl GLAWI.xml "^chimie\$" chemistry.xml
→ extracts articles with at least one gloss related to the chimie (chemistry) domain.
|
- |
• extractTitlesWithLabelValue.pl |
extracts titles of articles having a definition including a label whose value matches the specified one (whatever its type).
The label value is to be matched against a case-insensitive regexp.
Command line: perl extractTitlesWithLabelValue.pl GLAWI.xml labelValueRegexp [outFile]
Example: perl extractTitlesWithLabelValue.pl GLAWI.xml "^vieilli\$" dated.xml
→ extracts article's titles with at least one gloss including a vieilli (dated) label value.
|
- |
• extractTitlesWithLabelValueAllSenses.pl |
same as the previous script (extractTitlesWithLabelValue.pl) but the label has to be found in every gloss, i.e. the label is found in the gloss of a monosemic entry
or the label marks all the glosses of a polysemic entry (for a given POS).
The label value is to be matched against a case-insensitive regexp.
Command line: perl extractTitlesWithLabelValueAllSenses.pl GLAWI.xml labelValueRegexp [outFile]
Example: perl extractTitlesWithLabelValueAllSenses.pl GLAWI.xml "^vieilli\$" dated.xml
→ extracts article's titles with all glosses including a vieilli (dated) label value.
|
- |
• extractGlossWithLabelValue.pl |
extracts glosses including a label whose value matches the specified one (whatever its type).
The label value is to be matched against a case-insensitive regexp.
Command line: perl extractGlossWithLabelValue.pl GLAWI.xml labelValueRegexp [outFile]
Example: perl extractGlossWithLabelValue.pl GLAWI.xml "^péjoratif\$" pej.xml
→ extracts glosses with at pĂ©joratif label value.
Output format: article's title TAB gloss
Example: moscoutaire | Communiste qui ne jure que par l'Union soviétique. |
cartelliste | Relatif un cartel, une entente. |
encagoulé | Terroriste qui porte une cagoule. |
|
- |
• extractGlossMatchingCriteria.pl |
extracts glosses including a given word (to be matched against a case-insensitive regexp).
Command line: perl extractGlossMatchingCriteria.pl [-H] [-l] [-w word] [-f wordsFile] [-p POS] GLAWI.xml [outFile]
Options:
-H : HTML formatted output
-l: lemmas' glosses only
-p POS : selects only glosses within a given POS section type (regexp match)
-w word : glosses matching against word (regexp match) are selected
-f labelValuesFile : file including a list of words to be found in glosses (UTF-8 text file, one value per line, regexp allowed).
Example: perl extractGlossMatchingCriteria.pl -H -l -p "nom" -w "anti.*" GLAWI.xml anti.html
→ extracts nouns' glosses (lemmas only) including a word starting with anti and outputs the HTML-formatted result into file anti.html.
|
Getopt::Std |
• extractArticlesMatchingCriteria.pl |
extracts articles (or articles' titles) matching a set of criteria.
Command line: perl extractArticlesMatchingCriteria.pl [OPTIONS] GLAWI.xml [outFile]
Options:
-e : outputs only entries' titles instead of the whole articles
-t regexp : selects only articles whose titles match against the specified regexp (case-insensitive)
-p POS : selects only articles having a given syntactic category (equal to POS)
-c labelCategory : selects only articles having a gloss definition including a label whose category equals labelCategory (whatever the label's value)
-v labelValue : selects only articles having a gloss definition including a label whose value equals labelValue (whatever the label's category)
-f labelValuesFile : selects only articles having a gloss definition with a label whose value is included in the specified file (UTF-8 text file, one value per line, regexp allowed). Example file.
When both -c and -v options are used, they are intended to apply on the same label.
When -c and/or -v are used together with the -p option, the targeted label is to be found in a POS section of the specified type.
-v and -f options are mutually exclusive.
Examples:
- perl extractArticlesMatchingCriteria.pl -v 'rare' -p verbe GLAWI.xml
→ extracts the articles with verbs that include a rare label (value) in one of their glosses.
- perl extractArticlesMatchingCriteria.pl -e -p verbe -f computerScienceLabelValues.txt GLAWI.xml csVerbs.txt
→ extracts titles of verb entries having a gloss definition with a label whose value is included in the computerScienceLabelValues.txt file.
Output is written in file csVerbs.txt .
|
Getopt::Std , XML::DOM |