The Tools


The tools we used for this research are now part of the Computational Linguistics Toolset

The tools are free (licensed under the General Public License). You get the entire tool-package (containing the newest version of all Computational Linguistics Toolset tools and the library) [in one download]

The tools we made and used for this research are:

Corpus

CorpusGoall
run parts of the entire corpus task in one go

CorpusSplitter
split an one-file corpus in bits

CorpusStripper
strip unwanted elements from a corpus

CorpusRefiner
further refine (sentence-split, remove whitesp)

CorpusMetaCollector
collect meta-data from a corpus (dividing reqs it)

CorpusDivider
divide into age-groups for example (reqs metadata)

Corpus2Wordrow
turn into the wordrow format (examine tools req it)

CorpusSampler
take a random sample from a corpus

Tagging

TaggingGoall
run parts of the entire tagging task in one go

Corpus2TnT
move a refined corpus to the TnT-format

UniteCorpus
unite a TnT-ready corpus to prepare it for training

TrainTnT
train TnT on a tagged corpus

TagTnT
tag a corpus using TnT

TnT2TagStat
move a tagged corpus to the statistics-format

Statistics

PermStatGoall
run parts of the entire stat task in one go

NgramPermutator
permutates the n-grams

PermutationStatter
applies normalizations and measures to the data

PermutationStatTabler
rewrites the results to tab-delimited format

PermStatResultSelector
selects significant and sorted top results

Examine

CorpusExaminer
examines a corpus

CorpusDebugger
debugs a stripped corpus

RowStatter
stats a collection of wordrow or tagrow files

TagSampleFinder
finds samples of POS-tag ngrams

TagSameStatter
tests the prestations of an automatic tagger (TnT)
Part of the LogiLogi Network: The LogiLogi Foundation - LogiLogi.org - OgOg.org
This is an old version for archival purposes, see www.LogiLogi.org for the current version.
< Edit this document | View history | Printer friendly (inc. links) >
Visited 2106 times
Document last modified Sun, 21 May 2006 22:21:12
All content is available under the GNU Free Documentation License. The LogiLogi-system is under the GPL
SourceForge.net Logo Zylon Internet Services-Groningen Logo
Visitor statistics