Computational Linguistics Toolset v1.1.5


A set of tools for doing Permutation Statistics on corpora, and for other computational linguistics tasks (like corpus cleaning, examination, and sensing using WordNet).

The tools are free (licensed under the General Public License). You get the entire tool-package (containing the newest version of all fiauimenre tools and the library) [in one download]. The tools are each documented, also a general [readme-file] is included. You can always e-mail me if something is still unclear (or if you found a bug).

A large number of these tools have been used for the Finnish Australian Immigrants Research. The Goall scripts are still configured for their usage in that research.

The core sensing tools for disambiguating using the [WordNet Similarity tools by Ted Pedersen], have been completed. They were built for speed (about 10 times faster). Actually some optimalizations I made for this have now also been included in the WordNet Similarity package (v 0.16). However still one part that these tools use is not yet included, the dbreader measure, you can get it here.

Corpus

CorpusGoall
run parts of the entire corpus task in one go

CorpusSplitter
split an one-file corpus in bits

UniteCorpus
unites a corpus (used because TnT needs it so)

CorpusStripper
strip unwanted elements from a corpus

CorpusRefiner
further refine (sentence-split, remove whitesp)

CorpusMetaCollector
collect meta-data from a corpus (dividing reqs it)

CorpusDivider
divide into age-groups for example (reqs metadata)

Corpus2Wordrow
turn into the wordrow format (examine tools req it)

Corpus2Tagrow
turn into the tagrow format (sensing tools req it)

CorpusCompoundify
compoundify a corpus based on a list of compounds

CorpusLeciconReducer
reduce to only words in a list or remove stopwords

CorpusTagsetReducer
reduce the tagset of a corpus

CorpusSampler
take a random sample from a corpus

CorpusRewriteTagrow
rewrites to another tagset (sensing tools req it)

Tagging

TaggingGoall
run parts of the entire tagging task in one go

Corpus2TnT
move a refined corpus to the TnT-format

TrainTnT
train TnT on a tagged corpus

TagTnT
tag a corpus using TnT

TnT2TagStat
move a tagged corpus to the statistics-format

Permutation Statistics

PermstatGoall
run parts of the entire stat task in one go

NgramPermutator
permutates the n-grams

PermutationStatter
applies normalizations and measures to the data

PermutationStatTabler
rewrites the results to tab-delimited format

PermStatResultSelector
selects significant and sorted top results

Sensing

SensingGoall
runs an entire sensing research, contains examples

WordCombinationFinder
finds all combinations of words within a window

ListSenser
assigns similarities using the lesk measure

SentenceSenser
disambiguates a sentence

SemanticGravitor
calculates "semantic gravity" for POS-tags or words

Examine

ExamineGoall
runs an entire examine research, contains examples

CorpusExaminer
examines a corpus

CorpusDebugger
debugs a stripped corpus

RowStatter
stats a collection of wordrow or tagrow files

RowChecker
checks that 2 row-files have the same number of elements

TagSampleFinder
finds samples of POS-tag ngrams

TagSameStatter
tests the prestations of an automatic tagger (TnT)

TableScaler
scales the contents of a tab-delimited file

TableTurner
swaps the X and Y axis of a tab-delimited table
Part of the LogiLogi Network: The LogiLogi Foundation - LogiLogi.org - OgOg.org
This is an old version for archival purposes, see www.LogiLogi.org for the current version.
< Edit this document | View history | Printer friendly (inc. links) >
Visited 7362 times
Document last modified Sun, 22 Apr 2007 13:45:52
All content is available under the GNU Free Documentation License. The LogiLogi-system is under the GPL
SourceForge.net Logo Zylon Internet Services-Groningen Logo
Visitor statistics