Word Combination Finder
- Purpose
Finds all word-combinations within a given window-width within the corpus. These word-combinations are used to calculate simmilarities in advance using theWordNet::Similarity-measures. For example applied to the data: a b c d it returns a-b a-c b-c b-d c-d if the window-size is set to 3. Note that only unevenwindow-sizes are supported
The configfile is configwordcombinationfinder.pl
- Synopsis
./wordcombinationfinder.pl
- [-c <corpus>] [-f <fromsubdir>] | [-fd <fromdir>] [-tc <tocorpus>] [-t <tosubdir>] | [-td <targetdir>] [-tf <combinedwordsfile>] [-w <window-size>] [-aw] [-ps <partsubdir>] [-p <part>] | [-pd <partdir>] [-pf <divprtfile> [...]] [-dr] [-? = -h = -help = --help] [-v [<verboselvl>]]
- -c
- the corpus that needs to be reduced
- -f
- subdir below corpus where the compoundified & reduced version of the corpus can be found (defaults to compoundified if none is specified and if it's not changed in the configfile)
- -fd
- full path to the compoundified & reduced corpus
- -tc
- target-corpus. If none is set the source corpus is used
- -t
- the subdir relative to the corpusdir in which the list of word-combinations should be stored. Note that this option is only possible if the corpus is given with the -c or -tc option (not the full path with the -td option)
- -td
- full path to the place where the list of word-combinations should be stored
- -tf
- the name of the word-combinations file
- -w
- the size of the window. The window is the size of the field within which all the words should be combined with all the words to get the list of uniqueword-combinations. Note that only uneven window-sizes are supported
- -aw
- uses an absolute window ('^''s (words removed in reducing) are counted)
- -ps
- subdir in which the divisions are
- -p
- the name of a division to use as a part. The file-names in all the divprt -files in the division-dir are added and used as the list of files to use unless you specify one or more explicitly using the -pf option
- -pd
- full path to the division-dir to use as a part. This option causes the -p option to be ignored
- -pf
- one or more divprt files to use as the part instead of all files in the division
- -dr
- dry-run. Nothing is written or deleted, only reading and reporting is done
- -v
- the level of verbosity, default verboselevel = 2, available levels: 0,1,2,3
- -?
- (and equivalents) prints help: the purpose and the synopsys
NOTE:
- If no from and to-dirs are given the defaults in the config file are used
You can download (or look at the sources of) WordCombinationFinder [here]. To run it you will also need [the config file] and the [fiauimenre library]. You can also get the entire tool-package (containing the newest version of all fiauimenre tools and the library) [in one download]
|