Sentence Senser
- Purpose
Applies the WordNet SenseRelate AllWords disambiguator to a corpus or part there- of. The WordNet Similarity dbreader measure is used. This measure reads in pre-stored relatedness-values in a very efficient manner from a djbdb file. This file must contain the relatedness values for all word-combinations that will be encountered within the sense-relate window (can be found using the wordcombination-). finder.pl, and they can be tagged using the listsenser.pl)
Senserelating using this set-up is much faster than doing it using a relatedness measure on the fly. Note however that this script has been made especiallyfor a research in which the position of the words in the original sentence was important so this information was kept in the form of '^'-signs on places where no sensing was possible...
The configfile is configsentencesenser.pl
- Synopsis
./sentencesenser.pl
- [-c <corpus>] [-f <fromsubdir>] | [-fd <fromdir>] [-a <tagrowsubdir>] | [-ad <tagrowdir>] [-sf <senseddbfile> [-s <sdbsubdir>]] | [-sd <sdbdirfile>] [-tc <tocorpus>] [-t <tosubdir>] | [-td <targetdir>] [-o <tosubdir>] | [-od <targetwordrowdir>] [-w <windowsize>] [-ps <partsubdir>] [-p <part>] | [-pd <partdir>] [-pf <divprtfile> [...]] [-dr] [-? = -h = -help = --help] [-v [<verboselvl>]]
- -c
- the corpus whose wordrow files are to be used
- -f
- subdir below corpus where the wordrow files can be found (defaults to reduced if none is specified and if it's not changed in the configfile)
- -fd
- full path to the wordrows dir
- -a
- the tagrowsubdir. Defaults to rewrtagrow
- -ad
- full path to the tagrow dir
- -sf
- the cdb file containing the precalculated relatedness values to be read by the dbreader measure
- -s
- subdir containing the cdb file
- -sd
- full path to the cdb file (including the filename)
- -tc
- target-corpus. If none is set the source corpus is used
- -t
- the subdir relative to the corpusdir in which the sensed tagrows should be stored. Note that this option is only possible if the corpus is given with the -c or -tc option (not the full path with the -td option)
- -td
- full path to the place where the list of sensed tagrows should be stored
- -o
- the subdir relative to the corpusdir in which the sensed wordrows should be stored (storing the sensed words also is neccesary because wordnet doesn'taccept sensed words in a non-canonical form). Note that this option is only possible if the corpus is given with the -c or -tc option (not the full path with the -td option)
- -od
- full path to the place where the list of sensed wordrows should be stored
- -w
- the window-size for disambiguating (note: max the same as used in the wordcombinationfinder.pl that made the list for the db if you want all possible disambiguation to be done)
- -ps
- subdir in which the divisions are
- -p
- the name of a division to use as a part. The file-names in all the divprt -files in the division-dir are added and used as the list of files to use unless you specify one or more explicitly using the -pf option
- -pd
- full path to the division-dir to use as a part. This option causes the -p option to be ignored
- -pf
- one or more divprt files to use as the part instead of all files in the division
- -dr
- dry-run. Nothing is written or deleted, only reading and reporting is done
- -v
- the level of verbosity, default verboselevel = 2, available levels: 0,1,2,3
- -?
- (and equivalents) prints help: the purpose and the synopsys
NOTE:
- If no from and to-dirs are given the defaults in the config file are used
You can download (or look at the sources of) SentenceSenser [here]. To run it you will also need [the config file] and the [fiauimenrelibrary]. You can also get the entire tool-package (containing the newest version of all fiauimenre tools and the library) [in one download]
|