Permutation DRR In Permutation NormalizedA pdf with a full description of the normalization + the formulas can be found [here] The PermutationDRRInPermutationNormalized reader reads the data in a normalized fashion. The normalisation that is done is the following : All in-group-ngram-values are summed and then divided by the number of different ngrams. This gives the average per-ngram-value within the group (one average for all ngrams). Then all in-group-ngram-values are divided by this average. This results in all ngram-values being normalized within groups (and on average they will be 1). Then these in- group-normalized ngam-values are summed between the groups, for each value this will be a separate sum. The fraction of this sum that one ngram-value within the group takes will be the fraction of the raw frequency (of the ngram within the corpus) that the group gets assigned as it's nngram-value. Last but not least the ngram-value is divided by the average per-ngram-value within the entire permutation (that is the average ngram-value within all groups), to get a value again that is 1 on average Note that this normalisation does not span between different permutations, it is repeated for each permutation, and so it applies only to the groups within one permutation This normalization uses temp-data. It's slow when ran for the first time (and it uses a few gb of space depending on the size of your data-set), then it becomes much faster, because it then just reads in what it calculated before This normalisation is usefull to prevent false positive (reports of significance) effects from structural differences in sentence-lengths in the base-case and them being equaled out in the permutations. It also makes the results neater and easier to interpret |
MenuList
