|
http://www.wordandphrase.info/h_dispersion.asp
Why doesn't the frequency ranking follow the absolute frequency of a word?
DISPERSION AND RANKING (1,60,000)
As you browse through the frequency listing, you may notice that words with a lower frequency than other nearby words have a higher ranking (1-60,000). This is because the ranking is a function of two numbers: [frequency x dispersion]. Dispersion is a score (0.00-1.00) that measures how "evenly" the word is spread across the entire corpus (with 1.00 being the most even). The idea is that if a word is concentrated in just one or maybe two genres (or worse, even just a few sub-genres or texts in that genre), then the word is more specialized, and shouldn't be ranked as high in the overall list 1-60,000.
Most people won't need to see the dispersion score. If you do, you might consider downloading the data that contains this information. (See a sample (every seventh word, 1-60,000) with dispersion in the right column).
Also, please be aware that there are still some isolated "issues" with the frequency list, especially with words that occur mainly as a proper noun or in proper nouns (e.g. cook, ray, frost, savage). In most cases, these are already marked in the frequency list with parentheses, to let you know that there might be problems. But even with these issues, we believe that the frequency list here is more accurate than any other large frequency listing of English.
https://en.wikipedia.org/wiki/Statistical_dispersion
|
|