Confused about (password) entropy

There seem to be many different 'kinds' of entropy. I've come across two different concepts: A) The XKCD example of correcthorsebatterystaple. It has 44 bits of entropy because four words randomly chosen from a list of 2048 words is 4 * log2(2048) = 44 bits of entropy. This I understand. B) The Shannon entropy of the actual string i.e. the entropy is calculated based on frequencies of the letters/symbols. Applying the Shannon formula on correcthorsebatterystaple the result is 3.36 bits of entropy per character. # from http://stackoverflow.com/a/2979208 import math def entropy(string): "Calculates the Shannon entropy of a string" # get probability of chars in string prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ] # calculate the entropy entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ]) return entropy print entropy('correcthorsebatterystaple') # => 3.36385618977 Wikipedia only adds to my confusion: It is important to realize the difference between the entropy of a set of possible outcomes, and the entropy of a particular outcome. A single toss of a fair coin has an entropy of one bit, but a particular result (e.g. "heads") has zero entropy, since it is entirely "predictable". -- Wikipedia: Entropy (information theory) I don't quite understand the distinction between the entropy of the toss (generation) and the entropy of the result (the string). * When is B used and for what purpose? * Which concept accurately reflects the entropy of the password? * Is there terminology to differentiate between the two? * True randomness could give us correctcorrectcorrectcorrect. Using A we still have 44 bits. Using B the entropy would be the same as that of correct. When is the difference between the two important? * If a requirement specifies that a string needs to have 20 bits of entropy—do I use A or B to determine the entropy?
http://dlvr.it/Sj2xx5

No comments:

Post a Comment