How to calculate the entropy of a file? (Or let's just say a bunch of bytes)
I have an idea, but I'm not sure that it's mathematically correct.
My idea is the following:
- Create an array of 256 integers (all zeros).
- Traverse through the file and for each of its bytes,
increment the corresponding position in the array.
- At the end: Calculate the "average" value for the array.
- Initialize a counter with zero,
and for each of the array's entries:
add the entry's difference
to "average" to the counter.
Well, now I'm stuck. How to "project" the counter result in such a way
that all results would lie between 0.0 and 1.0? But I'm sure,
the idea is inconsistent anyway...
I hope someone has better and simpler solutions?
Note: I need the whole thing to make assumptions on the file's contents:
(plaintext, markup, compressed or some binary, ...)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…