What this page is

This page has a number of tables with the results of a test on several well-known hash functions. It accompanies this article.
There's a summary at the bottom if you're just interested in seeing which function scores ‘best’

Longer explanation

In a set of values that are uniformly distributed over the total output space, we can state that one half of the output values should have 1 as its first bit. One half of those values should have 1 as its second bit as well, and again one half should have 1 as its third.
A hash function should, insofar possible, generate for any set of inputs, a set of outputs that is uniformly distributed over its output space. If we count the frequency of the number of '1' bits in its outputs, we should get a nice, clear binomial distribution

In theory, there is no way a hash function can provide a uniformly distributed set of outputs for any set of inputs. No hash can be perfect. But how do they fare on large, everyday data sets?
This page has tests the uniformity of the outputs of several common hash functions, applied to text files containing a large set of ‘things’.

This property is interesting for Flajolet-Martin (PDF) sketches; as explained here, it can be used to make a set of inputs ‘countable’

How to read these tables

Every column states, for each hash function, the frequency of the number of values starting with (at least) n ‘1’ bits. In between parentheses is the p–value resulting from the binomial test.
Interpretation of this last value is up to you, but for convenience, p < 0.20 is highlighted in pink, and p < 0.05 is highlighted in red.

Remember that none of these hashes are perfect. Some of them are just even more imperfect than others.

Caution

Even though most of the hashes used in this test are commonly used in cryptography, this test uses a property of hash functions that is only tangentially related to security issues. Don't draw any conclusion about the security or insecurity of these functions based on these tables.

Dataset 1: the four million address blacklist

This list was collected by Cor Gest, who blacklists people for looking at his mail server in a funny way. It contains about four million unique ip and email addresses in ASCII format.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Passthrough 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.001) 0 (0.038) 0 (0.275) 0 (1.000)
Adler32 2009900 (0.264) 1003713 (0.039) 502641 (0.864) 251309 (0.889) 125701 (0.971) 47157 (0.000) 19945 (0.000) 7653 (0.000) 4 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.001) 0 (0.038) 0 (0.275) 0 (1.000)
CRC32 2011349 (0.743) 1004816 (0.425) 501638 (0.092) 250318 (0.029) 125254 (0.214) 62635 (0.402) 31289 (0.453) 15791 (0.522) 7821 (0.701) 3961 (0.593) 1998 (0.436) 981 (1.000) 520 (0.191) 252 (0.678) 127 (0.684) 60 (0.949) 30 (1.000) 13 (0.700) 5 (0.467) 1 (0.199) 1 (1.000) 1 (0.617)
MD5 2010200 (0.414) 1003929 (0.069) 502061 (0.296) 250454 (0.057) 125164 (0.133) 62490 (0.155) 31251 (0.334) 15506 (0.101) 7720 (0.127) 3836 (0.144) 1955 (0.857) 977 (0.898) 456 (0.119) 237 (0.610) 115 (0.527) 62 (0.898) 26 (0.469) 13 (0.700) 4 (0.273) 3 (1.000) 1 (1.000) 0 (1.000)
SHA224 2012108 (0.278) 1007283 (0.041) 503842 (0.101) 252319 (0.053) 126287 (0.087) 63311 (0.061) 31652 (0.193) 15821 (0.379) 7882 (0.765) 3915 (0.848) 1937 (0.557) 930 (0.100) 455 (0.109) 213 (0.038) 120 (0.857) 55 (0.482) 25 (0.365) 12 (0.521) 3 (0.102) 2 (0.602) 2 (0.718) 1 (0.617)
SHA512 2008519 (0.013) 1004287 (0.159) 501354 (0.035) 251028 (0.472) 125413 (0.431) 62535 (0.214) 31237 (0.296) 15508 (0.105) 7713 (0.109) 3877 (0.425) 1937 (0.557) 1014 (0.307) 515 (0.279) 257 (0.463) 127 (0.684) 71 (0.225) 31 (0.928) 14 (0.898) 7 (1.000) 2 (0.602) 2 (0.718) 2 (0.249)
Whirlpool 2011387 (0.714) 1005639 (0.882) 503111 (0.591) 251730 (0.468) 125905 (0.535) 62890 (0.853) 31328 (0.596) 15885 (0.164) 7874 (0.834) 3993 (0.296) 2000 (0.410) 1006 (0.444) 500 (0.685) 251 (0.725) 120 (0.857) 62 (0.898) 30 (1.000) 16 (0.798) 7 (1.000) 2 (0.602) 1 (1.000) 0 (1.000)
ECDSA+SHA1 2011376 (0.723) 1004509 (0.249) 502106 (0.328) 250887 (0.313) 125841 (0.662) 63215 (0.136) 31666 (0.168) 15857 (0.243) 7889 (0.705) 3925 (0.975) 1947 (0.718) 978 (0.924) 462 (0.198) 224 (0.180) 127 (0.684) 67 (0.444) 37 (0.240) 23 (0.055) 8 (0.855) 4 (0.797) 1 (1.000) 0 (1.000)
MD4 2011580 (0.577) 1005899 (0.654) 503254 (0.452) 251404 (0.956) 126127 (0.209) 63305 (0.064) 31736 (0.076) 15774 (0.615) 7892 (0.680) 3937 (0.879) 1995 (0.477) 988 (0.836) 472 (0.404) 237 (0.610) 113 (0.416) 59 (0.848) 28 (0.718) 15 (1.000) 7 (1.000) 2 (0.602) 2 (0.718) 2 (0.249)
SHA1 2011376 (0.723) 1004509 (0.249) 502106 (0.328) 250887 (0.313) 125841 (0.662) 63215 (0.136) 31666 (0.168) 15857 (0.243) 7889 (0.705) 3925 (0.975) 1947 (0.718) 978 (0.924) 462 (0.198) 224 (0.180) 127 (0.684) 67 (0.444) 37 (0.240) 23 (0.055) 8 (0.855) 4 (0.797) 1 (1.000) 0 (1.000)

Dataset 2: cracklib

About 50.000 common English words and proper names, in ASCII.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Passthrough 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.003) 0 (0.086) 0 (0.419) 0 (1.000) 0 (1.000) 0 (1.000)
Adler32 26297 (0.271) 13115 (0.332) 6566 (0.603) 3261 (0.456) 1630 (0.600) 844 (0.516) 465 (0.011) 235 (0.051) 28 (0.000) 19 (0.000) 0 (0.000) 0 (0.000) 0 (0.003) 0 (0.086) 0 (0.419) 0 (1.000) 0 (1.000) 0 (1.000)
CRC32 26262 (0.160) 13221 (0.928) 6623 (0.823) 3342 (0.483) 1618 (0.409) 803 (0.440) 386 (0.190) 181 (0.081) 86 (0.094) 43 (0.264) 24 (0.844) 14 (0.676) 6 (1.000) 3 (1.000) 2 (0.679) 0 (1.000) 0 (1.000) 0 (1.000)
MD5 26536 (0.332) 13402 (0.057) 6777 (0.025) 3466 (0.004) 1703 (0.198) 877 (0.074) 441 (0.166) 224 (0.222) 123 (0.055) 56 (0.530) 31 (0.279) 14 (0.676) 6 (1.000) 3 (1.000) 1 (1.000) 1 (0.554) 1 (0.332) 1 (0.183)
SHA224 26445 (0.858) 13141 (0.479) 6583 (0.767) 3278 (0.660) 1638 (0.745) 795 (0.293) 401 (0.587) 205 (0.972) 103 (1.000) 53 (0.834) 25 (1.000) 8 (0.209) 4 (0.432) 1 (0.392) 1 (1.000) 0 (1.000) 0 (1.000) 0 (1.000)
SHA512 26567 (0.215) 13321 (0.274) 6656 (0.511) 3303 (1.000) 1630 (0.600) 824 (0.972) 431 (0.374) 213 (0.625) 109 (0.554) 54 (0.727) 24 (0.844) 14 (0.676) 5 (0.842) 1 (0.392) 1 (1.000) 0 (1.000) 0 (1.000) 0 (1.000)
Whirlpool 26499 (0.517) 13215 (0.976) 6627 (0.782) 3308 (0.928) 1647 (0.920) 831 (0.847) 424 (0.570) 208 (0.889) 98 (0.657) 44 (0.329) 24 (0.844) 10 (0.575) 6 (1.000) 2 (0.778) 1 (1.000) 1 (0.554) 0 (1.000) 0 (1.000)
ECDSA+SHA1 26296 (0.267) 13152 (0.550) 6552 (0.482) 3257 (0.414) 1609 (0.294) 806 (0.505) 376 (0.071) 189 (0.236) 79 (0.016) 40 (0.109) 19 (0.200) 13 (0.889) 8 (0.549) 3 (1.000) 1 (1.000) 0 (1.000) 0 (1.000) 0 (1.000)
MD4 26566 (0.218) 13390 (0.075) 6675 (0.364) 3415 (0.045) 1705 (0.181) 878 (0.068) 443 (0.138) 223 (0.250) 117 (0.183) 61 (0.186) 35 (0.076) 19 (0.093) 6 (1.000) 3 (1.000) 2 (0.679) 1 (0.554) 0 (1.000) 0 (1.000)
SHA1 26296 (0.267) 13152 (0.550) 6552 (0.482) 3257 (0.414) 1609 (0.294) 806 (0.505) 376 (0.071) 189 (0.236) 79 (0.016) 40 (0.109) 19 (0.200) 13 (0.889) 8 (0.549) 3 (1.000) 1 (1.000) 0 (1.000) 0 (1.000) 0 (1.000)

Dataset 3: A big Apache log file

This is an Apache error log file, sifted through sort(1) and uniq(1) to obtain only unique lines. This still left about 50 megabytes worth of log data. Since most lines are very similar (i.e. vary only in the address or time stamp logged), this is probably the most stringent test employed here.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Passthrough 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.006) 0 (0.129) 0 (0.413)
Adler32 190619 (0.021) 95269 (0.236) 47554 (0.704) 23666 (0.632) 11845 (0.827) 5647 (0.000) 2738 (0.000) 1180 (0.000) 712 (0.278) 346 (0.203) 276 (0.000) 205 (0.000) 193 (0.000) 5 (0.000) 0 (0.000) 0 (0.006) 0 (0.129) 0 (0.413)
CRC32 189662 (0.430) 94960 (0.979) 47463 (0.949) 23688 (0.740) 11859 (0.929) 5870 (0.402) 2925 (0.444) 1488 (0.907) 772 (0.270) 378 (0.697) 189 (0.769) 91 (0.917) 45 (0.941) 23 (1.000) 10 (0.769) 6 (0.834) 4 (0.545) 2 (0.660)
MD5 189158 (0.015) 94875 (0.772) 47284 (0.346) 23673 (0.665) 11968 (0.356) 6015 (0.292) 3063 (0.078) 1515 (0.413) 759 (0.520) 387 (0.406) 199 (0.321) 100 (0.436) 63 (0.018) 29 (0.213) 20 (0.026) 10 (0.091) 4 (0.545) 4 (0.059)
SHA224 189971 (0.833) 94800 (0.568) 47259 (0.287) 23651 (0.562) 11707 (0.132) 5776 (0.038) 2881 (0.113) 1433 (0.193) 712 (0.278) 359 (0.568) 176 (0.509) 83 (0.349) 43 (0.713) 22 (0.917) 12 (0.882) 3 (0.399) 3 (0.768) 1 (1.000)
SHA512 189887 (0.953) 94949 (0.991) 47322 (0.450) 23538 (0.180) 11757 (0.298) 5878 (0.464) 2972 (0.927) 1460 (0.550) 753 (0.673) 370 (1.000) 189 (0.769) 96 (0.716) 47 (0.883) 19 (0.466) 7 (0.236) 5 (1.000) 3 (0.768) 1 (1.000)
Whirlpool 189189 (0.020) 94544 (0.126) 47158 (0.119) 23577 (0.282) 11731 (0.200) 5810 (0.105) 2953 (0.804) 1455 (0.466) 745 (0.898) 375 (0.815) 181 (0.797) 101 (0.377) 55 (0.211) 26 (0.532) 16 (0.185) 7 (0.531) 2 (1.000) 2 (0.660)
ECDSA+SHA1 190455 (0.075) 95201 (0.353) 47517 (0.842) 23748 (0.947) 11944 (0.484) 6086 (0.048) 3064 (0.075) 1539 (0.149) 754 (0.646) 367 (0.876) 184 (0.971) 90 (0.835) 45 (0.941) 20 (0.603) 10 (0.769) 3 (0.399) 0 (0.129) 0 (0.413)
MD4 190187 (0.362) 94852 (0.706) 47496 (0.924) 23866 (0.391) 11929 (0.576) 5846 (0.250) 2988 (0.699) 1451 (0.405) 715 (0.339) 346 (0.203) 160 (0.061) 82 (0.298) 37 (0.186) 17 (0.251) 7 (0.236) 3 (0.399) 1 (0.383) 1 (1.000)
SHA1 190455 (0.075) 95201 (0.353) 47517 (0.842) 23748 (0.947) 11944 (0.484) 6086 (0.048) 3064 (0.075) 1539 (0.149) 754 (0.646) 367 (0.876) 184 (0.971) 90 (0.835) 45 (0.941) 20 (0.603) 10 (0.769) 3 (0.399) 0 (0.129) 0 (0.413)

Dataset 4: A sequence of numbers, ASCII

This set contains numbers ranging from 00001 to 65536, written as plain ASCII.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Passthrough 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.001) 0 (0.040) 0 (0.278) 0 (0.632) 0 (1.000) 0 (1.000)
Adler32 32767 (0.997) 16383 (0.996) 8195 (0.972) 4447 (0.000) 200 (0.000) 70 (0.000) 70 (0.000) 70 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.001) 0 (0.040) 0 (0.278) 0 (0.632) 0 (1.000) 0 (1.000)
CRC32 32768 (1.000) 16384 (1.000) 8192 (1.000) 4073 (0.717) 2034 (0.762) 1017 (0.838) 509 (0.912) 253 (0.876) 125 (0.825) 57 (0.416) 28 (0.536) 15 (0.901) 9 (0.721) 6 (0.306) 6 (0.017) 4 (0.019) 2 (0.090) 0 (1.000)
MD5 32787 (0.885) 16362 (0.846) 8203 (0.897) 4072 (0.705) 2063 (0.736) 1023 (0.987) 500 (0.610) 259 (0.851) 127 (0.965) 61 (0.755) 33 (0.859) 13 (0.532) 6 (0.597) 2 (0.453) 1 (0.729) 0 (0.632) 0 (1.000) 0 (1.000)
SHA224 32593 (0.173) 16383 (0.996) 8168 (0.781) 4128 (0.606) 2061 (0.770) 1063 (0.219) 554 (0.066) 280 (0.133) 143 (0.184) 75 (0.169) 37 (0.375) 13 (0.532) 3 (0.077) 2 (0.453) 0 (0.278) 0 (0.632) 0 (1.000) 0 (1.000)
SHA512 32864 (0.456) 16395 (0.921) 8212 (0.813) 4163 (0.280) 2081 (0.459) 1032 (0.801) 521 (0.690) 250 (0.731) 119 (0.452) 30 (0.000) 11 (0.000) 9 (0.080) 5 (0.375) 3 (0.805) 1 (0.729) 0 (0.632) 0 (1.000) 0 (1.000)
Whirlpool 32906 (0.283) 16430 (0.678) 8152 (0.641) 4080 (0.802) 1979 (0.124) 996 (0.386) 480 (0.162) 251 (0.778) 132 (0.723) 63 (0.950) 24 (0.184) 10 (0.167) 6 (0.597) 1 (0.202) 0 (0.278) 0 (0.632) 0 (1.000) 0 (1.000)
ECDSA+SHA1 32894 (0.327) 16477 (0.401) 8338 (0.085) 4193 (0.118) 2108 (0.178) 1100 (0.017) 555 (0.059) 277 (0.188) 155 (0.019) 74 (0.211) 30 (0.791) 19 (0.451) 6 (0.597) 2 (0.453) 2 (1.000) 1 (1.000) 1 (0.393) 1 (0.221)
MD4 32897 (0.315) 16440 (0.613) 8184 (0.929) 4152 (0.366) 2036 (0.796) 1004 (0.539) 505 (0.773) 245 (0.511) 120 (0.507) 59 (0.574) 33 (0.859) 16 (1.000) 7 (0.860) 4 (1.000) 0 (0.278) 0 (0.632) 0 (1.000) 0 (1.000)
SHA1 32894 (0.327) 16477 (0.401) 8338 (0.085) 4193 (0.118) 2108 (0.178) 1100 (0.017) 555 (0.059) 277 (0.188) 155 (0.019) 74 (0.211) 30 (0.791) 19 (0.451) 6 (0.597) 2 (0.453) 2 (1.000) 1 (1.000) 1 (0.393) 1 (0.221)

Dataset 5: A sequence of numbers, binary

This set contains all values ranging from 0 to 65535, encoded in two bytes. This input set is special; it has exactly the property we expect of the output set, which is why the Passthrough ‘test’hash, which does nothing more than pass through the input string unaltered, gets a perfect score.

Interestingly, CRC32 also receives a perfect score; and even Adler32, notable for its under-performance in the other tests, fares pretty well.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Passthrough 32768 (1.000) 16384 (1.000) 8192 (1.000) 4096 (1.000) 2048 (1.000) 1024 (1.000) 512 (1.000) 256 (1.000) 128 (1.000) 64 (1.000) 32 (1.000) 16 (1.000) 8 (1.000) 4 (1.000) 2 (1.000) 1 (1.000) 0 (1.000) 0 (1.000)
Adler32 32768 (1.000) 16384 (1.000) 8192 (1.000) 4096 (1.000) 2048 (1.000) 1024 (1.000) 512 (1.000) 256 (1.000) 1 (0.000) 0 (0.000) 0 (0.000) 0 (0.000) 0 (0.001) 0 (0.040) 0 (0.278) 0 (0.632) 0 (1.000) 0 (1.000)
CRC32 32768 (1.000) 16384 (1.000) 8192 (1.000) 4096 (1.000) 2048 (1.000) 1024 (1.000) 512 (1.000) 256 (1.000) 128 (1.000) 64 (1.000) 32 (1.000) 16 (1.000) 8 (1.000) 4 (1.000) 2 (1.000) 2 (0.264) 0 (1.000) 0 (1.000)
MD5 32732 (0.782) 16472 (0.427) 8116 (0.373) 4017 (0.205) 2044 (0.937) 1008 (0.625) 496 (0.492) 240 (0.332) 125 (0.825) 64 (1.000) 34 (0.723) 14 (0.708) 5 (0.375) 4 (1.000) 2 (1.000) 1 (1.000) 0 (1.000) 0 (1.000)
SHA224 32853 (0.509) 16279 (0.346) 8271 (0.351) 4238 (0.022) 2143 (0.034) 1056 (0.313) 511 (0.982) 256 (1.000) 121 (0.565) 63 (0.950) 29 (0.659) 12 (0.381) 6 (0.597) 4 (1.000) 2 (1.000) 0 (0.632) 0 (1.000) 0 (1.000)
SHA512 32693 (0.561) 16353 (0.783) 8243 (0.547) 4096 (1.000) 2037 (0.814) 1050 (0.413) 499 (0.579) 231 (0.125) 109 (0.101) 62 (0.851) 30 (0.791) 14 (0.708) 8 (1.000) 2 (0.453) 1 (0.729) 0 (0.632) 0 (1.000) 0 (1.000)
Whirlpool 32604 (0.201) 16348 (0.749) 8242 (0.555) 4206 (0.077) 2110 (0.164) 1047 (0.469) 510 (0.947) 252 (0.827) 114 (0.232) 56 (0.348) 25 (0.249) 10 (0.167) 6 (0.597) 3 (0.805) 1 (0.729) 1 (1.000) 1 (0.393) 1 (0.221)
ECDSA+SHA1 32684 (0.514) 16353 (0.783) 8153 (0.649) 4088 (0.904) 2077 (0.515) 1033 (0.777) 508 (0.877) 245 (0.511) 124 (0.757) 69 (0.531) 30 (0.791) 17 (0.802) 10 (0.475) 6 (0.306) 3 (0.459) 1 (1.000) 0 (1.000) 0 (1.000)
MD4 32777 (0.947) 16271 (0.310) 8122 (0.412) 4029 (0.283) 2053 (0.911) 1012 (0.717) 497 (0.520) 255 (0.975) 130 (0.859) 62 (0.851) 35 (0.595) 17 (0.802) 12 (0.154) 6 (0.306) 3 (0.459) 1 (1.000) 1 (0.393) 0 (1.000)
SHA1 32684 (0.514) 16353 (0.783) 8153 (0.649) 4088 (0.904) 2077 (0.515) 1033 (0.777) 508 (0.877) 245 (0.511) 124 (0.757) 69 (0.531) 30 (0.791) 17 (0.802) 10 (0.475) 6 (0.306) 3 (0.459) 1 (1.000) 0 (1.000) 0 (1.000)

Summary: distribution of p-values

As a quick summary, perhaps it's interesting to present graphs of the computed p-values themselves. Bars to the left of these graphs represent low p-values. Bars nearer to the right edge represent high p-values.

CRC32 seems to score really well, but its graph is skewed by the results of Dataset 5 (binary numbers), which may or may not be too synthetic to be considered a fair benchmark. But even if you substract the results from that test, it does not fare significantly worse than other, cryptographic hash functions.

Graphs of the distributions of the p-values themselves.
Adler32 Adler32 CRC32 CRC32 MD5 MD5
SHA-224 SHA224 SHA512 SHA512 Whirlpool Whirlpool
ECDSA+SHA1 ECDSA+SHA1 MD4 MD4 SHA1 SHA-1