Recent Intel processors have specialized instruction for calculating CRC-32C: the CRC32 instruction. Fast software fallback is used on all other hardware, i.e. AMD processors and some older Intel processors.
Hardware performance (Core i7 3.4GHz):
- 64-bit mode: 21 GB/s
- 32-bit mode: 10 GB/s
Software performance (Core i7 3.4GHz with hardware acceleration disabled):
- 64-bit mode: 2 GB/s
- 32-bit mode: 1.8 GB/s
The library is optimized for larger buffers of several dozens of kilobytes. Above results have been measured on buffers ranging between 0KB and 64KB with 32KB average.
Recent AMD processors contain PCLMULQDQ instruction, which could be used to accelerage CRC-32C among other things. This library doesn't use the instruction, because there was no code at the time that could be easily incorporated.
I've nevertheless performed benchmarks and PCLMULQDQ-based code for CRC-32-IEEE performed at 2.8 GB/s or 40% faster than my software implementation for CRC-32C.
Intel has submitted PCLMULQDQ CRC code to zlib under permissive BSD license. Unfortunately, Intel's code is full of strange looking constants that have been generated from CRC-32-IEEE polynomial. In order to use Intel's code in my project, I would have to regenerate all these constants for CRC-32C polynomial.
I managed to do that for all constants but one. The constant-generating code is in the Hg repo in “constants” project. The one elusive constant is 0x9db42487. Jim Kukunas of Intel, who submitted the code to zlib, has described it as “magic” constant that produces FFFFFF00000…0000 when folded 4 times. My attempts to simulate this folding process yielded all kinds of strange constants but no FFFFFF00000…0000.
Anyway, even if I managed to generate this last constant, it still wouldn't be over. Firstly, it might turn out that the “magic” constant cannot be generated for CRC-32C polynomial. Secondly, CRC has fancy bit-flipped variants and I am not sure, maybe the zlib code is one such strange variant. Removing the flipping would be tricky without devoting time to fully understand Intel's algorithm. Maybe I will get back to it someday.