CRC-32C (Castagnoli) for C++ and .NET


Recent Intel processors have specialized instruction for calculating CRC-32C: the CRC32 instruction. Fast software fallback is used on all other hardware, i.e. AMD processors and some older Intel processors.

Hardware performance (Core i7 3.4GHz):

Software performance (Core i7 3.4GHz with hardware acceleration disabled):

The library is optimized for larger buffers of several dozens of kilobytes. Above results have been measured on buffers ranging between 0KB and 64KB with 32KB average.

PCLMULQDQ instruction

Recent AMD processors contain PCLMULQDQ instruction, which could be used to accelerage CRC-32C among other things. This library doesn't use the instruction, because there was no code at the time that could be easily incorporated.

I've nevertheless performed benchmarks and PCLMULQDQ-based code for CRC-32-IEEE performed at 2.8 GB/s or 40% faster than my software implementation for CRC-32C.

Intel has submitted PCLMULQDQ CRC code to zlib under permissive BSD license. Unfortunately, Intel's code is full of strange looking constants that have been generated from CRC-32-IEEE polynomial. In order to use Intel's code in my project, I would have to regenerate all these constants for CRC-32C polynomial.

I managed to do that for all constants but one. The constant-generating code is in the Hg repo in “constants” project. The one elusive constant is 0x9db42487. Jim Kukunas of Intel, who submitted the code to zlib, has described it as “magic” constant that produces FFFFFF00000…0000 when folded 4 times. My attempts to simulate this folding process yielded all kinds of strange constants but no FFFFFF00000…0000.

Anyway, even if I managed to generate this last constant, it still wouldn't be over. Firstly, it might turn out that the “magic” constant cannot be generated for CRC-32C polynomial. Secondly, CRC has fancy bit-flipped variants and I am not sure, maybe the zlib code is one such strange variant. Removing the flipping would be tricky without devoting time to fully understand Intel's algorithm. Maybe I will get back to it someday.

Project CRC-32C for C++ and .NET
Version 1.0.5
CRC type CRC-32C (Castagnoli)
Polynomial 0x1EDC6F41 / 0x82F63B78
CRC features configurable initial value, chainable, no pre/post-processing, continuous bit order
Optimizations Intel CRC32 instruction (x3) in hardware
Sliced table-driven software fallback
Performance 20 GB/s in hardware
2 GB/s in software
NuGet Crc32C (C++), Crc32C.NET (.NET)
Download crc32c-hw- (C++ and .NET)
Source code crc32c-hw and Crc32C.NET on Bitbucket
License BSD license, zlib license
Credits Robert Važan, Mark Adler