joriszwart.nl

Data compression

Minimal ZIP file – part IV

This article is part of a series.

  1. Creating a valid ZIP-file
  2. Use .NET’s DeflateStream
  3. No dependencies (coding the DEFLATE algorithm)
  4. Calculating the CRC-32 👈

Introduction

From the start, a dependency on System.IO.Hashing was introduced to calculate the CRC-32/ISO-HDLC checksum.

First, a step back; the self-written ‘de flater’ (Dutch pun intended) from Part III is being written off. We want dealer-quality software and return to the version from Part II.

Source code

The source code follows the standard CRC1 algorithm. The implementation is naive and can be sped up by using lookup-tables (still true 2026 AD?) or a hardware implementation.

To make this work the package System.IO.Hashing is removed.

public static class Crc32
{
    public static uint HashToUInt32(byte[] source)
    {
        const uint polynomial = 0xEDB88320; // ISO-HDLC

        var crc = 0b11111111_11111111_11111111_11111111;

        foreach (var b in source)
        {
            crc ^= b;

            for (var bit = 0; bit < 8; bit++)
            {
                var lsb = crc & 1;
                var mask = lsb * polynomial;
                crc = (crc >> 1) ^ mask;
            }
        }

        return crc ^ 0b11111111_11111111_11111111_11111111;
    }
}

Again, Info-ZIP’s zip agrees:

dotnet run
zip -T minimal.zip
test of minimal.zip OK

For extra verification, note the hash of ‘BANANABANANABANANABANANABANANA’ is 0x3dab2823.

Conclusion

Depending on the definition of ‘done’, you could say that writing ZIP-files is not that difficult. But the devil is in the details!

Now go make your own. Some ideas:

Related