joriszwart.nl

Data compression

Minimal ZIP file – part I

This article is part of a series.

  1. Creating a valid ZIP-file 👈
  2. Use .NET’s DeflateStream
  3. No dependencies (coding the DEFLATE algorithm)
  4. Calculating the CRC-32

Introduction

How difficult is it to create a ZIP-file1 with a bare minimum2 - but readable - amount of code?

To go slowly, we do this in a few steps. Each in their own article. The first step is to create a valid, but not very optimal ZIP-file (i.e., uncompressed).

The code is in C# but should be easy to follow for people with knowledge of other languages. No exotic syntax or tricks are used. If something is unclear or can be done better, let me know.

File format

For a little background, these are the parts of a ZIP-file with two files:

zip-file-layout.svg
ZIP file layout – tap or hover some interactivity

The individual parts contain things like file names, file sizes, checksums and compressed data.

Source code

The main code is only a few lines and follows the said file layout.

// Zip data
var filename = "bananas.txt"u8.ToArray();
var data = "BANANABANANABANANABANANABANANA"u8.ToArray();
...

// Create ZIP
using var zip = File.Create("minimal.zip");
...

// Calculate CRC-32/ISO-HDLC over the uncompressed data
var crc32 = Crc32.HashToUInt32(filedata);

// Write file
var offset = WriteLocalFileHeader();
WriteData();

// Write directory
var info = WriteCentralDirectoryHeader(offset);
WriteEndOfCentralDirectoryRecord(info.Position, info.Size);

In fact, that is most of it. The called functions only contain writing numbers, strings and data to a file.

The entire source code is 100 lines. Readability is the main goal.

As said, this is a valid ZIP-file with no compression. So it actually inflates your data because of the headers.

The full code can be found in Zip1.cs.

The proof is in the pudding

PKZIP 2.04g (Jan ‘93) confirms that it is a valid ZIP-file.

pkunzip 2.04g doing it’s work

MS-DOSâ„¢ screenshot from DOSBox

Info-ZIP’s zip agrees:

dotnet run
zip -T minimal.zip
test of minimal.zip OK

Info-ZIP’s zipinfo is also a great tool to inspect ZIP-files.

zipinfo -v minimal.zip

Next up

In the next part we’ll cheat a little and use .NET’s DeflateStream to create a more useful ZIP-file.


  1. Wikipedia: ZIP (file format) ↩︎

  2. The title of this article is about the code, not about the size of the created ZIP-file. Good find, Bertrik Sikken↩︎

Related