published on in blaugust draft

How much text can we fit into a QR code?

Many years ago, Mikko Hyppönen posted a thread on twitter[xcancel.com] on machine readable codes like QR codes.

It was interesting and I went and made this one. I dare you to scan it. If you haven’t figured out what it is, try singing it. You can find the music for it here.

Either way. A while later while reading the chapter on machine-readable codes in If it’s secure it’s vulnerable by the same Mikko, I went down the machine-readable code rabbit hole again.

First, QR codes has an encoding called ALPHANUMERIC. That allows 4296 characters from a limited character set.

So I was curious, would the whole chapter on codes in the book fit into a QR code.

The answer is no, the chapter is ~4700 characters. ~400 too many. Also <alphanumeric>ALPHANUMERIC IS/NT VERY READABLE. NO NEWLINES AND ONLY UPPERCASE AND $%*+-./: ALLOWED</alphanumeric>

But wait

What about compression?

Yes! Even DEFLATE can do it, and there is a BASE54 encoding specifically for ALPHANUMERIC QR codes.

Now the whole chapter fits in ~3500 chars (or 3400 with bzip2).

And actually… the BASE54 is unnecessary. We can store binary directly in QR. A whopping 23648 bytes (~23 KiB) if we use the lowest error correction.

So I wonder if we could compress the whole book into one code?

Spoiler: The answer is no, and if you’re an expert on QR codes you know why and was already writing me an angry email to correct me. It’s actually not 23648 bytes, it’s bits. So a binary QR code can fit around ~3 KiB and the text content of the book compresses to 111 KiB, so it will not fit. But if I knew that I wouldn’t have continued down the rabbit hole so let’s just continue seeing how much compressed text we can fit into a QR code.

The plaintext of “If it’s secure it’s vulnerable” compresses to roughly 111 KiB of bzip2, which is ~̶𝟻̶𝚡̶ too much. (To what I believed a QR code could store).

How about more modern ones? Let’s try zstd, and brotli. No… actually turned out bigger! 123 Kib and 129 KiB respectively. Is there anything else out there?

Turns out yes, there is at least two long long running competitions for compressing pure text as much as possible with little regard for speed or resource usage.

mattmahoney.net/dc/text.html and https://prize.hutter1.net

So let’s try the second best one from mattmahoney’s competition, cmix.

Ok, wow… that was slow, it took 10 minutes (vs <1s for bzip2 –best). But it got us down to 88 KiB!

That’s nice but not enough. We’d need ~30 QR codes (with 2.9 KiB per QR). Which is actually not that bad. A whole book in 30 QR codes.

Thinking outside the box

So if we leave the limitation of QR codes aside and look for any machine readable code format that we can print and then later scan back into data.

Then we will have no problem getting one book onto a A4.

Martin Monperrus had a great overview at monperrus.net/martin/store-data-paper (the link is dead but I linked to the archive).

We could use OPTAR which can apparently store ~200 KiB of data per a4 page, so the whole compressed “If it’s smart, It’s vulnerable” (~120 to 88 KiB) would fit just fine on one page.

or JABCode which I’m not sure, it seems 4,6 KiB per “symbol” (square) but you can have more than one symbol. AB Code seems interesting. Seems it was developed by Fraunhofer Institute for Secure Information Technology and is nowadays a ISO Standard 23634:2022

If you want to read the details without paying, the BSI doesn’t paywall their standards .

Blaugust note

This is day three of blaugust. Again, while this is mostly based on a old twitter thread of mine from 2022, little to no spell-checking has been done so marking as #draft