Entropy-Perfect Encoding
Using Mathematics to Compress Tightly-Coupled Web Components
Edited Jul 25, 2025 | Posted Jun 7, 2025 | ~10 minute read
Background
For the past three years I've been trying to build an operating system that can do everything you expect an operating system to do while also encoding its entire state as a single URL. That way, we get a permalink to every state and achieve truly comprehensive deep linking.
For a long time, I struggled with finding a solid approach to achieving this goal. A user-friendly operating system involves many independent, dependent and otherwise tightly coupled components - especially when you want it to be a platform that supports third-party app development. I tried a variety of different methods to store that data in a URL without any redundancy.
Some of my attempts involved query parameters, some base64-encoded JSON objects, some domain-specific languages, and some my own custom syntax. They all had the same problem - they failed to capitalize on the actual space available in URLs.
An incredible combinatorial explosion appears when you try to compute the number of unique URLs that can exist. Query params, JSON objects, domain-specific languages and any other syntax will always suffer from the same problem: it encodes duplicate information and requires delimiters that take up space in the URL.
The URL Space: Bigger Than the Universe
According to some very lazy research I just did, the estimated number of particles in the observable universe is . That's a 97-digit number. Have a look:
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Even though that's an unfathomably large number (imagine having to count to it), it's not that long – it barely fills two lines of text on my screen.
If you could assign a unique natural number (or hash) to every particle in the known universe you'd never need more than 97 digits to do it.
If we change the base from 10 to 64, items is fewer than . That means you never need more than 54 characters to assign a unique base 64 hash to every particle in the known universe.
That's an incredibly short string of text for such a big feat. Here's a sample I made by mashing my keyboard 54 times:
09m84COSM84wf897hos_t83740FW3M-8tdocs8374tyvsc9nc3b1nt
In theory, this could be the URL of a random particle in the universe. Hopefully, it's one close by.
This kind of assignment, where each element in a set has an integer value and there are no gaps or collisions, is called a minimal perfect hash function (MPHF).
Of course, URLs have a more structured syntax than base64 integers. For example, the domain name takes up a little bit of space in the address bar and you don't normally encode numerical data in a domain name.
Yet, even if we add the longest possible domain name (255 characters)…
https://heres.a.huge.domain.name.as.long.as.it.can.possibly.be.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.example.com/9m84COSM84wf897hos_t83740FW3M-8tdocs8374tyvsc9nc3b1nt
… we still use only 317 characters, and this is the worst-case domain name. We know that URLs can be far longer than this. For example, here is the URL I get when I search Google for pictures for cats:
https://www.google.com/search?sca_esv=c99a67a28e4a9644&q=cats&udm=2&fbs=AIIjpHxU7SXXniUZfeShr2fp4giZ1Y6MJ25_tmWITc7uy4KIeioyp3OhN11EY0n5qfq-zENwnGygERInUV_0g0XKeHGJtesk9Adz8V_3dCFxRHd-4rVc28Hvas3fJjxYa4l0bNk99rKkfyreU5lHIQnHuafulMAL-Uqa-w7l2q-jrp2K_DA_pRuYwavY1buYjWGJpBAMTtrbI6TDhEB-GyDqzxOrfIfwCQ&sa=X&ved=2ahUKEwiQhOe9hd6NAxVRRjABHUCvJ6oQtKgLegQIGRAB&biw=1223&bih=754&dpr=2
Back when Internet Explorer was popular, URLs were allowed to have thousands of characters. Today, they can have millions.
TL; DR:
Based on current estimates, there are far more unique URLs than there are particles in the known universe. This had me thinking: how can a web app capitalize on all of that storage space?
How many possible URLs are there?
A lot.
To determine the actual cardinality of that set you'd need to make a few assumptions. For example, are consecutive slashes allowed or will they be normalized away? Do we limit it to two thousand characters like Internet Explorer or two million like Chrome?
Let's go factor-by-factor and, at each step, we'll assume the worst possible case.
Worst-Case Length
First, we start with the length. Let's use a hard limit of 2000 characters. Since Chrome supports 2MB, we just removed 99.9% of the URLs from our set. Even so, there is still a lot of nuance to computing the cardinality of that remaining 0.1%.
No Protocol Mutability
For the scheme/protocol, forget about any cardinality imparted by that. I only want to think about secure browsing contexts ("https://"). We subtract those 8 characters, leaving us with 1992 characters.
Worst Possible Host
What about the host name? Well, what's the worst case we could possibly run in to?
Many projects and apps are meant to exist on only one origin, so let's not even allow any mutability in the host.
The maximum length of a host name is 255-characters. So let's bring back that behemoth of a domain that I showed earlier:
https://heres.a.huge.domain.name.as.long.as.it.can.possibly.be.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.example.com
Just to be clear, we removed every URL from our set that doesn't have this domain name which, as you know, was pretty much all of them. Yet, we're still left with 1737 characters we can store data in.
No Optional Features
There's still a lot of unpredictability in the size of our set, however. For example, the user credentials, port, query/search parameters, and hash/fragment all add a ton of cardinality to our set but they also add an incredible amount of complexity to the cardinality equation.
Luckily, they're all optional. We'll just remove all of the URLs that have these features (which, if you didn't know, were almost all of the URLs we had left).
No Variable-length Pathnames
At this point, we're really just counting the set of all valid pathnames up to 1737 characters long. All pathnames require at least the initial slash and the empty pathname without an initial slash will often resolve to one with it or vice versa. So, that first pathname character is effectively useless for encoding data.
Furthermore, how do we account for the variable-length of pathnames? We'd have to add the number of 2-character pathnames to the number of 3-character pathnames to the number of 4-character pathnames and so on.
To compute that, we require a geometric series. That's too rich for my blood, so let's just remove every URL whose pathname is not exactly 1737 characters long. Again, we just removed almost every URL from our already stripped-down set.
No Special Characters
Modern browsers support 85+ characters in the URL pathname (once you include the percent symbol, comma, parenthesis, square brackets, period, exclamation point, etc.), suggesting it might be possible to approach base 85.
There's just one problem: a lot of these special symbol combinations will get stripped away/normalized down to others during URL resolution. Some will be treated inconsistently in platforms that let you embed links. Others will cause an error if used incorrectly. It isn't trivial to compute the cardinality of that.
The solution? You guessed it. We'll remove most of the URLs from our set! Let's only consider URLs that use the following 64 character alphabet to spell out their pathname segments:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_
But we can't just string together 1737 of these bad boys and call it a day…
No Variable-length Segments
A valid URL pathname can be broken up into segments, each separated by a forward slash. Some systems throw an error if even one of those segments is longer than 250 characters. It wouldn't be terribly hard to account for these arrangements but it would mean writing out a geometric series.
So, let's only count URLs that have this exact arrangement:
Seven fixed-length segments: six 250-character segments followed by one 230 character segment.
This requires seven slashes to split it all up; 1737 minus 7 leaves us with an even 1730 segment characters.
Not to sound like a broken record, but almost none of the URLs in our set had this exact arrangement. We just removed almost everything we had left, yet again.
Just for fun, here is a "random" sample from our set:
https://heres.a.huge.domain.name.as.long.as.it.can.possibly.be.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.example.com/ld9370LC9uzp83Mklv90dujd9v7774n0V7p7em5078B7S790dkb094977au9D07B098yAV76ouyf79J7uft7d5f77g90jki98765dc7hkji7865d7dgy77yuhjkIk8Kgf567hj9ny7GVNjn7677V9jbvr567bj099efbn49efg776ft5x4rxi7T8Yoih0n7hv6r4d657d8g9u8b779ub8yf7t7d75d674s77743s2a1a7772zqzdtcuhbi/007opkolpp7m777ko7n777ou77h8yvtcr7tds5es5241awzsx7dfchvj7jkhboi78g8765c6e4x421s576y57t6fv8iub9ub87ytvc64dc768fv897b9ub9yv7tc7tv8ybhvutF64d52as2zfxcghvbhininIonoi0ih9ubgiI0P1MS07vmllp0097f74d2wa1asswextyvunimlljo00987643212wzqz1a12aswzz23sxerc34dfcrvr/5fvtby65g67buh7nu98m09k0,00k9jubyuhubvtctvubuvtdcrd4dxw23314152375786e6d976f70867f97743d8742a13s7278rf97vtiyfcdtd7xtrwSte7wrxztewzyrTUCtivuyvbOUyv77utrs462s67546rv9vbyv8rtc6X4s13A2ezwrexrcYdexwa2WESDfgtyu789iujhnmko9iujhbVFdre321qasxcfvghuiuytr432was/zxcvfghyui9o876543wsaxcdfghyuI9Oijnjiuygtftdrezwertrytfugyhiu8978gtf7r6d53s64512zwxetCRYtfughioYG87f564dsetrcyvubhij9i8u7t6f5r7743217aqwezsxdcrft7gyh77ujikJHgt5432177777qwertyhj7kJHgre321qaszxcvghjio987654321QWErfcv7bhuyTRfdfgtyui987654321qwerfv77798/bnhjHGv7fghjio09oiujhn7b7VCxsaq12we7rty7u8Ikj7hgFD777seRTyu876YT7r4ewsDFghJiuy6543721Qas7zxcVBnjio9876574321qasDFGhbnJUytrfdsERtyuJHgbvfdeRthBVGCFdsw34543212qaszxcVbghjkio0987654321qasZXcvbnhJKo0987654ERD7sw21qas7ZXcfghyuI9O0okmjki87uyhgbvgt54edsxzsa/Q12qwasXCfrt56tygBnhjio909OIKjmnko09iujhY76tfgcder321q7aszxCvgbhjui7uY6T5r4e321234567879i8UytrfdsXcvbghjIuyTRewqA7SzxcvbgHJik7oKjmjI7UyhgfrEwszx7sAWaszxcdftYUijkmjNHBGfrtYuiuYTredFCghyui89876543eDfgyuytr43edfgyuhbghJUI987654321qaszxCVghui9o876543wErt/fGHYBUtcu63s312asydu6IYUTYVbokuvhtExt42at4wrexcutyvbo8ygoiyrc53yr3254Sdu6rfvoyboiybiutyvyuCtycurc6u5c865777ctycuyut7ityd7i6757DE7U7674s5u74s54sd7xry7txc7jyrfxd7j7yrDujdI7775edu77564st427as13szr7ezUT7rxiYGCkyuC77FJut7ykm79OM0p7978t
I mashed my keyboard as fast as I could and didn't bother adding any underscores or dashes, and it still took a surprisingly long time to type this out.
Final Cardinality of our Set
Our set is now a fraction of a fraction of a fraction of a fraction of a fraction of a fraction of a fraction of its former self. Every URL in our set is a highly unlikely 264-character origin followed by a just-as-unlikely arrangement of 1730 base-64 digits. Ultimately, what we've got here isn't even a drop in the bucket compared to the set of all possible URLs.
But the good news is we finally have a set that can be represented by a trivial cardinality expression! Here it is:
Remember how there are fewer than particles in the known universe? Contrast that with . The ratio is basically 0:1. But what does our final number look like?
First, recall this number:
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
You can hash every particle in the universe with this.
Now look at the number of URLs in our "tiny" worst-case set:
49130930808458578125726405944552267433522803882142885516112811523788026785247859392846571767356697524778325967483200069078779474567637462287039820734625250499484361851684556510293931285640090262847320625396236411697500945066199598105166159939755919883442591852056298637720529199720593614665368213198995251597105758524900013562528183914971627692406189778877642149164166096666344310948740383560007413387592604328612531345115046683029331104462385525968171774813136492661946688877582952419371606080350664628677161116012681284900630454342749044599225617460538414933002793524748242310551996204900729123914340847332998493284864702610102439307491216408396209310348370320754399598907600619635495790019370265074821420291117076832510034388136166750175664483098298227108288715448223965251691403281034083780954570303916571362770718114444005072106144764849088939314984894581474809543833192044410395714190504825893726938138808786090562830615071943734645166593502644409746065788403640661934980709315726553694898120393882753330274852067296608154605846048350579895683801542722129840564975973695792574841807503188443209805038542236087627591190050313931573054215490244259112910601708234964741899698246055425592081845013246361046328159137214118378184713651023860908738930831971118247941416724345631434014423798305497398506349015730376483008110926005644593572237269397514131887446582215094604019679547368523447625105463951586910365216023840045751551948134564505146576477286926376308492310938786531574335304262217571479998838543109756673336971087025554889723823983924836021091907239786400390676746118864056643193205208982882660920507349524831913201182159186699144808760747505873640789216340957248824164601830737224297040770920458197407697184648022371257191918470172361380986595622828733829413281500185801880456613038104242643213822575569128878763680705643247394470467893917356297826947055879379566532902944043402022244021075932899192372571062439168957178821897773439784799153514401590031281325655093535268608362216209580429528935699708423114866509895772049714403629267974279109387816328548424345736717256948117015825744199080052425083293799967061771719865084133230904096721868612800747277569863102967441274245770550134896869196181886526475029674372524872538258468524762201586321646559863923236843242370292251062221529264770493701602901661815354058944472808349405810473671013491176811312485003537600785430597702027500137187545757793793903699970939444681886308707596870841054350202539644315500656737966010773075989351158870179723002367751683561878721860274497306669607424568652625472586573342936967070726864546132506261184714500488245623426076538010987701708241991726967647765104168030353744906806687895709021815045224257436600078097509507924968458836098125864106257805097776865485964770796555656543643860800587923743280926493394369557925897760397469072974966876141221188598603450443972539793912611469980053930814065772811818839300960781935055752799228266748050545969251447362299129060916916664278672204155214991658109636131883622892313180240750974195280013175415626930825404039290811759144500594703907940131918638449619062937500747268988899138994176
Imagine what you could hash with this!
In the case of my project, the "things" I need to hash are application states.
Do I really need this many application states to make an operating system?
My guess was no way (especially if I embrace a minimalist approach to UI interaction design).
However, to take full advantage of all of that storage space, I would need to create a MPHF that could integrate naturally into today's web app technology stack.
MVC + MPHF
A breakthrough came when I realized that the structure I was building to support the bijection had a surprising amount in common with MVC architecture. In fact, the two ended up laying right on top of each other and being the same thing.
Think of an MVC controller object as a piece of the piecewise-defined hash function. Whether you assign a hash (which acts as the data model) to a controller or manipulate the controller directly (for example, in response to user interaction), the application state and the model are always be kept in sync.
Each controller is a reusable component. It has its own hash range from 0 to k-1, where k is the cardinality of the component's model.
These components assemble together like LEGO® bricks, and no matter what assembly you make with them, the result is still a minimal perfect hash function over the assembly's state domain.
Leveraging the JavaScript prototype chain, I was able to add object-oriented concepts like inheritance into the equation.
A Real-World Demonstration
This isn't just a theoretical exercise. I put these methods into practice to create this website and the surrounding platform. Within the platform, this notebook is represented by a controller called "ejaugust.com"
. This is how many states it has:
60000
Right now, this is the state it's in, as an integer:
15000
In base64, that gives us the following hash:
3Go
When you interact with this app, you're changing its state, even if you're just scrolling the page. However, that's not the hash you see in your address bar. That's because this website is embedded into a larger platform.
Beyond the Notebook
Try opening the menu in the bottom left and visiting another app. Don't worry - everything on this app will remain just as you left it, including your scroll position. Likewise, changes you make in another one of my apps will be retained when you come back to this one.
This is not accomplished using any storage API in javascript and you certainly aren't uploading your activity to any server. It's all stored in your address bar, in the URL itself.
Kireji is a platform that unites a collection of apps into a single minimal perfect hash function. The cardinality of all of those apps together is about . As an integer, it looks like this:
61661095870794301440000000000000000000
Here's the state you've assigned to the platform:
15053978504126339534034370560939843
Here's how it looks as a hash in base 64:
KoTW_xDghhRI53_jBt3
Now, we add the domain of the app you're in along with a semantic version number to obtain the exact URL you currently have in your address bar:
https://www.ejaugust.com
It's quite short, all things considered.
It gives you a robust link back to this exact state - and any other on the platform - so that you can bookmark, share and/or analyze it.
I'm still adding features to the platform like movable windows, the ability to have arbitrarily many windows open, and even a full read-write file system that behaves in the familiar way. These features will lengthen the URL, but we've got plenty of room to grow.
Conclusion
Thanks for reading! Hopefully, this shows you just how powerful a minimal perfect hash function is when employed for the purposes of data compression. The way that it overlays with MVC architecture is also very satisfying.
I look forward to writing more about the project soon!