From AI to Infinity and Back

A Journey Beyond the Universe - and Language

Every book that will ever be written already exists in a finite AI-library. So does every biography of every person who has lived or ever will live. The mathematics that proves it starts with how large language models codify what can be said — and ends where Borges, Wittgenstein, and the theology of omniscience converge on a single number.

When we interact with modern artificial intelligence systems like ChatGPT or Claude, we're often told about "tokens" – mysterious units that determine both how the AI processes our text and how much we pay for the service. A token, in the world of large language models, is the smallest piece of text the model can see. It might be a whole word like "Babel," a fragment like "ing," a punctuation mark, or even a space.

Consider these examples:

"biography" might become: ["bio", "graphy"] = 2 tokens
"unimaginable" might become: ["un", "imagin", "able"] = 3 tokens
"Library of Babel" might become: ["Library", " of", " Babel"] = 3 tokens
"infinite" stays: ["infinite"] = 1 token

Modern tokenizers — algorithms like Byte Pair Encoding — carve human language into these token-pieces, mapping each one to a number the model can process. The rough arithmetic is simple enough. In English, one token corresponds to about four characters, or about three-quarters of a word. A short sentence needs perhaps ten tokens. A page of prose, around 350. These are the units by which AI companies meter their services — the kilowatt-hours of artificial thought. A heavy AI user like the famous AI journalist and researcher Azeem Azhar uses 1 million tokens per day. The average Gemini or ChatGPT user will consume 50,000 to 100,000 tokens.

Tokens are in fact a more efficient way than the alphabet to code language. Let’s consider a standard English book of 500 pages to get a better understanding of writing with tokens. A typical page holds around 275 words, giving us for the 500-page book roughly 137,500 words in total. At the standard conversion rate of approximately 1.33 tokens per word, that book contains around 183,000 tokens — call it 185,000 for a round number.

This is already a meaningful figure. It exceeds the context window of most current AI models, meaning no model available today can hold an entire 500-page book in its mind at once. The book must be fed in chapters, in fragments — much as a human reader turns pages, except the machine forgets the earlier ones.

But the truly interesting question isn't how many tokens fit in a book. It's how many books can be built from tokens.

Modern language models typically work with vocabularies of 30,000 to 100,000 tokens — carefully selected subword units that efficiently represent language. Early models like BERT used around 30,000; OpenAI's GPT-2 established a benchmark at roughly 50,000; current models like GPT-4 use about 100,000. For our calculation, we'll use 100,000 to stay consistent with today's AI.

These units, strung together in sequences of 185,000 tokens, can encode any 500-page book. Every novel, every scientific treatise, every love letter and shopping list — all expressible as a sequence drawn from this modest set. The question that follows is irresistible: how many such books are possible?

The Number of All Possible Books

The calculation is straightforward. We have 185,000 token positions in our book, and each position can be filled by any of the 100,000 available tokens.

The total number of possible books is therefore: 100,000¹⁸⁵'⁰⁰⁰ (100,000 available tokens to the power of 185,000 possible token positions). To grasp the magnitude of this number, we can convert it to a power of ten. Since log₁₀(100,000) = 5, we get:

≈ 10⁹²⁵'⁰⁰⁰ is the total number of volumes in the library.

That is a 1 followed by 925,000 zeros. For comparison, the number of atoms in the observable universe (i.e., all matter in all visible stars, planets, interstellar dust, galaxies) is estimated at roughly 10⁸². The number of possible books exceeds that by a factor of 10⁹²⁴'⁹¹⁸. It is, to borrow a phrase from mathematics, incomprehensibly larger than anything physical reality contains.

And yet — and this is the crucial point — it is finite.

The Library of Babel

The Argentine writer Jorge Luis Borges imagined something similar in his 1941 story The Library of Babel: a universal library containing every possible book of a fixed length, written using 25 distinct characters (the 22 Hebrew letters, the period, the comma, and the space). His library held every book that could ever be written — and, therefore, an overwhelming ocean of gibberish surrounding infinitesimal islands of meaning.

Borges was deeply influenced by Edward Kasner and James Newman's 1940 book Mathematics and the Imagination, which popularized mathematical concepts for general readers — including the famous "googol" (10¹⁰⁰) and, crucially, Georg Cantor's work on transfinite numbers and different sizes of infinity. Cantor's insight — that infinity is not a single thing but comes in hierarchies, that some infinities are larger than others — clearly electrified Borges and gave him the mathematical scaffolding to imagine a library that was vast beyond comprehension yet still, in some sense, bounded.

Similar to the above calculation with tokens, Borges worked not with tokens but at the level of individual letters and characters. With his 25 symbols and 410 densely printed pages (1,312,000 character positions per book), his library is vastly larger than ours:

25¹'³¹²'⁰⁰⁰ ≈ 10¹'⁸³⁴'⁰⁰⁰ possible books in Borges' library.

While our token library contains 10⁹²⁵'⁰⁰⁰ possible books, Borges's contains roughly 10⁹⁰⁹'⁰⁰⁰ times more. The main difference is gibberish: Borges's extra books are overwhelmingly meaningless — wild letter combinations that carry no information. Our token library contains gibberish too, of course — random sequences of tokens that make no sense — but because tokens are meaningful subword units rather than individual characters, they produce far less nonsense. By choosing the right units of language representation — 100,000 carefully selected tokens rather than 25 individual characters — we collapse the space of possibilities by an almost unimaginable degree while preserving the ability to express everything meaningful. Tokenization, it turns out, is not merely a technical convenience. It is a profound act of compression, and it is exactly what modern AI does. The choice of tokenization scheme is not an afterthought — it is one of the most consequential design decisions in building a language model.

The Physical Library

Let us indulge in one final act of arithmetic and give our library a body. Each of the 10⁹²⁵'⁰⁰⁰ books has 500 pages. A sheet of paper in a typical book is about 30 µm thick — 0.03 mm. A single book measures 15 mm, or 1.5 centimeters. Stack every book end to end:

Total length = 10⁹²⁵'⁰⁰⁰ × 0.015 m ≈ 10⁹²⁴'⁹⁹⁸ meters

One light-year is approximately 9.46 × 10¹⁵ meters. Converting:

Library length ≈ 10⁹²⁴'⁹⁸² light-years.

The observable universe is about 93 billion light-years across, or roughly 10¹¹ light-years. Our library exceeds the diameter of the observable universe by a factor of 10⁹²⁴'⁹⁷¹. If you set out at the speed of light the moment the universe began, you would not have crossed a fraction of a fraction of a fraction of this library by now. You would not have made any progress at all, in any meaningful sense.

And yet — the library is finite. It has a last book. It has an end.

Every Biography of Every Person in Every Possible Future Already Written

Here is where the mathematics tips into something else entirely. Somewhere in that finite library is a book that describes your life. Not metaphorically — a 500-page biography, as detailed as any ever written. The choices that shaped you, the encounters that changed your direction, the words you said and the ones you swallowed. It won't capture every breath or every forgotten dream — no 500-page book could — but it tells your story with the intimacy and completeness of the finest biography ever composed. And not just yours: every life that has been lived, is being lived, or could ever be lived has its volume on these shelves.

And in the library there are also the thousand-page versions, split across two volumes, that do capture the finer grain. The library doesn't just contain your biography — it contains every possible resolution of your biography, from a brief sketch to an exhaustive chronicle spread across multiple books. Taken together, they contain your entire life and all the ways it could have unfolded differently, in complete, exhaustive detail: every morning you have woken up, every conversation you have or could have had, every decision you have made and every consequence that followed. The words you spoke to your mother last Tuesday. The dream you forgot on Wednesday. The sentence you are reading at this very moment.

This is not mysticism. It is combinatorics. If every possible sequence of tokens exists in the library, then every possible narrative exists. Every biography, true or fictional, of every human life that was, is, or might yet be.

The same holds for every scientific paper that will ever be published, every poem that will ever be composed, every lie that will ever be told. The cure for every disease, and the false promise of cures that do not work. But here is the philosophical knife-edge: nobody knows which book is theirs.

Your life is fully written, but even if you could find the book, you would not know that it is yours. The library contains billions of biographies nearly identical to yours until today — differing by a single conversation, a single decision, a single afternoon. And, what is more, you do not know if the described future is yours or someone else's or purely fictional. You could hold your own life story in your hands and never be certain it was not the version where you turned left instead of right on a Tuesday in March. Whether you will win that Olympic bronze medal or lose it by a hundredth of a second, or if you will survive the avalanche next Sunday or not — these are open questions until you have lived them. You must live forward through time, experiencing each moment as if you are its author, even though the text already exists on a shelf in our finite library. You experience choice — the agonizing, exhilarating sensation of free will — while the record of your choices sits quietly in a finite library, already complete.

This seems an elegant resolution to the ancient debate between determinism and free will: both are true simultaneously. The book is written, but it is unknowable. Determinism holds in principle; freedom holds in practice. You are both the reader and the character, and you can never be sure which.

The Theologian's Library

Another final step. The numbers we have calculated are astronomical, but they are not infinite. They are vast beyond comprehension, but they terminate. And infinity — true infinity — swallows them without effort.

An infinite being, given infinite time, could write every book in our finite library. He doesn’t even need to be good, just systematic and indefatigable. That being could write them all, read them all, and then do so again — infinitely many times. But if that being is God with a perfect memory, every possible human life would not merely be recorded but known, intimately and exhaustively. And when all possible lives described in all possible books have been lived, every additional life would have to be a repetition of one already written. The only differences between two lives captured in the same book are the things that cannot be expressed in words — the precise quality of a sunset, the texture of grief, the taste of bread broken with a friend. As Wittgenstein wrote: Whereof one cannot speak, thereof one must be silent. The library's boundary is language itself.

This is, essentially, the classical theological concept of divine omniscience rendered in the language of information theory. But if any systematic, indefatigable being — or indeed an AI — could in principle write every book in the library, then the library is not a divine task. It is a mechanical one. God is not the librarian. God is the custodian of everything beyond language — the silence that Wittgenstein pointed to, where the library's shelves end and something else begins.

The Fastest Reader in an Infinite Aisle

There is one final irony buried in this journey, and it concerns the very tool that made it possible.

An artificial intelligence like the one that helped write this article can process text at astonishing speed. It could, in theory, read every book a human will encounter in a lifetime before that human finishes a single cup of coffee. Turned loose in our library of 10⁹²⁵'⁰⁰⁰ volumes, it would be the fastest reader the universe has ever seen.

And it would be completely, utterly lost. Not because the library is too large — though it is incomprehensibly so — but because speed without direction is just turbulence. The AI can read any book in the library. It cannot know which book to read. It has no curiosity, no restlessness, no sudden intuition at 9:51 in the morning that the number of possible tokens might connect to the number of possible lives. It does not wake at night with a half-formed question. It does not feel the itch of an idea not yet articulated.

That itch is what started this article. A human asked a simple, practical question — how is a token calculated? — and then refused to stop at the practical answer. What if we fixed the token length? How many books could we write? How long would the shelf be? And then, the leap no algorithm would make on its own: every life is already written in that library, and nobody knows which book is theirs.

The AI, for its part, could follow every leap. It could run the calculations in moments, draw the connections to Borges, convert light-years without reaching for a calculator. It could write the article in the time it takes to read the first paragraph. But it could not have begun. The article existed in the library long before our conversation, and the AI could read the library far faster than any human — yet without the human, it would never have found the right shelf.

This is the paradox of the library made practical: the book that describes its own discovery is already written, but it can only be discovered through the collaboration between a mind that knows which questions to ask and a mind that can pursue the answers at the speed of thought. The human walks into the library with a lantern. The AI reads every title the light touches. Neither is sufficient alone.

Perhaps this is the most hopeful conclusion our finite library offers. In a collection that already contains every possible text — every masterpiece, every discovery, every story — the act of creation is not writing. It is finding. It is the collaborative navigation through an unimaginable space toward the one book, among 10⁹²⁵'⁰⁰⁰, that matters right now. The human provides the compass. The machine provides the speed. And together, they do something that neither the library's completeness nor its finitude could ever render pointless: they choose which life to live, which book to read or rewrite, which essentially is the same.

That, in the end, may be what distinguishes us from the library that already contains us. We are not its authors. We are its navigators. And navigation, unlike authorship, requires two things the library itself can never supply: a question and the will to ask it.

This article emerged from a single conversation that began with the question "How is a token calculated?" and ended, unexpectedly, at the intersection of mathematics, philosophy, and theology. It was written by a human and an AI — one providing the compass, the other the speed. The calculations are real. The library is finite. The conversation that found it is already on one of its shelves.