Back to Articles
LLM Basics6 min readMay 20, 2026

What is Tokenization?

Learn how text is broken down into tokens, vocabulary indexes, and embeddings space for LLM input pipelines.

Tokenization is the foundational step of modern Natural Language Processing (NLP) and Large Language Models (LLMs). Before a neural network can process text, the text must be translated into a mathematical representation. This is done by breaking characters or words down into sub-units called 'tokens' and mapping them to integer IDs in a vocabulary list.

Byte Pair Encoding (BPE)

Most state-of-the-art LLMs (like GPT-4, Llama, and Mistral) utilize Byte Pair Encoding (BPE). BPE starts at the individual byte level and iteratively merges the most frequently occurring adjacent pairs of tokens in a corpus to form new subwords. This dynamic approach prevents the 'Out-Of-Vocabulary' (OOV) error because any unknown word can still be represented as individual characters or bytes.

pythonEditor
def get_stats(ids):
    counts = {}
    for pair in zip(ids, ids[1:]):
        counts[pair] = counts.get(pair, 0) + 1
    return counts

def merge(ids, pair, idx):
    newids = []
    i = 0
    while i < len(ids):
        if i < len(ids) - 1 and ids[i] == pair[0] and ids[i+1] == pair[1]:
            newids.append(idx)
            i += 2
        else:
            newids.append(ids[i])
            i += 1
    return newids

The Pipeline from Text to Tensor

Once tokenized, the sequence of integer IDs undergoes several matrix transformations:

  • Token IDs: A 1D tensor representing indices (e.g., [464, 2068, 318]).
  • Embedding Lookup: Each token ID fetches a high-dimensional vector from an embedding matrix (W_e) of size (Vocab Size x Hidden Dimension).
  • Positional Encoding: Vectors representing word positions are added to embedding vectors to give the model spatial context.
  • Hidden States: These combined vectors are fed into the Transformer's self-attention blocks.

Want to play with this concept?

We build interactive visual terminals for tokenizers, rendering engines, rate limiters, and network topologies. Explore them live!

Open Interactive Labs →