What Is a Token in AI? How AI Coding Tools Are Priced
▶ Watch on YouTube & subscribe to The Stack Underflow
Every limit, every bill, every discount in AI tooling is measured in the same unit: the token. But most cost calculators and explainers skip past the definition and throw numbers at you before you have any intuition for what you’re counting. This tutorial builds that intuition — because without it, the rest of the pricing math is just noise.
This is episode 1 of the “Hidden Cost of AI Coding” series. We start with the unit before we talk about the bills.
The one-sentence version: A token is a small chunk of text (roughly 4 characters or ¾ of a word in English), and the direction tokens travel — in or out — determines most of what you pay.
What a token actually is
A token is not a word. A token is not a character. A token is whatever chunk of text the model’s tokenizer decided to group together when it learned the language. That decision was made during training, not at inference time, so you don’t control it.
Some useful rules of thumb for English:
| Unit | Approximate token count |
|---|---|
| 1 character | ~0.25 tokens |
| 1 word | ~0.75 tokens |
| 4 characters | ~1 token |
| 750 words | ~1,000 tokens |
| 1 short page of prose | ~1,000 tokens |
These are approximations. The actual count depends on vocabulary frequency — common English words are often a single token, while rare words, names, and technical terms get split into multiple tokens.
Code tokenizes differently than prose
This is the part that catches developers off guard. Code is not prose. The same information expressed as code often costs twice as many tokens as the same information in plain English, because:
- Indentation uses tokens (every leading space or tab is counted)
- Punctuation — braces, semicolons, angle brackets — costs tokens
- Whitespace between tokens still costs tokens
# This innocent-looking block costs more tokens than it looks:
def calculate_total(
items: list[dict],
discount: float = 0.0,
) -> float:
return sum(item["price"] for item in items) * (1 - discount)
Every indent, every bracket, every type annotation is getting tokenized. When you’re sending thousands of lines of code as context to an AI coding assistant, the token count climbs fast.
The model lives in tokens, you live in words
Here’s the mental model worth internalizing:
Your message (words) → tokenizer → tokens → model → tokens → detokenizer → response (words)
Every input you send gets broken into tokens before the model sees anything. Every word the model writes back is generated as tokens, one at a time. The model has no concept of “word” — it sees only a sequence of integer IDs, each representing a token.
This is why a model’s context window is measured in tokens, not words or characters. When a model says it has a 200,000-token context window, that’s roughly 150,000 words — but the exact equivalent in your codebase depends on how your code tokenizes.
Not all tokens cost the same
This is where most people’s mental model breaks down. Tokens are priced by direction:
- Input tokens — what you send to the model (your prompt, your code, your conversation history)
- Output tokens — what the model generates back (the response, the generated code)
Output tokens typically cost 4 to 5 times more than input tokens. Sometimes up to 8 times more.
Input: $3 per million tokens (example)
Output: $15 per million tokens (example — 5× more)
This asymmetry exists because generating tokens is computationally more expensive than reading them. The model runs a full forward pass for every single output token it produces.
The shape of your task decides where your money goes
Because of this pricing asymmetry, the type of task you’re running matters enormously:
| Task type | Dominated by | Cost profile |
|---|---|---|
| Classification | Input | Cheap output |
| Extraction | Input | Cheap output |
| Code generation | Output | Expensive — model writes a lot |
| Long answers / docs | Output | Expensive — model writes a lot |
| RAG summarization | Input (large context) | Depends on response length |
A task with a large input and a short answer — like “classify this error as a type of bug” — pays mostly for input. A task that generates a lot of text — like “implement this feature from scratch” — is dominated by output cost, which is the expensive side.
Why a $400 AI coding bill isn’t what you think
When you see a large AI bill, it isn’t a single number. It’s a mix:
- Some fraction is input tokens (your prompts, your pasted code, your conversation history)
- A larger fraction is output tokens (the code the model wrote)
- In agentic workflows, most of it is context re-sent on every step — but that’s the next episode
The point: you can’t optimize a bill you don’t understand. And you can’t understand the bill without knowing that input and output tokens are priced differently.
ASCII diagram: token flow in a code generation request
You type:
"Write a function to parse JSON from a URL"
+ 200 lines of existing code for context
│
▼
[ Tokenizer ]
~600 tokens input (your message + context)
│
▼
[ Model inference ]
│
▼
[ Detokenizer ]
│
▼
Model writes:
~80 lines of generated code
~400 tokens output
Cost: (600 × input_rate) + (400 × output_rate)
where output_rate ≈ 5× input_rate
The output is smaller in raw volume, but it carries most of the cost weight.
Common misconceptions
-
“Tokens are just words.” They are not. A single word can be multiple tokens (especially compound words, names, and technical identifiers), and common short words may share a token. The 0.75 words-per-token rule is a starting heuristic, not a law.
-
“More context always means better results at the same cost.” More context means more input tokens, which costs money. And in agentic workflows where context is re-sent on every turn, that cost compounds. Bigger is not always better — it’s always more expensive.
-
“Code and prose cost the same per line.” They don’t. Code is more token-dense than prose because of indentation, punctuation, and syntax. A 100-line Python file can easily cost twice the tokens of a 100-line short story.
-
“Output tokens are the cheap part — the model is just typing.” This is backwards. Output tokens are the most expensive part of any API call, often 4–8× more per token than input. Tasks that generate a lot of text (code generation, documentation) are significantly more expensive than tasks that read a lot and respond briefly.
Frequently asked questions
How do I find out exactly how many tokens my prompt will use?
Most providers expose a tokenizer tool or a /tokenize endpoint. Anthropic’s Claude tokenizer is based on byte-pair encoding (BPE). You can use the anthropic Python SDK’s count_tokens method, or Anthropic’s Tokenizer tool in the Console, to get an exact count before you send a request.
Does the token count include the system prompt? Yes. Every token sent to the model — system prompt, conversation history, your current message, any retrieved context — counts as input tokens and is billed accordingly. This is why large system prompts have a real cost at scale.
Why does code cost more tokens than prose? Because code has high punctuation density, mandatory whitespace (indentation), and many rare identifiers that don’t compress into single tokens the way common English words do. The tokenizer was trained on a mix of text and code, but prose still tends to tokenize more efficiently.
If output tokens are 5× more expensive, should I tell the model to give shorter answers? Yes, deliberately. Instructing the model to be concise, to skip boilerplate, or to return only the changed lines instead of the full file can substantially reduce output token usage — and therefore cost. This is a real optimization technique, not just stylistic preference.
Where this fits in the series
This tutorial is episode 1 of How Claude Actually Works — a course that peels back the abstractions so you can reason clearly about what AI coding tools are actually doing and what they cost. Understanding tokens is the prerequisite for everything else: context windows, agentic cost explosions, caching, and pricing strategies.
The next episode covers the recent context problem — why AI coding bills can spike even when you don’t think you’re doing much, and what’s actually filling up your token budget.
Browse all tutorials in the series.
Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.
Subscribe on YouTube →