Local-only · GPT · Claude · Gemini · Llama

Know the token count before you hit send.

Context overflow is the most expensive silent failure in AI work. The Token Estimator counts tokens for GPT, Claude, Gemini, and Llama families in your browser, previews the API cost, and shows the percentage of the model's context you would consume. The text never leaves this tab.

Paste prose or code. Get a count in under a second.

  • 0 prompts sent to count tokens
  • 4 model families
  • 1 local tokenizer
estimate - GPT-4o preview
// input
12,847 tokens
~51,388 chars · 9,140 words
0 network requests fired

// model
GPT-4o · 200K context window
8% of context budget used

// cost (input only)
~$0.04 estimated at $2.50 / 1M tokens
output cost depends on response length

// comparison across families
GPT-4o          12,847 tokens
Claude Sonnet   13,102 tokens
Gemini 2.0      12,621 tokens
Llama 3.3       13,488 tokens

same text, four model tokenizers, side by side

All in the browser The tokenizer runs in this tab. Open the network panel: zero requests fire during counting.
No prompt sent to count Counting tokens with a hosted API would mean sending the prompt. The whole point of this tool is to skip that.
Approximation method documented BPE-style approximation per model family. The math is open. The accuracy bounds are stated.

Why estimates matter before you paste

Knowing the count is cheaper than hitting the limit.

Every modern model has a context window and a per-token price. Three things go wrong when you don't measure first: prompts get truncated silently, API bills surprise you at the end of the month, and you can't compare two model choices on the same input. A 30-second count fixes all three.

budget - same prompt, four models input only · cost in USD
// the prompt
~50 kB of source code + ~10 kB of docs
plus a 400-word instruction at the top

// what the count looks like
GPT-4o         12,847 tok  ~$0.04 in
Claude Sonnet  13,102 tok  ~$0.04 in
Gemini 2.0     12,621 tok  ~$0.04 in
Llama 3.3      13,488 tok  local · free

// what you would not have known
- the bundle is 8% of a 200K window
- output budget is 92% (~187K tokens)
- per-call cost lands at $0.04 input
  1. Context windows truncate without warning.Most APIs silently drop the oldest tokens when you exceed the limit. Your prompt looks fine in the chat. The model just stopped seeing the top of it.
  2. API cost is a function of tokens, not characters.A wall of code and a wall of prose with the same character count can differ by 40 percent in token count. Cost estimates from "characters / 4" miss this badly.
  3. Budget planning starts with a number.If a single call is 12K tokens, a 100-call batch is 1.2M tokens. The estimator gives you the per-call number so the batch math is honest.
  4. Model comparison needs equal footing."Is this prompt cheaper on Claude or GPT?" only has an answer if both counts come from the same input. The estimator counts all four in one pass.

The families you actually call. Counted locally.

Each family uses a slightly different tokenization scheme. GPT uses tiktoken-style BPE. Claude uses Anthropic's own BPE variant. Gemini uses a SentencePiece-style approach. Llama uses a related SentencePiece variant. The estimator carries a BPE-style approximation per family that lands within a few percent of the official count for prose.

Code and non-Latin scripts are the cases where approximations drift. The estimator is conservative on those (rounds up) so a count that fits the budget here will fit the model's real budget. The exact accuracy footer is shown next to each result.

  • GPT-4o200K context · OpenAI tiktoken-style approximation
  • GPT-4 Turbo128K context · same tokenizer family as GPT-4o
  • GPT-3.516K context · cl100k_base approximation
  • Claude Sonnet (current)200K context · Anthropic BPE approximation
  • Claude Opus (current)1M context · same Anthropic BPE family
  • Gemini 1.5 / 2.xup to 2M context · SentencePiece-style approximation
  • Llama 3.x128K context · SentencePiece variant for the Llama family
  • Notetokenization is approximated locally; exact counts come from the official APIs

One input. Four counts you'd otherwise pay APIs to get.

The same 50 kB sample - mixed prose and code, the kind of bundle you'd actually paste - through every tokenizer the estimator ships. Counts come from local approximations calibrated against each family's official tokenizer.

On small screens, each row stacks so every column reads without sideways scrolling.

Family Tokens 50 kB sample Context window size % used per call Input cost USD
GPT-4o12,847200K6.4%~$0.032
Claude Sonnet13,102200K6.6%~$0.039
Gemini 2.012,6212M0.6%~$0.016
Llama 3.313,488128K10.5%local · free

Know the number before the API does.

The Token Estimator is built into Prompt Organizer. Free, local, no account. Paste prose or code. The text stays in this browser until you choose to send it somewhere yourself.