Stop Wasting Tokens: Write Smarter JSON for LLMs
Shubham
Developer

As a programmer, you probably pass JSON to an LLM and that’s totally fine. LLMs understand JSON perfectly. It’s structured, predictable, and machine-friendly.
But recently, I started noticing something odd… the same data in JSON form often consumes way more tokens than it should.
Let’s understand why.
The Hidden Cost of JSON
When we work with JSON, it feels neat and efficient.
But for the model, JSON is full of “extra” characters. Quotes, braces, commas, each one gets tokenized separately.
Here’s a simple example:
{ "name": "Shubham", "role": "Developer" }
It looks clean to us, but to a tokenizer, it becomes:{, ", name, ", :, ", Shubham, ", ,, ", role, ", :, ", Developer, ", }
That’s already a lot of tokens for just two fields.
Now imagine a bigger dataset or a nested structure... the count goes up like crazy.
Why Token Efficiency Matters
Every LLM works on tokens, not characters or words.
Each token costs you money and model bandwidth.
So, if you’re sending large JSON data in prompts or context windows, you’re wasting tokens on quotes and braces that carry no actual meaning.
The less noise you send, the more room the model has for actual reasoning.
Make It Simple for the Model
At some point, I asked myself that why not just write it in a simpler way?
We don’t need to make our data beautiful for humans… we just need to make it clear for the model.
Look at this:
name: Shubham
about: Problem Solver
website: shubhamwhocodes.comThis says exactly the same thing as the JSON version…
But it uses fewer tokens and is easier for the LLM to parse.
No curly braces, no double quotes, just meaning.
For Repetitive Data, Use CSV
Now, what if you have a big array of similar objects?
Something like this:
[
{ "name": "Alice", "age": 25, "role": "admin" },
{ "name": "Bob", "age": 30, "role": "user" },
{ "name": "Charlie", "age": 22, "role": "moderator" }
]All those repeating "name", "age", "role" keys waste tokens.
A much cleaner approach is this:
name,age,role
Alice,25,admin
Bob,30,user
Charlie,22,moderatorIt’s compact, readable, and the model understands it without any problem.
CSV is basically free token compression for repetitive data.
When I Found TOON
While exploring this topic, I came across a package called TOON (Token-Oriented Object Notation).
It’s a compact format made specifically for LLMs to reduce token count.
Example:
users[2]{name,age,role}:
Alice,25,admin
Bob,30,userIt’s clever. It defines headers once and keeps data compact. You can check it out here: github.com/johannschopplich/toon
But there’s a catch… TOON doesn’t handle nested JSON well. You’d need to flatten your data before using it, which adds extra work.
So, while TOON is an interesting alternative, it’s not a universal solution.
My Take on It
For me, TOON is not the star of this idea - the mindset is.
We don’t need to complicate things for ourselves as programmers. We just need to simplify data for the tokenizer.
Here’s how I look at it now:
| Type of Data | Best Format | Why |
|---|---|---|
| Simple object | Plain text key:value | Saves tokens, easy to read |
| Large array of objects | CSV | Removes repeated keys |
| Deeply nested data | Flatten + JSON or TOON | Keeps structure with fewer tokens |
| Complex irregular data | JSON | Safe fallback |
The goal isn’t to switch formats for the sake of it. It’s to make your prompt cleaner and cheaper.
The Real Lesson
We usually think about optimizing our code… but in LLM work, we should also think about optimizing our prompts.
The fewer tokens the model has to read before it understands your data, the better it performs and the less it costs.
So, next time before pasting that big JSON blob into your prompt, ask yourself…
Can I just write this as plain text or CSV?
You might be surprised how much you save by simply removing quotes.
Closing Thoughts
Token efficiency isn’t about being fancy, it’s about being thoughtful. It’s not about replacing JSON or adopting a new syntax. It’s just about writing data that’s simpler for the model to understand. Because sometimes, the smartest optimization is just… removing what’s unnecessary.