AINov 02, 2025

Stop Wasting Tokens: Write Smarter JSON for LLMs

S

Shubham

Developer

Stop Wasting Tokens: Write Smarter JSON for LLMs

As a programmer, you probably pass JSON to an LLM and that’s totally fine. LLMs understand JSON perfectly. It’s structured, predictable, and machine-friendly.

But recently, I started noticing something odd… the same data in JSON form often consumes way more tokens than it should.

Let’s understand why.

The Hidden Cost of JSON

When we work with JSON, it feels neat and efficient.
But for the model, JSON is full of “extra” characters. Quotes, braces, commas, each one gets tokenized separately.

Here’s a simple example:

{ "name": "Shubham", "role": "Developer" }

It looks clean to us, but to a tokenizer, it becomes:
{, ", name, ", :, ", Shubham, ", ,, ", role, ", :, ", Developer, ", }

That’s already a lot of tokens for just two fields.
Now imagine a bigger dataset or a nested structure... the count goes up like crazy.

Why Token Efficiency Matters

Every LLM works on tokens, not characters or words.
Each token costs you money and model bandwidth.

So, if you’re sending large JSON data in prompts or context windows, you’re wasting tokens on quotes and braces that carry no actual meaning.

The less noise you send, the more room the model has for actual reasoning.

Make It Simple for the Model

At some point, I asked myself that why not just write it in a simpler way?
We don’t need to make our data beautiful for humans… we just need to make it clear for the model.

Look at this:

name: Shubham 
about: Problem Solver 
website: shubhamwhocodes.com

This says exactly the same thing as the JSON version…
But it uses fewer tokens and is easier for the LLM to parse.

No curly braces, no double quotes, just meaning.

For Repetitive Data, Use CSV

Now, what if you have a big array of similar objects?
Something like this:

[ 
  { "name": "Alice", "age": 25, "role": "admin" }, 
  { "name": "Bob", "age": 30, "role": "user" }, 
  { "name": "Charlie", "age": 22, "role": "moderator" } 
]

All those repeating "name", "age", "role" keys waste tokens.

A much cleaner approach is this:

name,age,role 
Alice,25,admin 
Bob,30,user 
Charlie,22,moderator

It’s compact, readable, and the model understands it without any problem.

CSV is basically free token compression for repetitive data.

When I Found TOON

While exploring this topic, I came across a package called TOON (Token-Oriented Object Notation).
It’s a compact format made specifically for LLMs to reduce token count.

Example:

users[2]{name,age,role}:
  Alice,25,admin
  Bob,30,user

It’s clever. It defines headers once and keeps data compact. You can check it out here: github.com/johannschopplich/toon

But there’s a catch… TOON doesn’t handle nested JSON well. You’d need to flatten your data before using it, which adds extra work.

So, while TOON is an interesting alternative, it’s not a universal solution.

My Take on It

For me, TOON is not the star of this idea - the mindset is.

We don’t need to complicate things for ourselves as programmers. We just need to simplify data for the tokenizer.

Here’s how I look at it now:

Type of DataBest FormatWhy
Simple objectPlain text key:valueSaves tokens, easy to read
Large array of objectsCSVRemoves repeated keys
Deeply nested dataFlatten + JSON or TOONKeeps structure with fewer tokens
Complex irregular dataJSONSafe fallback

The goal isn’t to switch formats for the sake of it. It’s to make your prompt cleaner and cheaper.

The Real Lesson

We usually think about optimizing our code… but in LLM work, we should also think about optimizing our prompts.

The fewer tokens the model has to read before it understands your data, the better it performs and the less it costs.

So, next time before pasting that big JSON blob into your prompt, ask yourself…
Can I just write this as plain text or CSV?

You might be surprised how much you save by simply removing quotes.

Closing Thoughts

Token efficiency isn’t about being fancy, it’s about being thoughtful. It’s not about replacing JSON or adopting a new syntax. It’s just about writing data that’s simpler for the model to understand. Because sometimes, the smartest optimization is just… removing what’s unnecessary.

Tags

LLMsPrompt EngineeringJSONDeveloper ProductivityToken Efficiency

Share this article