Categories: AI API, AI Developer Tools, AI Models, Large Language Models (LLMs), Open Source AI Models, Prompt Engineering
Not Diamond Review: Cut Your LLM Costs By Up To 10x?
Alright, let’s have a real chat. If you’re a developer, a startup founder, or just an enthusiast building cool stuff with AI, you’ve probably felt that little pang of dread when the monthly OpenAI or Anthropic bill lands. It starts small, but as your app gets more users, that bill grows. And grows. We’ve all been there, defaulting to the big, powerful models like GPT-4 for everything because, well, they’re the best, right? It’s like using a sledgehammer to hang a picture frame. It works, but it’s overkill and boy is it expensive.
For weeks I’ve been wrestling with this exact problem. How do you maintain quality without your cloud-spend spiraling into the stratosphere? Then I stumbled upon a tool with a curious name: Not Diamond. My first thought was, what a weird name. My second thought was, an AI model router? Tell me more.
So I did what any self-respecting tech geek would do. I rolled up my sleeves, brewed a strong coffee, and spent some quality time with it. And I’ve gotta say, what I found is something I think a lot of us in this space have been waiting for.
So, What Exactly is Not Diamond?
Let’s break it down. At its heart, Not Diamond is an intelligent traffic cop for your AI model requests. Instead of sending every single query from your users to one, single, expensive model, it looks at the query and decides which model is the best fit for that specific job. Think of it like a master chef with a full rack of knives. You wouldn’t use a giant meat cleaver to finely mince garlic, would you? You’d use a small, precise paring knife. The chef instinctively knows which tool is right for the task.

Visit Not Diamond
That’s what Not Diamond does for your AI. It’s a multi-model framework that lets you connect all your models—GPT-4, Claude 3 Opus, Llama 3, a fine-tuned open-source model you’re hosting yourself, you name it—into one system. Then, based on logic you can help it build, it routes the simple questions to the fast, cheap models and saves the heavy-duty, expensive ones for the truly complex problems. The result? They claim up to a 25% boost in accuracy and a cost reduction of up to 10x. A bold claim, but one that actually starts to make sense when you see how it works.
The Features That Actually Matter
A lot of tools come with a laundry list of features that sound impressive but don’t add much value. I was pleasantly surprised to find Not Diamond’s feature set is tight, focused, and genuinely useful.
Intelligent Routing: The Brains of the Operation
This isn’t just a simple, rules-based switch. This is the core magic of the platform. Not Diamond can take your own evaluation data—a set of inputs and your desired outputs—and use it to train an optimal routing algorithm specifically for your application. This is huge. It means the router’s logic isn’t based on some generic benchmark, but on what works for your users and your data. It learns that for a query about summarizing an email, maybe Claude Haiku is perfectly fine and costs a fraction of the price. But for a query that requires complex, multi-step reasoning, it knows to call in the big guns like GPT-4 Turbo. This data-driven approach is how it achieves better overall accuracy than any single model could on its own.
Automatic Prompt Adaptation: Less Tinkering, More Building
If you’ve ever tried to get the same result from three different models, you know the pain of prompt engineering. The way you ask a question to GPT-4 can be wildly different from how you need to phrase it for Llama. It’s a massive time sink. Not Diamond has this incredible feature called automatic prompt adaptation. It basically acts as a universal translator for your prompts. You write your prompt once, and it tweaks it under the hood to fit the specific syntax and style that each destination model prefers. I can’t overstate how much of a quality-of-life improvement this is. It’s one of those things you don’t realize you need until you have it, and then you can’t imagine going back.
Privacy by Design (No, Really)
Okay, so the moment you hear “third-party router,” the privacy alarm bells start ringing. “Is this company reading all my users’ prompts?” It’s a valid concern. I was skeptical too. But Not Diamond seems to have built their system with this in mind. They state that requests are handled client-side and they use a technique called fuzzy hashing. Without getting too technical, this means they can identify patterns and route requests without ever needing to see or store your raw, sensitive data. For any application dealing with user information, this isn’t just a feature; it’s a requirement. It’s a level of thoughtful engineering that builds a lot of trust.
The Big Question: Does It Actually Save You Money?
Let’s get down to brass tacks. The promise of a 10x cost reduction sounds like marketing fluff. But is it? In my experience, it’s not just possible; it’s probable for many applications. Most apps have a power-law distribution of query complexity. A small percentage of queries are really hard, but the vast majority are relatively simple. Let’s imagine a scenario.
Without Not Diamond, you send 100% of your 1 million monthly queries to a high-end model at, say, $10 per million tokens (a conservative price for a top-tier model). Your bill is hefty.
With Not Diamond, it learns that 90% of those queries can be handled perfectly by a small, fast model that costs just $0.50 per million tokens. Only the remaining 10% need the big, expensive model. The math on that is pretty compelling, isnt it? You’re suddenly paying a tiny fraction for the bulk of your traffic. That 10x figure starts to look very realistic.
Let’s Talk Pricing: Is Not Diamond Worth the Investment?
So what’s this going to set you back? I checked out their pricing page, and it seems pretty well-structured for teams of all sizes. It’s a classic SaaS model that makes a lot of sense.
There are three main tiers:
- Discovery (Free): You get up to 100,000 API routing requests per month for free. Let me repeat that. 100k requests for free. This is incredibly generous and makes it a complete no-brainer for small projects, solo devs, or just for testing it out thoroughly before committing.
- Possibility ($100/month): This is the growth-stage tier. You still get your first 100k requests free, and after that, it’s $0.001 per additional request. This is for when your app starts taking off and you need the scale to back it up.
- Necessity (Custom): This is the enterprise plan. You get the white-glove treatment with custom pricing, support, and all teh bells and whistles for large-scale deployments.
Honestly, the pricing feels more than fair. The value proposition is that the tool should easily pay for itself in LLM cost savings, and with a free tier like that, there’s very little risk in trying it out.
The Downsides: No Tool is Perfect
I always get suspicious when a review is nothing but glowing praise. Nothing is perfect, and Not Diamond is no exception. There are a couple of trade-offs to be aware of.
First, there’s a tiny bit of latency. The company claims it’s under 100ms. For 99% of applications—like content generation, email sorting, or most RAG systems—this is completely unnoticeable. A tenth of a second is nothing. However, if you’re building a hyper-responsive, real-time conversational AI where every millisecond counts, it’s something you’ll want to benchmark and test for yourself.
Second, and this is the more important one, the best custom routers require evaluation data. You can’t just plug it in and expect it to magically know the nuances of your app from day one. It needs data to learn. If you’re a brand new project with zero data and no evaluation pipeline, you’ll have to start with their pre-built routers and gather that data over time. This isn’t really a ‘con’ so much as a prerequisite for unlocking its full power. It’s a data-driven tool, so it needs data. Makes sense.
Who is Not Diamond Actually For?
After playing around with it, I have a clear picture of who gets the most out of this. Not Diamond is a perfect fit for:
- Startups and Dev Teams who are building LLM-powered features and are starting to watch their API costs with a nervous eye.
- Companies with diverse AI needs, where some tasks are simple classification and others are complex creative generation.
- Engineers who want to experiment with a mix of proprietary and open-source models without the headache of building and maintaining a complex web of integrations and logic themselves.
- Anyone who believes in the principle of using the right tool for the job.
Your Questions, Answered
How easy is it to integrate Not Diamond?
From what I’ve seen in the docs, it seems designed for easy integration. It’s built to work with existing evaluation pipelines and doesn’t require a massive architectural overhaul. It acts as a smart proxy, so you’re essentially just changing the API endpoint you’re calling.
Does Not Diamond see my private data?
No. They emphasize a ‘privacy by design’ approach using client-side processing and fuzzy hashing, which means your raw prompt data remains yours and isn’t read or stored by them.
Can I use open-source models with Not Diamond?
Absolutely. It’s a multi-model framework. The whole idea is to bring all your models, whether they’re from a major provider like OpenAI or an open-source model like Llama 3 that you’re hosting yourself, under one roof.
What if I don’t have evaluation data to start with?
You can start with their pre-built, generic routers. As your application runs, you can collect data on inputs and model performance to eventually train your own custom, hyper-optimized router.
Is the 10x cost saving claim realistic?
For applications with a high volume of simple queries, yes. It’s achieved by intelligently shifting the bulk of your workload from expensive, powerful models to cheaper, faster alternatives that are ‘good enough’ for those specific tasks.
How does Not Diamond improve accuracy?
By using a specialist for every task. A single model might be great at creative writing but mediocre at code generation. A router can send the writing task to the creative model and the coding task to a model that excels at code, leading to a higher quality output overall.
My Final Take
I came in skeptical and I’m walking away impressed. Not Diamond is one of those brilliantly simple ideas that solves a complex and expensive problem. The world of AI is moving beyond the ‘one model to rule them all’ mindset. The future is heterogeneous, a mix-and-match of the best tools for the job. A platform like this isn’t just a utility; it’s a strategic layer.
In an ecosystem where AI costs are a genuine barrier to scale, being smarter about your model usage isn’t just a nice-to-have—it’s a critical competitive advantage. Not Diamond provides the intelligence to make those smart choices automatically. For any team serious about building sustainable, scalable, and high-quality AI products, this is a tool you should be looking at. Seriously, go check out their free plan.