Everyone talks about building AI. Almost nobody talks about running AI. Here’s what the demo doesn’t show you.
The Visible Costs
These are the costs on the pricing page:
- API calls / inference costs. Per-token pricing for LLMs, per-request for hosted models.
- Compute for training/fine-tuning. GPU hours aren’t cheap.
- Vector database hosting. If you’re doing RAG, you’re paying for embeddings storage and retrieval.
Most teams budget for these. What they miss is everything below.
The Hidden Costs
Evaluation Infrastructure
You can’t improve what you can’t measure. AI systems need:
- Test datasets — curated, maintained, and regularly updated
- Evaluation pipelines — automated scoring, human review workflows
- A/B testing infrastructure — for comparing model versions
- Regression testing — ensuring new deployments don’t break existing behavior
Building this properly takes 2-4 weeks of engineering time. Skipping it means flying blind.
Monitoring and Observability
Traditional software either works or throws an error. AI systems can be confidently wrong. You need:
- Output quality monitoring — tracking drift in response quality over time
- Latency monitoring — LLM response times vary wildly
- Cost monitoring — a single prompt engineering change can 10x your API bill
- Usage pattern analysis — understanding how users actually interact with the system
Prompt Engineering and Maintenance
Prompts are code. They need:
- Version control
- Testing
- Review processes
- Regular updates as models change
When OpenAI or Anthropic releases a new model version, your carefully tuned prompts might behave differently. Someone needs to test and adjust.
Edge Case Handling
The demo handles the happy path. Production handles everything else:
- What happens when the model returns garbage?
- What happens when latency spikes to 30 seconds?
- What happens when the API is down?
- What happens when a user intentionally tries to break it?
Each of these needs a fallback strategy, and each fallback needs its own testing.
A Realistic Budget
For a production AI feature (not a toy, not a demo), expect:
| Category | % of Total Cost |
|---|---|
| Initial development | 25-30% |
| Evaluation & testing infrastructure | 15-20% |
| Ongoing API/compute costs (annual) | 20-30% |
| Monitoring & maintenance | 15-20% |
| Prompt engineering & iteration | 10-15% |
The initial build is less than a third of the real cost. If your budget only covers development, you don’t have a budget for AI in production.
When AI Isn’t Worth It
Sometimes, after doing this math, the answer is: use a rule-based system. Or a spreadsheet. Or a human.
That’s not a failure. That’s good engineering. The goal is to solve the problem, not to use AI.
Our Approach
At Medhaksha Labs, we start every AI project with a cost model. Before writing a single line of code, we estimate:
- Total cost of ownership for year one
- Ongoing operational costs
- The break-even point vs. non-AI alternatives
If AI doesn’t make economic sense, we’ll tell you. We’d rather build you something that works within your budget than something impressive that you can’t afford to run.
The best AI system is the one that delivers more value than it costs. Everything else is a demo.