There is no single price. The build is usually a few weeks of senior engineering; the surprise is the running cost per request, which is where most of the long-term money goes. Scope the smallest feature that proves value, instrument cost from day one, and expand from there.
This is the first question almost every founder asks us, and the honest answer is that it splits into two very different numbers: what it costs to build the feature, and what it costs to run it once real users hit it. Teams plan for the first and get surprised by the second.
What actually drives the cost
The headline price of an AI feature is mostly senior engineering time, and that scales with a handful of things:
- Scope: a single well-defined feature (a copilot, a search box, a classifier) is a few weeks. A platform of features is a quarter or more.
- Data access: connecting the model to your private data safely, with permission-aware retrieval, is usually the real work, not the prompt.
- Quality bar: a demo is cheap. A feature that holds up across edge cases needs evals, and evals take time to build.
- Latency and cost targets: hitting a price-per-request and a response-time budget at scale is engineering, not configuration.
The running cost is the part people miss
Once a feature ships, every interaction calls a model, and those calls have a unit cost. At a few hundred users it is a rounding error. At scale it can become one of your largest line items if nobody is watching it. We have seen features that worked perfectly in a demo become uneconomic in production simply because no one measured cost per request before launch.
The good news is that this is controllable. Caching, model routing (using a cheaper model where it is good enough), prompt compression, and strict token budgets routinely cut running cost by large margins without hurting quality. But it only works if you instrument it from the start.
Where teams overspend
- Building a broad platform before proving that one feature changes user behaviour.
- Reaching for fine-tuning when retrieval would have been cheaper and easier to maintain.
- Shipping without cost instrumentation, then discovering the bill after launch.
- Paying for premium models on every call when a cheaper model handles most of the traffic.
How we scope it
We start with a short paid sprint: a technical spec, a cost model with a measured price per request, and a working prototype against your real data. From there you know the build number and the running number before committing to the full feature. Most first versions ship in under six weeks; the cost model is what keeps them shippable at scale.
Related
Want this built for your product or business?
We scope the smallest version that proves value, then ship it to production. Fixed scope, fixed timeline, senior engineers only.
More guides
GDPR-compliant AI assistant for your business: what is actually required
What it really takes to run an internal AI assistant on your company data under GDPR: data residency, access control, retention, and what to ask any vendor.
RAG vs fine-tuning: which one does your product need?
A practical comparison of retrieval (RAG) and fine-tuning for product AI features: when each one fits, what they cost to run, and why most teams should start with retrieval.