AI Engineering
AI-Native Product Architecture: Beyond the ChatGPT Wrapper
A framework for building products where AI is the core, not a feature bolted on. LLM routing, fallback chains, observability, and cost control at scale.
What "AI-Native" Actually Means
Most products today are AI-augmented: a traditional application with an LLM bolted on to summarize, classify, or generate. AI-native means AI is the core execution engine — it routes decisions, generates user-facing content, and drives product behaviour end-to-end. The architecture required is fundamentally different.
LLM Routing: The Load Balancer for Intelligence
Not all tasks require GPT-4 or Claude Opus. A routing layer directs each request to the cheapest model that can handle it acceptably. Simple classification and short-form generation go to a fast, cheap model (Haiku, GPT-4o-mini). Complex reasoning and long-form generation go to a frontier model. We reduced LLM spend by 55% on one product by implementing this pattern. The key metric is quality-at-cost, not quality alone.
Fallback Chains
Design explicit fallback chains: primary model → secondary provider → cached response → graceful degradation. Cache aggressively — a semantic cache (embed the query, check cosine similarity against a response cache) can serve 20–40% of traffic without a model call. LlamaIndex's SemanticCache and GPTCache are production-ready implementations.
Cost Control as a Product Feature
AI products have a unique cost structure: marginal cost per query is non-zero. Build cost control into the product layer: per-user token budgets, rate limiting by tier, and cost-per-feature dashboards let you price the product correctly and catch runaway usage before it becomes a finance problem.
Avoiding the ChatGPT Wrapper Trap
The ChatGPT wrapper is a product that is essentially a UI for a model — no proprietary data, no workflow integration, no switching cost. AI-native products are defensible through data flywheel (user interactions improve the model), workflow integration (AI embedded in the user's actual work), and proprietary retrieval (your corpus is something competitors cannot replicate). Build toward those properties from the first sprint.
Deepak Kushwaha