Writing

Blog

Long-form thinking on AI engineering, system design, and building products people actually use.

Jan 15, 2026·10 min read

LLM Observability: Building Eval Pipelines That Actually Catch Problems

Logging prompts and responses is not observability. Here is how to build eval pipelines that surface hallucinations, semantic drift, and cost spikes before your users do.

Read

DjangoFastAPIArchitecture

Oct 22, 2025·8 min read

Django vs FastAPI for AI Backends: A Decision Framework

After shipping AI products with both, here is the honest breakdown — when Django's batteries-included approach wins, when FastAPI's async-first design is the right call, and how to hybridize.

Read

Vector DBRAGProduction

Aug 8, 2025·9 min read

Vector Database Showdown: Pinecone vs Weaviate vs Chroma in 2025

Benchmarked all three in production RAG workloads. The winner depends entirely on your query patterns, budget, and ops maturity — not the benchmark charts.

Read

Multi-AgentLangGraphLLM

May 19, 2025·12 min read

Multi-Agent Systems in Production: LangGraph Patterns That Actually Work

State machines for LLMs are powerful and surprisingly tricky to operationalize. Graph patterns, error-recovery designs, and human-in-the-loop integrations that held up under real load.

Read

RAGLLMProduction

Mar 28, 2025·11 min read

Building RAG Pipelines at Scale: Lessons from Production

What nobody tells you about retrieval-augmented generation when you move from prototype to production: chunking strategies, re-ranking, eval loops, and the surprising cost of naive embeddings.

Read

ArchitectureLLMProduct

Feb 14, 2025·9 min read

AI-Native Product Architecture: Beyond the ChatGPT Wrapper

A framework for building products where AI is the core, not a feature bolted on. LLM routing, fallback chains, observability, and cost control at scale.

Read

LeadershipCultureStartups

Jan 5, 2025·8 min read

Engineering Leadership in Remote-First Indian Startups

Nine years of hard-won lessons on building high-performing distributed teams — hiring for ownership, async-first culture, and why velocity is a lagging indicator.

Read

LangChainLlamaIndexRAG

Dec 12, 2024·7 min read

LangChain vs LlamaIndex in 2025: A Pragmatic Comparison

After building production systems with both, here is where each framework genuinely shines — and where they will slow you down.

Read

KubernetesMLOpsInfrastructure

Nov 3, 2024·13 min read

Kubernetes for ML Workloads: A Practical Playbook

GPU node pools, spot instance strategies, model serving with vLLM, and the autoscaling configuration that cut our inference costs by 65%.

Read

InterviewsSystem DesignCareer

Oct 15, 2024·10 min read

The System Design Interview: What Interviewers Actually Want

After conducting 200+ mock interviews, I have noticed the same patterns. This is what separates strong candidates from exceptional ones.

Read