All postsCategory

Caching

Prompt, prefix, semantic, and KV caching — the techniques that cut repeated LLM work and the cost that comes with it.

llmcost optimizationanthropic

Anthropic Prompt Caching: How It Works + When to Use It

How Anthropic prompt caching works, what it costs to write and read the cache, and the conditions under which it cuts input token spend by up to 90%.

MKMohammed Kafeel
9 min read
llmcachingsemantic caching

Category-Aware Semantic Caching for LLM Workloads

How to partition your semantic cache by query category so similar-but-different intents don't collide, and why heterogeneous workloads break naive semantic caching.

MKMohammed Kafeel
14 min read
llmvllminference

vLLM KV Cache Reuse: A Guide to Cutting Inference Costs

How to configure and verify KV cache reuse in vLLM to cut repeated-prefix inference costs, with concrete steps and the metrics to watch.

MKMohammed Kafeel
14 min read
llmcachingarchitecture

Multi-Tier LLM Cache: Semantic, Prefix & Inference Layers

How to stack semantic, prefix, and inference-layer caches into a single pipeline that maximises hit rate while controlling cost and staleness.

MKMohammed Kafeel
15 min read
llmcachingcost optimization

LLM Cache Pre-Warming for Off-Peak Customer Service Bots

A case study on warming LLM caches with predictable queries overnight so support bots hit cache on the first message of the day instead of paying full inference cost.

MKMohammed Kafeel
13 min read
llmprompt cachingopenai

OpenAI vs Anthropic Prompt Caching: Key Differences

A side-by-side comparison of how OpenAI and Anthropic implement prompt caching — automatic vs manual, TTLs, pricing, and which fits which workload.

MKMohammed Kafeel
12 min read
llmcachingprefix caching

Prefix Caching vs Semantic Caching: Which Fits Your App?

The practical difference between prefix caching (exact-match on token sequences) and semantic caching (embedding similarity), and how to pick the right one for your use case.

MKMohammed Kafeel
12 min read
llmprompt cachingcost optimization

Prompt Caching Break-Even: How Many Reads to Save Money?

The exact formula for calculating your prompt caching break-even point — factoring in write premium, read discount, TTL, and request volume — so you know whether caching is worth it before you turn it on.

MKMohammed Kafeel
12 min read