Features
- Automatic Caching: Passive caching that automatically identifies repeated context content without changing API call methods (In contrast, the caching mode that requires explicitly setting parameters in the Anthropic API is called “Explicit Prompt Caching”, see Explicit Prompt Caching (Anthropic API))
- Cost Reduction: Input tokens that hit the cache are billed at a lower price, significantly saving costs
- Speed Improvement: Reduces processing time for repeated content, accelerating model response
- System prompt reuse: In multi-turn conversations, system prompts typically remain unchanged
- Fixed tool lists: Tools used in a category of tasks are often consistent
- Multi-turn conversation history: In complex conversations, historical messages often contain a lot of repeated information
Code Examples
- Anthropic SDK Example
- OpenAI SDK Example
Install SDKEnvironment Variable SetupFirst Request - Establish CacheSecond Request - Reuse CacheResponse includes context cache token usage information:
Important Notes
- Caching applies to API calls with 512 or more input tokens
- Caching uses prefix matching, constructed in the order of “tool list → system prompts → user messages”. Changes to any module’s content may affect caching effectiveness
Best Practices
- Place static or repeated content (including tool list, system prompts, user messages) at the beginning of the conversation, and put dynamic user information at the end of the conversation to maximize cache utilization
- Monitor cache performance through the usage tokens returned by the API, and regularly analyze to optimize your usage strategy
Pricing
Prompt caching uses differentiated pricing:- Cache hit tokens: Billed at discounted price
- New input tokens: Billed at standard input price
- Output tokens: Billed at standard output price
See the Pricing page for details.Pricing example:
Further Reading
Cache Comparison
| Prompt Caching (Passive) | Explicit Prompt Caching (Anthropic API) | |
|---|---|---|
| Usage | Automatically identifies and caches repeated content | Explicitly set cache_control in API |
| Billing | Cache hit tokens billed at discounted price No additional charge for cache writes | Cache hit tokens billed at discounted price First-time cache writes incur additional charges |
| Expiration | Expiration time automatically adjusted based on system load | 5-minute expiration, automatically renewed with continued use |
| Supported Models | MiniMax-M2.1 series | MiniMax-M2.1 series MiniMax-M2 series |