Caching
Caching can reduce latency and cost, but it must be implemented carefully to avoid leaking sensitive content or serving stale results. This guide outlines safe, practical caching patterns for LLM security workflows.
What to Cache (and What Not To)
Good candidates:
- deterministic scan results for identical inputs
- allowlist and blocklist lookups
- policy configurations that change infrequently
Avoid caching:
- raw prompts that contain personal data or PHI
- full model responses for public or shared systems
- anything tied to a specific user session unless scoped carefully
Cache Key Strategy
Use a stable, privacy-safe key:
- hash input content instead of storing raw content
- include policy version and sensitivity in the key
- include account or environment identifiers
Example key format:
scan:{account}:{policyVersion}:{hash(content)}
TTL and Invalidation
- use short TTLs for user-generated content
- invalidate caches when policies change
- avoid long-lived caches for security decisions
Redis for Distributed Caching
If you use Redis, keep it private and secured. Configure it as part of your infrastructure:
redis:
enabled: true
url: "redis://localhost:6379/0"
Use network restrictions and TLS where supported by your Redis deployment.
Safety Guidelines
- cache only the minimum data needed for the decision
- never cache secrets or credentials
- do not store PHI or PII in cache values
- monitor cache hit rates and eviction behavior
Testing and Monitoring
- validate that cached decisions match live results
- track cache hit ratio and latency impact
- log cache errors without exposing sensitive input