We’re excited to announce the Public Preview for the Azure OpenAI Semantic Caching policy in Azure API Management! This innovative feature empowers customers to optimize token usage by leveraging semantic caching, which intelligently stores completions for prompts with similar meanings.
With this policy, customers can easily configure semantic caching for their Azure OpenAI endpoints. This caching mechanism utilizes Azure Redis Enterprise or any other external cache that has been onboarded to APIM, providing flexibility in caching solutions.
By leveraging the Azure OpenAI Embeddings model to calculate vectors for prompts, the semantic caching policy intelligently identifies semantically similar prompts and stores respective completions in the cache. This allows for efficient completions reuse, reducing token consumption and improving overall performance.
Customers can configure semantic caching in a centralized manner for multiple API consumers, streamlining management and ensuring consistent caching behavior across their API ecosystem. This capability enables customers to maximize the benefits of caching and optimize token usage, enhancing the scalability and efficiency of their Azure OpenAI integration.
Click here to learn more.
 



