We’re excited to announce the General Availability of the Azure OpenAI Token Limit Policy in Azure API Management! This feature empowers customers to effectively manage and enforce limits on API consumers based on their usage of OpenAI tokens.
With this policy, customers can set limits on API consumers based on OpenAI token usage, expressed in tokens-per-minute (TPM). This allows for precise control over token consumption, ensuring fair and efficient utilization of OpenAI resources.
Customers have the flexibility to assign token-based limits on any counter key, such as Subscription key, IP Address, etc., tailoring the enforcement to their specific use cases. .
By relying on token usage metrics returned from the OpenAI endpoint, customers can accurately monitor and enforce limits in real-time. The policy also enables pre-calculation of prompt tokens on the APIM side, minimizing unnecessary requests to the OpenAI backend if the limit is already exceeded.
Furthermore, customers can leverage headers and variables such as tokens-consumed and remaining-tokens within policies for enhanced control and customization.
With the introduction of the OpenAI Token Limit Policy, customers can now centrally manage limits for multiple API consumers and OpenAI endpoints for both streaming and non-streaming scenarios, streamlining the management process and improving resource utilization efficiency.
Click here to learn more.