The Kubernetes AI Toolchain Operator (KAITO) add-on for AKS is now available in preview. You can now run specialized machine learning workloads like large language models (LLMs) on AKS more cost-effectively and with less manual configuration. The add-on is based on the open-source Kubernetes AI Toolchain Operator (KAITO).
Streamlined to a few steps, the AI toolchain operator add-on for AKS automates LLM deployment across available CPU and GPU resources by selecting optimally sized infrastructure for the model.
This add-on makes it possible to easily split inferencing across multiple lower-GPU count VMs, increasing the number of Azure regions where workloads can run, eliminating wait times for higher GPU-count VMs, and lowering overall cost.
You can also choose from preset models with images hosted by AKS, significantly reducing overall inference service setup time on your cluster.