IBM has unveiled a new processor, codenamed Telum, which it says will accelerate artificial intelligence (AI) processing on its Z-series mainframes. Developed over the last three years by IBM’s Research AI Hardware Center, the chip contains eight processor cores with a deep super-scalar out-of-order instruction pipeline, running with more than 5GHz clock frequency. IBM said Telum is optimised for the demands of heterogeneous enterprise-class workloads.
Telum uses a redesigned cache and chip-interconnection infrastructure, which now provides 32MB cache per core, and can scale to 32 Telum chips. The dual-chip module design contains 22 billion transistors and 19 miles of wire on 17 metal layers.
Because of latency requirements, complex fraud detection often cannot be completed in real time, which means a bad actor could already have successfully purchased goods with a stolen credit card before the retailer is aware that fraud has taken place. Telum is IBM’s first processor that contains on-chip acceleration for AI inferencing while a transaction is taking place.
Christian Jacobi, IBM’s chief architect for Z processors, said IBM wanted to provide its banking, finance and insurance customers with the ability to run real-time AI at a transaction volume of 10,000 to 50,000 transactions per second. “It is built for in-transaction inference and designed using an AI core from the IBM AI research centre,” he said. “We worked with the Z team to make it accessible to deal with high transaction.”
The acceleration is provided via a new instruction, which is programmed under the Z processor core, said Jacobi. “There is no operating system intervention.”
Unlike GPU-based AI acceleration, he said, “there is no need to send data across a PCI bus, which adds to latency”.
According to Jacobi, the new AI accelerator chip is optimised to provide direct access to the memory where data is stored. When it is not being used for AI processing, Telum can switch to run normal processing functions, he said.
IBM said that at a socket level, the new chips will offer a 40% increase in performance compared with the Z15 system, and Jacobi said IBM plans to develop further optimisation in the software stack.
“There are layers of code involved in delivering the entire solution,” he said. “It starts with the silicon and the firmware that runs on the processor cores and the AI accelerator. This firmware implements various operations, like ‘Matrix Multiplication’. On top of that runs the operating system and AI framework software, exploiting the new Neural Network Processing Assist instruction that is the software-level view onto the on-chip accelerator.
“With this approach, clients can build AI models anywhere – on IBM Z, IBM Power or other systems of their choice – then export those models into the Open Neural Network Exchange [ONNX] format. Then the IBM Deep Learning compiler will compile and optimise the ONNX models for deployment on IBM Z. The compiled models will then run on Telum, directly exploiting Telum’s AI accelerator through that hardware/firmware/software stack.”