Microsoft has announced the launch of Maia 200, an AI accelerator chip designed to speed up inference workloads and integrate with Microsoft Azure.
AI inference is the process of running a trained AI model to make predictions on new, unseen data.
With these new chips, Microsoft said it will enable its global cloud infrastructure to meet the requirements of the “next generation” of AI workloads.
Microsoft said that due to its compute power, a single Maia 200 chip will be able to “effortlessly” run today’s largest AI models while still leaving ample headroom for even larger models in the future.
The company claims that Maia 200 delivers a speed of over 10 petaFLOPS at 4 bit precision (FP4) and more than 5 petaFLOPS at 8 bit precision (FP8) based on cutting edge 3 nanometer technology.
This allows the chip to transfer data at a rate of 7TB per second. The chip also provides 272MB of on-chip memory.
Microsoft claims Maia 200 is the most efficient system it has built, offering around 30 per cent cost performance over current systems.
Maia 200 will be used for AI models from the Microsoft Superintelligence team and on products such as Azure AI Foundry , Microsoft’s integrated and interoperable AI platform for developing AI applications, as well as Microsoft 365 Copilot.
Scott Guthrie, Microsoft’s executive vice-president for cloud and AI, described Maia 200 as an AI inference “powerhouse”, claiming that it has three times the FP4 performance of the third generation Amazon Trainium and FP8 performance above Google’s seventh generation TPU.
“We also designed Maia 200 for fast, seamless availability in the data centre from the beginning, building out early validation of some of the most complex system elements, including the backend network and our second-generation, closed loop, liquid cooling Heat Exchanger Unit,” he added. “Native integration with the Azure control plane delivers security, telemetry, diagnostics and management capabilities at both the chip and rack levels, maximising reliability and uptime for production-critical AI workloads.”






Recent Stories