AI Inference Is King; Do You Know Which Chip is Best?
Nvidia's AI performance for inference remains strong, with AMD and Intel making progress but still behind. The software advantages of Nvidia are a significant barrier for competitors. Intel and AMD are working to close the gap, but Nvidia's dominance in AI is still evident.

Everyone is not just talking about AI inference processing; they are doing it. Analyst firm Gartner released a new report this week forecasting that global generative AI spending will hit $644 billion in 2025, growing 76.4% year-over-year. Meanwhile, MarketsandMarkets projects that the AI inference market is expected to grow from $106.15 billion in 2025 to $254.98 billion by 2030. However, buyers still need to know what AI processor to buy, especially as inference has gone from a simple one-shot run through a model to agentic and reasoning models that can increase computational requirements by some 100-fold. Performance continues to skyrocket, driving down price/token.
Nvidia Headquarters
For seven years, the not-for-profit group MLCommons has been helping AI buyers and vendors by publishing peer-reviewed quarterly AI benchmarks. It has just released its Inference 5.0 suite of results, with new chips, servers, and models.
New benchmarks were added for the larger Llama 3.1 405B, Llama 2 70B with latency constraints for interactive work, and a new “R-GAT” benchmark for graph models. Only Nvidia ran benchmarks for all the models. A new benchmark was also added for edge inference, the Automotive PointPainting test for 3D object detection. There are now 11 AI benchmarks managed by MLCommons.
AI Benchmark Results
AI is built on silicon, and MLCommons received submissions for six new chips this round, including AMD Instinct MI325X, Intel Xeon 6980P “Granite Rapids” CPU, Google TPU Trillium (TPU v6e), Nvidia B200 (Blackwell), Nvidia Jetson AGX Thor 128, and the Nvidia GB200.
As usual, Nvidia won all benchmarks by a significant margin. The B200 tripled the performance of the H200 platform and the Blackwell GPU outperformed the Hopper GPU three-fold in various models, showcasing exceptional performance improvements.
Nvidia Inference Performance
The NVL72, powered by Nvidia GB200, exceeded performance expectations by being thirty times faster than the 8-GPU H200 running the new Llama 405B, although it has 9 times more GPUs. Nvidia's new open-source Dynamo “AI Factory OS” further optimizes AI at the data center level, potentially doubling throughput for AI factory operations.
AMD also showcased competitive performance with their MI325 chip, which performed admirably against Nvidia's previous generation Hopper GPU. AMD's ROCm software alternative to CUDA is gaining traction in the industry, attracting users who are looking for cost-effective AI solutions.
Competition and Future Expectations
While Nvidia remains the leader in AI applications, AMD and Intel are making strides in closing the performance gap. AMD expects their upcoming MI350 chip to further narrow the distance, while Nvidia continues to invest in software and solutions to maintain its lead at the data center level.
Overall, the AI industry is witnessing fierce competition and advancements from major players like Nvidia, AMD, and Intel, offering a variety of options for AI practitioners seeking optimal performance and cost-effective solutions.
Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. The author's firm, Cambrian-AI Research, has various semiconductor firms as clients but holds no investment positions in the companies mentioned in this article.