Clarifai and Vultr Lead the Charge in AI Inference Performance at NVIDIA GTC 2025

Clarifai and Vultr Unveil Record-Breaking AI Inference Performance

In a remarkable showcase of advanced technology, Clarifai and Vultr have teamed up to present unprecedented AI inference performance during the NVIDIA GTC 2025 conference held in Washington, D.C. from October 28 to 30. Their latest benchmark results reveal that the Clarifai Reasoning Engine, when powered by Vultr’s expansive dedicated GPU infrastructure, achieves impressive inference speeds and cost efficiencies that overshadow other platforms in the market.

The Clarifai Reasoning Engine is optimized for agentic AI inference, making it a vital tool for organizations looking to leverage AI’s full potential. Recent independent evaluations by Artificial Analysis showed that this system operates at a blistering speed of 544 tokens per second, with a rapid time to first token of just 0.36 seconds. Most impressively, it offers an industry-leading cost of merely $0.16 for every million tokens processed on the GPT-OSS-120B model. These metrics indicate not only superiority in speed but also in the efficiency of resources utilized, showcasing a significant advancement over other GPU-based platforms.

Kevin Cochrane, CMO of Vultr, emphasized the significance of this performance, stating, "Clarifai's benchmark-topping performance is a testament to what's possible when software innovation meets cloud engineering excellence. Our GPU clusters deliver extraordinary inference speed and efficiency while keeping costs under control. Together with Clarifai, we enable organizations to develop competitive, high-performance AI solutions faster than ever before."

Clarifai’s co-founder and CEO, Matthew Zeiler, also spoke to the synergistic relationship between Clarifai and Vultr. He highlighted how Vultr’s GPU infrastructure is crucial in maximizing the capabilities of the Clarifai Reasoning Engine. As enterprises engage in building AI workloads and systems, the balance of performance and cost efficiency becomes critical. Vultr’s infrastructure is tailored to meet these demands while ensuring the quality of AI reasoning isn’t compromised.

The recent benchmarks are part of Clarifai's 11.9 release, which has introduced new functionalities designed for advanced AI systems. Among these enhancements are cloud instances powered by the NVIDIA HGX B200 and NVIDIA GH200 Grace Hopper™ Superchips. The platform also expands its toolkit compatibility, allowing users to effortlessly manage their AI models with vLLM, LMStudio, and Hugging Face integrations. Additionally, fresh model releases such as Qwen3-Next-80B-A3B-Thinking and Qwen3-30B-A3B-Instruct enrich the model-agnostic ecosystem introduced by Clarifai.

Clarifai's Reasoning Engine is designed to cater to enterprise-scale workloads, continuously optimizing its processes such as kernels, batching, and memory usage based on the observed workload behavior. This capability leads to performance enhancements over time without sacrificing accuracy, enabling flexibility for organizations as they either adapt existing models or introduce new ones.

As Clarifai and Vultr continue to collaborate, they set an exceptional benchmark in the artificial intelligence landscape, emphasizing performance, scalability, and cost control for developers and enterprises alike. This partnership undoubtedly accelerates innovation in reasoning systems, agentic applications, and generative AI, significantly impacting the future of AI deployment in various industries.

To learn more about these advancements and how they could benefit your business, visit Clarifai and Vultr at booth #453 during the conference.