Supermicro Unveils NVIDIA HGX™ B200 Systems Reinforcing AI Performance Leadership in MLPerf® Inference v5.0

Supermicro's Groundbreaking Performance in AI Technology

In a significant advancement for artificial intelligence technology, Super Micro Computer, Inc. (SMCI) has launched its new line of NVIDIA HGX™ B200 systems, setting a high bar in AI performance as confirmed by the latest MLPerf® Inference v5.0 benchmarks.

The newly released systems, notable for their impressive liquid and air cooling technologies, have shown exceptional performance, surpassing previous generation systems by generating three times as many tokens per second. This leap in performance represents a major milestone in AI/ML, high-performance computing (HPC), and cloud storage solutions.

A New Era in AI Capabilities

According to the benchmarks, the Supermicro NVIDIA HGX™ B200 systems, when deployed with eight GPUs, achieved unprecedented performance levels. The 4U liquid-cooled and 10U air-cooled configurations set new records according to specific benchmarks, indicating Supermicro's unwavering commitment to leading innovation in the AI sector. Charles Liang, the President and CEO of Supermicro, stated, "Supermicro continues to lead the AI industry, as demonstrated by the first new benchmarks released by MLCommons in 2025."

The results were achieved during the MLCommons benchmark period, reiterating Supermicro's dedication to developing optimized solutions that maintain industry standards and performance levels. The liquid-cooled systems showcased over 3 times the tokens per second in benchmarks for both Llama2-70B and Llama3.1-405B, solidifying the units' superiority over competing H200 systems.

Innovative System Design

What sets Supermicro apart is its unique building block architecture, allowing for a diverse range of optimized systems tailored to various AI workloads. This versatility enables the company to rapidly adapt and refine solutions in collaboration with NVIDIA, ensuring clients receive cutting-edge technology that meets their specific AI needs.

Both air-cooled and liquid-cooled versions of the NVIDIA HGX™ B200 8-GPU systems were in active use even before the MLCommons benchmark initiation date. Supermicro's technicians worked diligently to enhance both hardware and software configurations to maximize performance outcomes, adhering strictly to the benchmarking rules provided by MLCommons. The dedication to optimization has resulted in air and liquid-cooled variants exhibiting equivalent performance results, reinforcing the availability of high-quality systems for clients.

Impressive Benchmark Results

Technically, Supermicro's SYS-421GE-NBRT-LCC (featuring 8x NVIDIA B200-SXM-180GB) and SYS-A21GE-NBRT models demonstrated superior output while executing the Mixtral 8x7B Inference and Mixture of Experts benchmarks, achieving 129,000 tokens per second. Notably, results for the large Llama3.1-405b model reached over 1,000 tokens per second. Previous generation GPU systems were unable to come close to these performance metrics, marking a significant leap in the technology's capability.

The benchmarks also included impressive performances with smaller inference tasks. For Tier 1 system deliveries, the Supermicro device equipped with NVIDIA B200 SXM-180GB achieved peak performances across a variety of scenarios, as evidenced by the Llama2-70b benchmark.

Pioneering Cooling Solutions

The NVIDIA HGX B200 8-GPU systems are equipped with next-gen cooling technologies. Utilizing newly designed cold plates and an enhanced 250kW coolant distribution unit (CDU) doubles the cool capacity of previous designs while fitting into the familiar 4U form factor. This innovative design safeguards rack space and maximizes performance efficiency, allowing for simultaneous operations of multiple systems. Notably, each 42U, 48U, or 52U configuration accommodates numerous systems and GPUs, reflecting Supermicro's commitment to effective rack solutions.

The new 10U air-cooled system also boasts advanced design capabilities, offering space for up to eight 1000W TDP Blackwell GPUs while maintaining the density and performance efficiency that enterprises require.

Conclusion

Supermicro stands at the forefront of the IT solutions sector, with an extensive portfolio of over 100 GPU-optimized systems across various cooling methods and configurations. Established and headquartered in San Jose, California, Supermicro epitomizes innovation in AI infrastructure delivery, ensuring clients are equipped with state-of-the-art technology designed for maximum efficacy and minimal environmental impact. Their unwavering dedication to product excellence cements Supermicro's position as a critical player in the fast-evolving landscape of AI and high-performance computing.

For further exploration of the new MLPerf v5.0 Inference benchmarks and Supermicro's innovative offerings, visit MLCommons or check their product lineup on the Supermicro website.