iFLYTEK Achieves Recognition for AI Infrastructure with Volcano at CNCF

iFLYTEK Wins CNCF End User Case Study Contest



The Cloud Native Computing Foundation (CNCF) has officially named iFLYTEK as the winner of its End User Case Study Contest due to the company’s remarkable implementation of the Volcano batch system. This recognition came during an announcement made on June 9, 2025, just ahead of the KubeCon + CloudNativeCon China event scheduled for June 10-11 in Hong Kong.

About iFLYTEK



iFLYTEK is a leading Chinese technology firm specializing in artificial intelligence, particularly in the areas of speech and language processing. As this tech company ramped up its operations, it encountered significant challenges associated with scaling its AI workloads. One of the primary issues was the inefficient scheduling of resources, resulting in underutilization of Graphics Processing Units (GPUs). Teams within the company were often at odds over limited resources, leading to hampered workflows and elongated project timelines.

The Challenge



Before adopting Volcano, iFLYTEK's infrastructure was plagued by coordination issues. Different teams frequently struggled with resource bottlenecks, job failures, and complex debugging scenarios in training pipelines. This lack of efficiency not only inhibited the company’s progress but also imposed a significant stressful burden on the teams involved in AI training and development.

To mitigate these challenges, iFLYTEK knew it had to find a solution that enabled better resource allocation, simplified workflows, and minimized job disruptions. The need for fair access to resources among various teams became apparent, laying the groundwork for the implementation of Volcano.

The Solution: Volcano



Volcano is a cloud-native batch system that operates on Kubernetes, specifically tailored to tackle high-performance workloads like AI/ML training, big data processing, and scientific computations. With advanced scheduling capabilities—ranging from job orchestration to resource fairness and queue management—Volcano proves to be essential for effective handling of large-scale distributed tasks.

iFLYTEK's collaboration with Volcano has proved transformative. By integrating this technology, the company has implemented elastic scheduling and Directed Acyclic Graph (DAG)-based workflows, instigating a series of operational enhancements. These improvements led to a remarkable increase in GPU utilization, significant cost reduction in infrastructure, and faster recovery from job failures.

Results and Improvements



The integration of Volcano has yielded substantial outcomes for iFLYTEK:
  • - 40% Increase in GPU Utilization: By optimizing GPU resources, iFLYTEK has effectively cut down on idle computing and reduced operational costs.
  • - 70% Faster Job Recovery: With enhanced stability, the training processes are now uninterrupted, allowing for continuous progression in projects.
  • - 50% Acceleration in Hyperparameter Search: This speed-up has facilitated quicker iterations and innovations in AI model development.

As stated by DongJiang, a senior platform architect at iFLYTEK, “Before Volcano, coordinating training under large-scale GPU clusters across teams meant constant firefighting. Volcano gave us the flexibility and control to scale AI training reliably and efficiently.”

Looking Ahead



Chris Aniszczyk, CTO of CNCF, highlighted the important role open-source technologies play in addressing complex challenges at scale. “iFLYTEK's case study demonstrates how open source can solve complex, high-stakes challenges at scale,” said Aniszczyk. The company’s facilities and insights gained from its implementation of Volcano will be featured during their upcoming presentation at KubeCon + CloudNativeCon China, under the title “Scaling Large Model Training in Kubernetes Clusters with Volcano.”

This event presents an excellent opportunity for the tech community to learn from iFLYTEK’s journey toward scaling AI models effectively. For further details regarding the event schedule, please visit CNCF’s official event page.

Conclusion



As demand for AI grows, iFLYTEK's successful adaptation of cloud-native tools like Volcano signifies a transformative step in managing increasingly complex workloads. Their experience undoubtedly acts as a valuable case study for organizations aspiring to improve workflow efficiency and resource management in AI training environments.

Topics Consumer Technology)

【About Using Articles】

You can freely use the title and article content by linking to the page where the article is posted.
※ Images cannot be used.

【About Links】

Links are free to use.