Moreh Breaks Ground with LLM Inference on Tenstorrent Galaxy: A Game Changer for AI Infrastructure
Moreh's Landmark Achievement in LLM Inference
In a significant advancement for artificial intelligence infrastructure, Moreh, a pioneering company in software solutions for AI, has successfully validated its LLM (Large Language Model) inference performance on the Tenstorrent Galaxy Wormhole system. This breakthrough was announced recently by CEO Gangwon Jo during the TT-Deploy launch event held in San Francisco on May 1, 2026.
Performance Validation
Moreh's performance tests leveraged various leading Mixture-of-Experts (MoE) models, including well-regarded systems like GPT-OSS, Qwen, GLM, and DeepSeek. These validations confirmed that the LLM inference on the Tenstorrent Galaxy system not only matched but in some cases surpassed the performance capabilities of traditional NVIDIA DGX A100 class systems. This achievement illustrates a promising alternative to the standard GPU-centric AI architectures that have dominated the market.
Cost Efficiency Improvements
In addition to outstanding performance, Moreh's innovative architecture focused on improving cost efficiency within its AI infrastructure. The company introduced a disaggregated serving architecture, successfully integrating GPUs with Tenstorrent Wormhole chips. This strategic implementation allows Moreh to utilize Tenstorrent processors specifically as dedicated fill accelerators, significantly reducing the reliance on high-cost HBM (High Bandwidth Memory), and streamlining overall infrastructure expenditures.
The results and methodology were first showcased live at the TT-Deploy event, further establishing Moreh's reputation as a leader in AI infrastructure solutions.
Live Demonstrations and Technical Insights
During the launch, Moreh demonstrated a live inference scenario utilizing its proprietary MoAI Inference Framework. This extensive and innovative framework allows for a unified operation of heterogeneous processing units, including NVIDIA, AMD, and Tenstorrent, within a single cluster. Such capabilities provide companies the flexibility to develop AI infrastructures without being locked into a specific vendor, thus encouraging better scalability and adaptability.
CEO Gangwon Jo remarked on the significance of this technology, stating, “Achieving production-grade LLM inference performance and stability on Tenstorrent-based systems marks a significant milestone.” He further indicated that the company is committed to continuously enhancing performance through intense optimizations across various architectures and deeper integration with Tenstorrent NPUs.
Building a Global Market Presence
Moreh is concurrently working on developing its own core AI infrastructure engine through its foundational LLM subsidiary, Motif Technologies. The aim is to construct end-to-end capabilities that span both the infrastructure and model domains, making strides in the global market alongside key partnerships with industry-leading companies such as AMD and Tenstorrent.
This progress underscores the significance of collaborative efforts within the tech industry to push the boundaries of what is achievable with AI and machine learning technologies. Moreh’s innovative solutions are set to not only alter cost structures in AI infrastructure but also alter how businesses approach software solutions in the future.
Conclusion
The achievement of LLM inference on the Tenstorrent Galaxy represents a pivotal development in AI technology, blending high performance with cost-effectiveness and flexibility. As Moreh continues to optimize its systems and expand its market reach, it paves the way for further innovations that can reshape the landscape of AI infrastructure.