Introducing ELYZA-LLM-Diffusion: The Future of Japanese Text Generation

ELYZA-LLM-Diffusion: A New Era in Japanese Text Generation

In a significant step towards advancing the capabilities of text generation in Japanese, ELYZA, a company dedicated to research and development of large-scale language models (LLMs), has unveiled its latest innovation: ELYZA-LLM-Diffusion. This diffusion-based large language model (dLLM) leverages KDDI's GPU infrastructure to enhance the ability to comprehend knowledge and follow instructions in the Japanese language, officially making it available for commercial use.

Understanding Diffusion Large Language Models (dLLMs)

Diffusion models, originally utilized for image generation AI, are now being adapted for language generation through the innovative dLLM approach. Unlike traditional autoregressive models that generate text sequentially from left to right, dLLMs can generate text by progressively removing noise from a dataset that has been deliberately corrupted with noise. By training the model on a diffusion process and a reverse diffusion process, it effectively generates clean data from noisy inputs. This method offers several advantages, including potentially fewer processing cycles, leading to faster generation speeds and reduced power consumption.

However, the technology faces challenges, including high training costs and the need for a more mature ecosystem for inference platforms. As of now, practical applications are limited, but foundational research is advancing, and the technology holds promise for future use. While several open models have emerged, most have primarily been trained on English datasets.

The ELYZA-LLM-Diffusion Series

ELYZA has built on the existing dLLM, Dream-org/Dream-v0-Instruct-7B, developed by the HKU NLP Group. By incorporating additional pre-training and instructive learning with Japanese data, ELYZA has successfully created a series of models designed to enhance the proficiency and instruction-following capabilities in Japanese.

The following models are now publicly available:

- ELYZA-Diffusion-Base-1.0-Dream-7B: This model has undergone additional pre-training with Japanese data to enhance its capabilities. Check it out here.
- ELYZA-Diffusion-Instruct-1.0-Dream-7B: This model includes instructive learning based on the previously mentioned base model. Explore it here.

For a clear understanding of the differences in the generation processes between dLLMs and AR models, you can view a comparative video provided by the company.

Additionally, a chat UI demo featuring this model is available on the Hugging Face hub, where users can experiment with the model live. Please note that high demand may result in waiting times for requests to be processed.

Performance Overview

Tackling the question of how well the model performs on Japanese tasks, ELYZA conducted evaluations comparing it with existing open dLLMs and AR models. The findings revealed that ELYZA-Diffusion-Instruct-1.0-Dream-7B demonstrates equal or even superior capabilities in tasks requiring general proficiency in the Japanese language.

Addressing the Power Consumption Challenge

As the demand for AI utilization continues to surge, the international community faces growing concerns over power consumption related to data centers used for AI applications. To address this, efficient reasoning and learning models like LLMs are essential. The dLLM's characteristic of generating text with fewer processing cycles could help mitigate the time and energy required compared to traditional AR models.

Further advancing this research is expected to accelerate the development of efficient high-performance Japanese LLMs. ELYZA commits to ongoing research and development centered around LLMs, ensuring that findings are shared widely to support the implementation of LLM technologies across Japan's landscapes and contribute to the evolution of natural language processing techniques.

Company Overview

Founded in September 2018, ELYZA is dedicated to creating breakthroughs in language model technology under the ethos of 'creating the norm in uncharted territories.' Focusing on large-scale language models in Japanese, ELYZA is involved in collaborative research with businesses and developing cloud services that foster corporate growth. The company is actively engaged in promoting the adoption and implementation of large-scale language models through cutting-edge research and consulting. ELYZA’s headquarters is located at Bunkyo-ku, Tokyo, Japan and its website is here.