Visual Bank Launches New Japanese Audio Dataset for AI Development

Visual Bank Inc., headquartered in Minato-ku, Tokyo, has unveiled an innovative resource for AI researchers and developers: the "Japanese Two-Speaker Leisure-Themed Conversational Speech Corpus with Transcripts." Administered through their subsidiary, Amanai Images Inc., this new dataset expands the Qlean Dataset line, specifically tailored for cutting-edge AI development in voice recognition, natural language processing, and large language models.

Overview of the Dataset

The dataset features over 400 hours of conversational audio recorded between male and female speakers aged 20 to 50. Each audio interaction has been meticulously paired with transcripts, allowing for a comprehensive analysis of natural Japanese dialogue. The conversations revolve around leisure activities including discussions about TV dramas, anime, gadget reviews, and personal travel experiences. This diverse range of topics is designed to mimic everyday conversations, making the dataset perfect for training and validating AI models, particularly in the fields of automatic speech recognition (ASR) and natural language processing (NLP).

The recordings were crafted without scripts, ensuring that the dialogue remains genuine and fluid. This approach not only facilitates a natural exchange of thoughts and opinions but also allows developers to create models that better understand and process real-life conversational dynamics. By capturing the nuances of free-flowing discussions, the dataset aims to enhance AI systems’ responsiveness and contextual understanding in practical applications.

Data Specifications

- Data Types: Audio (in mp3 and wav formats) and Text (in txt format)
- Recording Time: Approximately 400 hours, with individual audio clips lasting between 5 to 60 minutes
- Audio Quality: Recorded at a sampling rate of 44.1 kHz
- Scenarios: Conversations feature dynamic exchanges on leisure themes, such as critical thoughts on television series or movies, gaming experiences, and personal anecdotes related to travel and outings.

Use Cases

The newly released dataset provides various use case scenarios for both research and industrial purposes:

Research Applications:

1. ASR Model Evaluation: Researchers can utilize this dataset to assess the accuracy of ASR models that must handle multi-speaker inputs and speaker transitions.
2. Language Model Research: By analyzing conversational text that incorporates topic shifts and inter-speaker references, researchers can evaluate the capabilities of dialogue models in maintaining context during conversations.

Industrial Applications:

1. Validation of Voice UI: Developers of voice assistants and conversational interfaces can utilize this dataset in proof-of-concept validations to refine input processing and dialogue management techniques.
2. Evaluation of Japanese LLMs: The dataset is also excellent for developing and refining Japanese large language models (LLMs), helping to assess and improve the naturalness of responses and the continuity of dialogue.

About Qlean Dataset

The Qlean Dataset is developed and maintained by Amana Images Inc., a subsidiary of Visual Bank, and is considered to be a commercially viable AI training data solution. It encompasses a broad range of data types including images, videos, audio, 3D assets, and text. The Qlean Dataset seeks to establish a legally sound environment for both research and commercial application by providing clearly defined usage conditions and ensuring all necessary rights are cleared.

Through collaborations with significant data partners in various industries, such as the Chiba Lotte Marines and Toyo Keizai, the Qlean Dataset actively enriches its offerings to stay aligned with current trends and industry demands.

Conclusion

The launch of the Japanese Two-Speaker Leisure-Themed Conversational Speech Corpus with Transcripts represents a significant step forward in enhancing AI development in Japan. With its focus on authentic dialogue and relevant use cases for both research and commercial applications, this dataset is poised to be an invaluable resource for AI practitioners aiming to create more intuitive and responsive conversational agents.

For more information, visit the Qlean Dataset website.

Contact Information
Visual Bank Inc.
CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo
Corporate Site: Visual Bank
Amana Images: Amana Images