New Qlean Dataset Offers Japanese Horror Story Read-Aloud Audio for AI Development

New Qlean Dataset Launches Japanese Horror Story Audio Corpus

Visual Bank Inc., based in Minato-ku, Tokyo, has unveiled a new addition to its Qlean Dataset, providing a specialized collection aimed at AI developers and researchers: a Japanese single-speaker horror-themed read-aloud corpus complete with transcripts. This unique dataset is specifically designed for advancing technologies related to Automatic Speech Recognition (ASR), speech understanding, and the development of Large Language Models (LLMs).

Overview of the Dataset

The new dataset features recordings of a native Japanese speaker narrating traditional horror and ghost stories, accompanied by meticulously crafted transcripts that accurately reflect the spoken words. The emotional depth encapsulated in the readings captures critical elements such as tension and unease, making it not only suitable for structured read speech analysis but also for recognizing emotional delivery.

The significance of this dataset stems from its emphasis on prosody, pauses, and tonal shifts that are intertwined with narrative context. As a result, it supports more than just basic sentence-level speech recognition; it is also structured for long-context speech understanding and language model training. The single-speaker format of the recordings eliminates the need for speaker separation, allowing for effective evaluation of models and analysis of speech patterns under fixed conditions.

Intended Uses and Applications

The Qlean Dataset caters to both research applications and commercial needs. For researchers, it allows for the evaluation of ASR and speech understanding models, particularly in assessing the precision of recognizing long-form audio input that narrates horror tales. The continuous narrative format is ideal for testing how well ASR systems manage long-utterance recognition accuracy, particularly in identifying any recognition errors that may occur in extended contextual speech.

From an industrial perspective, the dataset can serve as validation data for conversational AI and voice generation systems. The emotionally nuanced speeches recorded within this corpus enable developers to validate the comprehension of inputs and ensure the quality of outputs delivered by such systems. Additionally, pre-deployment testing for call center and voice UI processing models can benefit from using this emotionally rich continuous speech to assess recognition stability and determine any operational risks inherent in voice UI systems and frameworks.

Features of the Qlean Dataset

The Qlean Dataset operates under stringent legal frameworks, ensuring that all collected data is ready for commercial use without legal complications. The types of data included in the dataset encompass audio and text, specifically formatted in mp3 for audio and various text formats, including txt, json, and csv for transcripts. With recordings that vary from 30 seconds to 90 minutes, the dataset captures a wide range of storytelling.

Key features include:

- Rapid Delivery: Datasets can be delivered quickly—within one business day—making them suitable for prompt project timelines.
- Custom Solutions: Qlean Dataset offers custom data collection and recording services, allowing for tailored datasets that meet specific project requirements.

For further inquiries or to explore the dataset, you can visit the Qlean Dataset website.

About Visual Bank Inc.

Visual Bank Inc. is a pioneering startup committed to fostering a next-generation data infrastructure that enhances AI development capabilities. Under the mission of 'Unlocking Data Accessibility,' the company provides innovative tools such as the AI-assisted platform THE PEN, supporting manga artists in their creative processes. By continually expanding its reach through initiatives like Qlean Dataset, Visual Bank is dedicated to making significant contributions to the AI landscape.