Groundbreaking Survey Reveals Insights on LLM Readability Across Major Japanese Media Platforms

Groundbreaking Survey on LLM Readability

A pioneering study conducted by Todo-O-Nada Inc. has unveiled alarming insights regarding the accessibility of major Japanese media outlets for large language models (LLM) like generative AI. Utilizing the LLM-friendly check feature embedded in their PR effectiveness measurement service, Qlipper, the firm assessed 3,166 prominent media platforms across Japan, revealing that only 10.0% of these outlets were likely to pass through the pre-training data pipeline of LLMs.

The findings categorized the media according to their likelihood of acceptance into the LLM data sets: a mere 317 outlets (10%) were marked as 'likely to pass,' while an additional 1,063 outlets (33.6%) fell into the 'conditional pass' category. This indicates that around 66.4% of the surveyed media were effectively excluded from AI learning material for various reasons. It is evident that the structural composition and content quality of the majority of these platforms create significant barriers to AI accessibility.

Key Insights from the Survey

1. Content Quality as a Primary Barrier
Contrary to popular belief that LLMs block media access, the research indicates that content quality and structural issues are primarily contributing to the rejection of many outlets. Approximately 64.2% of the media that did not pass the assessment failed to meet the quality standards during the cleansing phase. Only 7.8% were blocked by robots.txt file restrictions, highlighting a critical point: the internal quality and format of the content are the foremost obstacles.

2. Traditional Media's Withdrawal
A notable finding is the stark contrast between traditional media outlets, like national and local newspapers, and digital platforms regarding LLM accessibility. Among national newspapers, 100% were entirely closed to LLM crawlers, meaning that even if a press release is published, it would likely not be reflected in AI responses.

3. The Portal Dilemma
Portals, which are traditionally viewed as successful PR channels, displayed a dismal low of 5.9% in terms of passing through the LLM pre-training criteria. These platforms are caught in a complex web of quality and structural issues, contributing to a critical blind spot in the media landscape.

4. Quality of Specialist Sites
Specialist sites represented about 40% of the total platforms analyzed, revealing a bimodal score distribution. This indicates a division between sites that successfully reach LLM criteria and those that are immediately discarded, presenting a notable challenge for PR professionals.

## Transforming PR Strategies
With the revelations from this extensive survey, the PR industry faces a paradigm shift in its approach to media engagement in the era of generative AI. Traditional metrics of success based on publication visibility need to be reassessed, with a focus on whether content can effectively reach LLMs. PR professionals must now explore new queries such as: Which platforms can successfully relay their content to LLMs? and How can they ensure their own sites meet the criteria for LLM pipelines?

Todo-O-Nada's Response

Todo-O-Nada is committed to evolving these insights into actionable strategies within the PR domain. The LLM-friendly check feature deployed in Qlipper serves as a diagnostic tool for PR specialists to ensure their content is primed for AI access. Moreover, innovations like the Digidigi tool aim to establish a standard framework for recognition measurement in the LLM age.

For more details or inquiries about the survey, visit QLipper's official site.