Introduction
Archaic, a company based in Shibuya, Tokyo, has made headlines with the recent development of its RAG (Retrieval-Augmented Generation) AI system, specifically designed for handling Japanese business documents. Their innovative technology has exhibited superior performance in a comprehensive evaluation using benchmark datasets, achieving top scores in both the manufacturing category and overall average.
The Challenge of Complex Documents
A notable challenge many businesses face is the complexity of their documents, which often consist of not just text, but also charts, tables, and images. Traditional AI systems struggle with these multifaceted documents, often leading to incomplete responses or misinterpretation due to a lack of contextual understanding.
In response to these issues, Archaic has created a system that comprehensively understands document structure and maintains the relevance of information while interacting with generation AI, thereby enhancing accuracy and utility in a business context.
Core Technologies of Archaic RAG System
Central to the Archaic RAG system are two unique technologies that facilitate its performance:
1. Parser - Document Analysis Engine
The Parser automatically extracts text, figures, tables, and images from documents, breaking them down into meaningful units based on their structure and content. This allows the RAG system to process documents that are traditionally seen as non-structured, turning them into a format comprehensible by generation AI.
2. Tree-Data Structure - Information Hierarchy Technology
Extracted information is organized into a hierarchical Tree structure that preserves context and relationships. This allows for precise and coherent outputs, taking into account the relationships between main text, figures, tables, and annotations.
Differential Advantages Over Traditional RAG
Traditional RAG systems primarily process text and often falter when faced with visual elements like charts and tables, which can lead to significant information loss and contextual gaps. In contrast, Archaic's system is designed to manage these elements more holistically, greatly enhancing the accuracy and comprehensiveness of responses.
Comparison Table: General RAG vs. Archaic RAG System
| Features | General RAG | Archaic RAG System AI |
|---|
| ------- | ---- | -------- |
| Document Compatibility | Text-centric | Comprehensive, including charts, tables, images, and annotations |
| Context Understanding | Prone to fragmentation | Maintains meaning in hierarchical structure (Tree-Data) |
| Usable for Businesses | FAQ/Knowledge | Complex documents like manuals, specifications, meeting notes |
| Dependence on LLM | Relies on LLM capabilities | Supplements LLM weaknesses through preprocessing and structuring |
Performance Evaluation Methodology
To substantiate the effectiveness of their RAG system, Archaic utilized publicly available benchmark datasets tailored for evaluating Japanese RAG performance. This benchmark encompasses 300 question-answer pairs across five industries (finance, information and communication, manufacturing, public sector, and retail), providing a clear metric for comparison based on the accuracy of generated results.
The large language model utilized for verification was Claude 3.5 Sonnet, ensuring that the distinctions in performance between RAG implementations could be reliably assessed.
Evaluation Results by Industry Category
The following summarizes the accuracy rates achieved:
| Industry Category | Accuracy Rate | Number Correct (Correct/60 Questions) | Remarks |
|---|
| ----- | --- | -------- | ----- |
| Finance | 83.3% | 50 / 60 | — |
| Information | 85.0% | 51 / 60 | — |
| Manufacturing | 91.7% | 55 / 60 | ★Highest Rating |
| Public Sector | 93.3% | 56 / 60 | — |
| Retail | 93.3% | 56 / 60 | — |
| Overall Average | 89.3% | 268 / 300 | ★Highest Rating |
With the manufacturing sector and overall average scores leading the way, this demonstrates the Archaic RAG system's structural comprehension and preprocessing accuracy significantly contribute to its document generation capabilities.
Developer Commentary
Zhaoxu Wang, CTO of Archaic, commented, "I believe RAG is transitioning beyond simple 'search + generate' methods into an era defined by 'structure understanding + search + generate.' The Archaic RAG employs a Tree-Data structure that faithfully represents the semantic relationships within complex documents, integrating graphical data alongside textual content. The core of our technology lies in how effectively we can convey meaningful context. I expect this advancement will facilitate the democratization of business knowledge and repurpose document knowledge effectively."
Future Prospects
Looking ahead, Archaic plans to expand its practical applications along three main axes:
1. Development of industry-specific optimization templates (for manufacturing, finance, municipalities, etc.)
2. Provision of AI solutions for manual generation, meeting notes summarization, and knowledge search designed for non-structured documents.
3. Collaborative enhancement of the dataset to establish guiding principles for RAG metrics.
Through this innovative RAG system that emphasizes understanding structure and connecting meanings, Archaic aims to pioneer the future of knowledge utilization.
About Archaic
Founded on November 15, 2017, and located in Shibuya, Tokyo, Archaic focuses on creating a world where AI is as commonplace as electricity and water, enabling everyone to utilize it seamlessly. They boast profound expertise in deep learning and AI systems development, nurturing a specialized team committed to building cutting-edge, custom AI for industry leaders.
The CEO, Jun Yokoyama, believes that making AI accessible in familiar environments will lower barriers and ultimately boost various sectors of Japan's economy.
For more information, visit
Archaic's official website.
Contact
For inquiries regarding this announcement, please contact Archaic's public relations team at
[email protected].