Nexdata Expands Physical AI Data Collection Factory with 2.5 Billion Yen Investment

Nexdata Expands Physical AI Data Collection Factory



In recent years, advancements in AI technology have shifted from the era of large language models (LLMs) focused on generating information in digital spaces to the emergence of "Physical AI." This new phase enables systems to interact directly with the physical world and operate autonomously, fulfilling various real-world tasks. In Japan, the growing need for automation in manufacturing and services, paired with labor shortages due to an aging population, is accelerating market expansion. Unlike traditional generative AI, which primarily processes text and 2D images, Physical AI integrates environmental awareness through sensors and the physical actions of robots, establishing itself as a next-generation infrastructure essential for solving real-world challenges.

The Importance of Large-Scale Data for Physical AI Development



A growing consensus in the industry acknowledges that like LLMs, the development of Physical AI is significantly influenced by "Scaling Law." To improve the versatility of models and the precision of control in real-world environments, a substantial amount of high-quality, real-world data covering diverse physical phenomena and patterns of movement is crucial. However, collecting such data in real environments is often hampered by high costs associated with setting up environments, the complexity of synchronizing multiple sensors, and the burdens of annotation, leading to bottlenecks in the development process.

To resolve this challenge, Nexdata has invested over 2.5 billion yen to establish a dedicated data collection factory spanning more than 8,000 square meters. This facility provides comprehensive data solutions aimed at accelerating Physical AI development, including data collection in the first-person perspective (Ego-centric), annotation, and ready-to-use datasets that support environmental awareness, decision making, and motion control. With cost advantages derived from large-scale production and immediately usable data assets, Nexdata is poised to enhance the lead time and accuracy of Physical AI and VLA model development.

A Large-Scale, Low-Cost Data Supply Realized by Nexdata



Nexdata has invested over 2 billion yen in developing data infrastructure specifically for Physical AI. Currently, the company operates two large data collection factories, each exceeding 8,000 square meters, where over 400 diverse robotic platforms, including humanoid robots, quadruped robots, industrial robotic arms, and multi-fingered manipulators, operate simultaneously.

The facility features various scenarios that faithfully replicate operational environments such as homes, pharmacies, manufacturing lines, and logistics warehouses, with more than 600 operators and management staff on site. This setup allows for efficiently producing high-quality Physical AI data that covers the entire development phase, from pre-training large foundational models to fine-tuning tasks and imitation learning that simulates human demonstrations.

Comprehensive Data Collection Facilities



In addition to general data collection, Nexdata also runs a dedicated data gathering facility for robotic hands. The development of Physical AI utilizes a three-layer architecture consisting of "environmental recognition layers," "decision-making layers," and "action execution layers," which supports environmental understanding, behavioral planning, and precise control. With its expansive specialized data collection centers and dedicated staff, Nexdata can rapidly produce high-quality datasets. They have provided cost-efficient datasets including:

  • - Environmental Database (Environmental Recognition Layer): Over 288 million sets of high-precision 3D models and real-world scene data, covering various lighting conditions, object placements, and background patterns to enhance robots' spatial recognition capabilities and object detection accuracy.
  • - Brain Data Set (Decision-Making Layer): 4,000 hours of Ego-centric perspective multi-task execution video footage. It encompasses visual input and action sequences associated with everyday tasks (e.g., cooking, tidying, shelving goods), making it ideal for learning decision-making models that rely on long-term dependencies.
  • - Cerebellum & Body Data Set (Control Layer): This dataset includes over 10,000 sets of high-fidelity trajectory data, joint angle time series, and force feedback information, serving as foundational data for learning low-level control policies in imitation and reinforcement learning.

Additionally, a compilation of over 150,000 sets of robotic hand control data, including grasping, manipulation, and haptic feedback information, is offered to further support such endeavors.

Bridging the Sim2Real Gap with Ego-centric Data Collection



One highly sought-after service in development circles is the collection of Ego-centric data. This pertains to data recorded from the first-person viewpoint of an individual performing tasks, captured via wearable cameras and motion sensors. By using human intuitive actions as training data, robots can learn consistently from environmental recognition to action execution. This direct connection also facilitates the reflection of physical factors into data that are hard to reproduce in simulations, effectively addressing the "Sim2Real gap" challenges.

Nexdata delivers Ego-centric data collection services in real physical spaces. There are two collection methods: one based on intuitive control using VR and the other employing UMI-compliant grippers. Both methods utilize optical motion capture systems and VR headsets to enable immersive measurement and achieve accurate data collection in real-world settings. The resulting action policies can be deployed directly to various robotic platforms without needing further adaptation, supporting flexible operational development utilizing existing client infrastructure.

Future Prospects



Through collaboration with global corporations and research institutions in the Physical AI realm, Nexdata has already accumulated specialized knowledge around Ego-centric data collection, robotic control data, and Sim2Real adaptation. Based on these previous accomplishments and insights, the company is well-equipped to recommend optimal data strategies tailored to clients' development phases. Visits to the data collection factory and technical consultations are available upon request through their official website or contact points.

About Datatang Inc.


  • - Company Name: Datatang Inc.
  • - Brand Name: Nexdata
  • - Location: Kanda Awajicho 2-105, WATERRAS ANNEX 6F, Chiyoda-ku, Tokyo
  • - Establishment: February 2020
  • - Capital: 500 million yen
  • - Business Overview: AI learning data provision services (own data, customized data), collection, annotation, and platform provision.
  • - URL: https://nexdata.jp/

Topics Consumer Technology)

【About Using Articles】

You can freely use the title and article content by linking to the page where the article is posted.
※ Images cannot be used.

【About Links】

Links are free to use.