New Developments in Material Research: Discovering Scaling Laws Connecting Computational Databases and Experimental Data
Researchers from the Institute of Statistical Mathematics and Mitsubishi Chemical Corporation have made significant advancements in material research by unveiling a new phenomenon known as the "Scaling Law of Sim2Real Transfer Learning." This research was conducted in collaboration with a group specializing in materials research and has been published in the international academic journal "npj Computational Materials."
The challenge in data-driven material research lies in the insufficient experimental data, which hampers the predictive performance of AI models. To overcome this issue, ongoing advancements aim to develop large computational databases for physical properties generated through physical simulations. For instance, models pre-trained on computational databases can achieve extraordinary predictive capabilities through additional learning with limited experimental data — a process known as Sim2Real transfer learning.
The research has shown that as the size of the computational property database increases, the performance of transfer models in predicting experimental properties improves monotonically, following a power law. This systematic demonstration of a scaling law in Sim2Real transfer learning for materials is unprecedented.
Background of the Research
In data-driven research, data is undeniably the most crucial resource. Despite this, the availability of data in materials research is exceptionally limited compared to fields like natural language processing, computer vision, biology, and medicine. To overcome this barrier, researchers in materials science have leveraged first-principle calculations and molecular dynamics simulations to construct large-scale computational property databases. Initiatives such as Materials Project have paved the way for the development of comprehensive databases covering the entire periodic table, with other datasets like AFLOW, OQMD, and GNoME following suit in the inorganic materials sector.
In polymer materials research, the Institute of Statistical Mathematics has initiated the RadonPy project to fully automate computational experiments for polymers, joining forces with two national research institutes, eight universities, and 37 companies to create a world-class polymer property database. The research group has also established a fully automated system for quantum chemical calculations, which comprehensively evaluates the compatibility between polymer materials and solvent molecules.
Research Outcomes and Findings
In this study, the researchers demonstrated that a scaling law for Sim2Real transfer learning is applicable across diverse tasks in materials research. Previous theoretical predictions regarding the existence of scaling laws in transfer learning were empirically validated in computer vision fields. The results indicate that as the size of the computational database increases, the prediction performance of fine-tuned models improves monotonically, adhering to a power-law distribution.
The scaling intensity serves as a quantifiable metric for assessing the future value of databases. Furthermore, analyzing the scaling behavior helps estimate the required amount of data to achieve target performance and the attainable limits. This insight is anticipated to enhance strategic planning for the development of data platforms and streamline data production protocols in material development projects.
Future Developments
A pivotal milestone in data-driven materials research is the establishment of scalable transfer learning data production protocols and analytical workflows. Many domains within material development struggle to accumulate sufficient data necessary for data-driven research. This issue becomes more pronounced as research edges closer to advanced levels. Adopting an approach that selects sources capable of producing massive data, such as computational experiments, and utilizing machine learning to bridge the gap between source and target domains becomes essential.
In conclusion, the discoveries from this research pave the way for the RadonPy project's ongoing data production and improvements in the predictive capabilities of transfer models for downstream tasks. Continuous development of computational databases and optimization of resource distribution in experiments will ensure the sustained evolution of data-driven materials research.