Basecamp Research Unveils Trillion Gene Atlas for AI-Driven Therapies
Basecamp Research Unveils Trillion Gene Atlas for AI-Driven Therapies
Austin, Texas, and San Jose, California – March 18, 2026 – Basecamp Research, a leading AI lab focused on biological design, has announced the launch of the Trillion Gene Atlas, a groundbreaking initiative aimed at generating and modeling biological data on an unprecedented scale of one trillion genes. This ambitious project will scale the known evolutionary genetic diversity by a staggering 100-fold, gathering genomic data from over 100 million species across thousands of locations worldwide.
The Trillion Gene Atlas is made possible through a partnership with prominent organizations such as Anthropic, Ultima Genomics, and PacBio, with the project supported by NVIDIA’s advanced AI infrastructure. By leveraging over two decades of biological data collection and analysis, Basecamp intends to reduce this timeline to less than two years, revolutionizing the landscape of drug development.
Glen Gowers, co-founder and CEO of Basecamp Research, highlighted the significance of this initiative at the South by Southwest (SXSW) conference. He stated, “Today’s biological AI models are trained on a narrow slice of life on Earth. The Trillion Gene Atlas will vastly expand the known genetic universe beyond existing public databases. Training models on this scale will establish a new paradigm for programmable therapeutic design.”
Addressing the Biological Data Bottleneck
With the rapid growth in model size and computational power, diverse datasets have become critical drivers in the progress of AI-driven drug development and real-world applications. Currently, most sequence-based foundation models are trained on variants of the same public repositories, with about 80% of these models relying on a public database containing fewer than 250 million sequences.
Basecamp's newly unveiled EDEN foundation models avoid the industry's evolutionary “data wall” by being entirely trained on BaseData™, a proprietary genomic database that is now over ten times larger than all combined public resources. By learning from an unprecedented 10 billion novel genes from one million newly discovered species, EDEN has uncovered unique scaling laws for AI in biology.
This massive data diversity expansion allows EDEN to go beyond mere predictions and become the first model capable of designing varied therapeutics directly based on disease prompts. During laboratory validations, EDEN displayed zero-shot activity in primary human T-cells without requiring human or clinical data, already generating hits across several frontier modalities—including groundbreaking AI programmable gene insertion (aiPGI) and the development of targeted antimicrobial peptides with a remarkable 97% hit rate against priority pathogens.
The Trillion Gene Atlas builds upon this methodology, significantly broadening the width and contextual depth of genomic data within the so-called “Internet of Biology,” which is suitable for AI training. Phil Lorenz, the Technical Director of Basecamp Research, adds, “Larger models alone aren’t sufficient. EDEN has demonstrated that the performance of biological AI follows significantly steeper scaling trends with higher-quality and fully contextualized data.”
Global Biodiversity Partnerships
Over the past six years, Basecamp Research has established a network of scientific collaborators across 31 countries, creating a scalable pipeline for evolutionary genomics targeted at AI training. By integrating new regulatory and economic frameworks with fully decentralized DNA sequencing technologies, the firm harvests high-quality genomic data from ecosystems that are otherwise beyond the reach of traditional laboratories.
These collaborations are founded upon knowledge exchange, local capacity building, and equitable access-and-benefit-sharing agreements in compliance with emerging regulations concerning digital sequence information. This framework enables responsible, large-scale, and high-quality genomic data collection while investing in scientific infrastructure and training in partner regions. As part of the Atlas launch, Basecamp announced new partnerships in Chile and Argentina, as well as an expanded collaboration in Antarctica, further enhancing its global biodiversity network.
Scaling Data Generation and Computational Power
The Trillion Gene Atlas is made feasible through advancements in ultra-high-throughput short-read and long-read sequencing, in addition to accelerated computing. Basecamp is collaborating with Ultima Genomics and PacBio to deliver large-scale sequencing solutions that include high-accuracy, data-rich long reads.
Ultima is developing ultra-high-throughput systems for next-generation sequencing (NGS). The latest sequencing system from Ultima, the UG200 Series, enhances the company’s unique wafer-based sequencing architecture, enabling whole-genome and multiomics sequencing at industrial scale and low-cost levels, thereby supporting initiatives like the Trillion Gene Atlas.
Gilad Almogy, founder and CEO of Ultima Genomics, asserts, “Compared to other fields like language or computer vision, biology has historically been data-poor, as researchers have lacked the tools to generate large-scale data. We firmly believe that AI will profoundly shape our understanding of biology and human health, and the UG200 Series was designed from the ground up to provide the enormous datasets that BioAI requires to fulfill this promise.”
Creating an Agent-Based End-to-End Therapeutic Design Workflow
Anthropic, in its broader effort to connect Claude with new capabilities for life sciences, is integrating more scientific platforms. Together with the Claude for Life Sciences team, the Trillion Gene Atlas aims to enhance Claude's productivity as a research partner for scientists and clinicians, facilitating organizations dedicated to making new scientific advancements publicly accessible.
By combining advanced reasoning capabilities from Claude, the therapeutic design capabilities of EDEN, and NVIDIA's CUDA-X libraries for processing unstructured data, this initiative aims to create an integrated workflow that allows complex clinical data to be interpreted and directly translated into therapeutic designs.
The Trillion Gene Atlas is based on three pillars: large-scale DNA sequencing, global partnerships for data provision, and advanced computational power. Together with AI systems capable of drawing conclusions from complex data, these foundations can help convert extensive datasets into therapeutic discoveries. By markedly increasing the evolutionary data available for AI, Basecamp Research aims to accelerate and systematize drug development, thereby expanding previous advances made by EDEN in areas such as gene therapy and combating antibiotic-resistant bacteria.