Basecamp Research Unveils the Trillion Gene Atlas to Revolutionize AI-Driven Therapeutics

Basecamp Research Launches the Trillion Gene Atlas



In a groundbreaking announcement, Basecamp Research has unveiled its ambitious plan to create the Trillion Gene Atlas, aimed at transforming the landscape of drug development through artificial intelligence (AI). This monumental project seeks to amplify the known genetic diversity by a staggering one hundred times, collecting genomic data from over 100 million new species across thousands of locations worldwide.

In partnership with industry leaders like Anthropic, Ultima Genomics, and PacBio, and leveraging the advanced AI infrastructure provided by NVIDIA, Basecamp Research aims to streamline over two decades of data collection and analysis into a mere two years. The initiative was presented at SXSW in Austin, Texas, and at the NVIDIA GTC conference in San José, California.

Expanding the Genetic Database


The Trillion Gene Atlas is comparable in significance to the Human Genome Project, focusing on generating and modeling biological data at an unprecedented scale. Glen Gowers, co-founder and CEO of Basecamp Research, emphasized that current AI biological models train on a limited sample of Earth’s biodiversity. He stated, "The Trillion Gene Atlas greatly expands the known genetic universe, enabling a new paradigm for programmable therapeutic design."

The initiative aims to address the bottleneck in biological data, which hinders progress in AI-driven drug development. Currently, most foundational models rely on variants from a few public repositories, with around 80% trained on databases containing less than 250 million genetic sequences. Basecamp's foundational EDEN models, launched in January this year, break through this evolutionary data barrier by utilizing BaseData™, an in-house genomic database that is over ten times larger than all existing public resources combined.

The Power of Large Datasets


The EDEN model has demonstrated remarkable capabilities by learning from 10 billion new genes across 1 million newly discovered species, unveiling new scaling laws crucial for AI in biology. This innovation allows the model to design diverse therapies directly from the disease description itself. In laboratory validations, it showed efficacy without requiring human or clinical data, showcasing a 97% success rate against priority pathogens through AI-programmable gene insertion.

Phil Lorenz, CTO of Basecamp Research, pointed out that simply having larger models isn’t enough. He explained, "EDEN demonstrated that biological AI performance follows much steeper scaling trajectories with higher quality, fully contextualized data. The Trillion Gene Atlas extends this principle by one hundredfold."

Global Collaborations for Biodiversity


Over the past six years, Basecamp Research has built a network of scientific collaborators spanning 31 countries, creating a scalable platform designed specifically for genomic evolution training for AI. The company combines innovative regulatory frameworks and cutting-edge autonomous DNA sequencing technologies to gather high-quality genomic data from ecosystems typically inaccessible to traditional labs.

These collaborations focus on knowledge sharing, local capacity development, and equitable access agreements, aligning with new regulations for digital sequence data. This framework supports responsible, large-scale, high-quality genomic data collection while investing in scientific infrastructure and training in partner regions.

This launch of the Atlas includes new collaborations in Chile and Argentina, along with an expanded effort in Antarctica, further enhancing Basecamp’s biodiversity network.

Harnessing Advanced Sequencing Technologies


The ambitious Trillion Gene Atlas is driven by advances in ultra-high-throughput short and long-read sequencing technologies, made possible through partnerships with Ultima Genomics and PacBio. Ultima’s latest UG200 sequencing system is specifically designed for high-performance, whole-genome sequencing, supporting projects like the Trillion Gene Atlas at an affordable scale.

Gilad Almogy, founder and CEO of Ultima Genomics, noted, "The biology sector has faced a fundamental data scarcity compared to other fields like language or computer vision, mainly due to researchers lacking the necessary tools for large-scale data generation." He expressed excitement about how their technology can support Basecamp in actualizing its vision.

Meanwhile, PacBio’s HiFi sequencing provides long, accurate reads that maintain complete genomic context, allowing for precise resolution at a subspecies level in complex samples. Christian Henry, the president and CEO of PacBio, mentioned, "HiFi data lays the reliable foundation needed for biological AI models to learn from nature on a grand scale."

Accelerated Data Processing


At the core of this effort is NVIDIA's accelerated computing infrastructure, capable of processing vast amounts of genetic data on a petabase scale. Basecamp plans to use NVIDIA Parabricks to significantly accelerate metagenomic assembly. The goal is to compress a task that could previously take over 20 years into less than two years, enhancing performance and scope for foundational biological models in therapeutic development.

Through parallel data processing, automated annotation, and large-scale model training, Basecamp aims to streamline the sequence assembly, annotation, and model training processes to extend the possibilities of drug development.

Integrating Therapeutic Design with Anthropic


Furthermore, Basecamp Research is collaborating with Anthropic to enhance capabilities in life sciences, aiming to integrate their Claude AI with more scientific platforms. The goal is to enrich the research environment, making Claude a more productive ally for scientists and clinicians by connecting the Trillion Gene Atlas with advanced therapeutic design mechanisms.

In summary, the Trillion Gene Atlas stands as a pivotal initiative that hopes to revolutionize drug design and development while fostering a deeper understanding of biological data through collaborative innovation in AI technology.

Topics Health)

【About Using Articles】

You can freely use the title and article content by linking to the page where the article is posted.
※ Images cannot be used.

【About Links】

Links are free to use.