Revolutionizing Breast Cancer Detection: Open Source 3D Mammography Data Set Launch
In a groundbreaking initiative to combat breast cancer, iMerit, in collaboration with Segmed and Advocate Health, has announced the release of the largest open-source, annotated breast tomography dataset available to date. This initiative aims to propel advancements in artificial intelligence (AI) research, particularly in the domain of early breast cancer detection.
The Dataset
The newly released dataset features imaging studies drawn from 558 female patients and is meticulously crafted to ensure relevance and accuracy. It encompasses several key characteristics:
- - Volume of Data: Comprising a comprehensive collection of imaging studies, the dataset boasts additional longitudinal data accessible through Segmed.
- - Gold-Standard Modality: All images are produced using Digital Breast Tomosynthesis (DBT), commonly referred to as 3D Mammography, which enhances diagnostic accuracy.
- - Biopsy Confirmed Diagnoses: The highly curated dataset ensures balanced representation with ground-truth outcomes, which includes 271 malignant cases (48.5%) and 287 benign cases (51.5%). This meticulous curation enhances its reliability for medical professionals.
- - Focus on Early Detection: The dataset is particularly tailored for training AI models to identify subtle, early-stage breast cancer findings, with an average tumor size of a mere 1.34 cm, and approximately 85% of lesions measuring less than 2 cm.
- - Demographic Representation: The dataset captures a wide demographic spectrum, averaging 62 years in age, and consists of a racial composition that includes 96% White, 1% Black or African American, 1% Asian, and 1% Mixed-race patients.
- - Technical Specifications: Provided in DICOM format for the imaging volumes, with accompanying annotations in JSON format that include classifications and lesion coordinates. All data are fully de-identified, adhering to both HIPAA and GDPR standards.
Importance of Early Detection
Breast cancer continues to be a significant health concern globally, with statistics noting that 1 in 8 women will receive a diagnosis during their lifetime. In 2026 alone, approximately 321,910 American women are expected to be diagnosed with invasive breast cancer, highlighting the critical nature of early detection efforts. When breast cancer is diagnosed at an early stage, the five-year survival rate can exceed an impressive 99%.
Dr. Sina Bari, VP of Healthcare and Life Sciences AI at iMerit, expressed the organization’s commitment, stating, "We believe that high-quality, responsibly annotated data is the cornerstone for substantial advancements in healthcare AI. Releasing this dataset openly is our way of empowering researchers across the globe to devise tools that support radiologists, improve patient outcomes, and save lives."
Collaboration for Enhanced Women’s Health
The collaborative effort among iMerit, Segmed, and Advocate Health underscores a shared mission to advance women's health by promoting accessible research within the medical community. By making this dataset freely available, the entities aim to remove barriers faced by academic researchers, startups, and established institutions in their pursuit of innovation in breast cancer detection technology.
Accessing the Dataset
The dataset is now available for free download for registered users at this link
iMerit 3D Mammogram Dataset.
About iMerit
iMerit is a prominent AI data company specializing in the development of advanced machine learning and AI models through its vast network of domain specialists and its innovative software, Ango Hub.
About Segmed
Segmed, Inc. facilitates easy access to a broad array of diverse and high-quality medical imaging studies, catering to biopharmaceutical R&D and AI development.
About Advocate Health
Advocate Health represents one of the largest non-profit integrated health systems in the United States, dedicated to reaching new heights in clinical excellence and pioneering research.