Collecting detailed genomics and clinical data is crucial for developing precision diagnostics, therapeutics, and protocols for cancer treatment. Credit: Nicolas/E+/Getty Images

Clinical and scientific research institutions across India are collaborating to map the diverse population of cancer patients, starting with breast cancer, at the Indian Cancer Genome Atlas (ICGA).

Recognizing cancer as a genetic disease and understanding its unique characteristics in the Indian context is crucial for developing precision diagnostics, therapeutics, and protocols. This requires collecting detailed genomics and clinical data. The atlas has established a national consortium that collects and generates detailed genomics with linked clinical data, forming a foundation for a comprehensive understanding of cancers in India.

Cancer is a growing public health concern in India, with one in nine people facing the risk of a cancer diagnosis, leading to 1.5 million new cases and 800,000 deaths each year. This burden is expected to increase by 12.8% by 2025. Challenges such as limited cancer awareness, delayed diagnosis, and the reduced effectiveness of standard Western treatment protocols add to the complexity.

The atlas collects biological and clinical data ethically from consenting patients. Using advanced next-generation multi-omics technologies, the genomic, transcriptomic, epigenetic, and proteomic features of the disease are comprehensively characterized. The data is then correlated with the patients’ clinical histories, including diagnosis, treatment, and response to therapy. To safeguard patient privacy, the data is de-identified and securely stored. Authorized users can then access the data for research purposes. The scale and complexity of the data necessitate collaboration among cross-disciplinary teams of clinicians, bioinformaticians, data scientists and engineers to derive meaning and facilitate discoveries.

The initial focus of the ICGA is on breast cancer, the most prevalent cancer among Indian women. Notably, Indian women present with aggressive and late-stage disease at younger ages (in their thirties or forties), resulting in lower survival rates (< 60%) compared to their counterparts in Western countries (> 80%).

Until now, the ICGA has successfully collected and sequenced genomic data for over 200 women, and will scale that number up to 1000. Integrating genetic and clinical data with family history, environmental factors, and socio-cultural elements is the first step in understanding breast cancer, its treatment and progression in India.

The atlas will extend its scope to more cancers that affect Indian populations, including pediatric cancers.

High quality big data for research

India’s 1.3 billion people belong to diverse ethnicities, cultures and regions. This results in significant genomic variations among these groups. Clinical research has barely explored these variations, as Indian genomes account for a mere 0.2% of global genetic databases. The ICGA’s data generation will help address representation of Indians, and impact clinical practice by enabling research.

The first cohort of 1000 breast cancer patients is expected to generate 5 petabytes of data, underscoring the importance of creating high-quality data to enable impactful research. The ethically collected data goes through a standardized workflow for biobanking and sample sequencing. ICGA uses standard data models for clinical data harmonization and bioinformatics analyses. As it expands to other cancers, the atlas will have to handle increasing volumes of complex data. Efforts are underway to construct a robust, scalable data and compute infrastructure to support this growth.

Executing this national data consortium comes with challenges. It will need champions in government and within health systems, committing to this effort as a valuable investment for the future of cancer care in India. It will also need the support of patients and their families.

The success of this atlas stands to impact 20% of all humanity.