Enhancing the REMBRANDT MRI collection with expert segmentation labels and quantitative radiomic features

Sayah, Anousheh; Bencheqroun, Camelia; Bhuvaneshwar, Krithika; Belouali, Anas; Bakas, Spyridon; Sako, Chiharu; Davatzikos, Christos; Alaoui, Adil; Madhavan, Subha; Gusev, Yuriy

doi:10.1038/s41597-022-01415-1

Download PDF

Data Descriptor
Open access
Published: 14 June 2022

Enhancing the REMBRANDT MRI collection with expert segmentation labels and quantitative radiomic features

Scientific Data volume 9, Article number: 338 (2022) Cite this article

2607 Accesses
2 Citations
6 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 07 July 2022

This article has been updated

Abstract

Malignancy of the brain and CNS is unfortunately a common diagnosis. A large subset of these lesions tends to be high grade tumors which portend poor prognoses and low survival rates, and are estimated to be the tenth leading cause of death worldwide. The complex nature of the brain tissue environment in which these lesions arise offers a rich opportunity for translational research. Magnetic Resonance Imaging (MRI) can provide a comprehensive view of the abnormal regions in the brain, therefore, its applications in the translational brain cancer research is considered essential for the diagnosis and monitoring of disease. Recent years has seen rapid growth in the field of radiogenomics, especially in cancer, and scientists have been able to successfully integrate the quantitative data extracted from medical images (also known as radiomics) with genomics to answer new and clinically relevant questions. In this paper, we took raw MRI scans from the REMBRANDT data collection from public domain, and performed volumetric segmentation to identify subregions of the brain. Radiomic features were then extracted to represent the MRIs in a quantitative yet summarized format. This resulting dataset now enables further biomedical and integrative data analysis, and is being made public via the NeuroImaging Tools & Resources Collaboratory (NITRC) repository (https://www.nitrc.org/projects/rembrandt_brain/).

Measurement(s)	MRI scans
Technology Type(s)	Segmented labels in NIFTI format
Sample Characteristic - Organism	Homo sapiens

Segment anything in medical images

Article Open access 22 January 2024

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Demographic bias in misdiagnosis by computational pathology models

Article 19 April 2024

Introduction

Brain cancer is a deadly disease with a 5-year survival rate of only about 30% (www.seer.cancer.gov). According to the Global Cancer Observatory https://gco.iarc.fr/, there were 308,102 cases of cancers of the brain and the central nervous system (CNS) in the world as of 2020¹ (139,756 were women, and over 168,346 were men¹). There are more than 120 identified types of brain tumors, according to the National Brain Tumor Society, that are extremely heterogenous in nature, https://braintumor.org/brain-tumor-information/understanding-brain-tumors/tumor-types/ making it a complex disease to understand and interpret. In spite of the progress made in treatments of other cancers over the last 20 years, there continue to be only 5 approved drugs to treat brain tumors, and no prognostic advancements for GBM patients have been observed². https://braintumor.org/brain-tumor-information/brain-tumor-facts/.

Medical imaging technologies including magnetic resonance imaging (MRI) and computed tomography (CT) scans, are one of newer technologies increasingly used in translational imaging research³. Due to its complex nature, the brain tissue environment offers a rich opportunity for translational research. MRI can provide a comprehensive view of the abnormal regions in the brain⁴ therefore, its applications in the translational brain cancer research is considered essential for the diagnosis, monitoring, and management of the disease³.

In recent years, scientists have been able to integrate the data gleaned from medical images with genomics, and this burgeoning field is called radiogenomics^5,6,7. The imaging data is first converted into a quantitative summarized format, through extracted measurements (also known as radiomics) that can be both visual and sub-visual to the naked eye⁸. These radiomic features allow further extraction of imaging phenotypes, that can be integrated with genomics data using machine learning (ML) and artificial intelligence (AI) based algorithms. While many clinical trials are ongoing for new treatments in brain cancer research, there are many opportunities for the development novel treatment hypotheses using radiogenomics approaches⁹.

There are several large-scale national collaborations that utilize either brain cancer data, or medical imaging related technologies for translational research including, the Brain Science Foundation https://www.brainsciencefoundation.org/; The endbraincancer (EBC) https://endbraincancer.org/end-brain-cancer/; The Children Brain Tumor Tissue Consortium (CBTTC) https://www.chop.edu/clinical-trial/cbttc-collection-protocol; The Children’s Brain Tumor Network https://cbtn.org/about-us, The Cancer Imaging Archive (TCIA)¹⁰, and more. However, only a handful of national brain cancer projects include both multi-omics data and medical imaging data. These include The Cancer Genome Atlas (TCGA), which is a large collection of multi-omics data from 22 cancer types including Lower grade gliomas (LGG)^11,12 and Glioblastomas (GBM)^12,13. The imaging data from the TCGA data collection, along with imaging data from other studies are housed at the publicly accessible TCIA imaging data repository https://www.cancerimagingarchive.net/. The National Cancer Institute (NCI) Cancer Research Data Commons (CRDC) provides access to a cloud-based ecosystem with access, visualization, and analysis of multi-modal imaging data through its public portal. It also allows researchers to connect imaging data to corresponding genomics and proteomics data within the CRDC collections https://portal.imaging.datacommons.cancer.gov/.

Another initiative that included both omics data and medical images was the REMBRANDT project (REpository for Molecular BRAin Neoplasia DaTa), a joint initiative of the NCI and National Institute of Neurological Disorders and Stroke (NINDS). This project consisted of a large brain cancer patient-derived dataset that contained clinically annotated data generated through the Glioma Molecular Diagnostic Initiative (GDMI) from 874 glioma specimens comprising 566 gene expression arrays, 834 copy number arrays, and 13,472 clinical phenotype data points. In 2015, the molecular data including microarray gene expression, copy number, and clinical data were migrated to the Georgetown Database of Cancer (G-DOC)^14,15. This project was managed by our team at Georgetown University, and this dataset was made public in 2018 through the publication Gusev et al.¹⁶, and the data made available via the NCBI Gene Expression Omnibus (GEO) data repository GSE108476¹⁷. Among the patients in this REMBRANDT collection, pre-surgical magnetic resonance (MR) multi-sequence images was obtained from 130 patients and is hosted at TCIA¹⁸ https://wiki.cancerimagingarchive.net/display/Public/REMBRANDT.

In this paper, we obtained the raw MRI scans from the publicly available REMBRANDT collection, and processed them through a well-known image processing pipeline that is specialized for the brain cancer MRI scans. The workflow included automated volumetric segmentation of the MRIs that identified various subregions of the brain including necrotic core, edema, non-enhancing tumor (NET) and enhancing tumor (ET), Gray Matter (GM), White Matter (WM), and Cerebrospinal Fluid (CSF). A Board-Certified radiologist then performed verification and refinements of the segmented labels that included extracted radiomic features as well. This allowed the representation of the MRI scans in a quantitative format, with the intention of enabling further biomedical and integrative data analyses.

This dataset is being made public in the NeuroImaging Tools & Resources Collaboratory (NITRC) repository through this link (https://www.nitrc.org/projects/rembrandt_brain/)¹⁹ to allow researchers perform radiogenomics based analysis, integrate with gene expression and copy number data, and enable new discoveries and hypotheses. Table 1 shows a summary of the REMBRANDT brain cancer collection.

Table 1 Details of the REMBRANDT brain cancer collection.

Full size table

Materials and Methods

Data download

We first downloaded the pre-operative raw MRI scans from the TCIA imaging archive^10,20 for all the 130 patients including multiple series for each patient in DICOM file format²¹. The board-certified radiologist performed labeling of the MRI scans of the all modalities in the dataset that included MRIs from different modalities, including T1-weighted, T2-weighted, post-contrast T1-weighted (T1-C), and T2 Fluid-Attenuated Inversion Recovery (FLAIR) volumes²².

Data formatting

Some scans had mixed PD and T2 modalities, and had to be separated based on the meta-data in the DICOM file. Only patients that had available MRI data for all four modalities (T1, T2, T1-C and FLAIR) were selected for the next step, which resulted in a set of 72 patients. Figure 1 shows an example of four modalities from the same brain cancer patient.

We then applied two different pipelines for the processing of these scans, comprising two popular brain cancer segmentation tools: (a) The first pipeline used the BraTumIA²³ tool (Fig. 2A), and (b) the second pipeline used the GLISTRboost^24,25 tool (Fig. 2B). Notably, the GLISTRboost based pipeline was top ranked in the International Multimodal Brain Tumor Segmentation challenge 2015 (BraTS’15)²⁶ and uses an Expectation-Maximization (EM)²⁷ framework to automatically map the various sub-regions of the brain scans while accounting for brain deformations caused by the tumor through biophysical growth modelling²⁸. The runner-up for this challenge was the BraTumIA tool which uses a machine learning algorithm²³.

Brain tumor segmentation using BraTumIA

After the raw data was downloaded and formatted, we ended up with MRI scans from 72 patients with four modalities - T1-weighted, T2-weighted, T1-C, and FLAIR. The images were then used as input into the BraTumIA²³ tool which internally performed all processing steps. Skull stripping was performed using the Insight Toolkit ITK²⁹ as a first step to generate a brain mask, and in the second step, the images were registered i.e. spatially transformed using the ITK toolkit, so that the voxels of the various images will correspond to one another. The images were segmented into tumor and healthy images using a joint classification-regularization based algorithm. The segmented output labels were in a meta image format (.mha) file format (Fig. 2A).

The Board-Certified radiologist performed verification of the predicted segmented labels. Example segmented labels for a brain cancer patient obtained using the BraTumIA pipeline is shown in Fig. 3

Brain tumor segmentation using GLISTRboost

The raw data was downloaded and cleaned in a similar order as the previous pipeline to get MRI scans from 72 patients with four modalities - T1-weighted, T2-weighted, T1-C, and FLAIR. Then, several pre-processing steps were applied. The MRI scans were first re-oriented so that all the images would be transformed into the same Left-Post-Superior (LPS) coordinate system https://www.slicer.org/wiki/Coordinate_systems, a necessary step in order to be able to compare or integrate data obtained from different modalities. The images were then co-registered to the same T1 anatomic template using “Greedy” (github.com/pyushkevich/greedy)³⁰, a CPU-based C++ implementation of the greedy diffeomorphic registration algorithm³¹. Greedy is integrated into the ITK-SNAP (itksnap.org) segmentation software^32,33, as well as the Cancer Imaging Phenomics Toolkit (CaPTk - www.cbica.upenn.edu/captk)^34,35,36,37. After the co-registration, brain extraction (also known as skull-stripping) was performed using the Brain Mask Generator (BrainMaGe)^38,39, which is based on a deep learning segmentation architecture (namely U-Net⁴⁰) and uses a novel framework introducing the brain’s shape as a prior and hence allowing it to be agnostic to the input MRI sequence. BrainMaGe^38,39 was used to remove non-cerebral tissues like the skull, scalp, and dura from brain images.

A step called seeding was then performed by the radiologist. Seeding involved manual tagging of the sub-regions of the brain MRI including tumor regions namely ET, NET and ED; and healthy regions including white matter, gray matter, CSF, vessels and cerebellum. Seed points included center and radius of the tumor, and sample seed points in each sub-region of the brain image. This seeding step enabled the segmentation algorithm to accurately model the intensity distribution (mean and variance), for each tissue class. This allowed the segmentation tool to perform with higher accuracy compared to other segmentation tools. This step was performed using the Cancer Imaging Phenomics Toolkit (CaPTk) software platform^34,35,36,37. The output of this step included two text files - one with information about the tumor, and another regarding the sample points in each sub-region. These two files were used as input to the next step in the pipeline.

After these steps were completed, automated volumetric segmentation and registration was performed using GLISTRboost^24,25. During the segmentation process, MRI scans from 8 patients had to be filtered out for several reasons including low quality and very limited coverage, or unreliable results due to irregularities in the input MRI scans. At the end of this pipeline (Fig. 2B), complete segmentation results were successfully obtained for 64 patients. Table 2 shows a summary of the original 130 patients in the REMBRANDT patient cohort before start of analysis, and the 64-patient cohort after completion of the segmentation step.

Table 2 Summary of the patient cohort in the REMBRANDT brain cancer collection.

Full size table

The output files from this pipeline were in the form of NIfTI files https://nifti.nimh.nih.gov. Figure 4 shows the segmented labels for a brain cancer patient obtained using the GLISTRboost pipeline.

Radiomics analysis

Our Board-Certified radiologist discovered that the BraTumIA algorithm was only effective in the segmentation of one type of cancer, i.e., GBM patients; whereas the GLISTRboost pipeline produced more accurate segmented labels for all the brain cancer sub-types in this data collection. For this reason, we chose the segmented labels from the GLISTRboost pipeline for the radiomics analysis.

Pyradiomics⁴¹, an open-source python package was used to extract radiomics features from the segmented labels of the MRI brain scans. It included a total of 120 features, which describes various properties related to the medical image pixels, including two- and three-dimensional shape, texture, energy and entropy, size and co-occurrence, gray tone differences and more⁴¹. Table 3 shows a summary of the different classes of features characterized by pyradiomics⁴². Supplementary File 1 shows the radiomics features extracted from the REMBRANDT segmented labels from the GLISTRboost pipeline.

Table 3 Summary of the types of features represented in the pyradiomics numerical output.

Full size table

Applications

Applications for multi-omics analysis

The gene expression and copy number data from this same dataset was made public in 2018 through the publication Gusev et al.¹⁶, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE108476¹⁷. The medical imaging data in the form of segmented labels, along with numerical output from radiomics will now be made public through this publication. This would allow researchers to integrate gene expression, copy number and medical imaging data from the same set of patients. Such a multi-omics based radiogenomics analyses would allow for research and development of novel biomarkers, and treatment hypotheses for precision medicine.

Applications for meta-analysis of brain cancer imaging studies

The GLISTRboost segmentation pipeline used in this paper has been applied to the MRI scans from TCGA brain cancer (TCGA-GBM and TCGA-LGG) patients as demonstrated in the Bakas et al.¹² publication. Since the same GLISTRboost segmentation pipeline was applied to the REMBRANDT and TCGA brain cancer (TCGA-GBM and TCGA-LGG), we can now use them for meta-analyses. For instance, the open source radiomics PyRadiomics tool can be used on both datasets to obtain quantitative radiomics output. This means that these two data collections could be used together in a meta-analysis approach to provide a better sample size for machine learning and AI applications. We believe this is very valuable and enables further biomedical and integrative data analysis. The radiomics output from PyRadiomics from the REMBRANDT; and the TCGA-GBM and TCGA-LGG collections have been made available through this publication as Supplementary File 1 and Supplementary File 2 respectively.

Applications for federated learning approaches in brain cancer imaging studies

Another application is the Federated Tumor Segmentation (FeTS) platform⁴³ that allows training specific machine learning models by leveraging information gathered from brain cancer datasets residing in collaborating sites without ever exchanging the data. The segmented labels from our REMBRANDT MRI scans are part of this world-wide federation https://www.fets.ai/, and has enabled very large multi-site machine learning models in an effort to accelerate discovery.

Summary

In this publication, we took the raw MRI scans from the REMBRANDT data collection from public domain, and performed volumetric segmentation to identify various subregions of the brain. Radiomic features were then extracted to represent the MRI scans in numerical format. The gene expression and copy number data from the same Rembrandt dataset was made public in 2018 through the publication Gusev et al.¹⁶, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE108476¹⁷. This dataset now enables researchers to further translational research using not only the medical image data, but also in conjunction with the genomics and clinical data.

We believe that by making this dataset available to the research community via a public repository provides a unique data science research opportunity to the biomedical and data science research communities. Such combined datasets would provide researchers with a unique opportunity to conduct integrative analysis of quantitative data from medical images, gene expression and copy number changes, alongside clinical outcomes (overall survival) in this large brain cancer study published to date.

Technical Validation - Radiologist Manual Verification

Our Board-Certified radiologist confirmed that the BraTumIA algorithm was only effective in the segmentation of one type of cancer – GBM patients. This is mentioned in the BraTumIA manual (https://www.nitrc.org/projects/bratumia), and is due to the fact that the morphology is very different for each cancer subtype, and hence the tool worked well only for GBM patients

The radiologist found that the GLISTRboost algorithm was more effective in the segmentation of the various sub-types of brain cancers in this dataset – Astrocytoma, Oligodendroglioma, and GBM. Manual verification and correction were performed on the segmented labeled output files. By using an additional manual seeding step which provided sample sub-regions as a reference for the algorithm, the GLISTRboost pipeline was able to overcome morphology and other differences in the various sub-types of brain cancers in this dataset.

This verification and corrections were performed using an MRI viewer software MITK⁴⁴ https://www.mitk.org/. Figure 5 shows an example image of how the manual verification performed.

Data Records

We first downloaded the pre-operative raw MRI scans from the TCIA imaging archive for 130 patients. After cleaning, MRI scans from 72 patients with complete data from four modalities were chosen for further processing. Two well-known brain cancer segmentation pipelines were applied to the cleaned dataset – BraTumIA²³ and GLISTRboost²⁴. The GLISTRboost²⁴ algorithm was top ranked in the International Multimodal Brain Tumor Image Segmentation challenge 2015 (BraTS’15), and the BraTumIA²³ algorithm was the runner up. After running both the BraTumIA²³ and GLISTRboost²⁴ pipelines, it was discovered that BraTumIA²³ tool was only effective in the segmentation of one type of cancer – GBM patients. GLISTRboost²⁴ pipeline was more effective in the segmentation of the various sub-types of brain cancers in this dataset – Astrocytoma, Oligodendroglioma, and GBM.

The segmented labels from the GLISTRboost²⁴ pipeline, along with the manual corrections performed radiologist have been made publicly available through NeuroImaging Tools & Resources Collaboratory (NITRC) repository¹⁹. The gene expression and copy number data from this same dataset was made public in 2018 through the publication Gusev et al.¹⁶, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE108476¹⁷. Table 3 shows a high-level summary of the REMBRANDT brain cancer collection.

Usage Notes

The Madhavan⁴⁵ et al. publication that originally described the Rembrandt portal and dataset has enabled numerous analyses and has been cited 366 times so far (as of January 2022). The gene expression and copy number data from the REMBRANDT dataset was made public in 2018 through the publication Gusev et al.¹⁶, and the data made available the NCBI Gene Expression Omnibus (GEO) data repository GSE108476¹⁷ which has been cited 69 times so far (as of January 2022).

In this publication, we took the raw MRI scans from the REMBRANDT data collection and performed volumetric segmentation to identify various subregions of the brain. Radiomic features were then extracted to represent the MRI scans in a quantitative format. This dataset now enables researchers to integrate gene expression, copy number and medical imaging data from the same set of patients. Such a multi-omics based radiogenomics analyses would allow for research and development of novel biomarkers, and treatment hypotheses for precision medicine.

The GLISTRboost segmentation pipeline applied in this manuscript was previously applied to the MRI scans from TCGA brain cancer (TCGA-GBM and TCGA-LGG) patients in Bakas et al.¹² publication. Since imaging data from both REMBRANDT and TCGA brain cancer collection were processed with the same segmentation pipeline, the two datasets can now be used in-conjunction in a meta-analyses study. For example, the TCGA brain cancer dataset could be used as a training set, and the REMBRADNT dataset could be used as an independent testing set in such an analysis. Another example: open source radiomics tool PyRadiomics can be applied to both datasets to obtain quantitative radiomics output. Such a meta-analysis approach can provide a better sample size for machine learning and AI applications. We believe this would be very valuable and enables further biomedical and integrative data analysis. The radiomics output from PyRadiomics from the REMBRANDT; and the TCGA-GBM & TCGA-LGG collections have been made available through this publication as Supplementary File 1 and Supplementary File 2 respectively.

Another application is the Federated Tumor Segmentation (FeTS) platform⁴³ that allows training specific machine learning models by leveraging information gathered from brain cancer datasets residing in collaborating sites without ever exchanging the data⁴³. The segmented labels from our REMBRANDT MRI scans are part of this world-wide federation https://www.fets.ai/. Such a federated model has enabled very large multi-site machine learning models in an effort to accelerate discovery, and build new advanced machine learning models.

In summary, we believe that by making this dataset available to the research community via a public repository provides a unique data science research opportunity to the biomedical and data science research communities. Such combined datasets would provide researchers with a unique opportunity to conduct integrative analysis of numerical data from medical images, gene expression and copy number changes, alongside clinical outcomes (overall survival) in this large brain cancer study.

Data Privacy

The segmented medical images generated in this manuscript and made public via NITRC are skull stripped and hence do not contain identifiable information.

Code availability

The methods and tools applied in this paper use open-source tools detailed in respective publications Bakas et al.¹² publication. The python code for extracting PyRadiomics features from Rembrandt and the TCGA segmented data (Supplementary File 1 and 2 respectively) is provided here. https://github.com/ICBI/rembrandt-mri.

Change history

07 July 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41597-022-01518-9

References

Khazaei, Z. et al. The association between incidence and mortality of brain cancer and human development index (HDI): an ecological study. BMC Public Health 20, 1696, https://doi.org/10.1186/s12889-020-09838-4 (2020).
Article PubMed PubMed Central Google Scholar
Tracy Batchelor, R. N., Tarbell, N. & Weller, M. Oxford Textbook of Neuro-Oncology, https://doi.org/10.1093/med/9780199651870.001.0001 (Oxford University Press, 2017).
Vadmal, V. et al. MRI image analysis methods and applications: an algorithmic perspective using brain tumors as an exemplar. Neurooncol Adv 2, vdaa049, https://doi.org/10.1093/noajnl/vdaa049 (2020).
Article PubMed PubMed Central Google Scholar
Shukla, G. et al. Advanced magnetic resonance imaging in glioblastoma: a review. Chin Clin Oncol 6, 40, https://doi.org/10.21037/cco.2017.06.28 (2017).
Article PubMed Google Scholar
Fathi Kazerooni, A., Bakas, S., Saligheh Rad, H. & Davatzikos, C. Imaging signatures of glioblastoma molecular characteristics: A radiogenomics review. J Magn Reson Imaging 52, 54–69, https://doi.org/10.1002/jmri.26907 (2020).
Article PubMed Google Scholar
Binder, Z. A. et al. Epidermal Growth Factor Receptor Extracellular Domain Mutations in Glioblastoma Present Opportunities for Clinical Imaging and Therapeutic Development. Cancer Cell 34, 163–177 e7, https://doi.org/10.1016/j.ccell.2018.06.006 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bakas, S. et al. In Vivo Detection of EGFRvIII in Glioblastoma via Perfusion Magnetic Resonance Imaging Signature Consistent with Deep Peritumoral Infiltration: The phi-Index. Clin Cancer Res 23, 4724–4734, https://doi.org/10.1158/1078-0432.CCR-16-1871 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zwanenburg, A. et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 295, 328–338, https://doi.org/10.1148/radiol.2020191145 (2020).
Article PubMed Google Scholar
Rathore, S. et al. Multi-institutional noninvasive in vivo characterization of IDH, 1p/19q, and EGFRvIII in glioma using neuro-Cancer Imaging Phenomics Toolkit (neuro-CaPTk). Neurooncol Adv 2, iv22-iv34, https://doi.org/10.1093/noajnl/vdaa128 (2020).
Article PubMed Google Scholar
Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging 26, 1045–57, https://doi.org/10.1007/s10278-013-9622-7 (2013).
Article PubMed PubMed Central Google Scholar
Pedano, N. et al. Radiology Data from The Cancer Genome Atlas Low Grade Glioma [TCGA-LGG] collection. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.L4LTD3TK (2016).
Bakas, S. et al. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data 4, 170117, https://doi.org/10.1038/sdata.2017.117 (2017).
Article PubMed PubMed Central Google Scholar
Scarpace, L. et al. Radiology Data from The Cancer Genome Atlas Glioblastoma Multiforme [TCGA-GBM] collection [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.RNYFUYE9 (2016).
Madhavan, S. et al. G-DOC: a systems medicine platform for personalized oncology. Neoplasia 13, 771–83, https://doi.org/10.1593/neo.11806 (2011).
Article PubMed PubMed Central Google Scholar
Bhuvaneshwar, K. et al. G-DOC Plus - an integrative bioinformatics platform for precision medicine. BMC Bioinformatics 17, 193, https://doi.org/10.1186/s12859-016-1010-0 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gusev, Y. et al. The REMBRANDT study, a large collection of genomic data from brain cancer patients. Sci Data 5, 180158, https://doi.org/10.1038/sdata.2018.158 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gusev, Y. et al. The REMBRANDT study – a large collection of genomic data from brain cancer patients (GSE108476). https://identifiers.org/geo:GSE108476 (2018)
Scarpace, L., Flanders, A. E., Jain, R., Mikkelsen, T. & Andrews, D. W. Data From REMBRANDT [Data set]. The Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2015.588OZUZB (2019).
Sayah, A. et al. Segmentation Labels for the REMBRANDT brain cancer MRI image collection. NITRC https://doi.org/10.25790/bml0cm.87 (2021).
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30, 207–10, https://doi.org/10.1093/nar/30.1.207 (2002).
Article CAS PubMed PubMed Central Google Scholar
Kahn, C. E. Jr., Carrino, J. A., Flynn, M. J., Peck, D. J. & Horii, S. C. DICOM and radiology: past, present, and future. J Am Coll Radiol 4, 652–7, https://doi.org/10.1016/j.jacr.2007.06.004 (2007).
Article PubMed Google Scholar
Currie, S., Hoggard, N., Craven, I. J., Hadjivassiliou, M. & Wilkinson, I. D. Understanding MRI: basic MR physics for physicians. Postgrad Med J 89, 209–23, https://doi.org/10.1136/postgradmedj-2012-131342 (2013).
Article PubMed Google Scholar
Abu Khalaf, N., Desjardins, A., Vredenburgh, J. J. & Barboriak, D. P. Repeatability of Automated Image Segmentation with BraTumIA in Patients with Recurrent Glioblastoma. AJNR Am J Neuroradiol 42, 1080–1086, https://doi.org/10.3174/ajnr.A7071 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bakas, S. et al. GLISTRboost: Combining Multimodal MRI Segmentation, Registration, and Biophysical Tumor Growth Modeling with Gradient Boosting Machines for Glioma Segmentation. Brainlesion 9556, 144–155, https://doi.org/10.1007/978-3-319-30858-6_1 (2016).
Article PubMed PubMed Central Google Scholar
Zeng, K. et al. Segmentation of Gliomas in Pre-operative and Post-operative Multimodal Magnetic Resonance Imaging Volumes Based on a Hybrid Generative-Discriminative Framework. Brainlesion 10154, 184–194, https://doi.org/10.1007/978-3-319-55524-9_18 (2016).
Article PubMed Google Scholar
Menze, B. H. et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging 34, 1993–2024, https://doi.org/10.1109/TMI.2014.2377694 (2015).
Article PubMed Google Scholar
Moon, T. K. The expectation-maximization algorithm. 13 (1996).
Mang, A., Bakas, S., Subramanian, S., Davatzikos, C. & Biros, G. Integrated Biophysical Modeling and Image Analysis: Application to Neuro-Oncology. Annu Rev Biomed Eng 22, 309–341, https://doi.org/10.1146/annurev-bioeng-062117-121105 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bauer, S., Fejes, T., Reyes, M. A Skull-Stripping Filter for ITK. The Insight Journal (2012).
Yushkevich, P. A. et al. Fast automatic segmentation of hippocampal subfields and medial temporal lobe subregions in 3 Tesle and 7 Tesla T2-weighted MRI. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 12, P126–P127 (2016).
Article Google Scholar
Joshi, S., Davis, B., Jomier, M. & Gerig, G. Unbiased diffeomorphic atlas construction for computational anatomy. Neuroimage 23(Suppl 1), S151–60, https://doi.org/10.1016/j.neuroimage.2004.07.068 (2004).
Article PubMed Google Scholar
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage 31, 1116–1128 (2006).
Article Google Scholar
Yushkevich, P. A. et al. User-Guided Segmentation of Multi-modality Medical Imaging Datasets with ITK-SNAP. Neuroinformatics 17, 83–102 (2019).
Article Google Scholar
Davatzikos, C. et al. Cancer imaging phenomics toolkit: quantitative imaging analytics for precision diagnostics and predictive modeling of clinical outcome. J Med Imaging (Bellingham) 5, 011018, https://doi.org/10.1117/1.JMI.5.1.011018 (2018).
Article Google Scholar
Fathi Kazerooni, A. et al. Cancer Imaging Phenomics via CaPTk: Multi-Institutional Prediction of Progression-Free Survival and Pattern of Recurrence in Glioblastoma. JCO Clin Cancer Inform 4, 234–244, https://doi.org/10.1200/CCI.19.00121 (2020).
Article PubMed Google Scholar
Pati, S. et al. The Cancer Imaging Phenomics Toolkit (CaPTk): Technical Overview. Brainlesion 11993, 380–394, https://doi.org/10.1007/978-3-030-46643-5_38 (2020).
Article PubMed PubMed Central Google Scholar
Saima Rathore, S. B. et al. Brain Cancer Imaging Phenomics Toolkit (brain-CaPTk): An Interactive Platform for Quantitative Analysis of Glioblastoma. In International MICCAI Brainlesion Workshop (2018).
Thakur, S. et al. Brain extraction on MRI scans in presence of diffuse glioma: Multi-institutional performance evaluation of deep learning methods and robust modality-agnostic training. Neuroimage 220, 117081, https://doi.org/10.1016/j.neuroimage.2020.117081 (2020).
Article PubMed Google Scholar
Thakur, S. P. et al. Skull-Stripping of Glioblastoma MRI Scans Using 3D Deep Learning. Brainlesion 11992, 57–68, https://doi.org/10.1007/978-3-030-46640-4_6 (2019).
Article PubMed Google Scholar
Ronneberger, O., Fischer, P., & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention 234–241 (Springer, 2015).
van Griethuysen, J. J. M. et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 77, e104–e107, https://doi.org/10.1158/0008-5472.CAN-17-0339 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pyradiomics. Radiomic Features. https://pyradiomics.readthedocs.io/en/latest/features.html. (2016)
Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep 10, 12598, https://doi.org/10.1038/s41598-020-69250-1 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Nolden, M. et al. The Medical Imaging Interaction Toolkit: challenges and advances: 10 years of open-source development. Int J Comput Assist Radiol Surg 8, 607–20, https://doi.org/10.1007/s11548-013-0840-8 (2013).
Article PubMed Google Scholar
Madhavan, S. et al. Rembrandt: helping personalized medicine become a reality through integrative translational research. Mol Cancer Res 7, 157–67, https://doi.org/10.1158/1541-7786.MCR-08-0435 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was partly funded by the Lombardi Cancer Center support grant (NCI P30 CA51008), and partly supported by the National Cancer Institute (NCI) and the National Institute of Neurological Disorders and Stroke (NINDS) of the National Institutes of Health (NIH), under award numbers NCI: U01CA242871, NCI: U24CA189523, and NINDS: R01NS042645. The content of this publication is solely the responsibility of the authors and does not represent the official views of the NIH.

Author information

These authors contributed equally: Anousheh Sayah, Camelia Bencheqroun.

Authors and Affiliations

Medstar Georgetown University Hospital, Washington, DC, USA
Anousheh Sayah
Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC, USA
Camelia Bencheqroun, Krithika Bhuvaneshwar, Anas Belouali, Adil Alaoui, Subha Madhavan & Yuriy Gusev
Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, USA
Spyridon Bakas, Chiharu Sako & Christos Davatzikos

Authors

Anousheh Sayah
View author publications
You can also search for this author in PubMed Google Scholar
Camelia Bencheqroun
View author publications
You can also search for this author in PubMed Google Scholar
Krithika Bhuvaneshwar
View author publications
You can also search for this author in PubMed Google Scholar
Anas Belouali
View author publications
You can also search for this author in PubMed Google Scholar
Spyridon Bakas
View author publications
You can also search for this author in PubMed Google Scholar
Chiharu Sako
View author publications
You can also search for this author in PubMed Google Scholar
Christos Davatzikos
View author publications
You can also search for this author in PubMed Google Scholar
Adil Alaoui
View author publications
You can also search for this author in PubMed Google Scholar
Subha Madhavan
View author publications
You can also search for this author in PubMed Google Scholar
Yuriy Gusev
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.S.: Board Certified radiologist expertise, performed labeling of MRI scans, seeding, manual verification of all segmentation results. C.B.: performed the data cleaning, Bratumia segmentation analysis, and machine learning data analytics. K.B.: project manager, data manager, writing, and editing. A.B.: project design, machine learning team lead, machine learning data analytics; S.B., C.S. and C.D.: data initialization/pre-processing, performing GLISTRboost segmentation analysis. A.A.: project administration and finance. S.M.: microarray and copy number data owner. Y.G.: senior bioinformatics expert, conceptualization and project design, P.I., writing and editing. All authors participated in reviewing the manuscript.

Corresponding authors

Correspondence to Krithika Bhuvaneshwar or Yuriy Gusev.

Ethics declarations

Competing interests

Subha Madhavan is currently employed and is a minor shareholder at AstraZeneca, Gaithersburg, MD, USA.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary File 1:

Supplementary File 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sayah, A., Bencheqroun, C., Bhuvaneshwar, K. et al. Enhancing the REMBRANDT MRI collection with expert segmentation labels and quantitative radiomic features. Sci Data 9, 338 (2022). https://doi.org/10.1038/s41597-022-01415-1

Download citation

Received: 12 October 2021
Accepted: 24 May 2022
Published: 14 June 2022
DOI: https://doi.org/10.1038/s41597-022-01415-1

Subjects

Abstract

Similar content being viewed by others

Segment anything in medical images

Towards a general-purpose foundation model for computational pathology

Demographic bias in misdiagnosis by computational pathology models

Introduction

Materials and Methods

Data download

Data formatting

Brain tumor segmentation using BraTumIA

Brain tumor segmentation using GLISTRboost

Radiomics analysis

Applications

Applications for multi-omics analysis

Applications for meta-analysis of brain cancer imaging studies

Applications for federated learning approaches in brain cancer imaging studies

Summary

Technical Validation - Radiologist Manual Verification

Data Records

Usage Notes

Data Privacy

Code availability

Change history

07 July 2022

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary File 1:

Supplementary File 2

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links