A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms

Liew, Sook-Lei; Lo, Bethany P.; Donnelly, Miranda R.; Zavaliangos-Petropulu, Artemis; Jeong, Jessica N.; Barisano, Giuseppe; Hutton, Alexandre; Simon, Julia P.; Juliano, Julia M.; Suri, Anisha; Wang, Zhizhuo; Abdullah, Aisha; Kim, Jun; Ard, Tyler; Banaj, Nerisa; Borich, Michael R.; Boyd, Lara A.; Brodtmann, Amy; Buetefisch, Cathrin M.; Cao, Lei; Cassidy, Jessica M.; Ciullo, Valentina; Conforto, Adriana B.; Cramer, Steven C.; Dacosta-Aguayo, Rosalia; de la Rosa, Ezequiel; Domin, Martin; Dula, Adrienne N.; Feng, Wuwei; Franco, Alexandre R.; Geranmayeh, Fatemeh; Gramfort, Alexandre; Gregory, Chris M.; Hanlon, Colleen A.; Hordacre, Brenton G.; Kautz, Steven A.; Khlif, Mohamed Salah; Kim, Hosung; Kirschke, Jan S.; Liu, Jingchun; Lotze, Martin; MacIntosh, Bradley J.; Mataró, Maria; Mohamed, Feroze B.; Nordvik, Jan E.; Park, Gilsoon; Pienta, Amy; Piras, Fabrizio; Redman, Shane M.; Revill, Kate P.; Reyes, Mauricio; Robertson, Andrew D.; Seo, Na Jin; Soekadar, Surjo R.; Spalletta, Gianfranco; Sweet, Alison; Telenczuk, Maria; Thielman, Gregory; Westlye, Lars T.; Winstein, Carolee J.; Wittenberg, George F.; Wong, Kristin A.; Yu, Chunshui

doi:10.1038/s41597-022-01401-7

Download PDF

Data Descriptor
Open access
Published: 16 June 2022

A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms

Sook-Lei Liew ORCID: orcid.org/0000-0001-5935-4215^1,2^na1,
Bethany P. Lo¹^na1,
Miranda R. Donnelly¹,
Artemis Zavaliangos-Petropulu²,
Jessica N. Jeong¹,
Giuseppe Barisano ORCID: orcid.org/0000-0001-5598-1369^3,4,
Alexandre Hutton¹,
Julia P. Simon²,
Julia M. Juliano⁴,
Anisha Suri⁵,
Zhizhuo Wang¹,
Aisha Abdullah¹,
Jun Kim²,
Tyler Ard ORCID: orcid.org/0000-0002-5123-1226²,
Nerisa Banaj⁶,
Michael R. Borich⁷,
Lara A. Boyd ORCID: orcid.org/0000-0002-2828-4549⁸,
Amy Brodtmann⁹,
Cathrin M. Buetefisch ORCID: orcid.org/0000-0001-6473-7800^7,10,
Lei Cao¹¹,
Jessica M. Cassidy ORCID: orcid.org/0000-0003-3469-0399¹²,
Valentina Ciullo⁶,
Adriana B. Conforto^13,14,
Steven C. Cramer¹⁵,
Rosalia Dacosta-Aguayo¹⁶,
Ezequiel de la Rosa^17,18,
Martin Domin ORCID: orcid.org/0000-0002-8620-966X¹⁹,
Adrienne N. Dula²⁰,
Wuwei Feng²¹,
Alexandre R. Franco^11,22,23,
Fatemeh Geranmayeh²⁴,
Alexandre Gramfort ORCID: orcid.org/0000-0001-9791-4404²⁵,
Chris M. Gregory²⁶,
Colleen A. Hanlon ORCID: orcid.org/0000-0002-3151-0279²⁷,
Brenton G. Hordacre²⁸,
Steven A. Kautz ORCID: orcid.org/0000-0003-3151-8235^26,29,
Mohamed Salah Khlif³⁰,
Hosung Kim²,
Jan S. Kirschke ORCID: orcid.org/0000-0002-7557-0003³¹,
Jingchun Liu³²,
Martin Lotze ORCID: orcid.org/0000-0003-4519-4956¹⁹,
Bradley J. MacIntosh^33,34,
Maria Mataró^35,36,
Feroze B. Mohamed³⁷,
Jan E. Nordvik^38,39,
Gilsoon Park³,
Amy Pienta ORCID: orcid.org/0000-0003-1174-6118⁴⁰,
Fabrizio Piras ORCID: orcid.org/0000-0003-3566-5494⁶,
Shane M. Redman⁴⁰,
Kate P. Revill⁴¹,
Mauricio Reyes⁴²,
Andrew D. Robertson ORCID: orcid.org/0000-0002-9095-9877^43,44,
Na Jin Seo^26,29,45,
Surjo R. Soekadar⁴⁶,
Gianfranco Spalletta^6,47,
Alison Sweet⁴⁰,
Maria Telenczuk ORCID: orcid.org/0000-0003-1311-1634²⁵,
Gregory Thielman⁴⁸,
Lars T. Westlye ORCID: orcid.org/0000-0001-8644-956X^49,50,
Carolee J. Winstein^51,52,
George F. Wittenberg ORCID: orcid.org/0000-0001-5603-1197^53,54,
Kristin A. Wong⁵⁵ &
…
Chunshui Yu^32,56

Scientific Data volume 9, Article number: 320 (2022) Cite this article

9720 Accesses
30 Citations
42 Altmetric
Metrics details

Subjects

Abstract

Accurate lesion segmentation is critical in stroke rehabilitation research for the quantification of lesion burden and accurate image processing. Current automated lesion segmentation methods for T1-weighted (T1w) MRIs, commonly used in stroke research, lack accuracy and reliability. Manual segmentation remains the gold standard, but it is time-consuming, subjective, and requires neuroanatomical expertise. We previously released an open-source dataset of stroke T1w MRIs and manually-segmented lesion masks (ATLAS v1.2, N = 304) to encourage the development of better algorithms. However, many methods developed with ATLAS v1.2 report low accuracy, are not publicly accessible or are improperly validated, limiting their utility to the field. Here we present ATLAS v2.0 (N = 1271), a larger dataset of T1w MRIs and manually segmented lesion masks that includes training (n = 655), test (hidden masks, n = 300), and generalizability (hidden MRIs and masks, n = 316) datasets. Algorithm development using this larger sample should lead to more robust solutions; the hidden datasets allow for unbiased performance evaluation via segmentation challenges. We anticipate that ATLAS v2.0 will lead to improved algorithms, facilitating large-scale stroke research.

Measurement(s)	stroke lesion
Technology Type(s)	manual segmentation in ITK-SNAP
Sample Characteristic - Organism	Homo sapiens
Sample Characteristic - Environment	brain

Segment anything in medical images

Article Open access 22 January 2024

Jun Ma, Yuting He, … Bo Wang

brainlife.io: a decentralized and open-source cloud platform to support neuroscience research

Article Open access 11 April 2024

Soichi Hayashi, Bradley A. Caron, … Franco Pestilli

Principal component analysis

Article 22 December 2022

Michael Greenacre, Patrick J. F. Groenen, … Elena Tuzhilina

Background & Summary

Large neuroimaging datasets are increasingly being used to identify novel brain-behavior relationships in stroke rehabilitation research^1,2. Lesion location and lesion overlap with extant brain structures and networks of interest are consistently reported as key predictors of stroke outcomes^3,4,5,6. However, in order to examine these measures in large datasets, accurate automated methods for detecting and delineating stroke lesions are needed. Two critical barriers limiting accurate automated segmentation in rehabilitation research are the variability in post-stroke neuroanatomy across patients and the limited amount of diverse data with which to train and test segmentation algorithms.

In acute stroke, large clinical neuroimaging datasets have led to improvements in segmentation algorithms for clinical MRI protocols (e.g., diffusion weighted imaging, FLAIR, or T2-weighted MRI)^7,8,9. However, MRIs are not routinely collected as part of stroke rehabilitation clinical care, which usually commences at subacute or chronic stages. To obtain neuroimaging data at this stage, rehabilitation researchers often recruit people with stroke to participate in research studies, requiring significant time, funding effort and cost to generate even small datasets. In addition, high-resolution T1-weighted (T1w) MRIs are typically used at this stage to identify and delineate lesioned tissue, as T1w MRI provides excellent spatial resolution and is required for registering other research imaging data, such as functional MRI and diffusion MRI. Although other imaging modalities, such as T2-weighted MRI or FLAIR imaging, would be helpful for identifying additional white matter abnormalities, they are often not routinely collected. This is due to limited scanning time, which is allocated for MR sequences directly related to the researcher’s hypotheses. However, because lesions are often more challenging to identify at this later stage, and T1w single-channel imaging is incompatible with most multispectral tools developed for acute clinical imaging, there are few options for automated lesion segmentation. Of the existing automated lesion segmentation tools for single-channel, T1w MRI data, most are not highly accurate or reliable¹⁰ and require significant manual effort for quality control and correction¹. Due to these challenges, manual lesion segmentation remains the gold standard in stroke rehabilitation research, but it is inefficient, subjective, and limits large-scale stroke rehabilitation research.

Machine learning, and in particular, deep learning algorithms, have been applied to address this problem, but they require large, diverse training datasets to create generalizable models that can perform well on new data. To this end, we previously released a public dataset of 304 stroke T1w MRIs and manually segmented lesion masks called the Anatomical Tracings of Lesions After Stroke (ATLAS) v1.2 dataset¹¹. ATLAS is the largest dataset of its kind and intended to be a resource for the scientific community to develop more accurate lesion segmentation algorithms. It is also meant to be used as a standardized benchmark with which to compare the performance of different segmentation methods¹⁰. The data are derived from diverse, multi-site data from 11 research cohorts worldwide and harmonized by the ENIGMA Stroke Recovery working group¹. ATLAS v1.2 has been accessed and cited widely since its release in 2018, with reports including the improved performance of stroke lesion segmentation algorithms using novel methods, particularly deep learning and convolutional neural networks (e.g.^{12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28}).

The reach of the ATLAS v1.2 dataset has also extended beyond stroke lesion segmentation. It has also been used as a key example of a large, public neuroimaging dataset²⁹, to provide published guidelines on how to perform lesion segmentation³⁰, to evaluate the performance of different hippocampal segmentation methods in stroke³¹, to test other non-stroke automated methods, such as anomaly³² and asymmetry detection³³, and as inspiration for future AI programs and large public datasets³⁴, among other uses. It is a valuable educational resource and has been used as a teaching resource in courses on machine learning and computer vision as well as for student thesis projects. It has been cited by over 75 publications and downloaded over 1800 times from over 30 countries in the past several years since its release, demonstrating its significant global impact on the scientific and academic community.

However, while ATLAS v1.2 spurred the development of many new automated lesion segmentation methods (Table 1), there are still no publicly available automated methods that have reported performance reliable enough to be used for research. Although no published standards exist, in our own research we estimate that a minimum Dice coefficient, or measure of overlap between the true lesion and the predicted lesion mask³⁵, of greater than 0.85 needs to be reached before a method can be declared sufficiently reliable to replace manual segmentation. In 2018, we used the ATLAS v1.2 dataset as a benchmark to evaluate publicly available automated lesion segmentation methods using T1w MRIs, but the best performing method (Lesion Identification with Neighborhood Data Analysis, or LINDA)³⁶ only had an average Dice coefficient of 0.5 on ATLAS v1.2¹⁰. Similarly, all of the more recently published methods that were trained and tested on ATLAS v1.2 report an average Dice coefficient under 0.7 (see Table 1 for details). In addition, because ATLAS v1.2 is a completely public dataset, without a partitioned test dataset, it is possible for researchers to overfit their model, not perform proper validation, or incorrectly calculate the Dice coefficient. This can lead to artificially inflated performance metrics. ATLAS v1.2 did not contain separate test data, which is necessary to reliably evaluate algorithm performance and generalizability to new data. Finally, of the 17 different methods published using ATLAS v1.2, 12 papers did not report publicly available code, limiting their utility to the scientific community.

Table 1 Published Methods for Automated Lesion Segmentation Using ATLAS v1.2.

Full size table

To address the above-mentioned concerns, we created ATLAS v2.0, which expands upon and replaces ATLAS v1.2. ATLAS v2.0 contains 1271 T1w MRIs with manually segmented lesion masks from 44 different research cohorts across 11 countries worldwide (including ATLAS v1.2 data, which are denoted in the accompanying metadata).

ATLAS v2.0 improves on ATLAS v1.2 in several ways. First, it contains more than four times as much data as ATLAS v1.2 and from more diverse cohorts, providing a bigger dataset for training and testing. Second, ATLAS v2.0 provides a single lesion mask file that encompasses all detected lesions, instead of having separate files per lesion, which previous users reported as being cumbersome in ATLAS v1.2. Third, ATLAS v2.0 fixes minor errors and issues with registration and orientation noted in previous ATLAS releases. Finally, and most importantly, ATLAS v2.0 is split into three parts: (1) a training dataset, which is comprised of 655 publicly released T1w MRIs and lesion masks, (2) a test dataset, which is comprised of 300 publicly released T1w MRIs with hidden lesion masks, and (3) a generalizability dataset, which is comprised of 316 completely hidden T1w MRIs and lesion masks from separate cohorts. The hidden data is available only for testing algorithm performance in lesion segmentation challenges and competitions (see Lesion Segmentation Challenges). Notably, the training and test set contain similar distributions of data, such that an algorithm trained on the training set should perform well on the test set. However, the generalizability dataset of 316 cases (T1w MRI and lesion masks) are from completely new cohorts, and none of this data is publicly released in order to evaluate the generalizability of algorithm performance on completely unseen data. In these ways, we aim to reduce the risk of research groups overfitting their data and reporting inflated algorithm performance, with an overall goal of improving the state of the field. We also strongly encourage lesion segmentation challenges to require public sharing of submitted methods to facilitate greater scientific dissemination. In the current paper, we describe the ATLAS v2.0 dataset, along with several lesion segmentation challenge platforms that aim to utilize this dataset.

Methods

Data overview

Similar to our previous ATLAS v1.2 release, the ATLAS v2.0 dataset was aggregated from 44 research cohorts collected for various research purposes, with specific eligibility criteria, and therefore may not be representative of the general population of all patients with stroke. The general purposes of the research cohorts involved were to understand brain-behavior relationships between brain measures, functional outcomes (e.g., sensorimotor impairment, cognitive impairment, mood), and/or response to different therapies after stroke. In the case of intervention or observational studies with longitudinal data, only the first timepoint was included in ATLAS v2.0 (see also Data Records). The data range from acute (within the first 24 hours after stroke) to chronic (more than 180 days after stroke); the time of MRI acquisition in relation to stroke onset is included in the metadata. The data are derived from studies that were approved by their local ethics committee and were conducted in accordance with the 1964 Declaration of Helsinki. Informed consent was obtained from all subjects. The ethics committee at the receiving site (the University of Southern California) approved the receipt and sharing of the de-identified data, which do not contain any personal identifiers.

For each subject file, we first performed quality control of the image. Images were excluded if large motion artifacts or other disruptions made it difficult to identify the lesion. Next, brain lesions were identified, and masks were manually drawn in native space. Our team identified and traced lesions using ITK-SNAP^37,38 (version 3.8.0; Fig. 1; see lesion segmentation details below). After tracing, we reviewed and edited lesion masks as necessary using a standardized quality control protocol. In a subset of the data, lesion masks were received from the originating site and edited and checked for quality by our team. All team members received lesion-tracing training and followed a standard operating protocol for tracing lesions to ensure consistency across tracers¹¹. All lesion masks were checked for quality by two separate trained team members. During the quality control process, we ensured that the boundaries of the lesion segmentation were accurate and that all identifiable lesions in the brain were traced.

ATLAS v2.0 includes all the same subjects as v1.2, with the removal of repeated subjects that had two timepoints (n = 9) so that in ATLAS v2.0, each subject is only represented once. All subject files have undergone a lesion tracing and preprocessing pipeline (Fig. 2) and are named and stored in accordance with the Brain Imaging Data Structure (BIDS) (http://bids.neuroimaging.io/)³⁹. Metadata on scanner information, sample image headers for each cohort, and lesion information for each subject in the training dataset is included in the Supplementary Materials. The metadata also includes time of MRI acquisition in relation to stroke onset in days, where this data was available. However, subject demographic information, such as age, sex, or other clinical outcome measures, is not shared due to privacy concerns and data sharing policies at many of the contributing sites. We acknowledge that this information would greatly enhance the utility of this dataset; we aim to be able to include this information for at least a subset of data, where allowed, in future releases.

Of the 1271 samples, data from 955 samples were randomly split into public training and hidden test datasets across sites, so that the testing set includes a similar multi-site composition as the training set. As mentioned previously, lesion challenges will also have access to 316 samples from new cohorts in order to test the true generalizability of algorithms to completely unseen data. Finally, any previously released data used as part of ATLAS v1.2 was kept as part of the public training dataset to prevent contamination of the test dataset.

Data characteristics

The T1w MRI data were collected on 1.5-Tesla and 3-Tesla MR scanners. All data are high-resolution (e.g., 1 mm³ or higher), with the exception of four cohorts who have at least one dimension with a resolution between 1–2 mm³ (R027, R047, R049, R050). Each cohort was collected on a single scanner using the same parameters except for 2 cohorts (R027, R049). In these cases, the metadata includes an example of each scanning parameter.

The entire dataset (N = 1271) is derived from 44 research cohorts in total. The training (n = 655) and test (n = 300) datasets are derived from the same 33 research cohorts; samples from each cohort are randomly assigned to either training or test datasets so that they will have similar compositions. Thus, algorithms trained on the training dataset should perform well on the test dataset. The generalizability dataset (n = 316) is derived from 11 new cohorts to test the performance of trained algorithms on completely unseen data.

During the review process for each lesion mask, metadata on number of lesions and lesion location (left vs. right hemisphere, cortical vs. subcortical) was manually recorded by a trained team member. This detailed information for each subject can be helpful for sorting the data into subgroups with different lesion characteristics. In the training dataset (n = 655), 61.9% of subjects had only a single lesion, and 38.1% had multiple lesions. Of the total subjects with multiple lesions, 7.2% had multiple lesions contained in either the left or right hemispheres only (noted as “Unilateral”), 18.5% had multiple lesions in both hemispheres (noted as “Bilateral”) and 12.4% had multiple lesions with at least one lesion in either the cerebellum or brainstem (noted as “Other”) (Table 2). Lesions were counted as separate and additional if they were non-contiguous with any other lesion. Lesions were nearly equally distributed between left and right hemispheres, with 57.1% of subjects exhibiting at least one left hemisphere lesion, 58.8% exhibiting one right hemisphere lesion, and 22.9% with one lesion in either the cerebellum or brainstem (noted as “Other”). Lesions were also documented as either subcortical, cortical and white matter, or other. Consistent with the criteria used for ATLAS v1.2, lesions defined as subcortical were contained completely within the white matter and subcortical structures. Lesions defined as cortical and/or white matter indicate that the lesions extend into the cortex; these lesions often also include white matter and/or subcortical structures. Finally, the “Other” category encompasses lesions falling in the brainstem or cerebellum. Among all lesions in the training dataset, 25.5% were cortical and/or in the white matter, 59.7% subcortical, and 14.8% other (Table 3). Corresponding metadata includes this information on lesion number and location for each subject in the training dataset.

Table 2 Lesion number and hemisphere location per subject.

Full size table

Table 3 Lesion location (subcortical vs. cortical).

Full size table

We also provide time of MRI acquisition relative to stroke onset in the metadata in days in a column labeled “days post stroke.” In some cases, the exact number of days between stroke onset and MRI acquisition was not recorded or provided to us. For these participants, a general timeline was included (i.e., MRI was acquired greater than 180 days post-stroke); this was recorded in a column labeled “chronicity” where 180 indicates they are equal to or greater than 180 days post-stroke. Of note, several records did not have this accompanying information, so they have been marked as “NA”. However, we have provided as much data as possible to help inform the evaluation of algorithm performance based on time after stroke.

Metadata information is not provided for individual subjects within the test dataset (n = 300) to avoid biasing algorithms. However, it is presented at a group level. The test dataset is derived from 24 cohorts. Overall, 68.7% of subjects had only a single lesion and 31.3% had multiple lesions. Of the subjects with multiple lesions, 5.3% were marked “Unilateral”, 14.3% were marked “Bilateral”, and 11.7% were marked “Other” (Table 2). Lesions were nearly equally distributed between left and right hemispheres, with 51.7% of subjects exhibiting at least one left hemisphere lesion, 56.3% with at least one right hemisphere lesion, and 22.3% with at least one lesion in either the cerebellum or brainstem (noted as “Other”). Lesions were also documented as either subcortical, cortical, or other (existing in the cerebellum or brainstem). Among all lesions in the testing dataset, 32.0% were cortical and/or in the white matter, 51.7% subcortical, 16.3% other (Table 3). Data characteristics between the training and test datasets were similar.

Finally, metadata is not provided at all for the generalizability dataset (n = 316) to maintain its purpose of evaluating algorithm performance on unseen and unknown data. However, we note that it is comprised of multi-site data collected for research purposes, similar to the training and test datasets.

Training for individuals performing lesion tracing

The research team responsible for the lesion segmentation and quality control followed the same training procedure to the training for the team that created ATLAS v1.2¹¹, with the exception of using ITK-SNAP instead of MRIcron, due to its semi-automated lesion interpolation tool. Training for the lesion identification and tracing process involved study of in-depth neuroanatomy, standardized protocols, instructional videos, and consultations with a neuroradiologist. This protocol includes tracing the same initial set of lesions twice per person, with extensive feedback provided from multiple team members. Our standard operating procedures are freely available online (https://github.com/npnl/ATLAS/). The training manual for ITK-SNAP³⁷ is freely available (http://www.itksnap.org/docs/fullmanual.php) and was also used as part of the lesion tracing process.

Identifying and tracing lesions

For lesion identification, each T1w MRI was opened with ITK-SNAP (Fig. 1) and examined carefully. Tracers also received training in the identification of white matter hyperintensities of presumed vascular origin⁴⁰ and perivascular spaces, which were excluded from the lesion masks as much as possible. Lesions were traced using either a mouse or stylus (i.e., Wacom Intuos Draw). All identified lesions for each subject were contained in a single image file. For lesions spanning a large number of slices (i.e., >50 slices), the “interpolation” tool was used. Upon completion, raw lesion mask files were saved and named according to a BIDS-compliant naming scheme (see also Data Records).

All files were subsequently reviewed for quality control by two additional trained team members. If changes were necessary, edits were conducted by the original tracer. Upon approval, each subject’s raw mask and T1w image were added to the raw/native space dataset, then preprocessed and added to the preprocessed dataset. We recognize that manual tracing is a highly subjective process, even across similarly trained individuals, and we aimed to reduce any amount of tracing differences between tracers through multiple quality control steps. In the current release, we prioritized generating the largest possible dataset for public archiving. However, in a future release, we hope to also release a subset of the data with multiple lesion segmentation masks generated by different tracers. These multiple human ratings for each stroke brain could help establish a baseline for inter-rater variability, given the subjectivity of the task as noted above.

Preprocessing normalization, registration and defacing

In addition to releasing a dataset in native space with no preprocessing (raw; see Data Records below), we also released a preprocessed dataset that is archived with the International Neuroimaging Data-Sharing Initiative (INDI; Fig. 2). Each step in the preprocessing pipeline is identical to ATLAS v1.2, ensuring consistency across ATLAS versions. The pipeline includes intensity normalization and registration to a standardized template. In order to fully de-identify images, we also removed any potentially identifying non-brain data, such as facial images (termed defacing), a common procedure required to fully anonymize an MR brain image. First, we corrected for intensity non-uniformity and performed an intensity standardization step, which was completed with scripts included in the MINC-toolkit (https://github.com/BIC-MNI/minc-toolkit). After this correction, we used MINC tools to linearly register both T1w and lesion segmentation images to an MNI-152 template, which is included in the archive. Finally, we defaced the T1w images using the “mri_deface” tool from FreeSurfer (v1.22) (https://surfer.nmr.mgh.harvard.edu/fswiki/mri_deface). Per BIDS derivatives specifications, the T1w image and corresponding lesion mask are archived with file names of “sub-r***s***_ses-1_space-MNI152NLin2009aSym_T1w.nii.gz” and “sub-r***s***_ses-1_space-MNI152NLin2009aSym_label-L_desc-T1lesion_mask.nii.gz”, respectively (see also Data Records below for more details). Images that were previously excluded from ATLAS v1.2 due to errors in registration¹¹ have now been included after manually correcting and inspecting them. After completion of the preprocessing pipeline, all subject files were visually inspected for quality to ensure correct lesion mask alignment and proper registration to the template (Fig. 3).

Probabilistic spatial mapping of lesion location

To visualize the average distribution of lesions contained in ATLAS v2.0 across the whole brain, we created a probabilistic map of lesions in the public stroke brains from the ATLAS v2.0 training and testing datasets with the MNI template (Fig. 4). This was completed with the mincaverage tool found in the MINC-toolkit (https://github.com/BIC-MNI/minc-toolkit). As noted previously, this may not be representative of all strokes and is only meant to visually demonstrate the voxels identified most commonly as lesioned in our dataset. This map has also been provided in NifTI format and uploaded to NeuroVault.org, where it can be freely accessed (https://neurovault.org/images/706022/).

Data Records

Data are publicly available in preprocessed format (standardized to MNI-152 space) on INDI⁴¹ (fcon_1000.projects.nitrc.org/indi/retro/atlas.html), a free platform for neuroimaging data sharing. Raw data in native space are available on the Archive of Data on Disability to Enable Policy and research⁴² (ADDEP, https://doi.org/10.3886/ICPSR36684.v4), which has a more stringent restricted data use agreement to maintain privacy of the raw data. The metadata denotes whether each subject in the training dataset was previously part of the ATLAS v1.2 release. For the test dataset (n = 300), only the T1w scans, without lesion masks, are released on each platform so that users can test their algorithms on this data and submit their output to lesion segmentation challenges for evaluation. The generalizability dataset (n = 316) is only available for lesion segmentation challenges (see Lesion Segmentation Challenges below). None of the subjects from the previous ATLAS v1.2 release are included in either the test or generalizability datasets.

Data are maintained in BIDS format³⁹. There are 33 cohorts in the training and testing datasets, and within each cohort folder are individual subject folders. We used the following naming convention: sub-r***s*** where r*** represents the research cohort number and s*** represents the subject number. All data are cross-sectional and from a single timepoint, so they all are denoted with “ses-1”. Native space images are labeled as “space-orig” while images normalized to the MNI-152 template are labeled as “space-MNI152Nlin2009aSym”. Finally, the description denotes that the lesion mask was traced from the T1w MRI (versus a different imaging modality, such as FLAIR).

Following BIDS conventions, a lesion mask in native space would be named as such: “sub-r***s***_ses-1_space-orig_label-L_desc-T1lesion_mask.nii.gz” and the corresponding T1w MRI would be named as “sub-r***s***_ses-1_space-orig_T1w.nii.gz.” As noted previously, the T1w MRI and lesion mask in MNI space are noted as: “sub-r***s***_ses-1_space-MNI152NLin2009aSym_T1w.nii.gz” and “sub-r***s***_ses-1_space-MNI152NLin2009aSym_label-L_desc-T1lesion_mask.nii.gz”, respectively.

Technical Validation

The ATLAS v2.0 dataset was developed using similar protocols and methods as the ATLAS v1.2 dataset, which has been successfully utilized to develop numerous lesion segmentation methods for the last several years^{12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28}. For ATLAS v2.0, detailed manual quality control for image quality occurred during the initial lesion segmentation, and all segmentations were examined for quality by two additional researchers. Following preprocessing, lesions were again checked for proper registration to template space. The ATLAS v2.0 dataset has been validated and incorporated into several new lesion segmentation challenges (see Lesion Segmentation Challenges below).

Usage Notes

Data can be accessed under a standard Data Use Agreement, which requires users to agree to use the data only for research or statistical purposes, and not for the investigation of specific research subjects. Users of the ATLAS v2.0 dataset should properly acknowledge the data contributions of the authors and laboratories by citing this article and the specific data repository from which they accessed the data.

As previously noted, manual lesion segmentation can be subjective, and despite our extensive quality control process, errors can still occur. Any issues or feedback can be submitted on the ATLAS Github page under ‘issues’, which will be addressed by our research team (https://github.com/npnl/ATLAS/). Any changes to the data or updates with new data will be released under new ATLAS versions (e.g., v2.1, v2.2), and changes will be posted on Github.

Finally, to accompany ATLAS v2.0, we also have released updated open-source software for analyzing lesions (Pipeline for Analyzing Lesions After Stroke (PALS))²⁸. This software allows users to calculate lesion volume, evaluate lesion overlap with brain regions of interest, and create lesion overlap images (similar to that shown in Fig. 4; see Code Availability).

Lesion segmentation challenges

A key purpose of the ATLAS v2.0 dataset is to provide hidden test data to fairly evaluate the performance of lesion segmentation algorithms. To this end, the ATLAS v2.0 lesion mask test data (n = 300) and generalizability dataset (n = 316 T1w MRIs and lesion masks) are only available for lesion segmentation challenges upon request to the corresponding author. The ideal challenge will provide fast, web-based evaluation, share results on a public leaderboard, and will require public sharing of submitted algorithms with clear usage instructions to advance scientific knowledge within the community and continually improve on the best available algorithms.

Following our ATLAS v1.2 release, we found that a large percentage of users of the ATLAS dataset are students from around the world who used this data to learn how to apply machine learning, deep learning, and/or computer vision methods to this challenging problem. ATLAS v1.2 was used widely for student theses and class projects, as well as for training individuals in algorithm development, and we anticipate that ATLAS v2.0 will be used extensively for these purposes as well. Given the educational interest in ATLAS, a challenge using the ATLAS v2.0 data has been established through a partnership with the Paris-Saclay Center for Data Science using their Rapid Analytics and Model Prototyping (RAMP) project management tool (https://paris-saclay-cds.github.io/ramp-docs/)⁴³. RAMP challenges are open and collaborative web challenges that provide informative starter kits in Python to reduce the barrier of entry for participants⁴³. The starter kits provide background information on the problem as well as basic solution code. The RAMP approach democratizes science by allowing novice data scientists and learners to approach new technical problems by providing the foundational knowledge necessary to get started in the field and giving everyone the same starting point. RAMP challenges consist of a competitive phase, during which participants work individually to solve the problem, and a collaborative phase, during which participants can see each other’s solutions and work together to create the best final solution. Following the competitive phase, participants submit their solutions and code to the RAMP website, where they can see the results of everyone’s submissions. Because code is openly shared in the collaborative phase, participants can learn from one another’s solutions and work together to develop the best combined solution. This collaborative method has been used to successfully address over 20 different scientific challenges and is an excellent educational tool⁴³. More information about the RAMP automated lesion segmentation challenge using ATLAS v2.0 data can be found here: https://ramp.studio/problems/stroke_lesions. This RAMP challenge may also be made available for use by course instructors and can provide a project platform for collaborative learning at events such as Brainhacks, which bring together scientists around the world to work together on challenging brain imaging problems⁴⁴.

ATLAS v2.0 is also part of the Ischemic Stroke Lesion Segmentation (ISLES) Challenge at the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2022 (http://www.isles-challenge.org/). The ISLES challenge is one of the best-known stroke lesion segmentation challenges and has attracted hundreds of researchers to the competition over the years to showcase the performance of novel methods. The ISLES challenge series started in 2015 and has taken place at MICCAI for multiple years, incorporating new datasets and clinical and technical challenges each year⁹. ISLES datasets often serve as benchmarks for the field, and teams are invited to submit their algorithms for publication following the challenge^9,45,46. Adding ATLAS v2.0 to the ISLES challenge introduces stroke data across acute to chronic timepoints into the challenge for the first time and presents a unique single-channel (versus multispectral) imaging challenge. The ISLES 2022 challenge utilizes both ATLAS v2.0 test and generalizability datasets for algorithm evaluation via the Grand Challenge platform (https://atlas.grand-challenge.org/). Importantly, this platform will be used to publicly and automatically evaluate algorithm performance both during ISLES 2022 and after, for ongoing public evaluation. http://www.isles-challenge.org/We also include an accompanying sample solution on Github to assist users in getting started (see Code Availability).

ENIGMA Stroke Recovery receives new stroke MRI data on an ongoing basis, and we continually generate lesion segmentations that can be used as additional test data. New cohort data may be added to our generalizability dataset and used only in lesion segmentation challenges (e.g., expanding on our current n = 316 completely hidden test dataset), and we anticipate sharing additional data in future ATLAS releases. In future challenges, data may also be sorted into small, medium and large lesions, as we previously showed that automated methods performed the worst on small, followed by medium, lesions, and perform the best on large lesions¹⁰. This is likely due to the ease of detection of large lesions boundaries, whereas small lesions can often be missed completely or mistaken for other brain pathology¹⁰. Future challenges may focus on accurate identification of small lesions only, or on improving the accuracy of medium and large lesion segmentation boundaries.

In conclusion, ATLAS v2.0 builds on our previous ATLAS v1.2 release and provides a total archive of 1271 images, including 955 public images, separated into 655 public training cases and 300 test cases, and 316 completely unseen images from new cohorts available only for lesion segmentation challenges. Our primary goal in releasing ATLAS v2.0 is to enable the development of more accurate, robust and generalizable lesion segmentation algorithms using single-channel T1-weighted MR images. We anticipate that the larger sample size, hidden test dataset, generalizability dataset, and collaboration with lesion segmentation challenge platforms will lead to the development of improved lesion segmentation algorithms. The ultimate goal of this work is to increase the reproducibility of stroke MRI studies and facilitate large-scale stroke neuroimaging analyses to inform stroke rehabilitation research.

Code availability

The ATLAS v2 lesion segmentations were generated using ITK-SNAP version 3.8.0. Our protocols for lesion segmentation can be found on our Github (https://github.com/npnl/atlas). Code used to preprocess the dataset were adapted from the MINC-toolkit (https://github.com/BIC-MNI/minc-toolkit). T1w images were defaced using the “mri_deface” tool from FreeSurfer (v1.22) (https://surfer.nmr.mgh.harvard.edu/fswiki/mri_deface). PALS, our open-source software to perform lesion analyses, can be accessed at https://github.com/npnl/PALS. Finally, as part of the MICCAI ISLES 2022 challenge, we provide sample code on our Github (https://github.com/npnl/isles_2022/) to assist users in getting started with the lesion segmentation challenge (e.g., code to obtain the data, load it, and save predictions in the format expected by our automatic evaluator).

References

Liew, S.-L. et al. The ENIGMA Stroke Recovery Working Group: Big data neuroimaging to study brain-behavior relationships after stroke. Human brain mapping 43, 129–148, https://doi.org/10.1002/hbm.25015 (2022).
Article PubMed Google Scholar
Liew, S.-L. et al. Smaller spared subcortical nuclei are associated with worse post-stroke sensorimotor outcomes in 28 cohorts worldwide. Brain Communications, https://doi.org/10.1093/braincomms/fcab254 (2021).
Boyd, L. A. et al. Biomarkers of stroke recovery: Consensus-based core recommendations from the Stroke Recovery and Rehabilitation Roundtable. Neurorehabilitation and neural repair 31, 864–876 (2017).
Article Google Scholar
Feng, W. et al. Corticospinal tract lesion load: An imaging biomarker for stroke motor outcomes. Ann Neurol 78, 860–870, https://doi.org/10.1002/ana.24510 (2015).
Article PubMed PubMed Central Google Scholar
Kim, B. & Winstein, C. Can neurological biomarkers of brain impairment be used to predict poststroke motor recovery? A systematic review. Neurorehabilitation and neural repair 31, 3–24 (2017).
Article Google Scholar
Cassidy, J. M., Tran, G., Quinlan, E. B. & Cramer, S. C. Neuroimaging identifies patients most likely to respond to a restorative stroke therapy. Stroke 49, 433–438 (2018).
Article Google Scholar
Chen, L., Bentley, P. & Rueckert, D. Fully automatic acute ischemic lesion segmentation in DWI using convolutional neural networks. NeuroImage: Clinical 15, 633–643 (2017).
Article Google Scholar
Wu, O. et al. Big data approaches to phenotyping acute ischemic stroke using automated lesion segmentation of multi-center magnetic resonance imaging data. Stroke 50, 1734–1741 (2019).
Article Google Scholar
Maier, O. et al. ISLES 2015-A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Medical image analysis 35, 250–269 (2017).
Article Google Scholar
Ito, K. L., Kim, H. & Liew, S. L. A comparison of automated lesion segmentation approaches for chronic stroke T1‐weighted MRI data. Human brain mapping 40, 4669–4685 (2019).
Article Google Scholar
Liew, S.-L. et al. A large, open source dataset of stroke anatomical brain images and manual lesion segmentations. Scientific data 5, 180011 (2018).
Article Google Scholar
Paing, M. P., Tungjitkusolmun, S., Bui, T. H., Visitsattapongse, S. & Pintavirooj, C. Automated Segmentation of Infarct Lesions in T1-Weighted MRI Scans Using Variational Mode Decomposition and Deep Learning. Sensors 21, 1952 (2021).
Article ADS Google Scholar
Xue, Y. et al. A multi-path 2.5 dimensional convolutional neural network system for segmenting stroke lesions in brain MRI images. NeuroImage: Clinical 25, 102118 (2020).
Article Google Scholar
Qi, K. et al. In International conference on medical image computing and computer-assisted intervention. 247–255 (Springer).
Zhou, Y., Huang, W., Dong, P., Xia, Y. & Wang, S. D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation. IEEE/ACM transactions on computational biology and bioinformatics (2019).
Yang, H. et al. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 266–274 (Springer).
Chen, X., You, S., Tezcan, K. C. & Konukoglu, E. Unsupervised lesion detection via image restoration with a normative prior. Medical image analysis 64, 101713 (2020).
Article Google Scholar
Tomita, N., Jiang, S., Maeder, M. E. & Hassanpour, S. Automatic post-stroke lesion segmentation on MR images using 3D residual convolutional neural network. NeuroImage: Clinical 27, 102276 (2020).
Article Google Scholar
Basak, H., Hussain, R. & Rana, A. DFENet: A Novel Dimension Fusion Edge Guided Network for Brain MRI Segmentation. arXiv preprint arXiv:2105.07962 (2021).
Chen, X., Pawlowski, N., Rajchl, M., Glocker, B. & Konukoglu, E. Deep generative models in the real-world: An open challenge from medical imaging. arXiv preprint arXiv:1806.05452 (2018).
Hui, H., Zhang, X., Li, F., Mei, X. & Guo, Y. A partitioning-stacking prediction fusion network based on an improved attention U-Net for stroke lesion segmentation. IEEE Access 8, 47419–47432 (2020).
Article Google Scholar
Kervadec, H., Dolz, J., Wang, S., Granger, E. & Ayed, I. B. in Medical Imaging with Deep Learning. 365–381 (PMLR).
Liu, X. et al. MSDF-Net: Multi-scale deep fusion network for stroke lesion segmentation. IEEE Access 7, 178486–178495 (2019).
Article Google Scholar
Lu, Y., Zhou, J. H. & Guan, C. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). 1059–1062 (IEEE).
Qi, K. et al. Multi-task MR Imaging with Iterative Teacher Forcing and Re-weighted Deep Learning. arXiv preprint arXiv:2011.13614 (2020).
Sahayam, S., Abirami, A. & Jayaraman, U. In 2020 IEEE 4th Conference on Information & Communication Technology (CICT). 1–6 (IEEE).
Wang, S., Chen, Z., Yu, W. & Lei, B. Brain Stroke Lesion Segmentation Using Consistent Perception Generative Adversarial Network. arXiv preprint arXiv:2008.13109 (2020).
Zhang, Y. et al. MI-UNet: multi-inputs UNet incorporating brain parcellation for stroke lesion segmentation from T1-weighted magnetic resonance images. IEEE Journal of Biomedical and Health Informatics 25, 526–535 (2020).
Article Google Scholar
Deng, L. et al. The SUSTech-SYSU dataset for automatically segmenting and classifying corneal ulcers. Scientific data 7, 1–7 (2020).
Article Google Scholar
Boyne, P. et al. Functional magnetic resonance brain imaging of imagined walking to study locomotor function after stroke. Clinical Neurophysiology 132, 167–177 (2021).
Article Google Scholar
Zavaliangos-Petropulu, A. et al. Testing a convolutional neural network-based hippocampal segmentation method in a stroke population. BioRxiv (2020).
Martins, S. B., Falcao, A. X. & Telea, A. C. In BIOIMAGING. 74–81.
Martins, S. B., Ruppert, G., Reis, F., Yasuda, C. L. & Falcão, A. X. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). 882–885 (IEEE).
Yeo, M. et al. Artificial intelligence in clinical decision support and outcome prediction–applications in stroke. Journal of medical imaging and radiation oncology (2021).
Crum, W. R., Camara, O. & Hill, D. L. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE transactions on medical imaging 25, 1451–1461 (2006).
Article Google Scholar
Pustina, D. et al. Automated segmentation of chronic stroke lesions using LINDA: Lesion identification with neighborhood data analysis. Human brain mapping 37, 1405–1421, https://doi.org/10.1002/hbm.23110 (2016).
Article PubMed PubMed Central Google Scholar
Yushkevich, P. A. & Gerig, G. ITK-SNAP: an intractive medical image segmentation tool to meet the need for expert-guided segmentation of complex medical images. IEEE pulse 8, 54–57 (2017).
Article Google Scholar
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. NeuroImage 31, 1116–1128 (2006).
Article Google Scholar
Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data 3, 160044 (2016).
Article Google Scholar
Wardlaw, J. M. et al. Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration. The Lancet Neurology 12, 822–838 (2013).
Article Google Scholar
Liew, S.-L. et al. Anatomical Tracings of Lesions After Stroke (ATLAS) R2.0. International Neuroimaging Data-Sharing Initiative fcon_1000.projects.nitrc.org/indi/retro/atlas.html (2021).
Liew, S.-L. et al. The Anatomical Tracings of Lesions after Stroke (ATLAS) Dataset - Release 2.0, 2021 (ICPSR 36684). The Archive of Data on Disability to Enable Policy and research (ADDEP) https://doi.org/10.3886/ICPSR36684.v4 (2021).
Article Google Scholar
Kégl, B. et al. The RAMP framework: from reproducibility to transparency in the design and optimization of scientific workflows. (2018).
Gau, R. et al. Brainhack: Developing a culture of open, inclusive, community-driven neuroscience. Neuron 109, 1769–1775 (2021).
Article CAS Google Scholar
Winzeck, S. et al. ISLES 2016 and 2017-benchmarking ischemic stroke lesion outcome prediction based on multispectral MRI. Frontiers in neurology 9, 679 (2018).
Article Google Scholar
Hakim, A. et al. Predicting Infarct Core From Computed Tomography Perfusion in Acute Ischemia With Machine Learning: Lessons From the ISLES Challenge. Stroke, STROKEAHA. 120.030696 (2021).

Download references

Acknowledgements

S.-L.L. is supported by the NIH (R01NS115845; R25HD105583; K01HD091283; P2CHD06570). L.A.B. is supported by the Canadian Institutes for Health Research. A.G.B. is supported by the NHMRC (GNT1020526). C.M.B. is supported by the NIH (R21HD067906; R01NS090677). J.M.C. is supported by the NIH/NICHD (R00HD091375). A.B.C. is supported by the NIH (R01NS076348-01); Hospital Israelita Albert Einstein (grant 2250-14), CNPq/305568/2016-7. A.N.D. is supported by the Texas Legislature to the Lone Star Stroke Clinical Trial Network. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Government of the United States or the State of Texas. F.G. is supported by the Wellcome Trust (093957). B.G.H. is supported the National Health and Medical Research Council fellowship (1125054). S.A.K. is supported by the NIH (P20 GM109040) and the VA (1IK6RX003075). M.S.K. is supported by the NHMRC. H.K. is supported by a BrightFocus Research Grant (A2019052S). B.J.M. is supported by the Canadian Partnership for Stroke Recovery. M.M. is partially supported by ICREA under the ICREA Academia programme. F.P. is supported by the Italian Ministry of Health; Ricerca Corrente 2020, 2021. K.P.R. is supported by the NIH (R21HD067906, R01NS090677). N.J.S. is supported by the NIH/NICHD (R01HD094731), NIH/NIGMS (P20GM109040), and VA (I01RX003066). S.R.S. is supported by the European Research Council (ERC) under the project NGBMI. G.S. is supported by the Italian Ministry of Health RC 20–21/A. M.T. is supported by the Center for Data Science. G.T. is supported by the Temple University sub-award of NIH R24. L.T.W. is supported by The European Research Council under the European Union’s Horizon 2020 research and Innovation program (ERC StG, Grant 802998). C.J.W. is supported by the NIH (R01HD065438). G.F.W. is supported by the VA RR&D Merit Review Program.

Author information

These authors contributed equally: Sook-Lei Liew, Bethany P. Lo.

Authors and Affiliations

Chan Division of Occupational Science and Occupational Therapy, University of Southern California, Los Angeles, CA, USA
Sook-Lei Liew, Bethany P. Lo, Miranda R. Donnelly, Jessica N. Jeong, Alexandre Hutton, Zhizhuo Wang & Aisha Abdullah
Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Sook-Lei Liew, Artemis Zavaliangos-Petropulu, Julia P. Simon, Jun Kim, Tyler Ard & Hosung Kim
Laboratory of Neuroimaging, Mark and Mary Stevens Neuroimaging and Informatics Institutes, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Giuseppe Barisano & Gilsoon Park
Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA
Giuseppe Barisano & Julia M. Juliano
Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, USA
Anisha Suri
Laboratory of Neuropsychiatry, IRCCS Santa Lucia Foundation, Rome, Italy
Nerisa Banaj, Valentina Ciullo, Fabrizio Piras & Gianfranco Spalletta
Department of Rehabilitation Medicine, Emory University School of Medicine, Atlanta, GA, USA
Michael R. Borich & Cathrin M. Buetefisch
Department of Physical Therapy & Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, British Columbia, Canada
Lara A. Boyd
Florey Institute of Neuroscience and Mental Health, University of Melbourne, Melbourne, Victoria, Australia
Amy Brodtmann
Department of Neurology, Emory University, Atlanta, GA, USA
Cathrin M. Buetefisch
Center for the Developing Brain, Child Mind Institute, New York, NY, USA
Lei Cao & Alexandre R. Franco
Department of Allied Health Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jessica M. Cassidy
Hospital das Clínicas, São Paulo University, Sao Paulo, SP, Brazil
Adriana B. Conforto
Hospital Israelita Albert Einstein, Sao Paulo, SP, Brazil
Adriana B. Conforto
Department of Neurology, University of California Los Angeles and California Rehabilitation Institute, Los Angeles, CA, USA
Steven C. Cramer
Department of Psychiatry and Clinical Psychobiology, University of Barcelona, Barcelona, Spain
Rosalia Dacosta-Aguayo
icometrix, Leuven, Belgium
Ezequiel de la Rosa
Department of Computer Science, Technical University of Munich, Munich, Germany
Ezequiel de la Rosa
Functional Imaging Unit, Department of Diagnostic Radiology and Neuroradiology, University of Greifswald, Greifswald, Germany
Martin Domin & Martin Lotze
Departments of Neurology and Diagnostic Medicine, Dell Medical School at The University of Texas Austin, Austin, TX, USA
Adrienne N. Dula
Department of Neurology, Duke University School of Medicine, Durham, NC, USA
Wuwei Feng
Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
Alexandre R. Franco
Department of Psychiatry, NYU Grossman School of Medicine, New York, NY, USA
Alexandre R. Franco
Department of Brain Sciences, Imperial College London, London, UK
Fatemeh Geranmayeh
Center for Data Science, Université Paris-Saclay, Inria, Palaiseau, France
Alexandre Gramfort & Maria Telenczuk
Department of Health Sciences & Research, Medical University of South Carolina, Charleston, SC, USA
Chris M. Gregory, Steven A. Kautz & Na Jin Seo
Cancer Biology, Wake Forest School of Medicine, Winston Salem, NC, USA
Colleen A. Hanlon
Innovation, Implementation and Clinical Translation (IIMPACT) in Health, Allied Health and Human Performance, University of South Australia, Adelaide, South Australia, Australia
Brenton G. Hordacre
Ralph H Johnson VA Medical Center, Charleston, SC, USA
Steven A. Kautz & Na Jin Seo
The Florey Institute of Neuroscience and Mental Health, Heidelberg, VIC, Australia
Mohamed Salah Khlif
Neuroradiology, School of Medicine, Technical University Munich, München, Germany
Jan S. Kirschke
Department of Radiology, Tianjin Medical University General Hospital, Tianjin, China
Jingchun Liu & Chunshui Yu
Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
Bradley J. MacIntosh
Hurvitz Brain Sciences Program, Toronto, Ontario, Canada
Bradley J. MacIntosh
Department of Clinical Psychology and Psychobiology, Institut de Neurociències, Universitat de Barcelona, Barcelona, Spain
Maria Mataró
Institut de Recerca Sant Joan de Déu, 08950, Esplugues de Llobregat, Spain
Maria Mataró
Jefferson Magnetic Resonance Imaging Center, Philadelphia, PA, USA
Feroze B. Mohamed
CatoSenteret Rehabilitation Center, SON, Norway
Jan E. Nordvik
Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
Jan E. Nordvik
Inter-university Consortium for Political and Social Research, University of Michigan, Ann Arbor, MI, USA
Amy Pienta, Shane M. Redman & Alison Sweet
Facility for Education and Research in Neuroscience, Emory University, Atlanta, GA, USA
Kate P. Revill
ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
Mauricio Reyes
Schlegel-University of Waterloo Research Institute for Aging, University of Waterloo, Waterloo, Ontario, Canada
Andrew D. Robertson
Canadian Partnership for Stroke Recovery, Sunnybrook Research Institute, Toronto, Ontario, Canada
Andrew D. Robertson
Department of Rehabilitation Sciences, Medical University of South Carolina, Charleston, SC, USA
Na Jin Seo
Clinical Neurotechnology Laboratory, Dept. of Psychiatry and Neurosciences (CCM), Charité - Universitätsmedizin Berlin, Berlin, Germany
Surjo R. Soekadar
Menninger Department of Psychiatry and Behavioral Sciences, Division of Neuropsychiatry, Baylor College of Medicine, Houston, TX, USA
Gianfranco Spalletta
Department of Physical Therapy and Neuroscience, Samson College of Health Sciences, St. Joseph’s University, Philadelphia, PA, USA
Gregory Thielman
Department of Psychology, University of Oslo, Oslo, Norway
Lars T. Westlye
NORMENT, Department of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
Lars T. Westlye
Division of Biokinesiology and Physical Therapy of the Herman Ostrow School of Dentistry, University of Southern California, Los Angeles, CA, USA
Carolee J. Winstein
Department of Neurology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Carolee J. Winstein
Geriatrics Research, Education and Clinical Center, HERL, Department of Veterans Affairs, Pittsburgh, PA, USA
George F. Wittenberg
Departments of Neurology, PM&R, RNEL, CNBC, University of Pittsburgh, Pittsburgh, PA, USA
George F. Wittenberg
Department of Physical Medicine & Rehabilitation, Dell Medical School, University of Texas at Austin, Austin, TX, USA
Kristin A. Wong
Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, China
Chunshui Yu

Authors

Sook-Lei Liew
View author publications
You can also search for this author in PubMed Google Scholar
Bethany P. Lo
View author publications
You can also search for this author in PubMed Google Scholar
Miranda R. Donnelly
View author publications
You can also search for this author in PubMed Google Scholar
Artemis Zavaliangos-Petropulu
View author publications
You can also search for this author in PubMed Google Scholar
Jessica N. Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Barisano
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Hutton
View author publications
You can also search for this author in PubMed Google Scholar
Julia P. Simon
View author publications
You can also search for this author in PubMed Google Scholar
Julia M. Juliano
View author publications
You can also search for this author in PubMed Google Scholar
Anisha Suri
View author publications
You can also search for this author in PubMed Google Scholar
Zhizhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Aisha Abdullah
View author publications
You can also search for this author in PubMed Google Scholar
Jun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Tyler Ard
View author publications
You can also search for this author in PubMed Google Scholar
Nerisa Banaj
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Borich
View author publications
You can also search for this author in PubMed Google Scholar
Lara A. Boyd
View author publications
You can also search for this author in PubMed Google Scholar
Amy Brodtmann
View author publications
You can also search for this author in PubMed Google Scholar
Cathrin M. Buetefisch
View author publications
You can also search for this author in PubMed Google Scholar
Lei Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jessica M. Cassidy
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Ciullo
View author publications
You can also search for this author in PubMed Google Scholar
Adriana B. Conforto
View author publications
You can also search for this author in PubMed Google Scholar
Steven C. Cramer
View author publications
You can also search for this author in PubMed Google Scholar
Rosalia Dacosta-Aguayo
View author publications
You can also search for this author in PubMed Google Scholar
Ezequiel de la Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Martin Domin
View author publications
You can also search for this author in PubMed Google Scholar
Adrienne N. Dula
View author publications
You can also search for this author in PubMed Google Scholar
Wuwei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre R. Franco
View author publications
You can also search for this author in PubMed Google Scholar
Fatemeh Geranmayeh
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Gramfort
View author publications
You can also search for this author in PubMed Google Scholar
Chris M. Gregory
View author publications
You can also search for this author in PubMed Google Scholar
Colleen A. Hanlon
View author publications
You can also search for this author in PubMed Google Scholar
Brenton G. Hordacre
View author publications
You can also search for this author in PubMed Google Scholar
Steven A. Kautz
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Salah Khlif
View author publications
You can also search for this author in PubMed Google Scholar
Hosung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jan S. Kirschke
View author publications
You can also search for this author in PubMed Google Scholar
Jingchun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Martin Lotze
View author publications
You can also search for this author in PubMed Google Scholar
Bradley J. MacIntosh
View author publications
You can also search for this author in PubMed Google Scholar
Maria Mataró
View author publications
You can also search for this author in PubMed Google Scholar
Feroze B. Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Jan E. Nordvik
View author publications
You can also search for this author in PubMed Google Scholar
Gilsoon Park
View author publications
You can also search for this author in PubMed Google Scholar
Amy Pienta
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Piras
View author publications
You can also search for this author in PubMed Google Scholar
Shane M. Redman
View author publications
You can also search for this author in PubMed Google Scholar
Kate P. Revill
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Andrew D. Robertson
View author publications
You can also search for this author in PubMed Google Scholar
Na Jin Seo
View author publications
You can also search for this author in PubMed Google Scholar
Surjo R. Soekadar
View author publications
You can also search for this author in PubMed Google Scholar
Gianfranco Spalletta
View author publications
You can also search for this author in PubMed Google Scholar
Alison Sweet
View author publications
You can also search for this author in PubMed Google Scholar
Maria Telenczuk
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Thielman
View author publications
You can also search for this author in PubMed Google Scholar
Lars T. Westlye
View author publications
You can also search for this author in PubMed Google Scholar
Carolee J. Winstein
View author publications
You can also search for this author in PubMed Google Scholar
George F. Wittenberg
View author publications
You can also search for this author in PubMed Google Scholar
Kristin A. Wong
View author publications
You can also search for this author in PubMed Google Scholar
Chunshui Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors reviewed, edited, and approved the manuscript. S.-L.L. conceptualized the dataset, led the data harmonization effort, oversaw the lesion segmentation and preprocessing steps, initiated the archives and lesion challenges, and wrote and edited the paper. B.L. managed the data harmonization, lesion segmentation and preprocessing steps, organized the data, and wrote and edited the paper. S.-L.L., B.L., M.R.D., A.Z.P., J.N.J., J.P.S., J.M.J., A.S., Z.W., A.A. and J.K. organized the data, performed lesion segmentation, trained lesion tracers, and/or performed quality control on the data. G.B. provided medical expertise for lesion detection and segmentation. A. H., J.P.S. and H.K. developed and implemented the preprocessing pipeline. B.L., G.P. and H.K. reviewed and compiled data on existing automated lesion segmentation methods using ATLAS v1.2. A.H., M.T. and A.G. developed the RAMP challenge. E.d.l.R., M.R. and J.S.K. developed the ISLES challenge. A.P., S.M.R. and A.S. developed and manage the ADDEP archive. A.R.F. and L.C. developed and manage the INDI archive. T.A. created the probabilistic overlay and data visualizations. S.-L.L., N.B., M.R.B, L.A.B., A.B., C.M.B., J.M.C., V.C., A.B.C, S.C.C., R.D.-A., M.D., A.N.D., W.F., A.R.F., F.G., C.M.G., C.A.H., B.G.H., S.A.K., M.S.K., J.L., M.L., B.J.M., M.M, F.B.M., J.A.N., F.P., K.P.R., A.D.R., N.J.S., S.R.S., G.S., G.T., L.T.W., C.J.W., G.F.W., K.A.W. and C.Y. acquired, de-identified, and shared the stroke MRI data used in this dataset.

Corresponding author

Correspondence to Sook-Lei Liew.

Ethics declarations

Competing interests

A.G.B. serves on the Biogen Australia Dementia Scientific Advisory Committee and editorial boards of Neurology and International Journal of Stroke. S.C.C. serves as a consultant for Abbvie, Constant Therapeutics, MicroTransponder, Neurolutions, SanBio, Panaxium, NeuExcell, Elevian, Medtronic, and TRCare. E.d.L.R. is employed by icometrix. C.A.H. serves on the Advisory Board for Welcony Magstim and as a Consultant for Brainsway. B.G.H. has a clinical partnership with Fourier Intelligence. J.S.K. is cofounder of BoneScreen GmbH. G.F.W. serves on the Scientific Advisory Board for Myomo, Inc.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liew, SL., Lo, B.P., Donnelly, M.R. et al. A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms. Sci Data 9, 320 (2022). https://doi.org/10.1038/s41597-022-01401-7

Download citation

Received: 20 December 2021
Accepted: 19 May 2022
Published: 16 June 2022
DOI: https://doi.org/10.1038/s41597-022-01401-7

This article is cited by

A large public dataset of annotated clinical MRIs and metadata of patients with acute stroke
- Chin-Fu Liu
- Richard Leigh
- Andreia V. Faria
Scientific Data (2023)
On the Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural Networks
- Saeed Iqbal
- Adnan N. Qureshi
- Tariq Mahmood
Archives of Computational Methods in Engineering (2023)
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset
- Moritz R. Hernandez Petzsche
- Ezequiel de la Rosa
- Jan S. Kirschke
Scientific Data (2022)