The BrainLat project, a multimodal neuroimaging dataset of neurodegeneration from underrepresented backgrounds

The Latin American Brain Health Institute (BrainLat) has released a unique multimodal neuroimaging dataset of 780 participants from Latin American. The dataset includes 530 patients with neurodegenerative diseases such as Alzheimer’s disease (AD), behavioral variant frontotemporal dementia (bvFTD), multiple sclerosis (MS), Parkinson’s disease (PD), and 250 healthy controls (HCs). This dataset (62.7 ± 9.5 years, age range 21–89 years) was collected through a multicentric effort across five Latin American countries to address the need for affordable, scalable, and available biomarkers in regions with larger inequities. The BrainLat is the first regional collection of clinical and cognitive assessments, anatomical magnetic resonance imaging (MRI), resting-state functional MRI (fMRI), diffusion-weighted MRI (DWI), and high density resting-state electroencephalography (EEG) in dementia patients. In addition, it includes demographic information about harmonized recruitment and assessment protocols. The dataset is publicly available to encourage further research and development of tools and health applications for neurodegeneration based on multimodal neuroimaging, promoting the assessment of regional variability and inclusion of underrepresented participants in research.


Background & Summary
Dementia and neurodegenerative diseases significantly impact patients, families, the economy, and public health systems worldwide.However, such impact, coupled with prevalence, underdiagnosis, and assessment, is unequal.Latin America is one of the most unequal regions in the world, with a lack of adequate dementia diagnosis and care [1][2][3][4] .The current prevalence of dementia in LACs is estimated at 8.5% and is projected to be 19.33% by 2050, representing an increase of 220% approximately.Such prevalence is higher compared to other regions 5,6 including Europe (current 6.9% and projected up to 7.7% by 2050) or North America (current 6.5% and projected up to 12.1% by 2050) 4,5,[7][8][9][10] Paradoxically, most global research on neurodegeneration is underrepresented in terms of Latino populations 4,8,[11][12][13][14] Most literature arises predominantly from the US, Europe, and other regions with high-income settings.Despite the pressing need to evaluate regional diversity and provide tailored evidence for underrepresented samples 2,[15][16][17][18] , current scientific findings on neurodegeneration in Latin America do not meet this requirement.The situation seems more urgent given the recent evidence that the so-called non-stereotypic populations 15 (participants from underrepresented populations in admixtures, genetics, cultural backgrounds, and demographics) defy the generalization of brain-phenotype models from stereotypical populations [19][20][21][22] .Thus, to evaluate diversity in dementia research is an immediate and significant gap that needs to be addressed.
Developing affordable, scalable, and widely available biomarkers is crucial for early diagnosis and intervention, specially Latin America 4,8,[11][12][13][14] .While several multimodal neuroimaging databases and consortia for neurodegeneration exist (e.g., ADNI, LONI, HCP, UK Biobank, CAMCAN, ABCD, PPMI, ENIGMA), there is a lack of datasets from underrepresented, non-stereotypical samples, and few databases include EEG data.EEG is an advantageous technique for assessing neurodegeneration due to its cost-effectiveness, accessibility, scalability, and applicability to underserved populations.The opportunity to evaluate brain dynamics and networks with www.nature.com/scientificdatawww.nature.com/scientificdata/combined spatiotemporal methods represents a significant advance for clinical assessment23,24, as well as multimodal imaging and computational approaches to neuroscience25-27.However, to our knowledge, no other open datasets of multiple neurodegenerative diseases include resting-state recordings with high spatial (fMRI) and temporal (EEG) resolution.
The BrainLat dataset 28 (Fig. 1) is a pioneering dataset that addresses these gaps by providing data from a diverse group of Latin American patients with various neurodegenerative diseases, including Alzheimer's disease (AD), behavioral variant frontotemporal dementia (bvFTD), Parkinson's disease (PD), multiple sclerosis (MS), and healthy controls.It is a regional effort designed as a multicentric study with harmonized recruitment and neurocognitive assessment, led by the Latin American Brain Health Institute (BrainLat) 29 and the Multi-partner consortium to explore dementia research in Latin America (ReDLat) 10,30,31 with the support of various stakeholders.Details for harmonizing per the ReDLat procedures (recruitment and neurocognitive assessment) include a site manual, a checklist, and a tutorial, all available elsewhere 30 .
Along with cognitive and sociodemographic information, the BrainLat dataset 28 includes anatomical MRI, resting-state fMRI, and resting-state EEG.Neuroimaging records have not been harmonized to allow dataset users to conduct custom analyses.Nevertheless, different post-recording harmonization (w-and z-scores, confusion matrices, data transformation/normalization, optimizers, and k-folds validation) have been successfully applied in this data 32,33 .Thus, the BrainLat dataset 28 has been utilized for understanding neurodegeneration and developing multimodal markers  .
By making the BrainLat dataset 28 openly accessible, the project aims to encourage additional analyses and data exploitation.This dataset 28 is the first to be released from a larger multicentric initiative, the Euro-LAD EEG consortium 60 , a Global EEG Platform for dementia research inclusive of diverse and underrepresented data.We hope this dataset 28 will allow the future development of normative EEG datasets based on harmonized multicentric data, assessing sociodemographic variability, and promoting the development of tools and health applications for neurodegeneration based on multimodal neuroimaging.
Latin American populations display extensive heterogeneity triggered by the unique combination of genetic and environmental (i.e., socioeconomic) differences 3,9 .This open-access dataset 28 fosters collaboration and facilitates the identification of new biomarkers, ultimately contributing to advancements in understanding and treating neurodegenerative diseases.While genetics and socioeconomic status information are not currently included in the BrainLat dataset 28 , we anticipate that these will be available upon completing the ReDLat protocol by 2026, when the dataset will be updated.

participants.
The BrainLat dataset 28 contains neuroimaging and cognitive data from 780 subjects, including patients with AD (N = 278), bvFTD (N = 163), PD (N = 57) and MS (N = 32), and HCs (N = 250).Participants were enrolled in clinical sites from the Multi-Partner Consortium to Expand Dementia Research in Latin America (ReDLat), a regional effort to harmonize participant enrollment and neurocognitive assessment in multicentric studies 10,30 .Five ReDLat countries were included (Argentina, Chile, Colombia, Mexico, and Peru, see Table 1).The demographic information of the BrainLat dataset 28 (global information) is presented in Table 2, while the information split for the recruitment sites is presented in Table 3 and stored in BrainLat_Demographic.csv.There was limited information available on the age of the participants at the onset of the disease.Consequently, the duration of the disease is not reported.
As noted above, the BrainLat dataset 28 included MS patients, where primary mechanisms are considered to have a larger inflammatory component compared to AD, bvFTD, and PD.Nonetheless, incorporating MS in the dataset holds significant relevance.Comparisons between MS and other neurodegenerative diseases are relevant and frequently reported 47,61,62 .Although the pathophysiological pathways differ, insightful comparisons between these conditions can be made.By leveraging multivariate data, comprehensive analyses can be performed to delineate shared and unique disease patterns [63][64][65][66] .Moreover, recent insights have emphasized shared mechanisms across different neurodegenerative diseases, including the role of inflammatory pathways [65][66][67][68] .Furthermore, the flexible nature of the dataset design allows for analyses to be conducted with patient groups combined or separated.This offers the opportunity to observe MS alone or in comparison with other conditions, providing a rich perspective in understanding complex neurodegenerative pathways.
Ethics.The institutional ethic boards of each recruitment site provided ethical approval for collecting and sharing data.The ethics approval reference codes for each participating institution (Table 1) are listed below.www.nature.com/scientificdatawww.nature.com/scientificdata/The ethics approvals were granted in accordance with the ethical regulations and guidelines of the countries where the centers are located, and in compliance with the Declaration of Helsinki.
On their first visit to the recruitment centers, participants were provided with both oral and written explanations about objectives, risks, and benefits of the study.Afterwards, participants proceed to sign a written consent form (Fig. 1).Patients were accompanied by a relative or legal representative, who signed the informed consent when necessary.The informed consent provided by the participants included for the open publication of the anonymized data.Consequently, participants were educated about processing information to protect the confidentiality of personally identifiable information.Information about sharing and publication of anonymized data was provided.For anonymization, the participants' names were replaced by a code (section Usage Notes), and MRI images were defaced (section Data Records).

Recruitment, inclusion criteria, clinical and cognitive assessments.
Information about the study was spread through networks of the recruitment centers and social media.The target audience was the HCs, patients with neurodegenerative diseases, and their families.The inclusion and exclusion criteria of the participants are outlined below.These criteria were reviewed and agreed upon by clinicians of the ReDLat consortium 30 .
The inclusion criteria for controls (HCs) were: • Possessing a Modified Clinical Dementia Rating (CDR) = 0 and a Mini-Mental State Examination (MMSE) score >25.• Meeting the criteria for fluency in Spanish (judged by the evaluator as sufficient to complete the assessment).www.nature.com/scientificdatawww.nature.com/scientificdata/• Having adequate visual and auditory acuity to complete cognitive testing.
• Not having any proven history of substance abuse, or neurological or psychiatric disorders.
The inclusion criteria for participants with neurodegeneration were: • Having a clinical diagnosis of mild/moderate AD, bvFTD, PD, or MS.When needed, the diagnosis was supported by neuroimaging assessment (routine MRI or hypoperfusion/hypometabolism SPECT or PET).• Meeting criteria for fluency in Spanish (judged by the evaluator as sufficient to complete the assessment).
• Must have adequate visual and auditory acuity for cognitive testing.
• For patients with dementia (AD and bvFTD): having an informant who maintained frequent contact with the participant (e.g., family member, partner, friend, caregiver).The informant should be familiar with the participant's daily activities and able to provide information on the participant's cognitive and functional status.
The duration of acquaintance with the patient should be at least six months.• Being able to sign the informed consent or be accompanied by an authorized representative who could do so.
• Clinically significant ischemic or hemorrhagic cerebrovascular disease, diffuse confluent white matter lesions (Fazekas Grade 3), intra or extra-axial masses revealed by MRI that compress brain parenchyma and that may affect cognition and/or behavior or may confound imaging analysis.• Deficiency of B12 (B12 < normal), hypothyroidism (TSH >150% of normal), HIV infection, renal insufficiency (creatinine >2), liver insufficiency (AST >2x normal), respiratory insufficiency (requiring oxygen), other significant systemic diseases (as judged by the attending neurologist).www.nature.com/scientificdatawww.nature.com/scientificdata/criteria for probable bvFTD 70 , or the criteria of the United Kingdom Parkinson's Disease Society Brain Bank (PDSBB) for PD 71 .Patients with MS were diagnosed by experts, considering standard clinical examination, magnetic resonance imaging, and lumbar puncture when necessary 40 .
Patients with AD and bvFTD were functionally impaired, as verified by caregivers.The AD patients were all sporadic, except for those recruited by one of the Colombian sites, who had PSEN1 mutations.The PD and AD groups had typical disease presentations, apart from the AD patients with PSEN1mutations that exhibited early-onset symptoms.The BraiLat dataset 28 does not include records of late-onset AD or other atypical disease presentations.Additionally, participants with bvFTD exhibited noticeable changes in personality and social behavior.Participants with PD received levodopa treatment and were evaluated during the 'on' phase.Further details regarding this medication are unavailable.
ñA comprehensive assessment of the neurological, neuropsychiatric, and neuropsychological domains of the participants was conducted by ReDLat experts using semi-structured interviews and standardized cognitive and functional tests.The evaluation lasted up to three hours and comprised the test described below.The cognitive outcomes are stored in BrainLat_Cognition.csv.

Frontotemporal lobar degeneration-modified clinical dementia rating (FTDL-CDR).
The FTDL-CDR is a 5-point scale characterizing six cognitive and functional domains: memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care 73 .Additionally, it is used for assessing behavioral and motor domains in the case of the frontotemporal dementia spectrum.Only bvFTD patients were evaluated with this instrument. 74is a revised and expanded version of UPDRS 75 consisting of twenty questions that needed to be answered by the patient or caregiver.The MDS-UPDRS has four parts, with part III dedicated to motor complications.The stage of the disease was rated with the Hoehn & Yahr (H&Y) scale 76 .to the distribution of disability in patients with comparable disease durations for detecting rates of disease progression.Only MS patients were evaluated with this instrument.

Cognitive tools. The montreal cognitive assessment (MoCA).
The MoCA 79 is a cognitive screening for tracking mild cognitive impairment.The MoCa comprises 30 points evaluating short-term memory, visuospatial abilities, multiple aspects of executive functions, attention, memory, and working memory, language abilities, and orientation to time and place.Its maximum score is 30, with higher scores indicating better performance.All participants were evaluated with this tool.

The ineco frontal screening (IFS).
The IFS 80 is a tool for screening executive function in patients with neurodegenerative diseases.The IFS evaluates response inhibition and set shifting, the capacity of abstraction, and working memory.The maximum score on the test is 30, with higher scores indicating better performance.All participants were evaluated with this tool.

Facial emotion recognition (FER).
In this task, participants identify emotional expressions depicted in a series of photos 39 (thirty-five faces selected from the emotion face set 81 ).Participants are instructed to associate faces with one of six possible emotions (happiness, surprise, sadness, fear, disgust, anger) or a neutral expression.A score (max.15) is calculated from the percentage of correct responses.All HCs, AD, bvFTD, and PD participants were evaluated with this tool.

Functional ability assessments. Functional activities questionnaire (FAQ)
. is a 10-item rating scale that measures instrumental activities of daily living (such as preparing meals and personal finance) 82 .A score above 9 suggests a possible impaired function and possible cognitive impairment.All HCs, AD, bvFTD, and PD participants were evaluated with this tool.
Frontotemporal dementia rating scale (FRS). is a 30-item scale that evaluates severity in people with dementia 83,84 .Scores from 1.92 to −2.58 indicate a moderate/severe disease stage and from −2.58 to −6.66 very severe/profound disease stage.Only bvFTD participants were evaluated with this tool.
All clinical, cognitive, and functionality assessments are provided as raw data.However, these can be normalized and harmonized for comparisons as performed elsewhere with the current data 31 .
Neuroimaging data.EEG and MRI were acquired within 6 months after the neurological evaluation (second and third visits of the participants to each recruitment site), following the ReDLat protocol 10,30 .The duration of the assessment includes up to 2 hours for EEG, and up to 1 hour for MRI.
The duration of the assessment includes up to 2 hours for EEG, and up to 1 hour for MRI.Global information about the neuroimaging modalities and the data split by recruitment sites are presented in Tables 4, 5. Noteworthy, as in other available datasets, ours has some missing data.For most of the participants, one (MRI) or two (MRI + EEG) neuroimaging modalities were acquired (Tables 4, 5).Nevertheless, EEG was the only neuroimaging modality acquired in a reduced group of participants (Tables 4, 5).Reasons for missing data include the different objectives of the studies for which data was initially acquired, technological constraints, and the use of varied data storage formats.Detailed information about the neuroimage modalities acquired from each participant is provided in BrainLat_records.csv, which is deposited on Synapse.

EEG recordings.
Both EEG acquisition and processing parameters are summarized in Table 6.Participants were seated in a comfortable chair inside a dimly lit, sound-attenuated, and electromagnetically shielded EEG room and instructed to remain still and awake.Ongoing (resting-state), eyes-closed EEG was recorded for ten minutes using the same amplifier across centers, a 128-channel Biosemi Active-two acquisition system (pin-type www.nature.com/scientificdatawww.nature.com/scientificdata/active, sintered Ag-AgCl electrodes).The reference electrodes were set to linked mastoids.Furthermore, external electrodes were placed in periocular locations to record blinks and eye movements.Analog filters were set at 0.03 and 100 Hz.The EEG was monitored online for detecting drowsiness, and myogenic and sweat artifacts.
The EEG was processed offline using an in-house pipeline built upon pre-existing EEGLab functions 85 .Only basic steps were implemented (i.e., re-referencing, filtering, and eliminating bad channels) to allow dataset users to conduct custom analyses.The row data (*.bdfextension) was imported into EEGLab using the BDFimport plugging and processed in the *.set extension (default EEGLab extension).Recordings were re-referenced to the average of all channels (average reference), and band-pass filtered between 0.5 and 40 Hz using a zero-phase shift Butterworth filter of order = 8.Data were down sampled to 512 Hz, and Independent Component Analysis (ICA) was used to correct EEG artifacts induced by blinking and eye movements.Malfunctioning channels were identified using a semiautomatic detection method and replaced using weighted spherical interpolation.www.nature.com/scientificdatawww.nature.com/scientificdata/MRI acquisition.The MRI neuroimages were acquired with 1.5 or 3 Tesla scanners.The list of scanner models and institutions can be found in Table 7. T1-MPRAGE anatomical scans were acquired using a T1-weighted volumetric magnetization-prepared rapid gradient echo sequence.Diffusion and T2-FLAIR images were obtained through T2-and diffusion-weighted images, respectively.The number of slices depended on the acquisition protocol.Resting-state functional MRI completed eyes-open resting state multi-echo BOLD functional scans.Participants were instructed to remain still, keeping their eyes open, with normal breathing to reduce motion artifacts.Resting-state data were recorded using a multi-echo EPI sequence.While individual information has not been incorporated within the main body of the text due to its substantial volume, the details of the acquisition parameters for all subjects are available in the *.json files.

Data Records
The neuroimaging data is hosted in the Synapse project "BrainLat-dataset" 28 .This is accompanied by the anonymized demographic information, and both cognitive and functional outcomes.Information is presented in *csv files (plain text, comma-separated values).Additionally, a dictionary containing all column headers from the demographic, cognitive, and neuroimaging csv files has been included in Synapse.
The neuroimaging data is organized according to the Brain Imaging Data Structure (BIDS) specifications 86 to address the heterogeneity of data organization and follow the FAIR principles of findability, accessibility, and interoperability 87 while protecting personal information.Initially developed to organize MRI data, the BIDS format has been extended to other neuroimaging modalities, including EEG.Accordingly, EEG data was converted into EEG-BIDS 88 .Conversion of the original files (i.e., e *.dcm for MRI and *.set for EEG) into the BIDS format was made using BIDScoin (for MRI) 89 and the BIDS EEGLAB plugging 88 (for EEG).For cases where MRI and EEG data were available from the same participant, the -MRI-BIDS and EEG-BIDS were combined in a single structure.The BIDS structures were validated using BIDS Validator v1.11.0 (https://bids-standard. github.io/bids-validator/). Personal information was removed from the EEG recordings during the EEG-BIDS conversion.The different MRI data were defaced using PyDeface 2.0.0 via Docker v4.12.0 (https://github.com/poldracklab/pydeface).
An example of the directory tree after structuring files according to the BIDS format is presented in Fig. 2. Participants' data from the same group are stored in the same folder.For a given participant, the data of the different neuroimaging modalities are presented separately, being subfolders named "anat", "func", "dwi", and "eeg".The name of the files containing the data begins with the "sub-" index, followed by the letter "P" and two letters referring to the PI responsible for the data acquisition (indicating the recruitment site).The name ends with the number of the subject (e.g., "00035"), followed by a string of characters indicating the neuroimaging modality.In individual folders, the files *.json contain information about the dataset and participants.

technical Validation
Quality checks included the implementation of standardized protocols for recruitment and psychophysiological assessment and quality control during the acquisition of neuroimaging data.range; b) identification of the required control profiles to maintain SD < 2-3 for each match; c) searching for controls to meet the required parameters, such that HCs were matched for age, sex, and education with patients.
Diagnosis and psychological assessment.Multidisciplinary teams made the diagnosis as part of an ongoing multicentric protocol 38,90 .The cognitive and functional status were assessed following the standard protocols implemented by ReDLat 30 .Evaluators received a clinical certification from board-certified neurologists after completing training and a monitoring process to use standard procedures.

EEG.
Incidences during the EEG acquisition were annotated for further visual inspection.Bad channels were detected using semiautomatic algorithms based on threshold amplitude.Automatic channel rejection and interpolation were implemented.On average, 3.2 ± 1.1 channels were interpolated per recording.Certified experts supervised the quality of the recording.

MRI.
The quality control metrics for the T1w and functional BOLD MRI scans were computed by the MRIQC package 91 , which outputs several quality control metrics of different aspects of the data.These quality control metrics are stored in group_T1w.tsvand group_bold.tsv in the derivatives/mriqc folder.

Fig. 1
Fig. 1 The BrainLat multimodal dataset of neurodegenerative diseases.The figure summarizes the entire protocol, encompassing various centers, participant groups, diagnostic criteria, cognitive assessments, and EEG and MRI recordings.The activities carried out by the participants during their three visits to the clinical center are also depicted.For the EEG session, the figure illustrates the key steps in the processing pipeline.Session three summarizes the different MRI recordings (anatomical, functional, and diffusion MRI).The recruitment sites included the INNN: Instituto Nacional de Neurología y neurocirugía, Ciudad de México, Mexico; INCMN: Geriatrics Department, Instituto Nacional de Ciencias médicas y nutrición Salvador Zubirán, Mexico City, Mexico; AI-PUJB: Aging Institute, Pontificia Universidad Javeriana, Bogotá, Colombia; UCIDP-IPN: Unit Cognitive Impairment and Dementia Prevention, Peruvian Institute of Neurosciences, Lima, Peru; CICA: Centro de Investigación Clínica Avanzada (CICA) Hospital Clínico Universidad de Chile, Chile: GERO: Neurology Department, Geroscience Center for Brain Health and Metabolism, Santiago, Chile; CNC-UdeSA Centro de Neurociencia Cognitiva, Universidad de San Andrés, Argentina.AD: Alzheimer's disease, bvFTD: behavioral variant frontotemporal dementia, PD: Parkinson's disease, MS: Multiple sclerosis, HCs: older healthy controls.

Fig. 2
Fig.2Illustrative diagram of the BrainLat dataset's directory tree, organized according to the BIDS format.For MRI data, anatomical (anat), diffusion-weighted (dwi), and functional (funct) images are stored in specific files.The same applies to the EEG data.

•
69sic clinical criteria for other types of dementia or other neurological disorders.•Inabilitytocommunicate in Spanish.Patients fulfilled either the current criteria of the National Institute of Neurological Disorders and Stroke-Alzheimer Disease and Related Disorders (NINCDS-ADRDA) working group for probable AD69, the revised

Table 1 .
List of sites contributing to the BrainLat dataset.Country codes meet the standards of the International Organization for Standardization (ISO).MX: Mexico, CL, Chile, CO: Colombia, PE: Perú, AR: Argentina.PI: principal investigator.

Table 2 .
Demographic information of the BrainLat dataset.Age and years of formal education are presented as mean (standard deviation).Sex is the ratio between females (F) and males (M).HbP: Handedness by preference (self-referenced handedness).The symbol *indicates the field contains missing information.The symbolindicates data is not available.AD: Alzheimer's disease, bvFTD: behavioral variant frontotemporal dementia, PD: Parkinson's disease, MS: Multiple sclerosis, HCs: healthy controls.

Table 3 .
77mographicThe multiple sclerosis severity score (MSSS).The MSSS77relates scores of the Expanded Disability Status Scale (EDSS) information of the BrainLat dataset split by recruitment site.Age and years of formal education are presented as mean (standard deviation).Country codes meet the standards of the International Organization for Standardization (ISO).Sex is the ratio between females (F) and males (M).HbP: Handedness by preference (self-referenced handedness).The symbol * indicates the field contains missing information.The symbol -indicates data is not available.AD: Alzheimer's disease, bvFTD: behavioral variant frontotemporal dementia, PD: Parkinson's disease, MS: Multiple sclerosis, HCs: healthy controls, AR: Argentina, CL, Chile, CO: Colombia, CNC-UniSA: Centro de Neurociencias Cognitivas, Universidad de San Andrés, Gero-CMYN: Clínica de Memoria y Neuropsiquistría, Centro de Gerociencia, Salud Mental y Metabolismo.

Table 4 .
Global neuroimaging information of the BrainLat dataset.Imaging information is presented for anatomical (MRI), functional (fMRI), and diffusion-weighted (DWI) resonance magnetic imaging.The magnetic field of the scan (scan) is presented.N: number of subjects, AD: Alzheimer's disease, bvFTD: behavioral variant frontotemporal dementia, PD: Parkinson's disease, MS: Multiple sclerosis, HCs: healthy controls.

Table 6 .
Equipment and technical parameters for EEG acquisition and processing.

Table 7 .
Equipment used for the MRI acquisition.