Native American ancestry significantly contributes to neuromyelitis optica susceptibility in the admixed Mexican population

Neuromyelitis Optica (NMO) is an autoimmune disease with a higher prevalence in non-European populations. Because the Mexican population resulted from the admixture between mainly Native American and European populations, we used genome-wide microarray, HLA high-resolution typing and AQP4 gene sequencing data to analyze genetic ancestry and to seek genetic variants conferring NMO susceptibility in admixed Mexican patients. A total of 164 Mexican NMO patients and 1,208 controls were included. On average, NMO patients had a higher proportion of Native American ancestry than controls (68.1% vs 58.6%; p = 5 × 10–6). GWAS identified a HLA region associated with NMO, led by rs9272219 (OR = 2.48, P = 8 × 10–10). Class II HLA alleles HLA-DQB1*03:01, -DRB1*08:02, -DRB1*16:02, -DRB1*14:06 and -DQB1*04:02 showed the most significant associations with NMO risk. Local ancestry estimates suggest that all the NMO-associated alleles within the HLA region are of Native American origin. No novel or missense variants in the AQP4 gene were found in Mexican patients with NMO or multiple sclerosis. To our knowledge, this is the first study supporting the notion that Native American ancestry significantly contributes to NMO susceptibility in an admixed population, and is consistent with differences in NMO epidemiology in Mexico and Latin America.

www.nature.com/scientificreports/ ancestry than controls (68.1% vs 58.6%; p = 5 × 10 -6 ). GWAS identified a HLA region associated with NMO, led by rs9272219 (OR = 2.48, P = 8 × 10 -10 ). Class II HLA alleles HLA-DQB1*03:01, -DRB1*08:02, -DRB1*16:02, -DRB1*14:06 and -DQB1*04:02 showed the most significant associations with NMO risk. Local ancestry estimates suggest that all the NMO-associated alleles within the HLA region are of Native American origin. No novel or missense variants in the AQP4 gene were found in Mexican patients with NMO or multiple sclerosis. To our knowledge, this is the first study supporting the notion that Native American ancestry significantly contributes to NMO susceptibility in an admixed population, and is consistent with differences in NMO epidemiology in Mexico and Latin America.
Neuromyelitis optica (NMO) is a chronic autoimmune inflammatory and demyelinating disease of the central nervous system (CNS), which mainly affects the optic nerve and spinal cord. Although NMO was first described in the XIX century, it was considered a clinical variant of multiple sclerosis (MS) for decades 1,2 . In 2004, the discovery of positive antiaquaporin-4 antibodies (AQP4-IgG) in serum of the majority of NMO patients led to significant progress in the clinical characterization of the disease, now acknowledged as a distinct entity with different immunological, clinical and epidemiological features 3,4 .
Although it has been difficult to establish the actual prevalence of NMO, mainly because most reports are not comparable due to differences in study design, methodological approaches, and diagnostic criteria, worldwide NMO prevalence has been estimated between 0.51 and 4.4 cases per 100,000 inhabitants 6,7 . The prevalence and clinical manifestations of NMO vary among different ethnic groups, and several authors have stated that NMO is more frequent in non-European populations 8,9 . Interestingly, the relative frequency of NMO (estimated as the ratio of NMO/(MS + NMO) cases) has been found to decrease gradually in South America from North (Venezuela) to South (Argentina). Because ethnicity also changes gradually from North to South in this region, with the proportion of European individuals being lower in Venezuela and higher in Argentina, the authors suggested that ethnic origin influences NMO frequency in Latin America 10 . To date there are no population-based studies of the prevalence of NMO in Mexico, and there is a single study estimating NMO prevalence at 1.3 per 100,000 inhabitants based on the NMO/(MS + NMO) relative frequency at a referral center in Mexico City 12 .
Like many other autoimmune diseases, NMO is a multifactorial disorder that results from complex interactions between genetic and environmental factors. Recent studies have reported associations of NMO with genetic variation in the Human Leukocyte Antigen (HLA) genome region in chromosome 6, particularly with class II alleles, showing ethnical and geographical differences: The DRB1*03:01 allele has been associated with NMO in European [13][14][15][16] 13,24 ; and DRB1*04:05 in Southern Brazilians 23 . Moreover, candidate gene studies have reported associations with variation in non-HLA genes such as AQP4 and others involved in immune function (PD-1, IL-17, IL-7R, CD6 and CD58) [24][25][26][27][28] . Of the latter, only AQP4 gene variation has been analyzed in various populations by sequencing promoter and/or coding regions of the gene attempting to identify variants involved in the pathogenesis of NMO. However, the association of AQP4 gene variation with NMO remains unclear, with inconsistent findings among populations [29][30][31][32][33][34] .
Because the Mexican population of today resulted from a complex and ongoing admixture process involving mainly Native American and European genetic components, we hypothesized that Native American ancestry also contributes to NMO susceptibility in Mexican patients. We thus used genome-wide microarray data, HLA high-resolution typing and AQP4 gene sequencing data to explore genetic ancestry and to seek genetic variants conferring susceptibility to NMO in Mexican patients.
Population admixture analysis. Reference Native American (NAT) and continental populations were used to generate a multidimensional scaling (MDS) plot. Figure 1. The left panel shows that components 1 and 2 distinguish Africans (AFR) and Europeans (EUR) from NAT individuals. The average genome-wide proportion of Native American ancestry was significantly higher in NMO patients than in controls (68.1% vs 58.6%; P = 5 × 10 -6 ). Conversely, the average genome-wide proportion of European ancestry was lower in cases than in controls (28.3% vs 37.3%; P = 3 × 10 -6 ), while African ancestry was similar in both groups (3.6% vs 4.1%; P = 0.143) (Fig. 1, right panel).

AQP4 sequencing.
After sequencing all AQP4 exons including exon-intron boundaries, we identified 35 SNPs in samples from NMO or MS Mexican patients. No novel variants were found. Thirteen of these variants were found in the 3′UTR region, one in the 5′UTR region, and three were synonymous variants. Although some variants were more frequent in NMO than in MS patients, the differences were not statistically significant. Supplementary Table 10, compares the alternative allele frequency of these 35 SNPs in NMO and MS patients, 1,000 Genomes continental populations and in 12 Native Mexican whole genome sequences 35 .

Discussion
While several studies have stated that the epidemiology of NMO differs from that of MS, being more frequent in non-European populations 9-11 , reliable comparisons among studies are difficult to establish. Epidemiological data suggest that ethnicity influences NMO prevalence, particularly in Latin America 10 . In Mexico, there is only one non-population based study estimating NMO prevalence in a referral center located in Mexico City 12 . To our knowledge, this is the first study in the Mexican admixed population where a genome-wide analysis revealed a higher proportion of Native American ancestry in NMO cases as compared to controls. This contrasts with the higher European genetic component previously observed in the Mexican patients with MS 36 . The NAT ancestry estimated in our control group recruited in Mexico City (central Mexico) is consistent with previous NAT ancestry estimations in the Mexican Mestizo population (~ 55%), known to gradually decline from South to North throughout the Mexican territory 37,38 . Unfortunately, NMO epidemiological studies in Mexico are scarce, and there is no information on the geographical distribution or ethnicity of NMO patients in Mexico.
In our GWAS analysis, six SNPs within the MHC region were associated with NMO with genome-wide significance. Both linkage disequilibrium and the conditional association analyses suggest that this association is driven by a single signal, led by two SNPs in perfect linkage disequilibrium (rs9272219 and rs9273012). There is only www.nature.com/scientificreports/  Table 4 inferred as Native American (NAT), European (EUR) or African (AFR). Local ancestry was estimated using RFMIX using triophased populations as reference. www.nature.com/scientificreports/ one previous report of a GWAS for NMO in individuals of European ancestry 16 , where two independent signals in the same MHC region were significantly associated with NMO: rs28383224, which is 18.6 kb downstream and rs1150757, which is 573.1 kb downstream the lead SNP found in the present study (rs9272219). Although rs28383224 and rs1150757 genotypes were not available in our analysis, both NMO GWAS share data on 3 of the 6 SNPs associated with NMO in the Mexican cohort (rs9368726, rs9405108 and rs9271588). These 3 SNPs were also associated with increased NMO risk in the European cohort, although with slightly lower odds ratio values. It is important to point out that we found no SNPs in proximity of rs1150757 associated with NMO in the Mexican population. Class II HLA alleles (HLA-DQB1*03:01, -DRB1*08:02, -DRB1*16:02, -DRB1*14:06 and -DQB1*04:02) and class II haplotypes (HLA-DRB1*16:02-DQB1*03:01, -DRB1*08:02-DQB1*04:02 and -DRB1*14:06-DQB1*03:01) showed the most significant associations with increased NMO risk in the present Mexican cohort, while HLA-DQB1*03:02 and -DQB1*02:02 alleles were significantly associated with decreased NMO risk. The HLA-DRB1*16:02 allele has also been associated with NMO in Southern Han Chinese and Japanese populations, and more recently in Southern Brazilians [21][22][23] . A very recent meta-analysis showed that the HLA-DRB1*16:02 allele was strongly associated with autoimmune diseases predominantly mediated by autoantibodies 5 . The frequency of this allele varies across the world but it is highest in Native populations of America (~ 39%), is also frequent in populations from Oceania (~ 28%) and South-East Asia (~ 28%), but is relatively low in Europe (~ 6%) and Africa (~ 4%) 39 . Furthermore, haplotype HLA-DRB1*16:02-DQB1*03:01 is very frequent in Native American populations from the Southern state of Oaxaca (Mixe, Mixtec and Zapotec) and Xavantes from Central Brazil, but is very rare in other continental populations [39][40][41] . Recently, haplotype HLA-DRB1*16-DQB1*03:01 was also associated with Parry-Romberg syndrome, an autoimmune disease affecting the craniofacial nerve in Mexican patients 42 .
The HLA-DRB1*03:01 allele has been consistently associated with NMO in European populations, and admixed populations with important contribution of the European gene pool (Brazilian mulatto, Afro-Caribbean and a small Mexican mestizo cohort) [13][14][15][16][17][18][19][20] . The frequency of HLA-DRB1*03:01 is as high as 20% in European, North African, Western Asian populations, but ranges from only 0 to 2% in Native Mexican populations 39 . Although in the present study the frequency of this allele was two-fold higher in cases as compared to controls (7.04% vs 4.12%, Supplementary Table 4), the difference did not reach statistical significance probably due to low statistical power derived from the small sample and effect sizes. Inconsistencies are not uncommon in genetic association studies and show the complexity of the genetic ancestry contribution in admixed populations. Interestingly, in the European GWAS, the HLA-DRB1*03:01 allele was imputed and found to be associated with AQP4-IgG-seropositive NMO but not with AQP4-IgG-seronegative NMO, and showed a high correlation with rs1150757 (r 2 = 0.7) but a poor correlation with rs28383224 (r 2 = 0.2) 16 . In the present study, the HLA-DRB1*03:01 allele was not associated with NMO, nor with any SNP in proximity of rs1150757.
HLA alleles previously associated with NMO in populations with European and/or Native American ancestry (HLA-DRB1*03:01 13-20 , -DRB1*16:02 21-23 and -DQB1*04:02 13,24 ) were in strong LD with the rs9272219 "A" risk allele. In contrast, two alleles significantly associated with decreased NMO risk in the present study (HLA-DRB1*04:07 and -DQB1*03:02) were in strong LD with the rs9272219 "C" allele. Whether the latter are in fact NMO protective alleles needs to be confirmed in independent cohorts. Moreover, no previously reported NMO risk alleles were found in individuals with the rs9272219 "C" allele.
Notably, local ancestry analyses revealed that all HLA alleles most associated with NMO risk and protection in the present study were predominantly inferred as of Native American ancestry. This is consistent with our finding of a higher proportion of NAT ancestry in NMO cases as compared to controls, and with epidemiological data suggesting that NMO is more prevalent in non-European populations [9][10][11] . To our knowledge there is only one previous study analyzing local ancestry of demyelinating diseases, where HLA alleles DRB1*16:02 and DRB1*14:02 were inferred as of Native American ancestry in Hispanics 43 , also in consistency with our local ancestry findings. As expected, the well-known HLA-DRB1*03:01 NMO risk allele was predominantly inferred as of European ancestry. Altogether, our SNP and HLA analyses suggest that a group of HLA alleles predominantly of Native American ancestry are associated with NMO susceptibility in the admixed Mexican population.
A limited number of studies have analyzed the role of AQP4 variants in the pathogenesis of NMO in USA 29 , Chinese 30-32 , Japanese 33 and Spanish 34 populations, with inconclusive results. We sequenced AQP4 coding regions in Mexican patients with NMO and MS, however no novel or missense variants were identified. Interestingly, four 3′UTR variants (rs7240333, rs14393, rs1058424 and rs3763043) were more frequent in NMO as compared to MS patients, although the differences were not significant. Two of these variants (rs1058424 and rs3763043) showed a weak but significant association with NMO in the Han Chinese population 31 . The highest frequencies of these four 3′UTR polymorphisms have been found in Native Mexicans (29.2%, 79.2%, 50% and 79.2%, respectively) 35 .
Some limitations of the study must be pointed out. Firstly, because no medical information was obtained from the control group (CANDELA project participants from Mexico), misclassification bias could potentially affect the statistical power of the study. Controls lacking medical information have been previously used in other GWAS studies, as the effect on statistical power is expected to be modest unless the extent of this bias is substantial 44 . In the present study, it is unlikely albeit possible, that a low number of control participants were affected with NMO or could eventually develop NMO in the future. However, because NMO prevalence is very low, this bias is expected to be small. In addition, because of the possibility of spurious associations, the novel HLA associations here identified should be interpreted with caution and be confirmed in further studies including Mexican and other Latin American populations.
To our knowledge, this is the first study to examine the genetic ancestry of NMO patients supporting the notion that Native American ancestry significantly contributes to Neuromyelitis optica susceptibility in the admixed Mexican population. This finding is consistent with differences in the prevalence of NMO in populations of Mexico and Latin America, and contrasts with the epidemiology and genetics of multiple sclerosis 36 Genotyping. Eighty-three NMO samples were genotyped using the Illumina HumanOmniExpress array (~ 700,000 SNPs) and 36 using Illumina expanded Multi-Ethnic Genome Array (~ 1,700,000 SNPs), at the INMEGEN. Controls had been previously genotyped on HumanOmniExpress array as part of the CANDELA Consortium study 48 . One of the CANDELA-Mexico controls was also genotyped at INMEGEN for quality control purposes, and microarray data concordance was 99.8%.
High-resolution typing of the HLA region. HLA class I (A, B and C) and class II (DRB1 and DQB1) genes were typed by direct sequencing (sequence-based typing, SBT 50 ) in a total of 71 NMO samples and 97 controls. Genotypes were called using Applied Biosystems analysis software (Foster City, CA, USA) and the IMGT/ HLA database alignment tool 51 . Ambiguities were resolved using previously validated group-specific sequencing primers (GSSP) 50 .
AQP4 sequencing. AQP4 coding and UTR regions were sequenced on a Illumina MiSeq system. Primers were designed manually to span the regions of interest. Quality control of raw sequences was conducted using FastQC 52 and the Trimmomatic 53 algorithm was used to remove adapter sequences and trim short and low-quality end-read sequences. By using Bowtie2 54 and SAMTools 55 , we cleaned the sequence reads and then aligned them to the human reference genome (hg19) and variant calling and annotations.
Statistical analysis. Genome-wide screening. Quality control (QC) of the genotype data was carried out in PLINK 56 . SNPs and individuals were removed from the analysis based on minor allele frequency < 5%, call rate < 95%, deviation from the Hardy-Weinberg equilibrium (HWE) at P < 1 × 10 -5 and genotyping efficiency < 90%. Pairwise identity by descent (IBD) estimates were used to identify related individuals. No discordant sex information was found. After quality control, the data set comprised the genotypes of 119 NMO patients and 1,208 controls for 252,805 SNPs.
Ancestry analyses were carried out using ADMIXTURE 57 assuming three parental populations: EUR and AFR from the 1,000 Genomes project, and NAT genotypes of unrelated individuals, i.e. only parents of the NAT trios were included. The total number of autosomal SNPs common to the five populations (NMO patients, CANDELA controls, EUR, AFR and NAT) was 197,323. For each individual, the proportion of European, African and Native American ancestry was estimated at the genomic level. The significance of differences between ancestry proportions in NMO patients as compared to controls was determined using a T-test statistic. Local ancestry was determined through a random forest procedure using RFMix with 5 expectation-maximization (EM) iterations and a minimum of 6 reference haplotypes per tree node 58 . EUR and AFR from 1,000 Genomes project, along with trio phased NAT genotypes were used as reference populations for local ancestry. Haplotype phasing of NAT, NMO and control subjects was performed with Beagle 59 .
A genome-wide case-control association study was conducted using logistic regression models, adjusting for sex and two principal components, assuming an additive effect using PLINK 56 .
HLA region analysis. Allele and haplotype frequencies were obtained by direct counting, and haplotype blocks were built based on previous reports. Allele frequencies for HLA-A, -B, -C, -DRB1 and -DQB1 were compared between NMO patients and the control group. Maximum-likelihood haplotype frequencies for two-point, threepoint and four-point associations were estimated in each group using an EM algorithm implemented in Arlequin v3.1 60 . Linkage disequilibrium (LD; Δ and Δ′) and HWE were also calculated using Arlequin. Class II HLA haplotype were stratified by SNP rs9272219 and the significance of the differences between NMO patients and CANDELA controls was determined using a Fisher's exact test.

Data availability
The dataset for the NMO patients generated and/or analyzed during the current study are available from the corresponding authors on reasonable request. Access to the CANDELA dataset used in this manuscript was