The influence of population stratification on genetic markers associated with type 1 diabetes

Ethnic admixtures may interfere with the definition of type 1 diabetes (T1D) risk determinants. The role of HLA, PTPN22, INS-VNTR, and CTLA4 in T1D predisposition was analyzed in Brazilian T1D patients (n = 915), with 81.7% self-reporting as white and 789 controls (65.6% white). The results were corrected for population stratification by genotyping 93 ancestry informative markers (AIMs) (BeadXpress platform). Ancestry composition and structural association were characterized using Structure 2.3 and STRAT. Ethnic diversity resulted in T1D determinants that were partially discordant from those reported in Caucasians and Africans. The greatest contributor to T1D was the HLA-DR3/DR4 genotype (OR = 16.5) in 23.9% of the patients, followed by -DR3/DR3 (OR = 8.9) in 8.7%, -DR4/DR4 (OR = 4.7) in 6.0% and -DR3/DR9 (OR = 4.9) in 2.6%. Correction by ancestry also confirmed that the DRB1*09-DQB1*0202 haplotype conferred susceptibility, whereas the DRB1*07-DQB1*0202 and DRB1*11-DQB1*0602 haplotypes were protective, which is similar to reports in African-American patients. By contrast, the DRB1*07-DQB1*0201 haplotype was protective in our population and in Europeans, despite conferring susceptibility to Africans. The DRB1*10-DQB1*0501 haplotype was only protective in the Brazilian population. Predisposition to T1D conferred by PTPN22 and INS-VNTR and protection against T1D conferred by the DRB1*16 allele were confirmed. Correcting for population structure is important to clarify the particular genetic variants that confer susceptibility/protection for T1D in populations with ethnic admixtures.

Type 1 diabetes mellitus (T1D), which results from the autoimmune destruction of pancreatic β (beta) cells, is a polygenic disease that is influenced by both genetic and environmental contributing factors 1 . More than 60 loci are involved in susceptibility to T1D. The Major Histocompatibility Complex (IDDM1 locus) has been identified as a major determinant for genetic susceptibility to autoimmune diabetes, with the Human Leukocyte Antigens (HLA)-DR and -DQ alleles providing 40%-50% of the risk. In several countries, increased susceptibility is also conferred by differences in the variable number of tandem repeats at the 5′ -end of the insulin gene (INS-VNTR) and polymorphisms in immune response genes, including protein tyrosine phosphatase non-receptor type 22 (PTPN22) and cytotoxic T-lymphocyte-associated protein 4 (CTLA4) 1 .
T1D is more frequent in white populations, but its incidence varies in Caucasians from different countries and within the same country 2 . In Brazil, the estimated frequency of risk polymorphisms for T1D showed inter-regional differences and were usually lower than those in populations referred to as Caucasians in other countries [3][4][5] , reinforcing the need to expand genetic predisposition studies in the Brazilian population to clarify causal variants and related ancestry. The Brazilian population was formed by a strong admixture from three different ancestral roots, i.e., Amerindians, Europeans, and Africans, thereby hindering ethnic identification (or genetic ancestry) based predominantly on skin color 6 . Another obstacle is population stratification, which arises from ethnic admixture. Subgroups that are ancestrally distinct tend to have significant representation of different genetic ancestries, which suggests that a combination of over-represented alleles in patients relative to controls

Statistical Analysis
The variable distributions were verified by the Kolmogorov-Smirnov test. Qualitative variables were compared using the chi-square test or the Fisher's exact test, with Woolf corrections when necessary. Numerical variables with parametric and non-parametric distributions were analyzed using Student's t and Mann-Whitney tests [8][9][10][11] . The Hardy-Weinberg equilibrium was calculated for all of the genotypes using the chi-square test. Bonferroni correction was applied for multiple tests.
The ancestral composition of the subjects in our sample was inferred using Structure 2.3 12,13 . Briefly, this program assumed the existence of K parental populations for the tested mixed population (where K was set equal to 3, based on the known European, African, and Amerindian composition of the Brazilian population) and grouped individuals with admixed proximity to each of the parental populations using Bayesian inference. Thus, the fraction of genomic contribution from European, African, and Amerindian populations was obtained for each individual.
After defining the ancestor-estimated contribution, the single nucleotide polymorphisms (SNPs) contained in the candidate genes were tested by structured association analysis using the Association Test Structured Population (STRAT) program. This program provides a statistical method for performing association mapping in structured populations assuming no stratification or conditioning for individual lines. In short, this approach allows a case-control study with correction for the biases introduced by population stratification in association studies 12 . A p-value ≤ 0.05 was considered statistically significant and was equivalent to a 95% confidence level.

Results
Population characteristics. The characteristics of the groups are shown in Table 1. Patients with T1D were younger than the controls (p < 0.0001), with a predominance of females (p < 0.0001) and a lower body mass index (BMI, p < 0.0001). White was the most prevalent self-reported skin color in both groups; however, the diabetes group had a greater frequency of white skin color (81.7% vs. 65.6%, p < 0.0001 and lower frequencies of brown (15.4% vs. 27.8%, p < 0.0001) and black (2.4% vs. 5.9%, p = 0.0003) skin color than did the controls.
The average patient age at diagnosis was 12.3 ± 8.4 years and diabetes duration of 12.4 ± 10.6 years. Fasting glucose and HbA1c levels were higher (p < 0.0001) and C-peptide concentrations were lower (p < 0.0001) in patients than in the control group (Table 1).

T1D patients n(%)915
Mean ± SD Controls n(%) 789 Mean ± SD P a Ancestral composition. The contribution of each of the three parental populations (European, African, and Amerindian) was obtained for both groups. T1D patients had, on average, 77% European, 15% African, and 7.3% Amerindian ancestries, whereas the healthy controls had, on average, 71% European, 21% African, and 7.9% Amerindian ancestries. The ancestral contributions are shown in Fig. 1.

Discussion
T1D is a complex autoimmune disease with a strong genetic component 1,2 that is mainly related to HLA genes. HLA genes are the most polymorphic genes in the human genome, which has resulted in a widely variable distribution of HLA alleles and haplotype combinations among populations. Wide variability is also observed in other T1D-predisposing genes, including CTLA4, INS-VNTR, and PTPN22, in several ethnic groups and is likely related to different genetic backgrounds and environmental factors 1,2 .
Previous studies conducted in Brazil with small sample sizes have shown inter-regional differences in the frequencies of the HLA-DR and -DQ alleles and of the other risk polymorphisms for T1D 3 . Furthermore, differences in populations referred to as Caucasian in other countries were also observed 14 .
Considering that these discrepancies may result from the heterogeneity of the highly mixed Brazilian population, which may introduce bias into the evaluation of causal disease markers, we expanded our studies to a larger cohort and included a characterization of genetic ancestry to clarify the causal variants and related ancestry. This clarification is important because previous studies 6 noted a weak correlation between skin color and ancestral genomes in Brazil.
The ancestral composition of our sample was determined using Structure after genotyping 89 AIMs. To determine whether the associations of the alleles, genotypes, and haplotypes with T1D remained even after correcting for the stratification bias in our population, association tests were performed using STRAT, which uses a statistical approach for association mapping candidate genes in structured populations 12 .
Analysis of the 89 AIMs confirmed the three major ancestral roots of the population in our study: European, African, and Amerindian. This result most likely reflects major milestones in our history, specifically the European colonization of Amerindians and African slaves. The highest percentage of European descent in both the T1D group and controls (77% and 71%, respectively) corroborates other reports 6 . The self-reported skin color of the patients in our study reinforces the poor correlation between color and ancestry genomics in Brazil as well as the need for ancestry characterization. The diabetes group had a higher frequency of self-reported whites (81.7% vs. 65.6%, p < 0.0001) and lower frequencies of browns (15.4% vs. 27.8%, p < 0.0001) and blacks (2.4% vs. 5.9%; p = 0.0003) than the controls.
Our results have few, but important, differences from those obtained for Europeans and African Americans. They suggested that the proportion of women among T1D patients (57.6% × 42.4%; p < 0.001) was higher than reported in the literature, which is in accordance with previous reports by our group 5 and by Gomes MB in a Brazilian multicenter study 15 . However, we cannot exclude some bias in the selection of patients as our study was not designed to estimate the disease prevalence. It is important to remember that there is no difference between genders in most series, except in populations such as those of Sardinia (Italy), Oxford (UK) and Santa Fe de Bogota (Colombia), in which the percentage of men with T1D was higher than that in women 2 .
Because the frequency of the HLA-DRB1, -DQB1, and INS-VNTR alleles, as well of the PTPN22 variants, was similar between genders (data not shown), the higher frequency of T1D women in our cohort may be due to other polymorphisms, such as the T allele in the CD226 rs763361 variant that was suggested by Mattana et al. 5 in the Brazilian cohort, or to environmental factors.  Table 4. HLA-DRB1/DRB1 genotypes distribution in patients with type 1 diabetes mellitus and controls. Association with Type 1 diabetes before and after correction for population stratification T1D = type 1 diabetes mellitus; n = number of individuals; OR = odds ratio; CI = confidence interval; p no-structured = level of significance before strat analysis; p structured = level of significance after strat analysis. P required for statistical significance after a Bonferroni correction for multiple tests − < 0.005.
Scientific RepoRts | 7:43513 | DOI: 10.1038/srep43513 The age at diagnosis of the T1D group was similar to that of Caucasians. The low frequency of pancreatic autoantibodies probably stems from the long duration of diabetes.
There was a trend toward protective association with T1D by the DQB1*0301 allele (OR = 0.72; p = 0.003, non-significant). In a meta-analysis, the same allele was causal in Sweden and United Kingdom populations, but protective in Finland, Hungary and Italy 16 . The -DRB1*08 allele also showed a protection trend (OR = 0.58;  Table 5. Distribution of the HLA -DRB1/DQB1 haplotypes in patients with type 1 diabetes mellitus and normal controls before and after correction for population stratification. T1D = type 1 diabetes mellitus; n = number of individuals; OR = odds ratio; CI = confidence interval; p no-structured = level of significance before strat analysis; p structured = level of significance after strat analysis. Thirty five haplotypes with total number in patients plus controls greater than 10 were included (0.4%). P required for statistical significance after Bonferroni correction for multiple tests < 0.0013. Rare alleles were included in others.
Despite carrying two risk alleles, the DR4/DR9 genotype did not conferred susceptibility, probably due to the small sample size, which consisted of only 12 patients and five controls. The absence of the HLA -DR3, -DR4, and -DR9 alleles demonstrated lower risk to T1D (OR = 0.1) and occurred in 58.7% of the control population and only 12.5% of patients.
Similar data were observed for the non-HLA-loci. The INS VNTR I/I genotype was more prevalent in T1D patients (60.7%) than in the control population (32.2%), yielding a relative risk of 3.2 for T1D. A study conducted in São Paulo by Hauache et al. 18 with fewer patients noted a greater frequency of class I alleles, comprising 83.1% of diabetics and 69.3% of controls (OR = 1.98), which is closer to the results observed in Caucasian populations. The frequency of these alleles seems to differ between racial groups. Undlien et al. 19 demonstrated that class I alleles confer susceptibility to T1D in the Caucasian population, but not in black and Japanese populations.
The CT + TT genotypes in PTPN22 1858C/T were more prevalent in T1D patients (19%) than in controls (10.6%; p < 0.0001), conferring susceptibility to the disease (OR = 1.97). This risk was significant only in self-reported white individuals, possibly because the T allele is very rare in African-American and Asian populations 20 . In accordance with this result, risk genotypes were present in 26.8% to 42.1% of Caucasian patients (from the USA and Finland) and 16.5% to 25.3% of controls 20 .
Some degree of deviation from Hardy-Weinberg equilibrium is expected in structured populations. For this reason, Hardy-Weinberg deviation analyses were performed for all alleles (HLA) and candidate polymorphisms from the entire sample. One variant of the CTLA4 gene (+ 49A/G) was not in Hardy-Weinberg equilibrium for the controls and was excluded from further analysis. The literature reports a predisposition in patients from Italy, the UK, and the US who carry the T allele but not in patients from Germany 21 . The T allele was present in 54.5% of patients.
In this study, we confirmed that our population is, in fact, the result of three major ancestral roots: Amerindian, European, and African. This result most likely reflects major milestones in Brazilian history, specifically the European colonization of Indigenous and African slaves. However, the majority of the patients shared a European genetic background, as previously described in a study that included various regions of the country 6 . In fact, European ancestry prevailed in both patients with T1D and the controls, followed by African and Amerindian ancestry.
The predisposition to T1D conferred by HLA-DR/DQ, PTPN22, and INS-VNTR was also confirmed in our study. The great ethnic diversity resulted in genetic determinants with intermediate frequencies between those of   Caucasians and Africans, contributing to risks and protection that were partially discordant from these groups and were likely related to the low/intermediary incidence of T1D in Brazil (8/100,000 per year) 22 .
The correction for population stratification increased the statistical power of our analysis and strengthened the results. Our results highlighted the association of HLA-DRB1*16 and haplotype -DRB1* 07-DQB1*0201 with protection and set DQB1*0501 as neutral.
Our research has a number of limitations. The majority of the population in this study lives in São Paulo. Although these data cannot be extrapolated to our country, it is very representative of Brazil because São Paulo is a cosmopolitan city to which people have converged from all regions of the country as well as from Europe and Asia, with immigrants in the last century. The groups were not homogeneous regarding the mean age, skin color, gender (as stated previously) and the low BMI, which is probably related to the inadequate metabolic control of the patients. The high-resolution genotyping was performed only for the high-risk DR3 and DR4 alleles, which are fundamental to identifying individuals who may benefit from monitoring and preventive treatments and encompasses most of our population. The subtyping of -DQA1 and other neutral or protective DR alleles is important for identifying alleles that provide very strong protection against T1D and individuals who will not progress to the disease. These result are missing in our study, as well the influence of unknown environmental factors. Not all patients were positive for pancreatic autoantibody, possibly due to the cross-sectional nature of the study (implying that not all of them underwent autoantibody determinations at diagnosis) and to the fact that a small percentage of T1D patients are autoantibody negative at the time of diagnosis 2 .
As the frequencies of the alleles referring to the 3 ethnic groups were a continuous, without cut-off point due to the great admixture of our population, it was not possible to define the ethnic-specific HLA alleles and haplotypes associated with T1D.
Predisposition to T1D conferred by HLA-DR/DQ, PTPN22, and INS-VNTR loci was confirmed in our study. The great ethnic diversity of the population in our study, with contributions from Europeans, Africans and Amerindians, resulted in genetic determinants for T1D with intermediate frequencies between those of Caucasians and Africans, which contributes to partial discordant risk and protection from these groups.

Methods
Casuistic. The cohort comprised 915 patients with T1D, aged 24.6 ± 13.0 years, and 789 volunteers, aged 28.5 ± 11.5 years, without family history of diabetes or any other autoimmune disease and with normal blood glucose and glycated hemoglobin levels. T1D diagnosis was based on the clinical symptoms of diabetes (weight loss, polyuria, polydipsia) or ketoacidosis at diagnosis, low C peptídeo levels (below the positive cutoff values), and the immediate need of permanent insulin therapy, according to ADA criteria 23 . More recently, the presence of at least one islet autoantibody. Autoantibody determinations started a few years ago. For this reason, many patients underwent this analysis after elapsing many years of diagnosis, explaining those autoantibody negative patients. None was obese nor had a relative with type 2 diabetes or maturity onset of diabetes of the youth (MODY).
The vast majority of patients were attended at the Clinical Hospital. A low percentage was referred by endocrinologists participants in the study. The medical records of patients always included the age at diagnosis and their clinical features. Patients who had any suspicion of non autoimmune diabetes were not included in the cohort.
Most of the cohort lived in São Paulo city. This study was conducted in accordance with ethical principles and following the guidelines contained in the Helsinki Declaration. Approval by the Ethics Committee of the Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo and informed consent was obtained from all subjects, parents or guardians were obtained before the research procedures were initiated.
Autoantibodies. Serum levels of the autoantibodies against glutamic acid decarboxylase (GAD65A) and tyrosine phosphatase (IA-2A) were determined by radioimmunoassay (RSR limited, UK; CV < 7%). The normal values for 700 healthy controls (considered 3 standard deviations, SD) were < 1.0 IU/mL and < 0.8 IU/mL for GAD65A and IA2A, respectively. The sensitivity of both assays was 0.2 IU/mL. Serum levels of the autoantibodies against Zinc transporter 8 (ZnT8A) were measured by ELISA (KR770-96; Kronus, USA; CV < 7%). This assay detects and quantifies autoantibodies specific to residues R325 and W325 or to non-specific variants of residue 325. The normal value of ZnT8A in 321 healthy controls was defined as ≤ 16 u/mL (considered 3 SD).

Molecular study.
Genomic DNA was isolated from fresh peripheral blood cells using a conventional salting out method 24 . The HLA-DRB1 and -DQB1 alleles were genotyped using the Micro SSP TM Allele Generic and Specific HLA class II DNA LabType SSO typing system (One-Lambda, INC., USA). Genotypes of the class I and