Brazilian urban population genetic structure reveals a high degree of admixture

Giolo, Suely R; Soler, Júlia M P; Greenway, Steven C; Almeida, Marcio A A; de Andrade, Mariza; Seidman, J G; Seidman, Christine E; Krieger, José E; Pereira, Alexandre C

doi:10.1038/ejhg.2011.144

Download PDF

Article
Published: 24 August 2011

Brazilian urban population genetic structure reveals a high degree of admixture

Suely R Giolo^1,2,
Júlia M P Soler³,
Steven C Greenway⁴,
Marcio A A Almeida¹,
Mariza de Andrade⁵,
J G Seidman⁴,
Christine E Seidman⁴,
José E Krieger¹ &
…
Alexandre C Pereira¹

European Journal of Human Genetics volume 20, pages 111–116 (2012)Cite this article

9029 Accesses
84 Citations
3 Altmetric
Metrics details

Abstract

Advances in genotyping technologies have contributed to a better understanding of human population genetic structure and improved the analysis of association studies. To analyze patterns of human genetic variation in Brazil, we used SNP data from 1129 individuals – 138 from the urban population of Sao Paulo, Brazil, and 991 from 11 populations of the HapMap Project. Principal components analysis was performed on the SNPs common to these populations, to identify the composition and the number of SNPs needed to capture the genetic variation of them. Both admixture and local ancestry inference were performed in individuals of the Brazilian sample. Individuals from the Brazilian sample fell between Europeans, Mexicans, and Africans. Brazilians are suggested to have the highest internal genetic variation of sampled populations. Our results indicate, as expected, that the Brazilian sample analyzed descend from Amerindians, African, and/or European ancestors, but intermarriage between individuals of different ethnic origin had an important role in generating the broad genetic variation observed in the present-day population. The data support the notion that the Brazilian population, due to its high degree of admixture, can provide a valuable resource for strategies aiming at using admixture as a tool for mapping complex traits in humans.

Population relationships based on 170 ancestry SNPs from the combined Kidd and Seldin panels

Article Open access 11 December 2019

Andrew J. Pakstis, William C. Speed, … Kenneth K. Kidd

The genetic structure of Norway

Article Open access 17 May 2021

Morten Mattingsdal, S. Sunna Ebenesersdóttir, … Eivind Hovig

High-resolution inference of genetic relationships among Jewish populations

Article 09 January 2020

Naama M. Kopelman, Lewi Stone, … Noah A. Rosenberg

Introduction

The advances in genotyping technologies have provided important and considerable insights regarding our views of human population structure. The knowledge of patterns of genetic variation within and among human populations have contributed to a better understanding of the relationship between genetics and ethnicity, as well as improved the design and analysis of case–control association studies. Although there are several studies that have investigated the genetic structure of non-Caucasian populations, including individuals of African, African Americans, Asian, and Native American ancestry, most studies have primarily focused on individuals of European ancestry.^{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} Therefore, coverage of the global human population remains incomplete with populations from South America being underrepresented in the databases of human genetic variation. Included in these understudied populations are individuals from Brazil, a country of almost 200 million people, which represents approximately 52% of the South American population and 3% of the world's population.

Historically, the Brazilian population always experienced large degrees of intermarriage between ethnic groups, and Brazilians are known to be heavily admixed with Amerindian, European, and African ancestries. In general, Brazilians trace their origins to the original Amerindians and two main sources of immigration: Africans and Europeans.^{13, 14} In the five geographical regions of Brazil (North, Northeast, Center–West, Southeast, and South), Northern Brazilians are mostly of Amerindian ancestry, with some African ancestry. Current inhabitants of Northeast and Center–West are mostly of African origin, although some individuals whose ancestors migrated from Southern Brazil can trace their roots to Europe. Southern and Southeastern Brazilians are mostly of European origin. However, individuals of African and Asian descent are also found in several localities of the Southeast. For decades, new immigrants, as well as migrants from other parts of Brazil, have flocked to Southeast Brazil where intermarriage between individuals of different ancestry is very common. The goals of the present work are to: (i) identify patterns of population structure among the Southeast Brazilian population enabling individuals from this region to be included in future studies of genetic variation, (ii) to identify marker panels that can effectively capture the variation revealed by dense genotyping from samples of the Southeast Brazilian population and samples from the 11 populations of the HapMap Project, Phase III, which include individuals of Asian, African, European, and Mexican ancestry, and (iii) assess global and local ancestry inferences of the Southeast Brazilian population.

Materials and methods

Datasets and preprocessing steps

Analysis was performed considering samples of the Southeast Brazilian population (BRZ), as well as samples from the following 11 populations of the HapMap database, Phase III: African ancestry in Southwest (ASW), Utah residents with Northern and Western European ancestry from the CEPH collection (CEU), Han Chinese in Beijing, China (CHB), Chinese in Metropolitan Denver, Colorado (CHD), Gujarati Indians in Houston, Texas from the western state of Gujarat in India (South Asia) (GIH), Japanese in Tokyo, Japan (JPT), Luhya in Webuye, Kenya (LWK), Mexican ancestry in Los Angeles, California (MEX), Masai in Kinyawa, Kenya (MKK), Tuscans in Italy (TSI), and Yoruba in Ibadan, Nigeria (YRI). International HapMap Project, Phase III is available at http://www.sanger.ac.uk/humgen/hapmap3.

The Southeast Brazilian population samples are from a study conducted with trios of individuals (mother, father, and son or daughter), whose children have a congenital heart disease and parents do not. All individuals are from the general urban population of Sao Paulo, the largest metropolitan area of the country. In the present analysis, we have only used data from those unrelated individuals (mothers and fathers). These individuals were enrolled in the current study at the Heart Institute of the University of Sao Paulo. Genotyping for these samples was performed using the Affymetrix SNP array 6.0 platform (Affymetrix, Santa Clara, CA, USA). All subjects gave verbal and written consent. The present protocol was approved by the University of Sao Paulo Medical School IRB (CAPPesq). Samples from the HapMap were genotyped using two platforms, Affymetrix SNP 6.0 and Illumina Human 1M arrays (Illumina, San Diego, CA, USA). More details from the HapMap populations are available from the HapMap Project webpage. Only unrelated individuals were considered in the present analysis. Only SNPs located on the autosomal chromosomes and successfully genotyped in all populations were used for this analysis.

SNPs that were not accurately assessed on the Affymetrix 6.0 array were excluded from the final analysis. That is, we removed, separately for each of the 12 populations, SNPs with more than 5% missing genotype, SNPs that were not in Hardy–Weinberg equilibrium (P≤10⁻⁴), and also those with a minor allele frequency less than or equal to 0.01. At the end of these steps, 365 116 autosomal SNPs, shared by all 12 population data sets and 1129 unrelated individuals representing the 11 HapMap populations (n=991) and the Brazilian population (n=138), remained.

Statistical analysis

We used Principal Components Analysis (PCA), a dimensionality reduction technique,^{1, 2} to analyze the data. For each population k, the data set consists of n_k unrelated subjects, where each subject has m biallelic SNPs common for all populations. Data for all 12 populations were then displayed in a matrix G of dimension m by n with n = ∑_{k = 1}¹² n_k. The values 0, 1, 2, or empty, correspond to the genotypic information assigned to each SNP.² After mean-centering and normalizing each row i of the matrix G, n eigenvalues and n corresponding eigenvectors (axes of variation) were calculated, using the covariance matrix of individuals ψ=G′G. Plots of the eigenvectors associated with the largest eigenvalues were then used to investigate the structure of the populations under analysis. PCA was run without the removal of outliers and without eliminating SNPs in linkage disequilibrium.

To investigate whether a smaller number of SNPs could effectively capture the variation revealed by the 365 116 common SNPs, we built three panels of markers. The first panel has 250 SNPs, consisting of the top 50 SNPs retained from each of the top five axes of variation. SNPs were ranked on the basis of their loading scores (in absolute value) obtained from the axes of variation. The second and third panels were obtained by retaining the top-ranked 100 and 150 SNPs from each of the same top five axes, respectively. As there were no common SNPs among those retained, the total number of SNPs left in each panel was 250, 500, and 750, respectively. The relationship between the different populations was also investigated by calculating the Fst statistic, a metric representation of the effect of population subdivision^{15, 16} for each pair of populations, using the SNPs in the three panels, and also the 365 116 common SNPs. Fst statistic is often expressed as the proportion of genetic diversity due to allele frequency differences among populations. A zero value implies that the two populations are interbreeding freely and a value of one that the two populations are completely separate.

For global ancestry analysis, we applied the model-based STRUCTURE program¹⁷ to estimate the admixture proportion for the BRZ samples. This was done by applying the STRUCTURE program to two different pooled data sets consisting of four reference populations each (CEU, YRI, MEX, and BRZ, for model 1) and (TSI, ASW, MEX, and BRZ, for model 2), without informing the program which samples were the reference samples. The reason for selecting model 2 was based on the smallest Fst values obtained between the BRZ and HapMap, Phase III samples of Caucasian and African origin. As seen in Figure 3, the performance of the first two PCs in each of the two different pooled data sets is similar. We allowed the program in such an unsupervised mode to infer the underlying ancestral populations, as well as the ancestral proportion for each subject. The number of ancestral populations K was fixed at 3, 4, 5, and 7. For a given K, we ran STRUCTURE 10 times with different random seeds (10 000 iterations for burn-in phase, and 10 000 iterations for Markov chain optimization and recorded L(K), the log likelihood of the data given K, from each run. We used the metric ΔK to find the optimal K, which is selected to have the largest ΔK value.¹⁸ The inferred number of ancestral populations for the pooled data was 3.

Analyses described above were carried out using the publicly available STRUCTURE,¹⁷ and EIGENSTRAT^{2, 7} software packages.

Results

Principal components analysis

PCA using the 12 populations showed pronounced patterns of genetic variation within and amongst the populations. To visualize these patterns graphically, we shall consider the top three axes of variation chosen on the basis of their eigenvalues (Figure 1).

The two and three most informative axes of variation, PC1 and PC2 (Figure 2a), and PC1, PC2, and PC3 (Figure 2b), can resolve the 11 populations available in the HapMap study. That is, despite some overlap, we observed that the individuals from the 11 HapMap populations were clearly separated by their different ancestries of origin (African, Asian, European, and Mexican). Asian populations were tightly clustered and distinct from the African and European populations. The Southeast Brazilian population formed a continuum between Europeans and Africans, with some overlap of the Mexican population. The continuum of genotypes observed in the Brazilian population is consistent with the high degree of intermarriage between individuals of the European and African descent.

F_st statistic results

The F_st statistic was calculated for all population pairs using the 365 116 common SNPs (Table 1). Small Fst values (0.001 to 0.008) were found for each pair of Asian populations (CHB, CHD, and JPT), indicating less pronounced genetic differences between these populations. Similarly, each pair of African populations (ASW, LWK, MKK, and YRI) is separated by low Fst scores. Greater Fst distances (0.128 to 0.168) were observed between Asian and African populations. Populations with European ancestry (CEU and TSI) are also separated by small Fst values (0.003). Three distinct clusters of ancestral populations (Asian, African, and European) are distinguished by Fst scores. MEX and GIH populations are closer to the European cluster than to the African cluster as measured by Fst distance. Fst scores confirm that the Southeast Brazilian population is close to both the European, African, and Mexican populations.

Table 1 F_ST statistics calculated between each pair of populations using all 365 116 common SNPs

Full size table

Ancestry informative markers

Small sets of ancestry informative markers (AIMs) that can provide substantial substructure information have been the focus of several studies.^{19, 20, 21} AIM sets consisting of 200 markers or less can map ancestral origin to Africa, Europe, or Asia. We considered three panels of markers. SNPs on each panel were selected on the basis of their loading scores obtained from a PCA performed on the covariance matrix of the SNPs. The first panel has 250 SNPs consisting of 50 SNPs with highest loading scores (in absolute value) on the top five axes of variation. The second and third panels retained 100 and 150 SNPs, respectively, of the top five axes of variation, and have 500 and 750 markers, respectively. Plots of the two first axes of variation (PC1 and PC2) were obtained by performing PCA for each of the three panels of SNPs (data not shown). The 250 SNP set reproduced the stratification observed with the entire 365 116 SNP set (Figure 2). The 500 and 750 SNP set produced results that were indistinguishable from the 250 SNP set. The chromosomal distribution of the 500 SNP set was uniform. Although the magnitude of the Fst values varied, the same pattern could be observed for all three panels of markers (Table 2). All three SNP marker panels captured the variation revealed by the entire >300 000 SNP set. Indeed, calculation of the pairwise Spearman correlation coefficient between the four Fst matrices yielded results always higher than 0.964.

Table 2 F_ST statistics calculated between each pair of populations using Panel 1 (A), Panel 2 (B), and Panel 3 (C)

Full size table

Global ancestry inference of the Brazilian population

Global ancestry inference of the studied samples was able to determine mean ancestries for Amerindian, African, and European. For such, we have first recalculated Eigenstrat principal components, using two different subsets of HapMap samples as ‘ancestral’ populations. In the first model, we have used the CEU, YRI, and MEX samples to represent, respectively, a Caucasian, African, and Amerindian ancestral population. In the second model, we used the TSI, ASW, and MEX samples to represent such populations. The reason for using the first model was because of the common use of these as ancestral populations in most of the earlier reports. In the second model, we have used the populations with smallest Fst pairwise differences with the BRZ sample. No significant differences between these two models were observed (Figure 3). Structural analysis, using the 100 most important SNPs from PC1 and PC2, from these two models is presented in Figure 4. In our sampled individuals from the Brazilian Southeast region, mean values were 0.15, 0.24, and 0.61, respectively, for Amerindian, African, and European ancestries for Model I markers, and 0.17, 0.27, and 0.56, respectively, for Amerindian, African, and European ancestries for Model II markers (Figure 4).

Discussion

We have compared the genotypic variation of 365 116 SNPs among 1129 unrelated individuals of five continents (Asia, Europe, Africa, and North and South America) to individuals from Southeast Brazil. We demonstrate that this population is a highly admixed population and quite distinct from other HapMap populations. Principle component analyses demonstrate extensive of intermarriage between individuals of African and European descent. This intermarriage occurred between 1500 and the present day reflecting about 20 generations of intermarriage. Thus, the genomes of Brazilian individuals consist of chromosomal segments of distinct ancestry with substantial European and African-related admixture. These findings will have important implications for the correct design and analytical planning of studies exploring complex traits in this population. We expect that the large degree of admixture observed in the Southeast Brazilian population can be exploited for the gene mapping of important disease loci.

The study cohort was collected in Southeast Brazil, in Sao Paulo state. Individuals of African, Amerindian, and perhaps Asian ancestries, may be underrepresented in this study, as individuals with European ancestry comprise a majority in this region. Thus, additional analyses using larger and random samples that can cover all five Brazilian regions might perhaps show an even more pronounced degree of genetic variation than the one suggested by our analysis. Whether the same degree of intermarriage will be observed in other parts of Brazil or other parts of Latin America will be addressed in future studies.

New dense genotyping data from other forthcoming Brazilian studies will determine whether the same pattern of extensive genetic admixture exists in other parts of Brazil.

References

Patterson N, Price AL, Reich D : Population structure and eigenanalysis. PLoS Genet 2006; 2: e190.
Article PubMed PubMed Central Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D : Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006; 38: 904–909.
Article CAS PubMed Google Scholar
Seldin MF, Shigeta R, Villoslada P et al: European population substructure: clustering of northern and southern populations. PLoS Genet 2006; 2: e143.
Article PubMed PubMed Central Google Scholar
Paschou P, Ziv E, Burchard EG et al: PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet 2007; 3: 1672–1686.
Article CAS PubMed Google Scholar
Heath SC, Gut IG, Brennan P et al: Investigation of the fine structure of European populations with applications to disease association studies. Eur J Hum Genet 2008; 16: 1413–1429.
Article CAS PubMed Google Scholar
Paschou P, Drineas P, Lewis J et al: Tracing sub-structure in the European American population with PCA-informative markers. PLoS Genet 2008; 4: e1000114.
Article PubMed PubMed Central Google Scholar
Price AL, Butler J, Patterson N et al: Discerning the ancestry of European Americans in genetic association studies. PLoS Genet 2008; 4: e236.
Article PubMed PubMed Central Google Scholar
Biswas S, Scheinfeldt LB, Akey JM : Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. Am J Hum Genet 2009; 84: 641–650.
Article CAS PubMed PubMed Central Google Scholar
Xing J, Watkins WS, Witherspoon DJ et al: Fine-scaled human genetic structure revealed by SNP microarrays. Genome Res 2009; 19: 815–825.
Article CAS PubMed PubMed Central Google Scholar
McEvoy BP, Montgomery GW, McRae AF et al: Geographical structure and differential natural selection among North European populations. Genome Res 2009; 19: 804–814.
Article CAS PubMed PubMed Central Google Scholar
Auton A, Bryc K, Boyko AR et al: Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res 2009; 19: 795–803.
Article CAS PubMed PubMed Central Google Scholar
Adeyemo A, Gerry N, Chen G et al: A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet 2009; 5: e1000564.
Article PubMed PubMed Central Google Scholar
Goncalves VF, Carvalho CM, Bortolini MC, Bydlowski SP, Pena SD : The phylogeography of African Brazilians. Hum Hered 2008; 65: 23–32.
Article PubMed Google Scholar
Suarez-Kurtz G : Pharmacogenomics in Admixed Populations. Landes Bioscience: Austin, 2007.
Book Google Scholar
Wright S : Genetical structure of populations. Nature 1950; 166: 247–249.
Article CAS PubMed Google Scholar
Duan S, Zhang W, Cox NJ, Dolan ME : FstSNP-HapMap3: a database of SNPs with high population differentiation for HapMap3. Bioinformation 2008; 3: 139–141.
Article PubMed PubMed Central Google Scholar
Falush D, Stephens M, Pritchard JK : Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003; 164: 1567–1587.
CAS PubMed PubMed Central Google Scholar
Wang Z, Hildesheim A, Wang SS et al: Genetic admixture and population substructure in Guanacaste Costa Rica. PLoS One 2010; 5: e13336.
Article PubMed PubMed Central Google Scholar
Yang N, Li H, Criswell LA et al: Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine. Hum Genet 2005; 118: 382–392.
Article PubMed Google Scholar
Kosoy R, Nassir R, Tian C et al: Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat 2009; 30: 69–78.
Article PubMed PubMed Central Google Scholar
Enoch MA, Shen PH, Xu K, Hodgkinson C, Goldman D : Using ancestry-informative markers to define populations and detect population stratification. J Psychopharmacol 2006; 20: 19–26.
Article PubMed Google Scholar

Download references

Acknowledgements

We thank the CNPq (Brazil, Grant 150653/2008–5) for partial financial support (SRG). This work was supported by FAPESP (Grant 2007/58150-7), and Hospital Samaritano, Sao Paulo.

Author information

Authors and Affiliations

Laboratory of Genetics and Molecular Cardiology, Heart Institute, Medical School of University of Sao Paulo, Sao Paulo, Brazil
Suely R Giolo, Marcio A A Almeida, José E Krieger & Alexandre C Pereira
Department of Statistics, Federal University of Parana, Curitiba, Brazil
Suely R Giolo
Department of Statistics, University of Sao Paulo, Sao Paulo, Brazil
Júlia M P Soler
Department of Genetics, Harvard Medical School, Boston, MA, USA
Steven C Greenway, J G Seidman & Christine E Seidman
Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Mariza de Andrade

Authors

Suely R Giolo
View author publications
You can also search for this author in PubMed Google Scholar
Júlia M P Soler
View author publications
You can also search for this author in PubMed Google Scholar
Steven C Greenway
View author publications
You can also search for this author in PubMed Google Scholar
Marcio A A Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Mariza de Andrade
View author publications
You can also search for this author in PubMed Google Scholar
J G Seidman
View author publications
You can also search for this author in PubMed Google Scholar
Christine E Seidman
View author publications
You can also search for this author in PubMed Google Scholar
José E Krieger
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre C Pereira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Suely R Giolo or Alexandre C Pereira.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Giolo, S., Soler, J., Greenway, S. et al. Brazilian urban population genetic structure reveals a high degree of admixture. Eur J Hum Genet 20, 111–116 (2012). https://doi.org/10.1038/ejhg.2011.144

Download citation

Received: 28 September 2010
Revised: 27 April 2011
Accepted: 24 May 2011
Published: 24 August 2011
Issue Date: January 2012
DOI: https://doi.org/10.1038/ejhg.2011.144

Keywords

This article is cited by

Black and non-black population: investigation of the difference in butyrylcholinesterase activity in a healthy population in Salvador, Bahia
- Jucelino Nery da Conceição Filho
- Iris Campos dos Santos
- Ana Leonor Pardo Campos Godoy
Irish Journal of Medical Science (1971 -) (2023)
Somatic targeted mutation profiling of colorectal cancer precursor lesions
- Wellington dos Santos
- Mariana Bisarro dos Reis
- Denise Peixoto Guimarães
BMC Medical Genomics (2022)
Association of Toll-like receptors polymorphisms with the risk of acute lymphoblastic leukemia in the Brazilian Amazon
- Lilyane Amorim Xabregas
- Fabíola Silva Alves Hanna
- Allyson Guimarães Costa
Scientific Reports (2022)
Genetic ancestry inferred from autosomal and Y chromosome markers and HLA genotypes in Type 1 Diabetes from an admixed Brazilian population
- Rossana Santiago de Sousa Azulay
- Luís Cristóvão Porto
- Marília Brito Gomes
Scientific Reports (2021)
Association between vitamin D plasma concentrations and VDR gene variants and the risk of premature birth
- Letícia Veríssimo Dutra
- Fernando Alves Affonso-Kaufman
- Bianca Bianco
BMC Pregnancy and Childbirth (2020)

Brazilian urban population genetic structure reveals a high degree of admixture

Abstract

Similar content being viewed by others

Population relationships based on 170 ancestry SNPs from the combined Kidd and Seldin panels

The genetic structure of Norway

High-resolution inference of genetic relationships among Jewish populations

Introduction