Article

European Journal of Human Genetics (2008) 16, 1142–1150; doi:10.1038/ejhg.2008.77; published online 9 April 2008

Evaluation of HapMap data in six populations of European descent

Per E Lundmark1, Ulrika Liljedahl1, Dorret I Boomsma2, Heikki Mannila3, Nicholas G Martin4, Aarno Palotie5, Leena Peltonen6,7, Markus Perola6,7, Tim D Spector8 and Ann-Christine Syvänen1

  1. 1Molecular Medicine, Department of Medical Sciences, Uppsala University Hospital, Uppsala University, Uppsala, Sweden
  2. 2Department of Biological Psychology, Vrije Universiteit, Amsterdam, The Netherlands
  3. 3HIIT Basic Research Unit, Department of Computer Science, University of Helsinki, Helsinki, Finland
  4. 4Genetic Epidemiology Unit, Queensland Institute of Medical Research, Brisbane, Australia
  5. 5Finnish Genome Center, University of Helsinki, Helsinki, Finland
  6. 6Department of Molecular Medicine, National Public Health Institute, Helsinki, Finland
  7. 7Department of Medical Genetics, University of Helsinki, Helsinki, Finland
  8. 8Twin Research and Genetic Epidemiology Unit, St Thomas’ Hospital Campus, Kings College London School of Medicine, London, UK

Correspondence: Professor A-C Syvänen, Molecular Medicine, Department of Medical Sciences, Uppsala University Hospital, Entrance 70, 3rd floor, Research Department 2, Uppsala S-751 85, Sweden. Tel: +46 18 611 29 59; Fax: +46 18 55 36 01; E-mail: ann-christine.syvanen@medsci.uu.se

Received 2 October 2007; Revised 6 March 2008; Accepted 6 March 2008; Published online 9 April 2008.

Top

Abstract

We studied how well the European CEU samples used in the Haplotype Mapping Project (HapMap) represent five European populations by analyzing nuclear family samples from the Swedish, Finnish, Dutch, British and Australian (European ancestry) populations. The number of samples from each population (about 30 parent-offspring trios) was similar to that in the HapMap sample sets. A panel of 186 single nucleotide polymorphisms (SNPs) distributed over the 1.5Mb region of the GRID2 gene on chromosome 4 was genotyped. The genotype data were compared pair-wise between the HapMap sample and the other population samples. Principal component analysis (PCA) was used to cluster the data from different populations with respect to allele frequencies and to define the markers responsible for observed variance. The only sample with detectable differences in allele frequencies was that from Kuusamo, Finland. This sample also separated from the others, including the other Finnish sample, in the PCA analysis. A set of tagSNPs was defined based on the HapMap data and applied to the samples. The tagSNPs were found to capture the genetic variation in the analyzed region at r2>0.8 at levels ranging from 95% in the Kuusamo sample to 87% in the Australian sample. To capture the maximal genetic variation in the region, the Kuusamo, HapMap and Australian samples required 58, 63 and 73 native tagSNPs, respectively. The HapMap CEU sample represents the European samples well for tagSNP selection, with some caution regarding estimation of allele frequencies in the Finnish Kuusamo sample, and a slight reduction in tagging efficiency in the Australian sample.

Keywords:

linkage disequilibrium, tagSNPs, haplotype structure, GRID2, principal component analysis