Article

European Journal of Human Genetics (2006) 14, 450–458. doi:10.1038/sj.ejhg.5201565; published online 25 January 2006

Identification of probable genotyping errors by consideration of haplotypes

Tim Becker1, Ruta Valentonyte2,3, Peter J P Croucher4,5, Konstantin Strauch6, Stefan Schreiber2,3, Jochen Hampe2,3 and Michael Knapp1

  1. 1Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn, Bonn, Germany
  2. 2Institute for Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany
  3. 3University Hospital Schleswig-Holstein Campus Kiel, Schittenhelmstr. 12, Kiel, Germany
  4. 4Institute for Medical Informatics and Statistics, Christian-Alb rechts- University, Kiel, Germany
  5. 5University Hospital Schleswig-Holstein Campus Kiel, Brunswikerstr. 10, Kiel, Germany
  6. 6Institute for Medical Biometry and Epidemiology, Philipps-University Marburg, Marburg, Germany

Correspondence: Dr M Knapp, Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn, Sigmund-Freud-Str. 25, D-53105 Bonn, Germany. Tel: +49 228 287 5810; Fax: +49 228 287 5854; E-mail: knapp@uni-bonn.de

Received 7 July 2005; Revised 6 October 2005; Accepted 24 November 2005; Published online 25 January 2006.

Top

Abstract

Undetected genotyping errors pose a problem in genetic epidemiological studies, as they may invalidate statistical analysis or reduce its power. Haplotype analysis requires an improved standard of the data, because a haplotype can be inferred correctly only if the genotypes of all its markers are correct. Here, we present a method that identifies probable genotyping errors in trio samples with the help of the estimated haplotype frequency distribution of the sample. If the likelihood of the most likely haplotype explanation depends strongly on just one genotype, in the sense that setting the genotype to be missing leads to a much more likely haplotype explanation, this genotype is considered as a potential genotyping error. We describe a method that systematically searches the whole data set for such potential errors. Based on the haplotype distribution of a real data set, we carry out a simulation study to estimate the sensitivity and specifity of the method. In addition, we apply our approach to the real data set itself. Potentially erroneous genotypes are re-determined via sequencing. The results of both the simulation study and of the application to the real data set show that a considerable proportion of true genotyping errors is detected and that the number of false-positive signals is acceptable. We conclude that it is indeed possible to identify probable genotyping errors by considering haplotypes. The method described here will be part of the next release of our FAMHAP software.

Keywords:

genotype error, haplotype, frequency estimation

Extra navigation

.

naturejobs

ADVERTISEMENT