Introduction

DNA profiling has provided key information in forensic criminal identifications of sexual assault and murder, parentage and sibling relationship testing, and war casualty investigations. There is also an increasing demand for DNA typing in clinical medicine, including cell line monitoring, laboratory contamination control, prenatal diagnosis, and donor and recipient matching in organ and bone marrow transplantations, and in compensation medicine involving victim identification in car, train, and aircraft accidents (Jack 1997).

The first national DNA database was introduced in the UK in 1995 and has since assisted in solving a large number of crimes by linking offenders with crime scenes, even in cases in which no suspects had been previously identified (Wrrett 1997). The subsequent creation of the US Federal Bureau of Investigation (FBI) Combined DNA Index System (CODIS) in 1998 has allowed national searches for criminals in all 50 US states (Hoyle 1998). By July 2001, there were 707,867 DNA profiles of offenders in the USA, and 28,711 stored profiles from unsolved crime scenes. Some 1,211 offenders have been identified in 30 states, and 667 unidentified crime scenes have been linked with other crimes. The remains of over 100 missing persons have also been identified following the attack on the World Trade Centre in 2001 (Wertz 2002).

In addition to the UK and USA, other developed countries including Australia, Austria, Belgium, Demark, Finland, France, Germany, Greece, Ireland, Italy, Japan, Netherlands, Norway, Portugal, Russia, Sweden, and Switzerland, are also in the process of compiling or have created national DNA databases (Lincoln 1997; Schneider 1997; Wrrett 1997). The European DNA Profiling Group was established in 1989 and has grown rapidly from the eight initial members to 20 members representing all states of the European Community and associated western European countries (Schneider 1997; Martin et al. 2001; Schneider and Martin 2001).

The Peoples Republic China (PR China), with a current population of 1,260 million, is facing the major challenge of creating a national DNA database to meet the rapidly growing demand for DNA forensic profiling. In 1999, we started the construction of a preliminary Chinese DNA Database by genotyping 13 short tandem repeats (STRs) by using the AmpFL STR Profiler Plus Kit and AmpFL STR Cofiler Kit (Applied Biosystems), plus the sex-specific locus Amelogenin. Here, we present a summary of the resultant genotyping data that will provide base-line information for the ongoing task of constructing a national DNA index system in PR China.

Subjects and methods

Subjects

Blood samples were collected from 2,211 Han Chinese (1,111 males and 1,100 females) in northeast PR China, all of whom voluntarily took part in the project. Ethical clearance for the sampling was obtained from the Bureau of Public Hygiene and Health, Liaoyang City Government, P.R. China.

Finger-prick blood samples from each individual were collected on two Whatman 3MM cards. One card was bar-coded and despatched for DNA analysis, and the other was stored at −80°C.

Genotyping

DNA was extracted by the Chelex extraction method (Walsh et al. 1991). The amplification of 13 autosomal STR loci, viz., TPOX (2p13), D3S1358 (3p), FGA (4q28), D5S818 (5qter), CSF1PO (5q33.3), D7S820 (7pter), D8S1179 (8pter), THO1 (11p15.5), vWA (12p13.3), D13S317 (13q22), D16S539 (16q23.1), D18S51 (18q21.33), and D21S11 (21q21), together with the Amelogenin locus (Xp22.31–22.1 and Yp12.1) for sex testing, was performed on a GeneAmp PCR System 9600 (Applied Biosystems) by using the AmpFL STR Profiler Plus Kit and AmpFL STR Colifer Kit (Applied Biosystems) according to manufacturer's recommendations (Perkin-Elmer 1998). Separation of the subsequent PCR products was undertaken by capillary electrophoresis on an ABI Prism 310 DNA Analyzer (Applied Biosystems). The Genotyper program (Genotyper 2.1) was used to detect and analyze PCR products by reference to allelic ladders (Applied Biosystems). The nomenclature system for allele designation was based on the number of repeat units contained in each allele according to the DNA Commission of ISFH (1994) regarding PCR-based polymorphism in STR systems.

Statistical analysis

Basic statistical computations including allele frequency, observed heterozygosity, pairwise independence of genotypic frequencies for each combination of loci, and Hardy-Weinberg equilibrium (HWE) tests were performed by using the GDA program (Lewis and Zaykin 2000). An exact test was used to assess the significance of deviation from HWE (Guo and Thompson 1992). The power of discrimination (DP), average power of paternity exclusion (PE) per locus and for the combined loci, and polymorphism information content (PIC), were computed according to established methods (Botstein et al. 1980; Odelberg and White 1990; Edwards et al. 1992; Weir 1996).

Results

Allelic distributions

As shown in Table 1, a total of 161 alleles were detected at the 13 autosomal loci, ranging from seven alleles at TH01 to 26 alleles at FGA. An average of 12.4 alleles per locus was recorded among the 4,422 chromosomes genotyped. The shortest allele was allele-11 (110 bp) at D3S1358, and the longest allele was allele-15 (317 bp) at the CSF1PO locus. Rare alleles were found at FGA (alleles 15 and 16), D5S818 (allele 17), D8S1179 (allele 18), D18S51 (alleles 6, 24, 25, 26, 27), and D21S11 (alleles 23.2, 27.2, 28.2). With the exception of loci vWA, TH01, D13S317, and D16S539 (P<0.05; Fig. 1), the majority of loci showed comparable allelic distributions to African-American and US Caucasian data (P>0.05; Perkin-Elmer 1998).

Table 1. Allelic frequencies of the 13 autosomal STR loci in 4,422 chromosomes
Fig. 1.
figure 1

Comparative distribution of TH01, vWA, D13S317, and D16S539 by ethnic studies

Tests of HWE and genotypic disequilibria

Of the 13 loci investigated, only locus D8S1179 deviated significantly from HWE (0.01<P<0.05) in the Han Chinese (n=2,211). The results of the genotypic equilibrium tests are shown in Table 2. Significant dependence in pairwise genotypic frequencies were found at nine pairs of loci in the Han Chinese: TPOX-D7S820, D8S1179-D3S1358, D8S1179-FGA, D8S1179-D5S818, vWA-FGA, D16S539-D7S820, D18S51-FGA, D21S11-FGA, D21S11-D8S1179.

Table 2. Loci displaying genotypic disequilibrium in Han Chinese

Calculation of forensic statistics

The forensic statistics are summarized in Table 3. Of the 13 autosomal loci, locus D18S51 showed the highest values of DP (0.96), PIC (0.84), and PE (0.72). The accumulated DP and PE were 0.999999999 and 0.9999888, respectively (Fig. 2), giving a match probability for the population of 5.5×10-15.

Table 3. Forensic statistics for the Chinese study population across the 13 STR loci (DP power of discrimination, PE average power of paternity exclusion per locus, PIC polymorphism information content)
Fig. 2.
figure 2

Plot of the cumulative power of exclusion (PE) and discrimination power (DP) for the Han Chinese population, by number of loci tested

The X-Y homologous gene Amelogenin for sex determination was not included in the all calculations, because of its different inheritance pattern from autosomes. Amelogenin itself gives 50% exclusion power regarding questions concerning the gender of the suspected criminal.

Discussion

As the frequency of specific STR alleles can vary according to the ethnic origin of the individuals sampled, separate DNA databases may need to be constructed from the results obtained from different major populations (Fig. 1). Studies have also indicated the potentially confounding effects of intra-community and consanguineous marriage on STR profiling in various ethnic groups (Weir 1994; Balding and Nichols 1995; Wang et al. 2000, 2003; Black et al. 2001; Zhivotovsky et al. 2001; http://www.consang.net/). To meet such a challenge in PR China, frequency profiling of the forensic loci from each of the 56 officially recognized ethnic populations (Chinese Family Planning Commission 1998) is required. In addition, the possible effect of geographic differentiation needs to be investigated, particularly in the majority Han Chinese community, which accounts for over 92% of the total population of 1,260 million, and which is dispersed across an area of 9.6 million square kilometres. In the present context, it is possible that the genotypic disequilibrium observed at some of the loci on various chromosomes (Table 2) may have resulted from admixture of genetically substructured Han (Black et al. 2001; Wang et al. 2003).

The allele distributions at each autosomal locus were compared with data from African-Americans and US Caucasians and differed only at TH01, vWA, D13S317, and D16S539 (Fig. 1). A series of previously unreported alleles and rare alleles, which serve to increase the range of genetic variation, were identified in the Han Chinese population (Table 1, Fig. 1). The inclusion of a number of these rare alleles into the ladder marker system and the establishment of a specific Chinese DNA database at such loci could significantly improve the match probabilities in forensic examinations in PR China.

In practice, genetic markers used in paternity testing are required to provide a cumulative power of exclusion of >99%. Figure 2 summarizes the cumulative powers of discrimination and exclusion for the 13 forensic loci. After genotyping approximately six markers, the forensic kits reached a power of exclusion greater than the required 99% minimum limit. The 13 loci thus provide a powerful battery of DNA markers that are appropriate for use in paternity testing and individual identification in Chinese populations.