RETRACTED ARTICLE: Uncertainty in estimating the number of contributors from simulated DNA mixture profiles, with and without allele dropout, from Chinese, Malay, Indian, and Caucasian ethnic populations

Determining the number of contributors (NOC) accurately in a forensic DNA mixture profile can be challenging. To address this issue, there have been various studies that examined the uncertainty in estimating the NOC in a DNA mixture profile. However, the focus of these studies lies primarily on dominant populations residing within Europe and North America. Thus, there is limited representation of Asian populations in these studies. Further, the effects of allele dropout on the NOC estimation has not been explored. As such, this study assesses the uncertainty of NOC in simulated DNA mixture profiles of Chinese, Malay, and Indian populations, which are the predominant ethnic populations in Asia. The Caucasian ethnic population was also included to provide a basis of comparison with other similar studies. Our results showed that without considering allele dropout, the NOC from DNA mixture profiles derived from up to four contributors of the same ethnic population could be estimated with confidence in the Chinese, Malay, Indian and Caucasian populations. The same results can be observed on DNA mixture profiles originating from a combination of differing ethnic populations. The inclusion of an overall 30% allele dropout rate increased the probability (risk) of underestimating the NOC in a DNA mixture profile; even a 3-person DNA mixture profile has a > 99% risk of underestimating the NOC as two or fewer contributors. However, such risks could be mitigated when the highly polymorphic SE33 locus was included in the dataset. Lastly there was a negligible level of risk in misinterpreting the NOC in a mixture profile as deriving from a single source profile. In summary, our studies showcased novel results representative of the Chinese, Malay, and Indian ethnic populations when examining the uncertainty in NOC estimation in a DNA mixture profile. Our results would be useful in the estimation of NOC in a DNA mixture profile in the Asian context.


R E T R
www.nature.com/scientificreports/ The forensic DNA laboratory in Singapore routinely processes 'touch DNA samples' which would give rise to 'low-level' incomplete (also known as partial) DNA mixture profiles. As Singapore is a cosmopolitan city in Asia, this study seeks to evaluate the uncertainties in estimating the number of contributors in DNA mixtures which can arise from individuals of different Asian ethnic origins, in particular the Chinese, Malay and Indian populations. An additional novel element of this study involved taking into consideration allele dropout and its impact on estimation of NOC.
The process of interpreting a DNA mixture profile usually requires an analyst to ascertain the number of contributors (NOC) upfront 2,3 . However, this can be complicated by various factors that affect the composition of alleles that may be present or absent in a mixed DNA profile. Firstly, the alleles in a mixed DNA profile may be shared by different individuals-a phenomenon known as stacking. Secondly, some alleles from contributors may be absent or "drop-out" when DNA is degraded or present in low amounts. Lastly, alleles from low amounts of exogenous sources of DNA may also be present in the sample, resulting in a "drop-in" of alleles. This process is exacerbated by increasing sensitivity in PCR amplification kits and detection methods, which increases the risk of allele drop-in. And as the number of contributors in a DNA profile increases, it also brings about greater uncertainty in estimating the NOC in a mixture profile 2,4 .
While previous studies have explored the uncertainty in estimating the NOC, these studies focused primarily on Caucasian populations 2,4-6 . Simulated DNA mixture profiles were generated based on allelic frequencies of several hundred of individuals of a population group 2,6-8 . The uncertainty in the NOC estimation in Asians was examined as a single generic population 2 , notwithstanding that Asians are made up of distinctly different ethnic populations, such as Chinese, Malay and Indian. For example, 97 individuals were used to estimate the uncertainty in NOC estimation from the entire Asian population 2 . The use of a limited number of individuals to represent the diverse Asian ethnic populations may limit the accuracy of such studies when addressing Asian populations. This inaccuracy would impact the match statistic (likelihood ratio) calculated using probabilistic genotyping methods when there is a match, as these methods require the NOC to be determined 9,10 . In this respect, this study sought to determine the uncertainty in NOC estimation from simulated DNA mixture profiles from the Chinese, Malay and Indian ethnic populations. Additionally, we investigate the effect of a mixture of ethnicities on uncertainty in NOC estimation.
The previous studies on uncertainties in NOC estimation had also not taken into consideration allele dropout and its impact on estimation of NOC 2,4 . With laboratories increasingly processing 'touch DNA samples' which would give rise to 'low-level complex mixture evidence' 11 , a greater occurrence of DNA mixture profiles with allele dropout can be expected. Hence, this study also evaluated the increased risk of inaccurately estimating the NOC in DNA mixture profiles that experience allele dropout.

Methods
The crime reference blood samples used in the study are from previous forensic cases with their identification information anonymized except for self-reported ethnic population. These samples were obtained with consent as per the statutes of our country, specifically the Registration of Criminals Act (RCA). Allele frequencies for the Chinese, Malay, and Indian ethnic populations used in this study were generated from previous crime reference blood samples (Supplemental Table S1) on FTA cards by direct amplification using the AmpFℓSTR Identifiler Direct PCR Amplification kit (Thermo Fisher Scientific), Powerplex ESX 17 System (Promega), and GlobalFiler Express PCR Amplification kit (Thermo Fisher Scientific). The Identifiler Direct and ESX17 PCR products were analysed using the 3100 genetic analyser, while the GlobalFiler Express PCR products were analysed using the 3500xl genetic analyser.
The allele frequencies for the Caucasian ethnic population were based on previous studies 8,12 . Population substructures within the ethnic populations were not considered in this study. Mixtures were made from profiles within the same population, unless otherwise stated.

Premise of simulation model used (without consideration for allele dropout). A locus with a set
of alleles is to be denoted by {a 1 , a 2 , . . . a n } , where a n is the allele with n th number of repeats in a locus. The probabilities of observing the respective alleles in a locus containing the set {a 1 , a 2 , . . . a n } = {P(a 1 ), P(a 2 ), . . . P(a n )} , where P(a n ) denotes the probability of the allele a n . Premise of simulation model used (with consideration for allele dropout). A 'dropout' allele a d has a probability of dropout at P(a d ) . The sum of probabilities of all outcomes is 1, i.e. 1 − P(a d ) = P(a d ). Therefore, P(a d ) is the probability of not observing an allelic dropout. Therefore, given that allele dropout is not observed, the conditional probability P C of observing an allele a n can be calculated. P C (a n ) is the multiplication product of the original probability with the probability of not observing an allele dropout (refer to Supplemental Fig. S2): where P C (a n ) and P(a n ) are the conditional and original allele probabilities, respectively.
Hence, For a set of alleles in a given locus = {a 1 , a 2 , . . . a n , a d } , the probabilities of these alleles = P C (a 1 ), P C (a 2 ), . . . P C (a n ), P(a d ) , where P C (a 1 ) to P C (a n ) are the conditional probabilities of observing alleles a 1 to a n , given that no allele dropout is observed respectively. P C (a n ) = P(a n ) × P(a d ) www.nature.com/scientificreports/ Derivation of simulated DNA mixture profiles in silico. Simulated DNA mixture profiles were derived in silico by selecting alleles independently based on the allele frequencies of a given population. With a sample size of 30 simulated mixture profiles per iteration, and for over 10,000 iterations, a sizable representation of rare reported alleles is produced. For example, 1.2 million allele counts would be obtained from 10,000 iterations with a sample size of 30 simulated 2-person mixtures per iteration. In this regard, a rare allele with a probability of 0.0001 can still be expected to be observed 120 times, allowing for its representation when counting distinct alleles seen in a DNA mixture. The codes for these simulations were written in R language and executed in the RStudio software version 1.2.1335, with the R packages 'dplyr' version 0.8.1 and 'ggplot2' version 3.1.1.

R E T R
The output of the simulations was represented by a probability density function (p.d.f) of the distinct allele counts obtained from the 10,000 iterations. The probability of observing Z number of distinct allele(s), denoted by P(X) obs=z was determined by solving area under the p.d.f for Therefore, Probability of inaccurately estimating the NOC. The number of alleles that can theoretically be observed for N contributors ranges from 1 to 2N, where N denotes the NOC. In order to calculate the cumulative probability of observing k contributors and less in a DNA mixture profile derived from N contributors, the probabilities of observing 1 to 2k alleles were first summed for each autosomal locus, before multiplying the summed probabilities across all the loci 5 , i.e.
Use of experimental animals, and human participants. The work described herein did not involve the use of any experimental animals and human participants.

Number of distinct alleles from a DNA mixture profile without allele dropout.
To determine the number of distinct alleles expected of a DNA mixture profile derived from N number of contributors, with no allele dropout, we calculated the probabilities of observing 1 to 2N number of distinct allele(s) observed in the profiles (Fig. 1). As expected of a 2-person DNA mixture profile, three and/or four distinct alleles were observed in all 21 autosomal loci. For a 3-person DNA mixture profile, 19 out of 21 autosomal loci yielded four and/or five distinct alleles. It is theoretically possible to obtain an upper bound of eight and ten alleles for 4-and 5-person DNA mixture profiles, respectively. There were, however, generally no more than six distinct alleles observed across the different ethnic populations in a 4-person profile, except at SE33. Similarly, in a 5-person profile, the loci with more than six distinct alleles observed were: D18S51, FGA, SE33, and D2S1338 (Chinese ethnic population); FGA, D1S1656, SE33, and D2S1338 (Malay and Indian ethnic populations); and D18S51, D1S1656, D12S391, SE33, and D2S1338 (Caucasian ethnic population).
In addition, SE33 was observed to have the highest number of distinct allele count for all ethnic populations, regardless of the number of contributors in the DNA mixture profile. The typical number of distinct allele counts observed were six, seven, and eight alleles for a 3-, 4-and 5-person mixture profile, respectively.
Overall, these results indicate that the number of distinct alleles observed were generally lower than the theoretical expected upper bound value, especially for DNA mixture profiles from 4 to 5 contributors. Impact of allele dropout on distinct allele counts in a DNA mixture profile. A probability of dropout, P(a d ) = 0.3 was applied to all loci in our simulations to assess the impact of allele dropout on estimating the NOC. The probabilities of observing 1 to 2N number of distinct allele(s) were calculated based on these simulated DNA mixture profiles (Fig. 2). We observed an overall decrease of at least one distinct allele in DNA mixture profiles that were derived from two to five contributors, across all four ethnic populations. This observation suggested that under scenarios where allele dropout can be expected, there is an increased risk of underestimating the NOC to the profile.
Risk of underestimating the NOC in a DNA mixture profile. The theoretical expected upper bound of allele counts for a 2-, 3-, 4-, and 5-person DNA mixture profiles are four, six, eight, and ten alleles, respectively. A smaller-than-expected allele count can lead to an underestimate of the NOC present in a DNA mixture profile. Figure 1 shows that no more than six distinct alleles were generally observed in a 5-person DNA mixture. Assuming no quantitative assessment of the alleles (i.e., peak heights), a 5-person DNA mixture profile may, at prima facie, be reasonably assumed to originate from three persons.
In this respect, we assessed the risk of underestimating NOC by calculating the cumulative probability of observing k number of contributors and fewer, in a DNA mixture profile derived from N number of contributors (Table 1). Our results showed that the risk of interpreting a DNA mixture as originating from a single source was www.nature.com/scientificreports/ negligible across all the different DNA mixture profiles, regardless of ethnic populations and even after adopting an overall allele dropout rate of 30%. For a 3-person DNA mixture profile, and with a 30% allele dropout rate, there was greater than a 76% risk that the profiles would be estimated as derived from two contributors.
Using the same 30% allele dropout rate (without consideration of peak height data), there is a definite (100%) risk of a 4-person DNA mixture profile being underestimated as originating from three or two (3 ≥ NOC > 1) contributors. For a 5-person DNA mixture profile, there is a 100% and 46% risk of underestimating the profile as originating from either (4 ≥ NOC > 1) or (3 ≥ NOC > 1), respectively.
The implications of allele dropout are considerable as, in its absence, there is a negligible risk (< 0.5%) of underestimating the NOC for 3-and 4-person DNA mixture profiles. With respect to 5-person mixtures, the risk of underestimating such a profile as arising from (4 ≥ NOC > 1) contributors ranged from 29% (Indian population) to 96% (Malay population).
Taken together, the present study demonstrated that as the known NOC in a DNA mixture profile increased, there was a greater risk of underestimating the NOC. This problem was exacerbated when there was allele dropout. In the absence of allele dropout, DNA mixture profiles of up to four contributors could be estimated with confidence. In contrast, after factoring in allele dropout, only a 2-person DNA mixture profile could be deduced without risk of underestimating the NOC.
Mixture DNA profiles originating from a combination of different ethnicities. All the mixture DNA profiles simulated thus far are generated from individuals of the same ethnic population, i.e. a 3-person mixture DNA profile comprises entirely of three Chinese, or three Malay or three Indian contributors. In actual crime casework, it is possible that a mixture DNA profile can originate from a combination of individuals from different ethnic populations and/or proportions e.g. a 3-person mixture DNA profile can be made up from a combination of two Chinese and one Malay contributors. Three different combinations of mixture DNA profiles were created in silico: (1) one Chinese, one Malay, and one Indian in a 3-person mixture DNA profile hereinafter referred as 'CMI'; (2) two Chinese and two Malay in a 4-person mixture DNA profile hereinafter referred as 'CCMM'; and (3) two Chinese, one Malay, and one Indian in a 4-person mixture DNA profile hereinafter referred as 'CCMI' . The number of distinct alleles obtained from such mixture DNA profiles were determined (Fig. 3). The differences in the number of distinct alleles obtained from these combined-ethnicity mixture DNA profiles and profiles of entirely the same ethnic population are shown in Fig. 4.
A common trend among the CMI, CCMM, and CCMI profiles is a one-allele gain/loss in the distinct allele count obtained, when compared to the pure Chinese, Malay, or Indian mixture DNA profiles. Hence, in terms of the distinct allele count in a locus, a mixture DNA profile with contributors originating from a combination of differing ethnicities has a maximum of one allele difference as compared to those originating from entirely the same ethnic population. Additionally, our results showed a greater proportion of loci gaining one distinct allele in these profiles as compared to those from entirely the same ethnic population; overall 55 loci gained, as compared to 30 loci loss of one distinct allele.
Despite changes in the distinct allele count observed, there remains a negligible risk (< 0.05%) in underestimating the NOC of these mixture DNA profiles containing different ethnic combinations (Table 2). www.nature.com/scientificreports/

Discussion
Previous literature has reported on the uncertainty in determining the NOC in a DNA mixture profile. Those studies were, however, based on allele frequencies in Caucasian populations with only limited data from major ethnic populations in Asia 2,4,5 . Additionally, the effects of allele dropout on the uncertainty among these Asian populations have not been investigated. By determining the number of distinct alleles obtained from simulated DNA mixture profiles, the present study evaluated the uncertainty in estimating the NOC from the Chinese, Malay and Indian ethnic populations in comparison to that reported for the Caucasian population. Using Caucasian allele frequencies, the approach adopted in our study yielded similar global trends to that reported by Coble et al. 2 . First, the risk of NOC underestimation increases with an increasing number of contributors in a DNA mixture profile. Second, it is extremely unlikely for a DNA mixture to be underestimated as being  www.nature.com/scientificreports/ derived from a single person. Minor differences in probabilities were observed from Coble's 1 and this study. The Coble et al. 1 study reported a 16.5% risk of underestimating a 4-person DNA mixture profile as derived from three contributors and fewer. In our study, there was no risk of underestimation for a 4-person DNA mixture profile in the absence of allele dropout. This difference could be due to a combination of two factors: (i) our study used a bootstrapping simulation while Coble et al. 1 used a Monte Carlo approach; and (ii) allele frequencies used for modelling were different with the present study using the more recently published Caucasian allele frequencies 8,12 .
The trend of underestimating the NOC was also observed in the present simulation using Chinese, Malay and Indian ethnic allele frequencies, consistent with that of published literature on other populations and different PCR amplification kits 2,4,5 . This observation highlights the inherent uncertainty in estimating the NOC in a DNA mixture profile, regardless of ethnic population or the array of loci used to generate a profile.
An important element in the present study is the consideration of allele dropout, which is frequently encountered during PCR amplification of low template and/or degraded DNA samples. As this phenomenon was not addressed in previous mixture simulation studies 2,4,5 , an allele dropout rate was introduced in our simulation study. Since our laboratory uses the GlobalFiler PCR amplification kit, the allele dropout rate reported from the developmental validation of the kit was used as a benchmark. Ludeman et al. 12 reported approximately a 30% overall allele dropout rate when 30 pg of template DNA were used for PCR amplification with the GlobalFiler PCR amplification kit 13 . However, the rate of allele dropout is dependent on PCR amplification parameters and detection threshold used, as reported for older generations of PCR amplification kits [14][15][16][17][18] . We, therefore, relied on the empirical data obtained from our internal validation study using the GlobalFiler PCR amplification kit to determine our laboratory's allele dropout rate. Similar to the benchmark, we observed an overall 30% allele dropout rate after PCR amplification with 30 pg of template DNA (Supplemental Fig. S3). As such, an overall 30% allele dropout rate appeared to be a reasonable benchmark for GlobalFiler PCR amplification kit, at least within our laboratory.
In concordance with a previous study 19 , our results showed a greater underestimation of NOC when there is a 30% allele dropout rate than would be observed with no allele dropouts 19 . Since the SE33 locus 20,21 was able to reduce the NOC underestimation risk in a no-allele dropout scenario 2 , we investigated whether SE33 locus can similarly reduce NOC underestimation risk in a mixture profile with 30% allele dropout. The risk of underestimation is reduced by up to 54%, when the SE33 locus was factored into NOC estimation (Table 1). We, therefore, opine that the SE33 locus is useful for accurate estimation of NOC in a DNA mixture profile, especially in scenarios with allele dropouts. Taken together, our studies highlight the importance of using the www.nature.com/scientificreports/ SE33 locus as a NOC-determining-indicator in a DNA mixture profile. This is, of course, only possible with SE33-containing PCR amplification kits.
Our study also recognises that mixture DNA profiles can consist of a combination of contributors from different ethnicities. This is especially so in cosmopolitan cities and countries such as Singapore. As such, we looked at a combination of Chinese, Malay and Indian, as 3-person mixture DNA profile (CMI). As Chinese is the major ethnic population, followed by Malay and Indian, two 4-person mixture DNA profiles consist of (1) two Chinese and two Malay (CCMM), and (2) two Chinese, one Malay, and one Indian (CCMI) were examined.
We expected lesser allele sharing in the CMI, CCMM, and CCMI mixture DNA profiles as compared to those from entirely the same ethnic population; our results validated our expectation. Despite the overall slight increase in distinct allele count, there are generally no large (≥ 1%) elevated risk of underestimating the NOC in these mixture DNA profiles. These findings add on to the previous study on mixture DNA profiles 2 , where a combination of differing ethnic populations in a mixture DNA profile were never investigated. Our results can be cautiously extrapolated to the previous study 2 , i.e. a mixture DNA profile derived from a combination of different ethnic populations would only deviate slightly from one derived entirely from the same ethnic population. Heatmap of distinct allele counts, based on the differences between the probability obtained from a mixture DNA profile of mixed ethnic population (i.e. CMI, CCMM, CCMI) and that of an entirely same Chinese, Malay, or Indian (y-axis on the right) ethnic population. The differences in probability is calculated as mixed minus entirely same ethnic population mixture DNA profile. The combination of the ethnic populations for CMI, CCMM, and CCMI mixture DNA profiles are identical to that in Fig. 3. Table 2. Cumulative probabilities (risk) of observing k number of contributors and fewer, in a CMI, CCMM, and CCMI DNA mixture profile, where k = 4, . . . , 1. The combination of the ethnic populations for CMI, CCMM, and CCMI mixture DNA profiles are identical to that in Fig. 3 Mixture www.nature.com/scientificreports/ Finally, like other simulation models 2,5 , the present study did not take into consideration allele peak heights and peak height ratios. Hence, by relying solely on distinct allele counts, this study presents forensic DNA analysts with an upperbound possible risk in assigning NOC to a mixture profile 2,4 . Lastly, the effects of population substructure on NOC has been addressed previously 5 , and was not taken into consideration in the present study.

Conclusion
The present study using allelic frequencies derived from a substantial number of distinct Chinese, Malay and Indian ethnic individuals has provided a novel insight into the uncertainty in NOC estimations on DNA mixture profiles originating from Asian individuals. Further, we quantified the risks of underestimating the NOC, in a DNA mixture profile comprising entirely of the same, and a combination of differing, ethnic populations. The risk of underestimating the NOC is exacerbated in the presence of allele dropout. Since accurate estimation of NOC is a critical first step in mixture DNA profile interpretation, be it via manual means or probabilistic genotyping expert systems 2,3 , these insights would be particularly relevant to Asian laboratories performing match likelihood calculations on DNA mixtures.