To the editor

The European Union recently introduced legislation1 stipulating the mandatory labeling of food products with a GMO content greater than 1%. (The regulation(s) don't specify whether this is weight per weight or any other unit.) Thus far, most discussions concerning the methods used for sampling have focused on sampling requirements outside of the laboratory; for example, how to procure GMO seeds from a grain shipment? Scant attention has been paid to sampling problems that lie further down the analytical chain—that is, variability in the proportions of GMO to non-GMO DNA in replicate “homogenized” laboratory samples. We believe this has serious implications for the practicability of GMO detection in foods.

The first problem in any DNA sampling protocol is defining the limits of detection. The amount of unreplicated haploid genome (i.e., the 1C value) present in a sample is useful for relating genome copy number to the amount of sample taken. For example, up to 36,697 copies of the haploid Zea mays genome (which we will use here for all examples below) are present in a typical 100 ng DNA analytical sample, given the 1C value of 2.725 picograms3. It follows that a single copy of the haploid Z. mays genome in a 100 ng DNA sample is present at a level of 0.0027% (wt/wt). Levels of DNA below this threshold simply cannot be detected reliably in samples of this size.

A second problem is sampling error. This occurs in a perfectly homogeneous preparation, even if a large amount (say, 50 μg) of DNA is extracted from a laboratory sample and simple random sampling procedures4 are adopted. As the amount of DNA extracted from the sample becomes lower, sampling error becomes (proportionally) larger. Thus, replicate 100 ng DNA samples containing GMO material at a level of 0.1% (wt/wt) would produce GMO DNA estimates no better than 30% of the mean value, 95% of the time—a poor level of accuracy, even if we ignore other types of error inherent in a real analytical system.

To illustrate this, we use the cumulative distribution function for the binomial distribution5 to calculate the probable range of GMO genome copies that would be “sampled” in a single-step procedure—that is, from a (large) laboratory sample of “known” low content (0.1% GMO) into a series of 100 ng analytical samples. Although on average, the analytical samples should contain 36.7 GMO genome copies, in fact the number of GMO copies ranges from 25 to 48, with a 94.3% probability. Thus, the actual DNA content that would be observed in a single sample, with an 95% probability, would range from 0.068% to 0.131%; the probability of sampling exactly 36 GMO copies (i.e., 0.1% content) in a single analytical sample is only 0.066.

With lower levels of DNA, the problem is even more critical. For a laboratory sample containing DNA at a level of 0.01%, the 100 ng analytical sample would vary between 0.0027% and 0.0191% nearly 95% of the time. These calculations obviously refer to a “best possible” result, as they assume a single sampling step and a perfect analytical system.

When undertaking a dilution series, the assumption of simple random sampling may no longer be valid, as the number of copies available becomes strictly finite. Indeed, the number of copies used to prepare subsequent dilutions heavily influences the sampling error associated with the series. Consequently, the preparation of any dilution series must be undertaken in such a way as to minimize this bias; ideally, dilutions should be made from the primary laboratory sample. Unfortunately, we note that some equipment manuals actually encourage the construction of the series without recognizing this problem.

The classical solution to the issue of sampling error is to undertake repetitions and/or use appropriately sized (i.e., larger) analytical samples. We recommend that in the construction of a dilution series—for example, for determination of “limit of detection” of a method, or for the generation of standard curves—the nominal number of GMO copies in the weakest dilution of analytical sample should be set to 20, thus providing good statistical probability that all repetitions contain relevant DNA (Table 1).

Table 1 Expected probability of GMO copies in a 1000 ng DNA sample of Zea maysa

However, we are aware of important studies that seem to draw conclusions without such safeguards, despite explicitly working with copy numbers. Several international standards for PCR analysis of GMO in foodstuffs, currently under development, draw attention to sample sizes in the procurement of material for the laboratory sample, but in general do not address the issues of sampling associated with the analytical sample. We believe there is insufficient acknowledgment that repeated analytical samples drawn from a “homogenized” laboratory sample would not have identical proportions of GMO/non-GMO copies.