Evaluation of the evenness score in next-generation sequencing

Oexle, Konrad

doi:10.1038/jhg.2016.21

Download PDF

Original Article
Published: 14 April 2016

Evaluation of the evenness score in next-generation sequencing

Konrad Oexle¹

Journal of Human Genetics volume 61, pages 627–632 (2016)Cite this article

1626 Accesses
4 Citations
4 Altmetric
Metrics details

Subjects

Abstract

The evenness score (E) in next-generation sequencing (NGS) quantifies the homogeneity in coverage of the NGS targets. Here I clarify the mathematical description of E, which is 1 minus the integral from 0 to 1 over the cumulative distribution function F(x) of the normalized coverage x, where normalization means division by the mean, and derive a computationally more efficient formula; that is, 1 minus the integral from 0 to 1 over the probability density distribution f(x) times 1–x. An analogous formula for empirical coverage data is provided as well as fast R command line scripts. This new formula allows for a general comparison of E with the coefficient of variation (=standard deviation σ of normalized data) which is the conventional measure of the relative width of a distribution. For symmetrical distributions, including the Gaussian, E can be predicted closely as 1–σ²/2⩾E⩾1–σ/2 with σ⩽1 owing to normalization and symmetry. In case of the log-normal distribution as a typical representative of positively skewed biological data, the analysis yields E≈exp(−σ*/2) with σ*²=ln(σ²+1) up to large σ (⩽3), and E≈1–F(exp(−1)) for very large σ (⩾2.5). In the latter kind of rather uneven coverage, E can provide direct information on the fraction of well-covered targets that is not immediately delivered by the normalized σ. Otherwise, E does not appear to have major advantages over σ or over a simple score exp(−σ) based on it. Actually, exp(−σ) exploits a much larger part of its range for the evaluation of realistic NGS outputs.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Improving prime editing with an endogenous small RNA-binding protein

Article Open access 03 April 2024

Jun Yan, Paul Oyler-Castrillo, … Britt Adamson

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Introduction

Next-generation sequencing (NGS) techniques use random (‘shotgun’) sequencing of the template DNA in order to cover all ‘targets’ with a sufficient number of sequencing reads, that is, to reach a sufficient ‘coverage’. Accordingly, NGS always involves at least the fluctuation of a Poisson process. The distribution of the coverage thus cannot be entirely even but must have a variance that is at least as large as the mean (as in case of a Poisson distribution). On top of that lower bound, real NGS distributions show overdispersion and have variances that are substantially larger than their means. Overdispersion is due to various factors, including copy number variability of the template DNA, for instance, or pre-NGS manipulations, such as selective capturing of template DNA.

To assess the degree of inhomogeneity of NGS coverage quantitatively, Mokry et al.¹ elaborated on a consideration of Gnirke et al.² and introduced the ‘evenness score’ E. This score has found its way into the NGS field. Very recently, for instance, Lelieveld et al.³ applied E in their comparison of exome sequencing and whole-genome sequencing. Here I derive a computationally more efficient formula for the calculation of E. Then I use that formula in a general analysis, producing simple but close approximations. The latter allow for a comparison of the evenness score with conventional descriptors of the relative width of a distribution, such as the coefficient of variation.

Material & Methods and Results

The evenness score E

Mokry et al.¹ developed the evenness score as a tool to describe the dispersion of the coverage around the average coverage C_ave. Their idea is intuitive and proved useful for their study, but it has not been extensively characterized mathematically. Here I show that the explanation and derivation of the evenness score can be simplified significantly, leading to a more efficient computation of the score and to general insights into its relationship with more traditional statistical measures. In order to simplify the explanation, it has to be reproduced in the following paragraph. Readers asking for an immediate intuitive understanding of the evenness score are referred to Figure 1 and may then proceed directly to equation (3).

Mokry et al.¹ stated

where M_i ‘is defined as number of targeted positions with at least coverage C_i, C_ave is defined as the average coverage through all targeted positions and N_TP is defined as the total number of targeted positions.’ The introduction of the term C_i in this definition is an unnecessary complication as the instruction of the sum in equation (1) obviously implies that C_i=i. Thus, as Mokry et al.¹ stated, E equals 1 (=100%) in case of completely uniform coverage of all targeted positions at a level of C_ave because in this case M_i/N_TP=1 for all i⩽C_ave yielding . (Mokry et al.¹ used the letter P_i instead of M_i in equation (1) which is avoided here as P usually indicates a probability or relative frequency. Only after dividing M_i by N_TP, a probability results, that is, P(coverage⩾i)=M_i/N_TP.)

Mokry et al.¹ also provided a version of equation (1) for the continuous case, that is, ‘, …where F(i) is the fraction of positions with normalized coverage of at least C(i)/C_ave’, with ‘normalization’ meaning division by the mean. Again, the definition is a little complicated as the reader needs to figure out that C(i)/C_ave=i. Moreover, the use of the letters i and F in this formula is unfavorable as i has been applied in equation (1) already, although with a different meaning (!), and F usually relates to the left-sided cumulative distribution function (from −∞ to x, see https://en.wikipedia.org/wiki/Cumulative_distribution_function). Therefore, I prefer to write

with x≈i/C_ave and G(x)≈M_i/N_TP, where i is defined as in equation (1), and G(x) is the fraction of positions with normalized coverage of at least x. In equation (2), I omitted the factor ‘100%’ as it equals 1 anyway. With increasing C_ave, the residual difference between the discrete and the continuous version of E declines. Mokry et al.¹ used the continuous version for a visual explanation of the evenness score (see their ‘Figure 2’ and Figure 1 of the present paper). This explanation implies that G(x) (that is, ‘F(i)’ in terms of Mokry et al.¹) is the complement of the cumulative distribution function (cdf) of the normalized coverage. Hence G(x)=1–F(x), because 1–G(x) equals the fraction of positions with normalized coverage of at most x, that is, the cdf for which I use the common descriptor F(x) here. In case of a very even NGS result, almost all target positions have a coverage close to the mean, so that the probability density distribution (pdf) is restricted to the vicinity of the mean, the cdf is close to 0 for x<1, and the evenness score E approximates or 100%. Conversely, a coverage that is uneven with F(x)>0 for x<1 results in E<1 (that is, <100%).

Thus, except for the expression in percentage, the evenness score E of Mokry et al.¹ is given by

where F(x) is the cdf of the normalized coverage x=coverage/mean coverage.

With f(x) being the related pdf where , that is, , as there is no negative coverage (f(t)=0 for t<0), a rather convenient expression can be derived from equation (3) using integration by parts: As , equation (3) can be written as . Hence,

For the discrete case with x≈i/C_ave and f(x)≈n_i/N_TP, the analogous formula is

where N_TP and C_ave are defined as in equation (1) as the total number of targeted positions and the average coverage, respectively, while n_i is the number of targets that are covered with exactly i reads (that is, i is the non-normalized coverage). equation (5a) can be transformed to

where the condition ‘1⩽j⩽N_TP, C(j)⩽C_ave’ guarantees that the index of the summation runs through all target positions j whose coverage C(j) is not larger than the average coverage. As each of these positions occurs exactly once, equation (5b) does not have a weighing factor comparable to n_i in equation (5a) where i denotes coverage level instead of position. For a direct derivation of equations (5a and 5b), from equation (1), see Supplementary Material A and B and Supplementary Figure S1. As C_ave is usually not an integer, there might be a small deviation between the values calculated by equation (1) and equations (5a and 5b). The deviation is small for C_ave>10 but may be considerable if C_ave is of order ⩽1. The difference vanishes if C_ave is rounded to the next integer. With equations (5a and 5b), the computation time to calculate E is a linear function of the number of NGS reads as each read is addressed only once in the summation over (C_ave–i)n_i, whereas equation (1) requires computational time that increases as a quadratic function of the number of reads as each M_i represents a summation itself. For theoretical considerations (see below), equations (4, 5a and 5b) are also more useful than equations (1, 2, 3) because for various distributions, including the Gaussian normal distribution, F(x) cannot be provided in closed form. Concerning computational efficiency, the situation is then similar to the discrete version, as equation (4) involves only one numerical integration, while the calculation according to equation (3) implies the numerical integration of numerical integrations.

Commands to calculate E according to equations (4, 5a and 5b) on the R command line or as parts of R programs are as follows (see Supplementary Material B for a detailed explanation): In case of empirical (that is, discrete) data, let D be a vector that contains the data as a sequence of numbers representing the coverage of each of the targets. If this sequence is the column k of a table T, use the command ‘D=T[,k]’ to produce D. Then implementation of equation (5b) in R yields E by the command line script

where C_ave is rounded to the next integer (which only has a substantial effect for data whose non-normalized C_ave is very small; that is, of order ⩽1). This command also works after normalization of the data. For operations with a theoretical distribution f(x) of a continuous normalized random variable, equation (4) can be implemented in R as

where ‘f(x)’ has to be replaced by the specific pdf.

In the following, I derive approximations of the evenness score E, especially in terms of the distribution parameter σ. I also consider alternative scores such as e^−σ, which is restricted to the interval between 0 and 1 by definition and thus qualifies for scoring in percentage.

E and σ in case of a symmetrical pdf

As coverage always is positive, with f(x)=0 for x<0, and because normalization implies μ=1, the variance is given by . With both f(x)⩾0 and (x–1)²⩾0, the variance is where k₂>1. As (1–x)²⩽1–x for 0⩽x⩽1, equation (4) implies σ²/k₂⩽1–E, which yields an upper limit of E. In analogy to k₂, a constant k₀ can be defined with , where k₀>1 since because f(x) is a pdf. To derive the lower limit of E, apply Jensen’s inequality for convex functions such as (x–1)² (see Supplementary Material C) yielding . With equation (4), this is equivalent to k₀σ²/k₂⩾(k₀(1–E))². Hence,

The constants k₀ and k₂ depend on the form of the distribution; k₀ is associated with the relation of median m and mean μ=1. If 1>m, then , and k₀<2. In case of symmetrical pdfs, m=μ=1 and k₀=2. Moreover, symmetry implies , and therefore, k₂=2. Hence,

(see Figure 2 for some examples). Inequality(7) makes sense only if the normalized standard deviation σ, which equals for symmetrical pdfs, ranges between 0 and 1. Indeed, this can be shown using the extreme types of symmetrical pdfs: If f(x)→0 for x≠1, we get σ→0, whereas if f(x) has a U-form, with f(x)→0 for x(x–2)≠0, thus maximizing the distance of the random variable from the mean, we have σ→1 as (1–x)² equals either (1–0)² or (1–2)². For these two extremes, E is precisely determined by inequality(7), being 1 and 0.5, respectively. The relative error in estimating E by inequality(7), that is, by the mean of the limits (1–σ/2) and (1–σ²/2), must be smaller than half of their difference divided by the lower limit, 0.5(σ/2–σ²/2)/(1–σ/2). The maximum of the latter term is found at and is only 0.086. Analyzing realistic distributions (see below) yields relative errors even much smaller than that.

Among the pdfs that are symmetrical and unimodal (for example, bell-shaped), the pdf with the maximal σ is realized by an approximate rectangular distribution over the interval [0, 2] with f(x)=0.5 for 0⩽x⩽2, and f(x)=0 otherwise. A simple calculation yields , 1–σ/2=0.71, 1–σ²/2=0.83, E=0.75, a relative error in estimating E by inequality (7) of ((1–σ/2+1–σ²/2)/2–E)/E=0.03, and e^−σ=0.56. More so than the rectangular, the triangular distribution might serve as a semi-realistic but still analytically treatable model of a symmetrical and unimodal pdf. For a triangular pdf with its base on the interval [1–b,1+b],b⩽1, and, consequently, peak height of 1/b, one gets and . Again, E can be predicted rather well by inequality(7) with a relative error of <0.028. Of note, the range of E, that is, the interval [0.83, 1], is only half as large as the ranges of σ or e^−σ. Even more realistic, of course, than a triangular pdf is the assumption of a Gaussian normal distribution. The latter is reasonably symmetrical as long as the standard deviation is small with σ≪μ (see Figure 2 and Supplementary Material F for the effect of truncation at x=0). If the coverage results from a random production of reads as in a Poisson process, its distribution is approximately Gaussian with a variance before normalization that is as large as the mean coverage. Assuming a mean coverage of 100 before normalization, the standard deviation after normalization then is . Numerical integration using R (see Supplementary Material B) yields E(σ) as 0.96, 0.92 and 0.84 for σ being 0.1, 0.2, and 0.41, respectively, which is almost the same as in case of the triangular distribution (Figure 2).

One might think that E⩾1–σ/2 (see inequality(7)) also applies to all positively skewed normalized distributions, that is, normalized pdfs with positive third moment. However, this may not necessarily be the case. Defining with n∈{0,1,2,3,…} we get k₀ and k₂ according to their definitions in the derivation of inequality(6), k₁=2 owing to the definition of the mean μ, which equals 1 after normalization and k₃>2 in case of positive skewness. For E⩾1–σ/2 to be true, the product k₀k₂ needs to be >4 (see inequality(6)). Proofs in that matter are not trivial. For ‘positively slanted’ distributions⁴ (that is, pdfs for which f(μ+x)–f(μ−x) is not identically zero and changes sign in x>0 at most once and from negative to positive, which include the Pearson family and the log-normal distribution) it can be derived, using the reasoning of MacGillivray,⁵ that k₂>2 and 2>k₀>1 (not shown). However, this is not very helpful. For the log-normal distribution, better approximations are derived in the following section.

The evenness score of the log-normal distribution

Measurements on biological entities usually are positive with a maximum at x>0 and a tail towards higher values. As such, their distributions resemble a log-normal distribution (see Limpert et al.⁶ for a review). This type of distribution has been found in a great variety of cases, including gene expression,⁷ telomere length,⁸ neuronal activity,⁹ fecundity¹⁰ or time-to-event duration (for example, incubation time) of infectious and other diseases,^{11, 12} for instance, although log-normal genesis (multiplicative interaction of many random effects) cannot always be demonstrated perfectly. The pdf of the coverage in NGS may also have log-normal appearance (Figure 4): The rolling circle technique of Complete Genomics or the use of selective capturing of targets as in exome sequencing produce such distributions, whereas whole-genome sequencing with the Illumina technique results in rather symmetrical distributions.^{13, 14} Therefore, I examined the evenness score E(σ) of the log-normal distribution (see Figure 3).

The log-normal distribution¹⁵ is the density of a variable whose logarithm ln(x) has a Gaussian normal distribution No(μ*, σ*). Being the first derivative of the cdf with ∂ln(x)/∂x=x⁻¹, the log-normal pdf thus is

where μ* and σ* now are form parameters only that relate to mean and variance of x as and , respectively.¹⁶ Normalization (division by the mean) conserves the log-normal form of a distribution, since ln(x/μ)=ln(x)–ln(μ) implies that ln(x/μ) has the Gaussian normal distribution No(μ*–ln(μ),σ*) if the distribution of ln(x) is No(μ*,σ*). For normalized coverage with μ=1, the relation of σ* and σ simplifies to

The log-normal distribution is increasingly skewed with increasing σ (see Figure 3a), whereas it approximates a Gaussian normal distribution No(μ, σ) if σ→0 (see Supplementary Material D for a proof of the latter tendency). In case of small σ, the evenness score of a normalized log-normal distribution thus can be estimated by inequality(7) (see Figures 3b and 4b). First-order approximation of equation (9) in the vicinity of ln(1) yields σ*²=ln(1+σ²)≈σ², that is, σ≈σ*, so that inequality(7) translates to 1–σ*²/2⩾E⩾1–σ*2 for σ≈σ*→0. With first-order approximation in the vicinity of e⁰ as e^0+Δt≈1+Δt, this results in . Figures 3b and 4 and Supplementary Figure S2 show that also holds beyond the region of small σ. At σ=1.3 where σ*=1, the values of E=0.62 and e^−1/2=0.61 still are almost identical. Indeed, the approximation e^−σ*/2 is valid up to σ=3 (that is, σ*=1.5), with a maximal absolute error of 0.02,

To derive an approximation for even larger σ, use y=ln(x), which, by definition, has a Gaussian normal distribution No(μ*,σ*). Substituting x by y, that is, F_LogNo(x) by F_No(y) in equation (3), yields , taking into consideration that dx=∂x/∂ydy=e^ydy. The value of is determined by the region close to the origin as the factor e^y is approaching 0 for negative values of y beyond that region. For large σ (and, therefore, large σ*), F_No(y) approximates its maximum (=1) in that region so that its graph becomes flat and rather linear, because its mean μ* moves away from the origin with the square of its standard deviation, μ*=σ*²/2, according to equation (9). Hence, for large σ, F_No(y) can be replaced by a low-grade Taylor series approximation. The Taylor series can be expressed as , considering that f_No is the first derivative of F_No. With , one gets (see Supplementary Material E for a detailed derivation). Stopping that series at n=0 yields E≈1–(F_No(0)+f_No(0)(−1)). The term in the brackets amounts to a first-order Taylor approximation of F_No(y) for y=−1. Returning to the log-normal distribution of x with x=e^y then results in E≈1–F(e⁻¹). As shown in Figure 3b, this approximation is rather good for very large values of σ. For σ⩾2.5, its maximal absolute error is at most 0.02. Hence,

As F(e⁻¹)=F(0.38), which is the number of reads with a normalized coverage of at most 0.38, E here indicates the number of reads with a normalized coverage of at least 0.38. Figure 3b shows that 1–F(e⁻¹) is quite parallel to 1–F(0.2); in case of a 100 × average coverage, which is frequently aimed for in NGS projects, F(0.2) indicates the limit of 20 × that is usually considered as the minimal coverage necessary for reliable mutation detection. As such, for NGS projects with very large variance of the coverage, E may serve as useful and more or less direct indicator of the fraction of sufficiently covered targets.

Discussion

The evenness score is used in NGS to quantify the homogeneity of target coverage with sequencing reads.^{1, 3} As such, it is a measure of the relative width of a distribution, with the coverage being the distributed variable. Its use can be recommended only if it has advantages compared with the coefficient of variation, which is the parameter conventionally applied for this purpose. Here I have performed that comparison. To do so, I used the evenness score in its continuous version, which assumes a normalized random variable (that is, having a mean μ of 1). Therefore, the evenness score E was compared with the standard deviation σ, as σ equals the coefficient of variation if μ equals 1.

At first, I clarified the mathematical definition of E and derived a computational more efficient version (see equation (4)), which then was also translated to the non-normalized, discrete case of empirical coverage data (see equations (5a and 5b)). Using this version, the calculation of E avoids double summations thus making it now about as fast as the calculation of σ. As most software applications still do not contain a built-in routine for the calculation of E, I have provided short R commands that will be easily translatable to analogous commands in other programming languages.

Besides the unconventionality of E, its definition might appear to imply another disadvantage: Since the integration in its calculation, runs only up to the mean (=1 owing to normalization), E might appear to be insensitive to the variable’s distribution above the mean. However, this is not true because we see that . Hence, by influencing the location of the mean (before normalization), the upper part of the distribution influences the upper end of the lower part and, thereby, the result of the integration of the normalized lower part.

More important is the outcome of the general analysis of E performed in the present paper. For any symmetrical distribution, including the Gaussian normal distribution, I showed that E can be predicted with little error by σ, that is, by 1–σ/2 (see inequality(7) and Figure 2). Moreover, as some NGS methods entail positively skewed coverage data (see Figure 4, Ernani et al.¹³ and Lam et al.¹⁴), I examined the evenness score of the log-normal distribution, which is the typical distribution of positively skewed results of biological measurements:⁶ For a rather wide range of σ(⩽3), E was found to be predictable by e^−σ*/2 with σ*²=ln(σ²+1) (see equation (10) and Figure 3b). In these cases, E also does not seem to provide much information that is not easily derivable from σ. An advantage of E was revealed only for cases with very large coefficient of variation (that is, σ of normalized data ⩾2.5), as it then satisfyingly and directly predicts the fraction of targets with sufficiently high coverage (see equation (11) and Figure 3b), whereas this fraction cannot be easily estimated directly from σ.

Some might argue that the evenness score has the advantage of being a score between 0 and 1 (0% and 100%). However, a simple score with that quality can also be devised using σ, namely as e^−σ, which is 1 (that is, 100%) for absolutely homogeneous coverage and approaches 0 for inhomogeneous coverage. The major difference between E and e^−σ is given by the rate of approaching 0 as can be seen in Figure 3b. There, E still indicates an evenness of 0.37=37% if F(0.2)=0.5 with 50% of the targets having a coverage of at most 0.2 (that is, of at most 20 × if the mean is 100 ×), while e^−σ is already down to a level of e⁻⁵=0.007=0.7%. If such NGS outputs were unacceptable due to insufficient coverage of too many targets, E would not exploit its full range (0–1) for the evaluation of the acceptable NGS outputs. Indeed, the minimal E values of published NGS outputs as calculated in Mokry et al.,¹ Lelieveld et al.³ and this present paper are still as large as 0.62, 0.68 and 0.66, respectively, while e^−σ goes down to 0.37 (see Figure 4). On the other hand, if outputs with 50% of the targets having a coverage of at most 20% of the mean coverage were acceptable, E would have the advantage to preserve some of its range for their quantitative evaluation.

Dealing with log-normal distributions, it might also be worth considering the analog of the standard deviation of a Gaussian normal distribution, that is, the ‘multiplicative standard deviation’ σ* as recommended by Limpert et al.⁶ (note that the naming of the variables is different in Limpert et al.⁶). It is one of the two form parameters in equations (8 and 9). In case of empirical data, it can be calculated as the standard deviation of the natural logarithm of the random variable. In Figure 3b, the score e^−σ* is presented as a possible tool for the quantitative evaluation of the homogeneity of NGS outputs. It may provide a compromise between e^−σ and E. However, the ‘multiplicative standard deviation’ does not yet seem to be in common use and the NGS community may therefore hesitate to take it into consideration.

In summary, the general evaluation presented in this paper reveals that in most circumstances the evenness score E of a NGS output can be predicted quite well by the standard deviation σ of the normalized data (that is, by the coefficient of variation σ/μ in case of non-normalized data). Only if σ is very large (⩾2.5μ), does E have the advantage of directly reflecting the fraction of sufficiently covered targets. The general relation between E and σ set out here should also apply to other scientific fields that develop a parameter equivalent to E for their statistics.

References

Mokry, M., Feitsma, H., Nijman, I. J., de Bruijn, E., van der Zaag, P. J., Guryev, V. et al. Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries. Nucleic Acids Res. 38, e116 (2010).
Article Google Scholar
Gnirke, A., Melnikov, A., Maguire, J., Rogov, P., LeProust, E. M., Brockman, W. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).
Article CAS Google Scholar
Lelieveld, S. H., Spielmann, M., Mundlos, S., Veltman, J. A. & Gilissen, C. Comparison of exome and genome sequencing technologies for the complete capture of protein coding regions. Hum. Mutat. 36, 815–822 (2015).
Article CAS Google Scholar
Rösler, U. Distributions slanted to the right. Stat. Neerl. 49, 83–93 (1995).
Article Google Scholar
MacGillivray, H. L. The mean, median, mode inequality and skewness for a class of densities. Aust. J. Stat. 23, 247–250 (1981).
Article Google Scholar
Limpert, E., Stahel, W. A. & Abbt, M. Log-normal distributions across the sciences: keys and clues. BioScience 51, 341–352 (2001).
Article Google Scholar
Bengtsson, M., Ståhlberg, A., Rorsman, P. & Kubista, M. Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 15, 1388–1392 (2005).
Article CAS Google Scholar
Oexle, K. Telomere length distribution and Southern blot analysis. J. Theor. Biol. 190, 369–377 (1998).
Article CAS Google Scholar
Rupasov, V. I., Lebedev, M. A., Erlichman, J. S. & Linderman, M. Neuronal variability during handwriting: lognormal distribution. PLoS ONE 7, e34759 (2012).
Article CAS Google Scholar
Herrera, C. M. & Jovani, R. Lognormal distribution of individual lifetime fecundity: insights from a 23-year study. Ecology 91, 422–430 (2010).
Article Google Scholar
Sartwell, P. E. The distribution of incubation periods of infectious disease. Am. J. Hyg. 51, 310–318 (1950).
CAS PubMed Google Scholar
Horner, R. D. Age at onset of Alzheimer’s disease: clue to the relative importance of etiologic factors? Am. J. Epidemiol. 126, 409–414 (1987).
Article CAS Google Scholar
Ernani, F. P ., LeProust, E. M. & Agilent Technologies. Target enrichment for NGS. Euro Biotech. News 8, 42–44 (2009).
Google Scholar
Lam, H. Y., Clark, M. J., Chen, R., Chen, R., Natsoulis, G., O'Huallachain, M. et al. Performance comparison of whole-genome sequencing platforms. Nat. Biotechnol. 30, 78–82 (2012).
Article CAS Google Scholar
McAlister, D. The law of the geometric mean. Proc. R. Soc. 29, 367–376 (1879).
Article Google Scholar
Rinne, H. Taschenbuch der Statistik 3rd edn (eds Harri Deutsch, Frankfurt a.M.) 301–305 (Germany, 2003).
Glusman, G., Cariaso, M., Jimenez, R., Swan, D., Greshake, B., Bhak, J. et al. Low budget analysis of Direct-To-Consumer genomic testing familial data version1; referees: 2 approved. F1000Research 1, 3 (2012).
Article Google Scholar

Download references

Acknowledgements

I thank Kay E Reed for inspiring talks and critical reading.

Author information

Authors and Affiliations

Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland
Konrad Oexle

Authors

Konrad Oexle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konrad Oexle.

Ethics declarations

Competing interests

The author declares no conflict of interest.

Additional information

Supplementary Information accompanies the paper on Journal of Human Genetics website

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oexle, K. Evaluation of the evenness score in next-generation sequencing. J Hum Genet 61, 627–632 (2016). https://doi.org/10.1038/jhg.2016.21

Download citation

Received: 05 November 2015
Revised: 21 February 2016
Accepted: 21 February 2016
Published: 14 April 2016
Issue Date: July 2016
DOI: https://doi.org/10.1038/jhg.2016.21

This article is cited by

Ultralow amounts of DNA from long-term archived serum samples produce high-quality methylomes
- Marcin W. Wojewodzic
- Magnus Leithaug
- Trine B. Rounge
Clinical Epigenetics (2021)
A commentary on evaluation of the evenness score in next-generation sequencing
- Paul Horton
Journal of Human Genetics (2016)