## Practical guides

- Nature Reviews Genetics | Reviews
### Statistical power and significance testing in large-scale genetic studies

Pak C. Sham, Shaun M. Purcell

#### Article Citation

*Nature Reviews Genetics***15**, 335-346 (2014)

doi: 10.1038/nrg3706Posterior probability of H

_{0}given the critical significance level and the statistical power of a study, for different prior probabilities of H_{0}. - Nature Reviews Neuroscience | Research
### Power failure: why small sample size undermines the reliability of neuroscience

Katherine S. Button, John P. A. Ioannidis, Claire Mokrysz, Brian A. Nosek, Jonathan Flint

*et al.*#### Article Citation

*Nature Reviews Neuroscience***14**, 365-376 (2013)

doi: 10.1038/nrn3475 - Nature Protocols | Protocols
### Basic statistical analysis in genetic case-control studies

Geraldine M Clarke, Carl A Anderson, Fredrik H Pettersson, Lon R Cardon, Andrew P Morris

*et al.*#### Article Citation

*Nature Protocols***6**, 121-133 (2011)

doi: 10.1038/nprot.2010.182 - Nature Neuroscience | Reviews
### Erroneous analyses of interactions in neuroscience: a problem of significance

Sander Nieuwenhuis, Birte U Forstmann, Eric-Jan Wagenmakers

#### Article Citation

*Nature Neuroscience***14**, 1105-1107 (2011)

doi: 10.1038/nn.2886Graphs illustrating the various types of situations in which the error of comparing significance levels occurs.

- Nature Biotechnology | Research
### Analyzing 'omics data using hierarchical models

Hongkai Ji, X Shirley Liu

#### Article Citation

*Nature Biotechnology***28**, 337-340 (2010)

doi: 10.1038/nbt.1619 - Nature Genetics | Reviews
### Advantages and pitfalls in the application of mixed-model association methods

Jian Yang, Noah A Zaitlen, Michael E Goddard, Peter M Visscher, Alkes L Price

#### Article Citation

*Nature Genetics***46**, 100-106 (2014)

doi: 10.1038/ng.2876MLMe increases power and MLMi decreases power compared to linear regression.

Effectiveness of mixed linear models using random or top associated markers in correcting for stratification.

Effectiveness of mixed linear models using top associated markers in increasing study power.

- Nature Protocols | Protocols
### Quality control and conduct of genome-wide association meta-analyses

Thomas W Winkler, Felix R Day, Damien C Croteau-Chonka, Andrew R Wood, Adam E Locke

*et al.*#### Article Citation

*Nature Protocols***9**, 1192-1212 (2014)

doi: 10.1038/nprot.2014.071P-Z plot to reveal analytical issues with beta, standard error and P values.

- Nature Neuroscience | Reviews
### Circular analysis in systems neuroscience: the dangers of double dipping

Nikolaus Kriegeskorte, W Kyle Simmons, Patrick S F Bellgowan, Chris I Baker

#### Article Citation

*Nature Neuroscience***12**, 535-540 (2009)

doi: 10.1038/nn.2303 - Nature Neuroscience | Reviews
### A solution to dependency: using multilevel analysis to accommodate nested data

Emmeke Aarts, Matthijs Verhage, Jesse V Veenvliet, Conor V Dolan, Sophie van der Sluis

#### Article Citation

*Nature Neuroscience***17**, 491-496 (2014)

doi: 10.1038/nn.3648Use of conventional t test on nested data inflates the type I error rate, whereas cluster-based summary statistics decreases statistical power.

Graphical representations of conventional t test and multilevel analysis.

#### Article Citation

*Nature Biotechnology***27**, 1135-1137 (2009)

doi: 10.1038/nbt1209-1135Associating confidence measures with CTCF binding motifs scanned along human chromosome 21.

#### Article Citation

*Nature Biotechnology***22**, 1177-1178 (2004)

doi: 10.1038/nbt0904-1177#### Article Citation

*Nature Biotechnology***22**, 1315-1316 (2004)

doi: 10.1038/nbt1004-1315- Nature Methods | News
### Points of significance: Importance of being uncertain

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***10**, 809-810 (2013)

doi: 10.1038/nmeth.2613The mean and s.d. are commonly used to characterize the location and spread of a distribution.

The distribution of sample means from most distributions will be approximately normally distributed.

#### Article Citation

*Nature Methods***10**, 921-922 (2013)

doi: 10.1038/nmeth.2659Error bar width and interpretation of spacing depends on the error bar type.

- Nature Methods | News
### Points of significance: Significance, P values and t-tests

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***10**, 1041-1042 (2013)

doi: 10.1038/nmeth.2698Repeated independent observations are used to estimate the s.d. of the null distribution and derive a more robust P value.

#### Article Citation

*Nature Methods***10**, 1139-1140 (2013)

doi: 10.1038/nmeth.2738When unlikely hypotheses are tested, most positive results of underpowered studies can be wrong.

- Nature Methods | News
### Points of Significance: Visualizing samples with box plots

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***11**, 119-120 (2014)

doi: 10.1038/nmeth.2813Box plots reflect sample variability and should be avoided for very small samples (n < 5), with notches shown only when they appear within the IQR.

Quartiles are more intuitive than the mean and s.d. for samples from skewed distributions.

- Nature Methods | News
### Points of significance: Comparing samples—part I

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***11**, 215-216 (2014)

doi: 10.1038/nmeth.2858The uncertainty in a sum or difference of random variables is the sum of the variables' individual uncertainties, as measured by the variance.

In the two-sample test, both samples contribute to the uncertainty in the difference of means.

- Nature Methods | News
### Points of significance: Comparing samplespart II

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***11**, 355-356 (2014)

doi: 10.1038/nmeth.2900Family-wise error rate (FWER) methods such as Bonferroni's negatively affect statistical power in comparisons across many tests.

The shape of the distribution of unadjusted P values can be used to infer the fraction of hypotheses that are null and the false discovery rate (FDR).

#### Article Citation

*Nature Methods***11**, 467-468 (2014)

doi: 10.1038/nmeth.2937A sample can be easily tested against a reference value using the sign test without any assumptions about the population distribution.

The Wilcoxon rank-sum test can outperform the t-test in the presence of discrete sampling or skew.

- Nature Methods | News
### Points of significance: Designing comparative experiments

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***11**, 597-598 (2014)

doi: 10.1038/nmeth.2974Design and reporting of a single-factor experiment with three levels using a two-sample t-test.

Sources of variability, conceptualized as circles with measurements (x

_{i}, y_{i}) from different aliquots (x,y) randomly sampled within them. - Nature Methods | News
### Points of significance: Analysis of variance and blocking

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***11**, 699-700 (2014)

doi: 10.1038/nmeth.3005ANOVA is used to determine significance using the ratio of variance estimates from sample means and sample values.

Blocking improves sensitivity by isolating variation in samples that is independent from treatment effects.

- Nature Methods | News
### Points of Significance: Replication

Paul Blainey, Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***11**, 879-880 (2014)

doi: 10.1038/nmeth.3091Replicates do not contribute equally and independently to the measured variability, which can often underestimate the total variability in the system.

The number of replicates affects FDR and power of inferences on the difference in variances and means.

- Nature Methods | News
### Points of Significance: Nested designs

Martin Krzywinski, Naomi Altman, Paul Blainey

#### Article Citation

*Nature Methods***11**, 977-978 (2014)

doi: 10.1038/nmeth.3137Inferences about fixed factors are different than those about random factors, as shown by box-plots of n = 10 samples across three independent experiments.

#### Article Citation

*Nature Methods***11**, 1187-1188 (2014)

doi: 10.1038/nmeth.3180When studying multiple factors, main and interaction effects can be observed, shown here for two factors (A, blue; B, red) with two levels each.

In two-factor experiments, variance is partitioned between each factor and all combinations of interactions of the factors.

#### Article Citation

*Nature Methods***12**, 5-6 (2014)

doi: 10.1038/nmeth.3224Internal and external validity relate respectively to how precise and representative the results are of the population of interest.

In the presence of variability, the precision in sample mean can be improved by increasing the sample size, or the number of replicates in a nested design.

#### Article Citation

*Nature Methods***12**, 165-166 (2015)

doi: 10.1038/nmeth.3293In biological experiments using split plot designs, whole plot experimental units can be individual animals or groups.

The split plot design with CRD is commonly applied to a repeated measures time course design.

- Nature Methods | News
### Points of Significance: Bayes' theorem

Jorge López Puga, Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***12**, 277-278 (2015)

doi: 10.1038/nmeth.3335Marginal, joint and conditional probabilities for independent and dependent events.

Graphical interpretation of Bayes' theorem and its application to iterative estimation of probabilities.

- Nature Methods | News
### Points of significance: Bayesian statistics

Jorge López Puga, Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***12**, 377-378 (2015)

doi: 10.1038/nmeth.3368Prior probability distributions represent knowledge about the coin before it is tossed.

- Nature Methods | News
### Points of Significance: Sampling distributions and the bootstrap

Anthony Kulesa, Martin Krzywinski, Paul Blainey, Naomi Altman

#### Article Citation

*Nature Methods***12**, 477-478 (2015)

doi: 10.1038/nmeth.3414Sampling distributions of estimators can be used to predict the precision and accuracy of estimates of population characteristics.

The Luria-Delbrück experiment studied the mechanism by which bacteria acquired mutations that conferred resistance to a virus.

The sampling distribution of complex quantities such as the variance-to-mean ratio (VMR) can be generated from observed data using the bootstrap.

- Nature Methods | News
### Points of Significance: Bayesian networks

Jorge López Puga, Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***12**, 799-800 (2015)

doi: 10.1038/nmeth.3550Observing the state of a node can change the estimate of the states of other nodes.

Observation about child nodes creates conditional dependencies and independencies between its parent nodes on different paths.

- Nature Methods | News
### Points of Significance: Association, correlation and causation

Naomi Altman, Martin Krzywinski

#### Article Citation

*Nature Methods***12**, 899-900 (2015)

doi: 10.1038/nmeth.3587Correlation is a type of association and measures increasing or decreasing trends quantified using correlation coefficients.

Correlation coefficients fluctuate in random data, and spurious correlations can arise.

- Nature Methods | News
### Points of Significance: Simple linear regression

Naomi Altman, Martin Krzywinski

#### Article Citation

*Nature Methods***12**, 999-1000 (2015)

doi: 10.1038/nmeth.3627A variable Y has a regression on variable X if the mean of Y (black line) E(Y|X) varies with X.

In a linear regression relationship, the response variable has a distribution for each value of the independent variable.

Regression models associate error to response which tends to pull predictions closer to the mean of the data (regression to the mean).

- Nature Methods | News
### Points of Significance: Multiple linear regression

Martin Krzywinski, Naomi Altman

#### Article Citation

*Nature Methods***12**, 1103-1104 (2015)

doi: 10.1038/nmeth.3665The results of multiple linear regression depend on the correlation of the predictors, as measured here by the Pearson correlation coefficient r (ref. 2).

Results and interpretation of multiple regression changes with the sample correlation of the predictors.

- Nature Methods | News
### Points of Significance: Analyzing outliers: influential or nuisance?

Naomi Altman, Martin Krzywinski

#### Article Citation

*Nature Methods***13**, 281-282 (2016)

doi: 10.1038/nmeth.3812Observations near the mean have less influence on the regression estimates and fitted values.

The leverage, residual and Cook's distance of an observation are used to assess the robustness of the fit.

A plot of residuals as a function of leverage identifies influential observations that are not modeled well by the regression.