Points of View: Binning high-resolution data

Journal name:
Nature Methods
Volume:
13,
Page:
463
Year published:
DOI:
doi:10.1038/nmeth.3873
Published online

Limitations in print resolution and visual acuity impose limits on data density and detail.

At a glance

Figures

  1. Visual-acuity limits impose a minimum size on elements.
    Figure 1: Visual-acuity limits impose a minimum size on elements.

    (a) Lines thinner than 1/2 pt cannot be comfortably resolved if less than 1/2 pt apart. (b) Differences in tone, length and color are difficult to judge for elements smaller than 1 pt. (c) Data points should be at least three times the width of their line. White circles are shown with a 1/4-pt outline.

  2. Each set of tracks shows the same simulated coverage of a sixfold sequencing process differing only in how the data are binned (1, 2, 3, 4, 6 and 8 values into 900, 450, 300, 225, 150 and 112 bins, respectively).
    Figure 2: Each set of tracks shows the same simulated coverage of a sixfold sequencing process differing only in how the data are binned (1, 2, 3, 4, 6 and 8 values into 900, 450, 300, 225, 150 and 112 bins, respectively).

    Bin sizes range from 1/4 to 2 pt. The coverage average across each bin is shown as a histogram (a) and heat map sampling the nine-color gray sequential Brewer palette (b). Coverage relative to the average coverage is shown as a heat map with a red–blue diverging Brewer palette (c). Bins with values at least as extreme as the 5th, 10th, 90th or 95th percentile (P(5)–P(95), respectively) of the full data set are marked in shades of red and blue according to the key at the bottom (d).

  3. Aggregate statistics about central tendency, extrema and variation can be quantitatively encoded using multiple overlapping traces, shown here for the data in Figure 2 using 1-, 2- and 3-pt bins.
    Figure 3: Aggregate statistics about central tendency, extrema and variation can be quantitatively encoded using multiple overlapping traces, shown here for the data in Figure 2 using 1-, 2- and 3-pt bins.

    (a,b) Bin minimum (red), average (black) and maximum (blue). (c) Like a and b, except that minimum and maximum bin values extend from the average. The average is shown using a 3/8-pt line. (d) Like c, but with extrema values within the 10th–90th percentiles of global data shown in gray. (e) Bin s.d. (gray) with individual values (1-pt circles) in the bottom or top 5th percentile (black points). (f) As in b, but all individual values are shown, with those within the 10th–90th percentiles of global data in gray, and those in the top 10th percentile encoded in color like in d. (g) Individual values with z-scores encoded with a nine-color red–blue diverging Brewer palette.

References

  1. Nielsen, C. & Wong, B. Nat. Methods 9, 423 (2012).
  2. Anders, S. Bioinformatics 25, 12311235 (2009).
  3. Nielsen, C. et al. Genome Res. 22, 22622269 (2012).

Download references

Author information

Affiliations

  1. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

Competing financial interests

The author declares no competing financial interests.

Author details

Additional data