Limitations in print resolution and visual acuity impose limits on data density and detail.
The size of features in genomic data sets span many orders of magnitude, and it is a challenge to draw elements in a figure small enough to preserve detail but large enough to be visible. In a previous column1, strategies were identified to present genomic data in context2,3. This month we look at methods to bin high-density information and provide guidelines for the minimum size of elements in a figure.
Visual acuity imposes stricter limits than output resolution. A common unit of length in print is the point (pt; 1 pt = 1/72 inch). The resolving power of the eye is about 1/4 pt at a distance of 30 cm, and many journals impose a 1/4-pt or 1/2-pt minimum line width for figures. Although it is possible to discern 1/4-pt lines that are 1/4 pt apart (Fig. 1a), such fine detail can overwhelm the eye. We suggest lines at least 1/2 pt in width that are no closer together than 3/4 pt (Fig. 1b).
A size of at least 1 pt is needed to resolve the color of small elements, and to comfortably assess differences in adjacent heights (Figs. 1 and 2). When 1/2-pt line widths are used for axes and grids, a 1-pt line thickness for data traces is suggested, and symbols in line plots should be no smaller than 3 pt (Fig. 1). In any context, data traces should use symbols no finer than 1.5 pt on a 1/2-pt line. For scatter plots of high density, when large points can occlude each other, or if outliers are shown in a distinct visual channel, data points can be as small as 1 pt.
These requirements inform the extent of binning required for dense data tracks. Figure 2 demonstrates the visibility of binned data for bins of 1/4 to 2 pt. Finding local maxima is relatively easy even with 1/4-pt bins, but judging the average, assessing variability and discerning minima are difficult with bins smaller than 1 pt. Histograms are preferred over heat maps, except where space is an issue—heat maps can be more compact and effective for sparse data (track d, Fig. 2). We suggest not binning data into more than ∼250 intervals for one-column figures (3.5 inches wide) or ∼500 intervals for two-column figures (7.2 inches). This corresponds roughly to 1 pt in print, 4 pixels on a high-resolution screen or 2 pixels on a typical LCD projector. The limit on bin size reduces detail and smoothes out variation—for example, a full-page figure of human chromosome 1 requires bins of 500 kb (∼50 times the average gene size). One can mitigate this by encoding central tendency (median, average), extrema (minimum, maximum) and spread (s.d., interquartile range) (Fig. 3), or by highlighting global extrema or outliers (track d in Fig. 2 and tracks e–g in Fig. 3).
Nielsen, C. & Wong, B. Nat. Methods 9, 423 (2012).
Anders, S. Bioinformatics 25, 1231–1235 (2009).
Nielsen, C. et al. Genome Res. 22, 2262–2269 (2012).
The author declares no competing financial interests.
About this article
Cite this article
Krzywinski, M. Binning high-resolution data. Nat Methods 13, 463 (2016). https://doi.org/10.1038/nmeth.3873