Disentangling rock record bias and common-cause from redundancy in the British fossil record

The fossil record documents the history of life, but the reliability of that record has often been questioned. Spatiotemporal variability in sedimentary rock volume, sampling and research effort especially frustrates global-scale diversity reconstructions. Various proposals have been made to rectify palaeodiversity estimates using proxy measures for the availability and sampling of the rock record, but the validity of these approaches remains controversial. Targeting the rich fossil record of Great Britain as a highly detailed regional exemplar, our statistical analysis shows that marine outcrop area contains a signal useful for predicting changes in diversity, collections and formations, whereas terrestrial outcrop area contains a signal useful for predicting formations. In contrast, collection and formation counts are information redundant with fossil richness, characterized by symmetric, bidirectional information flow. If this is true, the widespread use of collection and formation counts as sampling proxies to correct the raw palaeodiversity data may be unwarranted.


Supplementary Methods
The IT approach is described in more detail in Schreiber 5 , Verdes 6 , and Hannisdal 7,8 . Here we illustrate the IT analysis and its interpretation with a commonly used example from human physiology ( Supplementary Fig. 1). Normally, breathing causes variations in the heart rate: when we inhale, the heart rate begins to increase, and when we exhale, it decreases. In addition, the heart rate responds to the partial pressure of oxygen in the arteries. However, the data in Supplementary   Fig. 1a come from a patient suffering from sleep apnea 1 , where breathing is halted during sleep, causing blood oxygen levels to fall, which eventually alerts the brain to resume breathing. Sleep apnea may thus disturb the usual patterns of interaction and feedback among the heart rate (H), respiration (R), and blood oxygen concentration (O). Intuitively, the anatomically obstructed breathing would be considered an important driver of the system in this case, but one that also responds to the oxygen "alarm", and the relationship between R and O will be highlighted here.
All three time series are sub-sampled at randomly spaced time steps (e.g. Supplementary   Fig. 1b), in analogy to sparse geological records with only relative age control. In line with common practice, we report simple correlations between time series in the form of Spearman rank-order correlation after aggressive detrending by first differencing. The IT, on the other hand, involves binning the observed amplitudes into histograms of the distribution of possible state transitions, and does not use differencing. Certain precautions therefore have to be made when testing for significance, including data pre-processing (see below), and the use of amplitude-adjusted Fourier transform (AAFT) surrogate data (e.g. Supplementary Fig. 1c). AAFT surrogates are designed to preserve both the frequencies (i.e. autocorrelations and power spectra) and amplitudes (i.e.
correlations and/or noise) of the original data, but to break any causal coupling by randomizing the phases of the frequency components 9 Supplementary Fig. 2c, f)? Note that the IT varies as a function of the bin size used for gridding the data, but for the significance test, we integrate across bin sizes, using the area under the curve as an informal measure of the total IT 6 . In this example, we find that IT is significant in both directions, but R→O is significantly greater than O→R (denoted R>O), representing a significantly asymmetric, yet bidirectional information flow. This result agrees with the intuitive expectation that the obstructed breathing pattern drives the oxygen concentration, but that the breathing intermittently responds to low oxygen levels via feedback mechanisms. However, the results described above (Supplementary Fig. 2) represent a single, random sampling of the system, and we would like to know how robust this result is to the sampling. A  Fig. 3). Note that because the values are frequencies, the third significance test is plotted in both directions (R>O and O>R, although these are mutually exclusive in a single analysis. As sampling approaches 100 time steps, IT between R and O becomes invariably significant, in both directions ( Supplementary Fig. 3), suggesting a two-way interaction. However, significant R>O is detected with increasing frequency, indicating a dominant directionality of information flow, consistent with the interpretation given above. Spearman rank-order correlation on first differences is rarely significant (Supplementary Fig. 3 Fig. 4).
Because of the non-symmetric nature of the IT, the pairwise IT results may be required to interpret the CIT: If there is some asymmetry (not necessarily significant) favouring Y→X over X→Y, then, even if pairwise Z→X and Z→Y are equal, Y←Z | X will be greater than X←Z | Y (i.e. Y will be a stronger conditioning variable than X). We refer to this effect when describing CIT results in the main paper (see Results section).
This example is instructive for several reasons: (i) cardiorespiratory physiology is a complex system near to one's heart, with multiple interacting components, non-linearity, and feedbacks; (ii) IT has the potential to quantify relative strength and directionality of coupling without recourse to mechanistic model assumptions; (iii) standard correlations are of little use or potentially misleading in this case; (iv) IT requires only relative age control, as is often the case with stratigraphic data; (v) IT can detect non-trivial interactions with relatively sparse data.
The use of AAFT surrogates helps avoid false positive results that may stem from frequency bias in the IT 7,10 . However, when analyzing a pair of time series, the IT may still be sensitive to large differences in the degree of non-stationary in the two time series. To evaluate whether or not differences in non-stationarity could be a contributing factor to the directional asymmetry, we used the KPSS test 11 to calculate a test score (1 if the null hypothesis of stationarity is rejected, 0 if not) over all possible lags K. A bias index was defined as the sum of the absolute values of the difference between the two KPSS test score vectors, divided by K (a similar index was proposed in Hannisdal et al. 12 ). Thus, a maximum bias value of 1 means that one time series is stationary at all lags, whereas the other time series is non-stationary at all lags, representing a maximum possible bias. Conversely, a bias value of 0 means that both series are either fully stationary, or nonstationary at the exact same lags, representing a minimum possible bias.
To reduce non-stationarity, all time series are pre-processed by detrending (subtracting a best-fit polynomial or spline), and power transformation (log, or Box-Cox). The data are also normalized (mean = 0, standard deviation = 1), which enables quantitative comparison of IT in both directions, and allows the data gridding bin size to be conveniently expressed in units of standard deviation ( Supplementary Fig. 2a-c). The effectiveness of data pre-processing can be illustrated with random walks (red noise, or AR(1) processes), which are typically highly non-stationary ( Supplementary Fig. 5). Even after pre-processing, some differences remain (e.g. red stippled line in Supplementary Fig. 5b), and these are typically present also in the surrogates, but a bias value of ~0.1 or lower has not been found to induce spurious asymmetry in the IT.