Marine viruses are critical drivers of ocean biogeochemistry, and their abundances vary spatiotemporally in the global oceans, with upper estimates exceeding 108 per ml. Over many years, a consensus has emerged that virus abundances are typically tenfold higher than microbial cell abundances. However, the true explanatory power of a linear relationship and its robustness across diverse ocean environments is unclear. Here, we compile 5,671 microbial cell and virus abundance estimates from 25 distinct marine surveys and find substantial variation in the virus-to-microbial cell ratio, in which a 10:1 model has either limited or no explanatory power. Instead, virus abundances are better described as nonlinear, power-law functions of microbial cell abundances. The fitted scaling exponents are typically less than 1, implying that the virus-to-microbial cell ratio decreases with microbial cell density, rather than remaining fixed. The observed scaling also implies that viral effect sizes derived from ‘representative’ abundances require substantial refinement to be extrapolated to regional or global scales.
Viruses of microbes have been linked to central processes across the global oceans, including biogeochemical cycling1,
Virus ecology underwent a transformation in the late 1980s with the recognition that virus abundances, as estimated using culture-independent methods, were orders of magnitude higher than estimates via culture-based methods11. Soon thereafter, researchers began to report the ‘virus to bacterium ratio’ (VBR) as a statistical proxy for the strength of the relationship between viruses and their potential hosts in both freshwater and marine systems12. This ratio is more appropriately termed the ‘virus-to-microbial cell ratio’ (VMR), a convention that we use here (Supplementary Section 1).
Observations accumulated over the past 25 years have noted a wide variation in the VMR, yet there is a consensus that a suitable first approximation is that the VMR is equal to 10 (Supplementary Table 1). This ratio also reflects a consensus that typical microbial abundances are approximately 106 per ml and typical virus abundances are approximately 107 per ml13,14. Yet, the use of a fixed ratio carries with it another assumption—that of linearity—that is, if microbial abundance were to double, then viruses are expected to double as well. An alternative is that the relationship between virus and microbial abundances is better described in terms of a nonlinear relationship, for example, a power law.
In practice, efforts to predict the regional or global-scale effects of viruses on marine microbial mortality, turnover and even biogeochemical cycles depend critically on the predictability of the relative density of viruses and microbial cells. The expected community-scale contact rate, as inferred from the product of virus and microbial abundances, is a key factor for inferring virus-induced cell lysis rates at a site or sites (see, for example, ref. 15), which also depend on diversity16, latent infections17 and virus–microbe infection networks18. Here, we directly query the nature of the relationship between virus and microbial densities via a large-scale compilation and re-analysis of abundance data across marine environments.
VMR exhibits substantial variation in the global oceans
In the compiled marine survey data (Fig. 1, Table 1 and the ‘Materials and methods’), 95% of microbial abundances range from 5.0 × 103 to 4.1 × 106 per ml, and 95% of virus abundances range from roughly 3.7 × 105 to 6.4 × 107 per ml (Fig. 2a). Both microbial and virus concentrations generally decrease with depth, as reported previously (for example, see ref. 19). We separated the samples according to depth using an operational definition of the near-surface and sub-surface, corresponding to samples taken at depths ≤100 m and >100 m, respectively. The cutoff of 100 m was chosen as a typical depth scale for the euphotic zone in systems with low to moderate chlorophyll20. The precise depth varies spatiotemporally. Our intent was to distinguish zones strongly shaped by active planktonic foodweb dynamics in well-lit waters (the ‘near-surface’) from dark mesopelagic waters shaped primarily by decaying particle fluxes with greater depth (the ‘sub-surface’). The median VMR for the near-surface samples (≤100 m) is 10.5, and the median VMR for the sub-surface samples (>100 m) is 16.0. In that sense, the consensus 10:1 ratio does accurately represent the median VMR for the surface data. We also observe substantial variation in the VMR, as has been noted in earlier surveys and reviews (Supplementary Table 1). Figure 2b shows that 95% of the variation in VMR in the near-surface ocean lies between 1.4 and 160, and between 3.9 and 74 in the sub-surface ocean. For the near-surface ocean, 44% of the VMR values are between 5 and 15, 16% are less than 5 and 40% exceed 15. This wide distribution, both for the near-surface and the sub-surface, demonstrates the potential limitations in using the 10:1 VMR, or any fixed ratio, as the basis for a predictive model of virus abundance derived from estimates of microbial abundance.
Virus abundance does not vary linearly with microbial abundance
Figure 3 shows two alternative predictive models of the relationship between logarithmically scaled virus and microbial abundances for water column samples. The models correspond to a fixed-ratio model and a power-law model. To clarify the interpretation of fitting in log–log space, consider a fixed-ratio model with a 12:1 ratio between virus and microbial abundance, V = 12 × B. Then, in log–log space the relationship is which we interpret as a line with a y intercept of log1012 = 1.08 and a slope (change in log10V for a one-unit change in log10B) of 1. By the same logic, any fixed-ratio model will result in a line with slope 1 in the log–log plot, and the y intercept will vary logarithmically with VMR. The alternative predictive model is that of a power law: V = cBα1. In log–log space, the relationship is The slope α1 of a fitted line on log-transformed data denotes the power-law exponent that best describes the relationship between the variables. The intercept α0 of a fitted line on log-transformed data denotes the logarithmically transformed pre-factor.
The 10:1 line has residual squared errors of −16% and −25% in the surface and deep samples, respectively (Supplementary Table 3). In both cases, this result means that a 10:1 line explains less of the variation in virus abundance compared with a model in which virus abundance is predicted by its mean value across the data. To evaluate the generality of this result, we considered an ensemble of fixed-ratio models, each with a different VMR. In the near-surface samples, we find that all fixed-ratio models explain less of the variation (that is, have negative values of R2) than a ‘model’ in which virus abundance is predicted to be the global mean in the data set (Supplementary Fig. 1). This reflects the failure of constant-ratio (that is, linear) models to capture the cluster of high VMRs at low microbial density apparent in the density contours of Fig. 2a and the shoulder of elevated high VMR frequency in Fig. 2b. The largest contributor to this cluster of points is the ARCTICSBI study (Table 1). In the sub-surface samples, fixed-ratio models in which the VMR varies between 12 and 22 do have positive explanatory power, but all perform worse than the power-law model (Supplementary Fig. 1). In contrast, the best-fitting power-law model explains 15 and 64% of the variation in the data for near- and sub-surface samples, respectively (Supplementary Table 3). The best-fit power-law scaling exponents are 0.42 (with 95% confidence intervals (CIs) of (0.39,0.46)) for near-surface samples and 0.53 (with 95% CIs of (0.52,0.55)) for sub-surface samples.
The difference between a linear and a power-law model can be understood, in part, by comparing predictions of viral abundances as a function of variation in microbial abundances. For example, doubling the microbial abundance along either regression line is not expected to lead to a doubling in virus abundance, but rather a 20.42 = 1.3- and 20.53 = 1.4-fold increase, respectively. The difference between models becomes more apparent with scale, for example, 10- and 100-fold increases in near-surface microbial abundances are predicted to be associated with 100.53 = 3.4- and 1000.53 = 11-fold increases in viral abundances, respectively, given a power-law model. The power-law model is an improvement over the fixed-ratio model in both the near- and sub-surface, even when accounting for the increase in parameters (Supplementary Table 3). In the near-surface, refitting the surface data without outliers improves the explanatory power to approximately R2 = 0.3 from an R2 = 0.65 for the sub-surface (Methods and Supplementary Fig. 2). The power-law exponents in the near- and sub-surface are qualitatively robust to variation in the choice of depth threshold, for example, as explored over the range between 50 and 150 m (Supplementary Fig. 3). In summary, the predictive value of a power-law model is much stronger in the sub-surface than in the near-surface, where confidence in the interpretation of power-law exponents is limited.
Study-to-study measurement variation is unlikely to explain the intrinsic variability of virus abundances in the surface ocean
Next, we explored the possibility that the variation in methodologies affected the baseline offset of virus abundance measurements and thereby decreased the explanatory power of predicting virus abundances based on microbial abundances. That is, if V* is the true and unknown abundance of viruses, then it is possible that two studies would estimate and , where |ε1| and |ε2| denote the relative magnitude of study-specific shifts. We constrain the relative variation in measurement, such that the measurement uncertainty is 50% or less (see ‘Materials and methods’). The constrained regression model improves the explanatory power of the model (Supplementary Table 3), but, in doing so, the model forces 18 of the 25 studies to the maximum level of measurement variation permitted (Supplementary Fig. 4). We do not expect differences in measurement protocols to explain the nearly two orders of magnitude variation in estimating virus abundance, given the same true virus abundance in a sample. Note that when sub-surface samples were analysed through the constrained power-law model, there was only a marginal increase of 2% in R2 and, moreover, 9 of the 12 studies were fit given the maximum level of measurement variation permitted (Supplementary Fig. 4). The constrained intercept model results suggest that the observed variation in virus abundance in the surface oceans is not well explained strictly by the variation in measurement protocol between studies.
VMR decreases with increasing microbial abundance, a hallmark of power-law relationships
We next evaluate an ensemble of power-law models, Vi = ciNαi, where index i denotes the use of distinct intercepts and power-law exponents for each survey. The interpretation of this model is that the nonlinear nature of the virus-to-microbial relationship may differ in distinct oceanic realms or due to underlying differences in sites or systems, rather than due to measurement differences. Figure 4 presents the results of fitting using the study-specific power-law model in the surface ocean samples. Study-specific power-law fits are significant in 18 of 25 cases in the surface ocean. The median power-law exponent for studies in the surface ocean is 0.50. Furthermore, of those significant power-law fits, the 95% distribution of the power-law exponent excludes a slope of one and is entirely less than one in 11 of 18 cases (Fig. 5). This model, in which the power-law exponent varies with study, is a significant improvement in terms of R2 (Supplementary Table 3). For sub-surface samples, study-specific power-law fits are significant in 10 of 12 cases in the sub-surface (Supplementary Fig. 5). The median power-law exponent for studies in the sub-surface is 0.67. Of those significant power-law fits, the central 95% distribution of the power-law exponent is less than one in 6 of 10 cases (Supplementary Fig. 6). A power-law exponent of less than one means that the virus abundance increases less than proportionately given increases in microbial abundance. This study-specific analysis extends the findings that nonlinear, rather than linear, models are more suited to describing the relationship between virus and microbial abundances. We find that the dominant trend in both near-surface and sub-surface samples is that the VMR decreases as microbial abundance increases. The increased explanatory power by study is stronger for near-surface than for sub-surface samples. This increase in R2 comes with a caveat: study-specific models do not enable a priori predictions of virus abundance given a new environment or sample site, without further effort to disentangle the biotic and abiotic factors underlying the different scaling relationships.
Viruses are increasingly considered in efforts to describe the factors controlling marine microbial mortality, productivity and biogeochemical cycles3,4,33,
It could also be the case that variation in endogenous factors determines total abundances. Endogenous factors can include the life history traits of viruses and microbes that determine which hosts are infected by which viruses18, as well as the quantitative rates of growth, defence and infection. For example, relative strain abundances are predicted to depend on niche differences according to the ‘kill-the-winner’ theory, which presupposes tradeoffs between growth and defence1,39. Similarly, the recent hypothesis of a complementary ‘king-of-the-mountain’ mechanism suggests that relative abundance relationships may depend on life history trait differences, even when tradeoffs are not strict40. In both examples, total abundances may nonetheless depend on other factors, including the strength of grazing.
The analysis of abundance relationships also requires a consideration of variation in time. As is well known, virus–microbe interactions can lead to intrinsic oscillatory dynamics. Indeed, previous observations of a declining relationship between VMR and microbial abundance have been attributed to changing ratios across phytoplankton bloom events, including possible virus-induced termination of blooms13. Similar arguments have been proposed in the analysis of tidal sediments41. Alternatively, observations of declining VMR with microbial density have been attributed to a variation in underlying diversity42. Another factor potentially complicating abundance predictions is that episodic events, including the induction of lysogenic populations, influence total microbial and viral counts. Varying degrees of lysogenic and co-infection relationships have been measured in marine virus–host systems14,17,43, the consequences of which may differ from those given interactions with lytic viruses, as is commonly the focus of model- and empirical-based studies. Whatever the mechanism(s), it is striking that virus abundances in some surveys can be strongly predicted via alternative power-law functions of microbial abundances. Mechanistic models are needed to further elucidate these emergent macroecological patterns and relationships, akin to recent efforts to explain emergent power laws between terrestrial predators and prey44.
The present analysis first separated the abundance data according to depth and then according to survey as a means to identify different relationships between virus and microbial abundances in the global oceans. The predictive value of total microbial abundance is strong when considering sub-surface samples. In contrast, microbial abundance is not a strong predictor of virus abundance in near-surface samples, when using linear or nonlinear models. The predictive power of nonlinear models improved substantially in the near-surface when evaluating each marine survey separately. The minimal predictive value of microbial cell abundances for inferring viral abundances in the near-surface when aggregating across all surveys is problematic given that virus–microbe interactions have significant roles in driving microbial mortality and ecosystem functioning3,5,33. Indeed the aggregation of abundance measurements in terms of total microbial abundances may represent part of the problem.
At a given site and time of sampling, each microbial cell in the community is potentially targeted by a subset of the total viral pool. In moving forward, understanding the variation in virus abundance and its relationship to microbial abundance requires a critical examination of correlations at functionally relevant temporal and spatial scales, that is, at the scale of interacting pairs of viruses and microbes. These scales will help inform comparisons of virus–microbe contact rates with viral-induced lysis rates, thereby linking abundance and process measurements. We encourage the research community to prioritize examination of these scales of interaction as part of efforts to understand the mechanisms underlying nonlinear virus–microbe abundance relationships in the global oceans.
Marine virus abundance data were aggregated from 25 studies (Table 1). A total of 5,671 data points were aggregated. The data collection dates ranged from 1996 to 2012. Data were primarily collected from coastal waters in the northern hemisphere, predominately during the summer months, with the notable exceptions of long-term coastal monthly monitoring sites, that is, the studies USC MO, BATS and MOVE.
Analyses of the data were performed using R version 3.1.1. Scripts and original data are provided at https://github.com/WeitzGroup/Virus_Microbe_Abundance.
A power-law regression model used the log10 of the predictor variable, microbial abundance per ml N and the log10 of the outcome variable, virus abundance per ml V. The power-law regression was calculated using the equation log10V = α0 + α1log10N. The α0 and α1 parameters were fit via ordinary least squares (OLS) regression to minimize the sum of square error.
Constrained variable-intercept model
The constrained model is a ‘mixed-effects’ regression model using the same predictor and outcome variables, log10 of microbial abundance per ml and the log10 virus abundance per ml, respectively. This model includes study-specific intercepts, which were constrained such that the values for any of the intercepts were restricted to one standard error above or below the intercept value taken from the power-law model. The standard error value for this model came from the power-law model. The equation for this model is , where is the study-specific intercept and α1 is the slope common to all studies, N is the predictor variable and V is the outcome variable.
Variable slope and variable intercept model
A power-law model where the exponent and intercept varied with each study was evaluated using the same predictor variable, log10 microbial abundance per ml, and the same outcome variable, log10 virus abundance per ml. In this model, there was a study-specific α0 and α1 and an OLS regression calculated using the equation .
Bootstrapping model CIs
Bootstrap analyses of the power-law model and mixed-effects models were conducted to derive 95% CIs surrounding the parameters estimated by the models. For all models the original data set was sampled with replacement, by study, to arrive at a bootstrap sample data set; this process was repeated 10,000 times. Distributions for all parameters were generated and the 2.5, 50 and 97.5% points were identified from among the 10,000 parameter estimates.
Outliers in the data were identified by calculating the top and bottom 2% of estimated VMRs amongst the entire 5,671 samples. The outliers corresponded to ratios below 1.81 and above 128. Those samples with VMRs that fell outside these bounds were considered outliers. There were 218 outlier samples taken at depths of ≤100 m and 10 outlier samples taken at depths of >100 m.
Depth cutoff robustness
The cutoff point for which data were partitioned into either the near-surface or the sub-surface was varied from 50 m to 150 m in 1 m increments. For each step, a power-law model was evaluated for both the near-surface and the sub-surface.
This work was supported by National Science Foundation (NSF) grants OCE-1233760 (to J.S.W.) and OCE-1061352 (to A.B. and S.W.W.), a Career Award at the Scientific Interface from the Burroughs Wellcome Fund (to J.S.W.) and a Simons Foundation SCOPE grant (to J.S.W.). This work was conducted as part of the Ocean Viral Dynamics Working Group at the National Institute for Mathematical and Biological Synthesis, sponsored by the National Science Foundation through NSF Award DBI-1300426, with additional support from The University of Tennessee, Knoxville.