The link between star formation and gas in nearby galaxies

Observations of the interstellar medium are key to deciphering the physical processes regulating star formation in galaxies. However, observational uncertainties and detection limits can bias the interpretation unless carefully modeled. Here I re-analyze star formation rates and gas masses of a representative sample of nearby galaxies with the help of multi-dimensional Bayesian modeling. Typical star forming galaxies are found to lie in a ‘star forming plane’ largely independent of their stellar mass. Their star formation activity is tightly correlated with the molecular and total gas content, while variations of the molecular-gas-to-star conversion efficiency are shown to be significantly smaller than previously reported. These data-driven findings suggest that physical processes that modify the overall galactic gas content, such as gas accretion and outflows, regulate the star formation activity in typical nearby galaxies, while a change in efficiency triggered by, e.g., galaxy mergers or gas instabilities, may boost the activity of starbursts. How galaxies form their stars has been extensively studied but the role of the galactic gas content and the efficiency of its conversion into stars remain to be fully understood. Here the author presents a data-driven statistical analysis that reduces potential biases related to non-detections to quantify the link between star formation, molecular, and neutral gas in nearby galaxies.

U nderstanding how galaxies form their stars remains one of the major goals of galaxy theory 1 . Empirical relations that link star formation to galaxy properties have provided many clues to this cosmic puzzle. The discovery of a relatively tight relation between star formation rate (SFR) and stellar mass of galaxies 2,3 showed that star formation proceeds in a similar fashion in most star-forming galaxies but with a highly redshift dependent normalization. While the physical origin of this starforming sequence (SFS) is not yet fully understood, it is likely linked to the accretion of gas onto galaxies and the growth of their parent dark matter halos [4][5][6] .
A more direct way of studying galactic star formation is by analyzing the interstellar medium (ISM) of galaxies [7][8][9][10] . Observationally, the surface density of star formation is well correlated with the surface density of molecular gas 11 . The physical interpretation of this empirical correlation is that both star formation and molecular hydrogen formation require low gas temperatures and high densities and thus occur in co-spatial locations of the ISM 12 .
Constraining gas masses of galaxies is observationally challenging and subject to various biases and selection effects. Fortunately, recent observations of carbon-monoxide (CO) and 21 cm line emission make it now possible to study the molecular and neutral gas content of representative samples of nearby galaxies 13,14 thus enabling a more comprehensive analysis of galactic star formation, the ISM composition, and the link to gas accretion.
A major conclusion reached by these studies was that star formation in galaxies does not simply scale with the mass of the molecular reservoir as suggested by previous analyses of the molecular Kennicutt-Schmidt relation but that the efficiency of converting molecular gas into stars varies with the offset from the SFS 10,[15][16][17] . However, selection effects pose a main challenge for this interpretation given that a large number of galaxies in these samples have line emission below the detection limit. Bayesian modeling offers a way to mitigate biases arising from such detection limits and other observational limitations [18][19][20] .
The present study employs a Bayesian approach to model the multi-dimensional distribution of SFRs, molecular gas, and neutral gas masses in a representative sample of nearby galaxies 13,14 while accounting for detection limits and observational uncertainties. The efficiency of star formation in typical star-forming galaxies is found to be largely constant both along and across the SFS. In contrast, the star formation activity of starbursts may be boosted by a high efficiency. Overall, the SFRs and total gas masses of galaxies are shown to be strongly correlated, suggesting that galactic star formation is regulated by physical processes involving gas accretion and galactic outflows. Valuable information about the gas accretion histories of galaxies may thus be gleaned from accurately constraining the slopes of the SFS and the corresponding neutral and molecular gas sequences (NGS, MGS).

Results
The star formation, neutral gas, and molecular gas sequences. Two different samples are used in the present analysis. First, a 'representative sample' of 1012 galaxies with stellar masses 9 ≤ lgM star ≤ 11 selected from the extended GALEX Arecibo SDSS Survey 14 (xGASS). Second, an extension of the representative sample ('extended sample') that includes 54 additional galaxies with molecular gas measurements from the CO Legacy Database for GASS 13 (xCOLD GASS) that are not in xGASS. Importantly, all galaxies within a given stellar mass range are included in the analysis, i.e., there is no ad hoc selection of galaxies according to their star formation activity.
The joint distribution of SFRs, neutral gas, and molecular gas masses at fixed M star is modeled as a non-Gaussian multivariate distribution with parameters that vary with M star (see the "Methods" section). This multi-dimensional distribution consists of a continuous component and a zero-component. The latter corresponds to galaxies with vanishing SFRs and gas masses while the former includes all other galaxies. The one-dimensional (marginal) distributions of SFRs and gas masses of the continuous component are modeled as a mixture of two gamma distributions. The first gamma distribution corresponds to SFRs or gas masses of ordinary star-forming galaxies. A gamma distribution is adopted as it provides a better approximation to the distribution of SFRs at fixed stellar mass than a log-normal distribution 21,22 . The second, sub-dominant gamma distribution accounts for outliers with high SFRs (i.e., starbursts) or gas masses 23 .
The present study employs the Likelihood Estimation for Observational data with Python (LEO-Py) method 20 to compute the likelihood of the various distribution parameters taking into account the detection limits, missing entries, outliers, and correlations of the observational data (for either the representative or the extended sample). Starting from a weakly informative prior, the probability distribution of the distribution parameters is explored via a Markov Chain Monte Carlo (MCMC) method with the help of an affine-invariant ensemble sampler 24 . The mean parameter values obtained from the MCMC chain based on the representative (extended) sample define the fiducial (extended) model.
SFRs and gas masses of galaxies in the representative sample are shown in Fig. 1. Also shown is the peak position of the SFS (Fig. 1a), the NGS (Fig. 1b), and the MGS (Fig. 1c) as well as their scatter according to the fiducial model. These sequences refer to intrinsic galaxy properties because observational artifacts such as detection thresholds, missing values, and observational errors are accounted for in the multi-dimensional Bayesian modeling. A number of physical processes such as environmental effects 25,26 , fluctuations in the SFRs, or varying gas accretion rates 27-30 may be responsible for setting the normalization, slope, and scatter of these sequences. The parameters of the fiducial and extended models as well as the slopes and scatters of the SFS, NGS, and MGS are listed in Supplementary Tables 1-4 (see Supplementary  Note 1).
The peak position of the SFS for a given stellar mass is defined as the mode of the lgSFR distribution of typical galaxies (i.e., those belonging to the main gamma component of the model) 20,31 . For gamma-distributed SFRs, the peak position also corresponds to the average SFR. Analogous definitions are adopted for the NGS and MGS.
The SFS scales sub-linearly with a slope of 0.54 in qualitative agreement with previous results obtained with different approaches 14,32 . The upward (downward) scatter for M star~1 0 10 M ⊙ galaxies is 0.38 dex (0.53 dex). The NGS has a much shallower slope (0.33) but a similar upward and downward scatter compared with the SFS. Among the three sequences, the MGS shows the steepest slope (0.69) and the lowest scatter (0.31 dex).
The lower scatter and steeper slope of the MGS compared with the SFS may suggest that the latter may be a consequence of the former. In this scenario, the SFS is a secondary relation created by the relatively tight correlation between M H 2 and M star on one hand, and between SFR and M H 2 (the galaxy-integrated form of the molecular Kennicutt-Schmidt relation) on the other. correlated. To study the latter, Fig. 2  The surfaces shown in Fig. 2 are isosurfaces of probability density. They are calculated from a random sampling of the probability distribution of the fiducial model (with stellar masses drawn randomly from the representative xGASS/xCOLD GASS sample) via the marching cubes algorithm 33 . The volumes enclosed by the isosurfaces contain 10, 50, and 90% (from the innermost to the outermost isosurface) of the probability of the continuous component of the fiducial model. The isosurfaces are highly flattened in one direction. Figure 2a, b shows this 'starforming plane' (SFP) in a face-on and edge-on view. The orientation of the SFP is calculated via a principal component analysis of all sample points within the 50% isosurface.
The orientation of the SFP could in principle depend on stellar mass. However, the present analysis suggests that such a dependence cannot be very strong. Figure 2 shows that the observed galaxies fall onto the SFP for all considered stellar masses. Furthermore, the orientation of the SFP as predicted by the fiducial model is also almost independent of stellar mass (see Supplementary Note 2). Hence, SFRs, M H I , and M H 2 , when measured relative to the peak position of their respective sequences, form an approximately two-dimensional surface (the SFP) that is largely independent of stellar mass suggesting it is an approximately universal characteristic of (at least) nearby galaxies. Figure 3 shows a projection of the three-dimensional SFR, M H I , and M H 2 space along the neutral gas direction for galaxies with M star~1 0 10 M ⊙ . Given the narrow range of stellar masses, absolute SFRs and gas masses can be easily converted into quantities relative to their respective sequences and, hence, Figure 3 is a projection of the SFP onto the SFR -M H 2 pair of axes. The SFR -M H 2 diagram is close to an edge-on projection of the SFP, given its orientation shown in Fig. 2. Hence, this The star-forming plane refers to the largely two-dimensional distribution of star formation rates (SFRs), neutral and molecular gas masses relative to the peak position of the star forming, neutral gas, and molecular gas sequence for a given stellar mass. Markers indicate the measured SFRs and gas masses of xGASS/xCOLD GASS observations with marker shapes and colors corresponding to different stellar masses (see legend). Regions bounded by the green, blue, and yellow isosurfaces include 10, 50, and 90% of galaxies (without the zero component) according to the fiducial model. Solid lines mark the intersections of the star-forming plane with the coordinate axes. The orientation of the star-forming plane is calculated via a principal component analysis based on the probability density within the 50% isosurface. The orientation of the star-forming plane is only weakly dependent on stellar mass. a b c Fig. 1 Scaling relations of nearby galaxies. Slope, normalization, and scatter of the star-forming sequence (a), neutral gas sequence (b), and molecular gas sequence (c). Points show the representative sample based on the xGASS / xCOLD GASS data sets 13,14 . Specifically, detected SFRs and gas masses are shown as blue circles with error bars indicating measurement uncertainties (one standard deviation). A large fraction of the observational data are either undetected/censored (cyan arrows) or missing (purple dots) necessitating careful modeling to avoid systematic biases. Peak position and scatter of each sequence, as determined by this study, are shown by solid and dashed lines. The peak position is defined as the mode of the conditional probability density of lg SFR, lg M H I , and lg M H 2 given M star . The predicted scaling of the peak position with stellar mass as well as the upward (Δ + ) and downward (Δ − ) scatter of each sequence for M star = 10 10 M ⊙ galaxies are listed in the legend of each panel.
projection of the SFP corresponds to a tight relation between molecular gas mass and SFR, i.e., it is a galaxy-integrated version of the molecular Kennicutt-Schmidt relation 11 . Figure 3 highlights two important points. First, the probability distributions of the observational data are well reproduced by the fiducial model after selection effects and observational uncertainties are taken into account. This shows that the underlying model provides a good description of the observational data. Secondly, there is a clear difference between the apparent ("mock") and the actual ("true") distribution of the model data thus highlighting the importance of properly modeling measurement uncertainties and data censoring in observational data. Here, data censoring refers to measurements that have been carried out but return values below a detection limit. The apparent relation between SFR and M H 2 is steeper than the actual relation as galaxies with low molecular masses are more likely to be censored than those with low SFRs.
Gas depletion times. The slope of the SFR -M H 2 relation is directly linked to the (molecular, neutral, total) gas depletion time, t dep , which is defined as the ratio between (molecular, neutral, total) gas mass and SFR. The total gas mass refers to the sum of molecular and neutral gas masses, and t dep corresponds to the time it would take to convert the present gas reservoir into stars at the current SFR. The gas depletion time is a major parameter in galaxy models and its dependence on galaxy properties is an active area of observational and theoretical research 8,10,34,35 . Previous observational analyses 7,10,15,17 and numerical simulations 27 have suggested that the molecular depletion time increases with a decreasing offset from the SFS, lgt dep~− 0.5 × ΔlgSFR, for a broad range of offsets, stellar masses, and redshifts. If true, this result would suggest that star formation is not only regulated by the amount of molecular gas present but also by the molecular-gas-to-star conversion efficiency. The latter could arise from a variety of physical processes operating in the ISM such as supersonic turbulence 1 . However, as pointed out above, the actual SFR -M H 2 relation may differ from the apparent relation due to observational uncertainties and detection limits.
Therefore, Fig. 4 analyzes the molecular and total gas depletion times and their scaling with the offset from the SFS. Specifically, Fig. 4a shows the depletion times derived directly from the observational data as well as the depletion times in a mock sample based on the extended model (see "Introduction") after adding observational uncertainties and detection limits. The excellent agreement between observational and mock results suggests that the extended model well describes the observational data. The The scaling of the actual depletion times, i.e., if measured without observational errors and detection limits, for the galaxies in the mock sample. Galaxies with zero SFRs are excluded from the analysis. For typical offsets from the star-forming sequence, the molecular gas depletion time shows only a mild dependence (∝SFR −0.24 ) on SFR. c, d Same as a, b but showing the average change in gas masses relative to the peak position of the corresponding gas sequence with offset from the star-forming sequence. The peak position of the total gas sequence is given by the sum of the peak positions of the molecular and neutral gas sequences including Helium. Changes in star formation activity of typical starforming galaxies are tightly linked to changes in their molecular and total gas masses.
previously reported slope of~−0.5 based on galaxies with detected molecular gas masses is also recovered. However, the estimates of the depletion times and the calculated scalings are potentially biased as they do not properly account for missing and censored data. Instead, Fig. 4b reports the actual scaling of the depletion times as predicted by the model. The scaling is significantly shallower (−0.24 for the molecular gas depletion time, −0.32 for the total gas depletion time) for typical offsets (−0.5 to 0.5) from the SFS. Hence, the gas depletion times are almost constant in normal star-forming galaxies both along the SFS (given that the SFS and MGS have similar slopes 9 , see Fig. 1) as well as across it (see also ref. 36 ). The scaling becomes steeper in galaxies (starbursts) that lie a factor of ≳3-5 above the SFS indicating that the gas-to-star conversion efficiency is elevated in such systems as expected from studies of local ultra-luminous infrared galaxies 37 .
Molecular and total gas masses vary with the offset from the SFS in a manner consistent with the results above, see Fig. 4c, d. In particular, the change of gas mass with the offset from the SFS becomes closer to linear once non-detected galaxies are included in the analysis. Again, this finding is consistent with a picture in which variations in the gas content, and not in the molecular-gasto-star conversion efficiency, drive the star formation activity of typical (non-starbursting) nearby galaxies. Figure 5 quantifies the modeling uncertainty of the actual molecular gas depletion time, t dep;H 2 , in galaxies with M star = 10 10 M ⊙ . The molecular gas depletion time is fit with a broken linear function between ΔlgSFR = −0.5 and 1 for a large number of random draws of the model parameters from the MCMC chain. Specifically, hlg t dep; À Á Δ is used as the fitting function, where x = ΔlgSFR, α 1 and α 2 are the slopes for low and high values of x, x b , and Δ are the break point and the smoothness of the transition from one slope to another, and A is the value of hlg t dep;H 2 =yri at x = x b . The median value of the slope of the molecular gas depletion time for non-starbursting galaxies (α 1 ) with M star~1 0 10 M ⊙ is −0.25 and the 2.5th, 16th, 84th, and 97.5th percentiles are −0.34, −0.30, −0.21, and −0.13. Hence, the slope of the molecular gas depletion time (for non-starbursting galaxies) differs from zero (at the 2σ level) but it is also significantly shallower than a −0.5 slope. Finally, the slope (α 2 ) in the starbursting regime (ΔlgSFR ≳ 0.6-0.7) is steeper than α 1 with α 2 ¼ À0:68 þ0:11 À0:10 . The analysis above implies that molecular gas depletion times of typical star-forming galaxies depend only weakly on stellar mass and SFR. Specifically, combining the molecular depletion time scaling of galaxies that lie on the SFS and MGS (see Fig. 1a It is instructive to compare Eq. (1) with the result of a combined analysis of data sets spanning z = 0-4 10 . This latter study finds t dep;H 2 / M 0:09 star ðSFR=SFR z SFS Þ À0:44 ð1 þ zÞ À0:62 , i.e., a steeper scaling with SFR and a dependence on redshift.
Interestingly, the scaling t dep;H 2 / ð1 þ zÞ À0:62 may be consistent with a molecular gas depletion time that has no explicit redshift dependence. This perhaps surprising result may be understood as follows. The normalization of the SFS of galaxies increases quickly with redshift, SFR z SFS ¼ ð1 þ zÞ 2À3 SFR z¼0 The suggestion above is reminiscent of the fundamental metallicity relation 40,41 which similarly explains the redshift evolution of the mass-metallicity relation 42,43 with an underlying redshift-invariant dependence of the metallicity on both SFRs and M star . It is also similar to a proposed relation linking total gas mass fraction, stellar mass, and SFRs in a redshift independent manner 44 . Finally, given the (1 + z) 2−3 scaling of the SFS, the redshift independence of Eq. (2) is only in agreement with the scaling t dep;H 2 / ð1 þ zÞ À0:62 if α 1 is between −0.31 and −0.21.

Discussion
The near constancy of t dep;H 2 in typical star-forming galaxies suggests that their SFRs are largely driven by their molecular gas masses. The regulatory influence of physical processes that determine how efficiently molecular gas is converted into stars is thus limited, at least on global, galaxy-integrated scales in such galaxies. In contrast, a higher conversion efficiency appears to be the main driver of the excessively high star formation activity in starbursts.
However, while galaxies near the SFS have on average similar molecular gas depletion times, the ratio between M H 2 and SFR in any given galaxy can differ significantly from this average value as the model predicts a probability distribution, not a deterministic mapping, between gas mass and SFR. In particular, the scatter of SFRs at fixed molecular gas mass (and vice versa), see Fig. 3, may explain observations of galaxies with low SFR and, yet, significant amounts of molecular gas 45 . Figure 4 demonstrates that the molecular depletion time and the total gas depletion time have a similar scaling behavior with ΔlgSFR. This implies that the molecular-to-neutral gas ratio, and thus the molecular fraction f H 2 ¼ M H 2 =M gas , is approximately constant across the SFS, i.e., for galaxies of a given M star , even including starbursts (see also Supplementary Note 3). The molecular-to-neutral ratio increases with increasing M star , however, as evidenced by the steeper slope of the MGS compared with the NGS. The molecular gas mass in nearby galaxies is thus primarily a function of M star (via its effect on f H2 ) and the total gas mass.
These considerations suggest an evolutionary model (see "Methods" for more details) in which the average star formation activity and stellar mass growth of star-forming galaxies is determined by the time evolution of the total gas mass. In the following discussion, f H 2 and t dep;H 2 , and thus t dep ¼ t dep;H 2 =f H 2 , are calculated from the empirically derived SFS, NGS, and MGS (see Fig. 1), with the additional scaling t dep;H 2 $ SFR À0:24 introduced in the previous section. An alternative version of ansatz (4) based on the reciprocal molecular gas depletion time is discussed in "Methods". Two specific choices for M gas (t) are analyzed in more detail below. The case of an approximately constant gas mass, as predicted by a class of equilibrium galaxy formation models 46,47 , provides a first example. In this case, the SFS is linear at all redshifts, while galaxies evolve along much more gradual trajectories (SFRc onstant) in M star − SFR space, see Fig. 6a. The predicted slopes of the MGS and NGS are slightly steeper (less steep) than linear, see Fig. 6c, e. In either case, the predictions of this analytic model are in disagreement with the strongly sub-linear slopes of the SFS, NGS, and MGS shown in Fig. 1.
Perhaps surprisingly, the slope of the SFS will still be linear, even if the gas masses evolve with time, as long as the ratio of gas masses between galaxies is time-independent and t dep is a powerlaw function of M star (see Supplementary Discussion). The empirical finding of a strongly sub-linear slope of the SFS thus suggests that gas mass histories of different galaxies are not scaled versions of each other. A second analytic model illustrates this result, see Fig. 6b, d, f. In this model, the gas mass follows the typical growth histories of dark matter halos but is multiplied by additional factors that result in a downsizing effect of the gas mass, i.e., the gas mass reaches its maximum value at higher redshifts in more massive galaxies and then declines faster 44 . Not only does this second model reproduce the sub-linear slopes of the SFS, NGS, and MGS, it also results in a mass-dependent suppression of star formation at late times (quenching) and, furthermore, it predicts a steepening in the slopes of the scaling relations at higher redshift in qualitative agreement with observations 38,48,49 . More generally, the predicted slopes of the SFS, NGS, and MGS approach the corresponding predictions of the first ('equilibrium') model as the redshift increases.
The model described by Eq. (4) links the gas mass of galaxies to their SFR and stellar masses. The discussion above thus points to a picture in which physical processes affecting M gas via gas inflows and outflows, such as cosmological gas accretion, hot gas cooling, a galactic fountain, and feedback from stars and black holes regulate the star formation activity and mass growth of typical, nearby galaxies [50][51][52][53][54][55] . In contrast, the higher SFRs of today's starbursts appear to result from a higher efficiency of converting molecular gas into stars 56 and are thus likely related to changes in the physical state of the ISM on molecular clouds scales triggered by, e.g., galaxy mergers 57 or gas instabilities 58 .
The quantitative results of this study are potentially subject to modeling choices and systematics inherent in the observational data sets. For instance, adopting lognormal instead of gamma distributions when modeling the SFRs and gas masses of galaxies increases the scaling coefficient α 1 of the molecular gas depletion time from −0.25 to −0.22. In addition, the predicted slopes of the SFS, NGS, and MGS change by up to~0.1. However, the results of this paper are not qualitatively affected by these changes. For example, in either case, the MGS (NGS) is predicted to be the sequence with the steepest (shallowest) slope and the smallest (largest) scatter. Secondly, to enable a fair comparison with the literature, the present analysis uses the xGASS and xCOLD GASS data as is. Hence, the accuracy of the model predictions may suffer from limitations related to observational systematics, such as those arising from the adopted conversion factors, flux aperture corrections, beam-size matching, and SFR calibrations.
Finally, the results presented here are based on measurements of nearby galaxies. Observations with the Atacama Large Millimeter/submillimeter Array, and other observatories, have begun to probe the ISM of high redshift galaxies in CO, CII, and continuum dust emission 10,36,59,60 . Furthermore, observational challenges, such as the uncertain mapping of observables to physical properties 61,62 and the large selection bias of most high-z samples, can often be mitigated, e.g., by studying galaxy properties via multiple techniques and by surveying representative samples of high redshift galaxies 59 . In addition, complementary observations at radio wavelengths will soon constrain both obscured and unobscured SFRs down to a few M ⊙ y −1 up to z = 2 63 and probe the H I content of galaxies out to similar redshifts 64 .
Given the prospect of large representative samples of high redshift galaxies in the near future, it will be especially important to continue the development of methods to combine observations from multiple redshifts, observatories, and physical sources in a robust and reliable manner while accounting for detection limits, observational uncertainties, missing data, and data correlations. Indeed, these techniques will likely be critical to accurately quantify the link between gas properties, SFR, and stellar mass of galaxies across cosmic history, thus highlighting the increasing importance of statistics and data science in the study of galaxies.

Methods
Observational data set. The observational data are drawn from two related galaxy catalogs. The first is the 'representative sample' of the xGASS survey 14 (see http:// xgass.icrar.org) which lists stellar masses, SFRs, and H I masses (among other properties) of 1179 nearby galaxies (0.01 < z < 0.05) with a wide range of stellar masses (M star = 10 9 − 10 11.5 M ⊙ ). The second catalog is the xCOLD GASS survey 13 (see http://www.star.ucl.ac.uk/xCOLDGASS) which includes stellar masses, SFRs, and H 2 masses of 532 galaxies with the same redshift and stellar mass distribution. The CO luminosity to H 2 mass conversion factor adopted by xCOLD GASS is derived from a radiative transfer analysis of multiphase ISM simulations coupled with empirical relations between CII and CO line emission, gas-phase metallicity, and offset from the SFS 65 . The two catalogs were merged with an outer join based on the provided GASS catalog identifiers resulting in a combined data set of 1234 nearby galaxies. The overlap between the two catalogs is very high (only 55 of the galaxies in the xCOLD GASS sample are not part of xGASS) which makes the combined xGASS/xCOLD GASS catalog an excellent data set to study the correlations between SFRs, H I , and H 2 masses of nearby galaxies. The stellar masses in the joint catalog were replaced with updated SDSS Data Release 7 (DR7) median mass estimates 66 available at https://home.strw.leidenuniv.nl/jarle/SDSS. The original and the updated stellar masses agree to better than 1% for all but a dozen of galaxies. The updated SDSS DR7 data also provide stellar mass measurement uncertainties (which are~0.08-0.1 dex for over 80% of galaxies). The joint catalog is available as Supplementary Data, see Supplementary Note 4.
The analysis in this paper makes use of two subsamples generated from the joint catalog. First, all 1012 galaxies with M star = 10 9 − 10 11 M ⊙ from the representative sample are selected from the joint catalog to form the 'representative xGASS/ xCOLD GASS sample'. Secondly, all 1066 galaxies with M star = 10 9 − 10 11 M ⊙ from the joint catalog form the 'extended xGASS/xCOLD GASS sample'. Both samples are similar, but the latter includes 43 additional galaxies with measured H 2 masses and 11 additional galaxies undetected in H 2 . The average SFR of these 54 additional sources is~7.1M ⊙ yr −1 which is almost a factor 5 higher than the average SFR in the representative sample, while the average stellar masses (lgM star /M ⊙~1 0.2) are virtually identical. This shows that starbursts make up a large fraction of these additional sources. Hence, the extended sample allows to better constrain the properties of starbursting galaxies at the cost of biasing the proportions between starbursting and non-starbursting galaxies.
About 30% of the galaxies in the combined data set have SFR estimates but lack a quantification of their uncertainties. Two options were considered. A first possibility is to mark the SFRs of all such galaxies as missing which results in a large fraction of the available SFR estimates being excluded from the analysis. An alternative approach consists of imputing SFR uncertainties based on a regression analysis. Specifically, the SFR uncertainty can be fit as function of SFR and stellar mass for those galaxies with provided SFR uncertainties. The analysis as presented in the paper follows the second approach but no substantive differences were found when the first option is chosen instead. All SFR measurements are censored if the SFR is lower than its measurement uncertainty. Measurement uncertainties of undetected H I (CO) sources are set to 1/5 the 5-σ (1/3 the 3-σ) detection limit given in the xGASS (xCOLD GASS) catalog.
Multi-dimensional model of star formation and gas content. The joint distribution of actual SFRs, molecular gas, and neutral gas masses at fixed stellar mass M star is modeled as a multivariate distribution consisting of a continuous component and a discrete 'zero-component'. The zero-component accounts for galaxies with vanishing SFRs and gas masses, while the continuous component models all other galaxies including regular star-forming galaxies and outliers with high SFRs and/or gas masses 20,21,67 . Hence, the probability density is pðSFR; M H I ; M H 2 jθ; π 0 Þ ¼ π 0 δðSFRÞδðM H I ÞδðM H 2 Þ þ ð1 À π 0 Þp cont ðSFR; M H I ; M H 2 jθÞ; where θ is the set of parameters describing the continuous component, while π 0 is the probability of a galaxy to belong to the zero component and δ is the Dirac delta function. Both θ and π 0 are functions of M star . In addition to this two-component model, an eight-component model was explored. In the latter, galaxies can belong (or not belong) to a zero component for each of SFR, M H I ; M H 2 , i.e., they can have vanishing SFRs but not vanishing gas masses and vice versa. Thus, in the eightcomponent model there are seven (partial) zero components and one fully continuous component. However, a Bayesian analysis showed that only two of the eight components contribute significantly to the total probability. These two components are the zero-component and the continuous component in the twocomponent model. Consequently, the two-component model was adopted as the default choice.
The continuous component of the joint distribution is modeled with the help of a Gaussian copula. This approach generalizes multivariate normal distributions to allow for arbitrary continuous marginal distributions. The correlation structure is fully captured by the 3 off-diagonal coefficients of a 3 × 3 correlation matrix R, while the marginal (one-dimensional) distributions are modeled as a mixture of two gamma distributions. The first gamma component corresponds to SFRs or gas masses of ordinary star-forming galaxies. It is parametrized by a shape (a SF ) and scale (b SF ) parameter. The second, sub-dominant gamma component accounts for outliers with high SFRs (i.e., starbursts) or gas masses 23 . Its parameters are a SF,out , b SF,out . Here, the scale b SF,out is measured relative to the peak of the SFS. The peak position of the SFS is naturally defined 20  The slope and scale parameters of the primary gamma component are modeled as linear functions of lgM star with slopes m and intercepts n for each parameter, see Supplementary Note 1 for details. The slope angles (ϕ ¼ arctanðmÞ) and perpendicular distances to the origin (d ¼ n cosðϕÞ) are used as the actual model parameters 68 instead of m and n. Given the relatively small number of galaxies with extreme SFRs or gas masses in the observational sample, no attempt is made in modeling the stellar mass dependence of a SF,out , b SF,out , and f SF,out . In contrast, a significant fraction of galaxies belongs to the zero component according to the predictions of the fiducial model. This fraction should depend on M star given the increase in the quiescent fraction of galaxies with stellar mass 69 . Hence, the logit of π 0 , defined as logit π 0 ¼ ln ðπ 0 =ð1 À π 0 ÞÞ, is modeled as a linear function of lgM star , with slope angle (ϕ 0 ) and perpendicular distance to the origin (d 0 ) as the main parameters.
The total number of parameters of the model is 26. There are 7 × 3 parameters that specify the slope and intercept of the stellar mass-dependent parameters of the gamma-mixture for SFRs, neutral, and molecular gas masses, three correlation coefficients, and two parameters that define the stellar mass dependence of the zero-component. Estimates for all model parameters are provided in Supplementary Note 1.
Bayesian analysis. The likelihood of the model parameters given the observational data are computed with LEO-Py 20 , available at https://github.com/rfeldmann/ leopy. The likelihood estimate accounts for the observational uncertainties and detection limits of SFR and gas mass measurements. Measurement errors are assumed to be normally distributed with zero mean and a standard deviation given by the measurement uncertainty. Missing SFRs, H I , or H 2 masses are assumed to be missing at random (MAR), i.e., the probability that a given entry is missing may depend on other galaxy properties (e.g., on the stellar mass) but not on the missing value itself. Very weak priors are adopted for all model parameters. Uniform, bounded priors are used for each slope angle ϕ and perpendicular distance d. The prior for the 3-vector of the correlation coefficients is modeled as uniform over the sub-volume of (−1, 1) 3 for which the correlation matrix is positive semi-definite and zero otherwise. The probability of model parameters given the observational data are given (modulo a constant of proportionality) by the product of the likelihood and the prior. However, since all adopted priors are uniform within the parameter bounds, this probability equals the likelihood (modulo a constant of proportionality) whenever all parameters are within their bounds, and 0 otherwise, thus simplifying the analysis. The posterior probability distribution of the model parameters was sampled with the MCMC ensemble sampler emcee 24 . Emcee was run for a total of 15000 steps using 1720 walkers and with a proposal scale parameter of 1.5. The first 4000 steps were considered burn-ins and discarded from the analysis. To reduce the wall-clock time, measurement uncertainties of stellar masses (~0.09 dex) were ignored. However, this simplification does not affect the presented results in a significant way, see Supplementary Tables 1-4. Furthermore, all MCMC calculations were run in parallel with MPI on 864 cores at the Swiss National Supercomputing Centre.
Mock observations. The present work uses mock data to confirm that the model provides a reasonable description of the observations and to construct the probability distribution of both actual and apparent galaxy properties for a given set of model parameters. The procedure below produces a mock catalog of specified size (N mock ). First, N mock stellar masses are drawn from the actual mass distribution of the xGASS/xCOLD GASS data set. Secondly, a given mock object is randomly assigned to either the zero component or the continuous component of the joint distribution with probability π 0 that depends on stellar mass. Mock objects in the zero component are assigned zero actual SFRs and gas masses. For and analyzes some of its predictions. In the equation above, t is the cosmic time, t dep ¼ t dep;H 2 =f H 2 is the total gas depletion time, f H 2 ¼ M H 2 =M gas is the molecular gas fraction, M gas (t, s) is a family of known gas mass histories, and s is a onedimensional parameter indicating a given evolutionary track. The SFR is the time derivate of the stellar mass, i.e., SFR(t, s) = ∂M star (t, s)/∂t, as long as stellar mass loss and mass accretion via galaxy mergers are ignored. The former can be partially accounted for by adopting the instantaneous recycling approximation 70,71 , while the latter is a reasonable assumption given that star-forming galaxies acquire most of their stellar mass via in situ star formation 72 . As presented in "Gas depletion times", the molecular gas depletion for typical star-forming galaxies is a power-law function of M star and SFR and potentially independent of z, i.e., t dep;H 2 ðM star ; SFRÞ / M β star SFR α . Furthermore, as discussed in "Discussion" and shown in Supplementary Fig. 6, the molecular gas fraction depends on M star (and potentially t) but not significantly on SFR. Hence, Eq. (5) can also be written as Equation (6) together with M star (0, s) = 0 is an initial value problem for any given fixed s. It can be solved numerically, e.g., with the solve_ivp function from the Python scipy.integrate module, to obtain M star (t, s) for all t. Subsequently, SFRs can be obtained from equation (6), molecular gas masses via M H 2 ¼ t dep;H 2 SFR, and neutral gas masses (including Helium) via M H I ¼ M gas À M H 2 . As the evolutionary model uses the functional forms of the SFS, NGS, and MGS only indirectly, via t dep;H 2 and f H 2 , it may not necessarily predict scaling relations in agreement with those shown in Fig. 1. For instance, the slope of their SFS will be exactly linear if galaxies evolve according to (6) with constant gas masses and f H 2 / M γ star (see Supplementary Discussion). Comparing model predictions and observational measurements of the SFS, MGS, and NGS, thus allows to put constraints on the gas growth history of galaxies.
Equation (1) is a power-law approximation for t dep;H 2 as a function of SFR and M star . While this is the conventional choice, an alternative approach is to fit the reciprocal molecular depletion time t  6) and thus can be solved in the same way. In fact, both models are identical if β 0 ¼ β=ð1 þ αÞ, α 0 ¼ α=ð1 þ αÞ, and a 0 ¼ a.

Data availability
The xCOLD GASS 13 and xGASS 14 catalogs are publicly available at http://www.star.ucl. ac.uk/xCOLDGASSand http://xgass.icrar.org. The combined xGASS / xCOLD GASS data set used in the present analysis is available as Supplementary Data, see Supplementary Note 4.