A radius valley between migrated steam worlds and evaporated rocky cores

The radius valley (or gap) in the observed distribution of exoplanet radii, which separates smaller super-Earths from larger sub-Neptunes, is a key feature that theoretical models must explain. Conventionally, it is interpreted as the result of the loss of primordial hydrogen and helium (H/He) envelopes atop rocky cores. However, planet formation models predict that water-rich planets migrate from cold regions outside the snowline towards the star. Assuming water to be in the form of solid ice in their interior, many of these planets would be located in the radius gap contradicting observations. Here we use an advanced coupled formation and evolution model that describes the planets’ growth and evolution starting from solid, moon-sized bodies in the protoplanetary disk to mature Gyr-old planetary systems. Employing new equations of state and interior structure models to treat water as vapour mixed with H/He, we naturally reproduce the valley at the observed location. The model results demonstrate that the observed radius valley can be interpreted as the separation of less massive, rocky super-Earths formed in situ from more massive, ex situ, water-rich sub-Neptunes. Furthermore, the occurrence drop at larger radii, the so-called radius cliff, is matched by planets with water-dominated envelopes. Our statistical approach shows that the synthetic distribution of radii quantitatively agrees with observations for close-in planets, but only if low-mass planets initially containing H/He lose their atmosphere due to photoevaporation, which populates the super-Earth peak with evaporated rocky cores. Therefore, we provide a hybrid theoretical explanation of the radius gap and cliff caused by both planet formation (orbital migration) as well as evolution (atmospheric escape).

The distribution of planetary radii observed mainly by the Kepler spacecraft shows a prominent feature called the "radius valley" 1,2,3,13 .It separates smaller "super-Earths" with radii R ≲ 1.7 R ⊕ from larger "sub-Neptunes" and its origin has multiple interpretations.One is a planetary envelope mass-loss process, i.e. atmospheric escape during planet evolution.The driving process could be photoevaporation because of stellar X-ray and Ultraviolet irradiation 4,5,6,7 or core-powered mass loss 14,15,16 of a hydrogen-helium (H/He) envelope on top of a core consisting of silicates and iron.In the following, we call this mixture of solid materials "rocky".With this limited compositional inventory, the observed sharp drop-off of the sub-Neptune occurrence at radii greater than ∼3 R ⊕ -the "radius cliff" -likely requires an additional mechanism, such as H 2 sequestration in the magma ocean 12 .Another possible explanation for the radius distribution is that the sub-Neptunes are water-rich with water mass fractions of several tens of percent 17,18,19 .Such large water contents are a consistent prediction of planet formation models which include the effect of planet-disk interactions leading to migration of icerich planets from outside the water condensation line towards the star 20,21,22,23,24,10 .In this scenario, the commonly assumed H/He dominated envelope of the sub-Neptunes is not required anymore.This hypothesis has recently gained supporting observational evidence based on planets around M stars 25 , whose bulk density distribution can be well reproduced with silicates and iron for super-Earths and about equal fractions of rock and water for sub-Neptunes.The tentative evidence for this scenario lies in the small scatter of sub-Neptune densities 25 , which is not expected for rocky cores with a H/He envelope.
For more massive stars, the smaller radial velocity amplitude often hinders a precise determination of the planetary mass.In that case, a clear picture in density does not emerge with presentday data and the different theoretical models can not easily be falsified 26 unless the planetary masses are constrained using theoretical arguments, such as the output of a planet formation model 6,11,10,27 .
In the case of water-dominated outer layers of the planet, the phase of water determines the precise radius.One key assumption, which is well motivated at the high temperatures where typical sub-Neptunes are observed, is that water is not condensed out into solid ice and instead forms a massive, hot, to a large degree supercritical, vaporized hydrosphere, here called "steam envelope" 18 .This significantly increases the radius relative to the condensed case 28 .Due to model limitations, water was not consistently included in this lower-density phase in the first works cou-pling a planetary mass distribution from planet formation models to photoevaporation models 6,11 and/or comparing the observed valley locus with the theoretically predicted one 29,11 .Therefore, these authors excluded that the super-Earths contain significant amounts of water and similarly favored water-poor sub-Neptune compositions because for solid ice, such planets fill the radius gap.Later, a combined formation and evolution model with the correct water phases to show that the the radius valley can emerge as a separator between dry and wet planets (i.e, formed within vs. beyond the ice-line) was presented 10 .In that work, the main process driving such a dichotomy are different efficiencies for pebble accretion depending on the pebble composition, which produces smaller rocky and larger icy cores.Recently, another study 27 came to the same conclusion with a similar model but also modeling N-body interactions between planets while keeping the water in condensed form.In ref. 10 , the authors found that the location of the valley is better reproduced if any water is mixed with H/He as steam and both undergo atmospheric escape.A detailed quantitative comparison to observations and the simulation of N-body interactions was outside the scope of their work motivating further investigation.
Here, we used a planetesimal-based coupled formation and evolution model to synthesize a population of planets.For the nominal case, we assumed that any accreted water mixes with H/He and can form a steam envelope.Furthermore, we model photoevaporative mass loss of both water and H/He leading to a more complete picture of all processes at play.The results are obtained in a statistically robust way by modeling a full population of 1000 systems with initial conditions informed from observations of disks 30,31,32 .This enables us to extract the key mechanism shaping the radius valley and statistically quantifying their ability to reproduce observations.We use the results of our planet formation modeling 8,9 created for statistical studies.The planetary systems form around 1 M ⊙ stars, each of which starts with different disk properties and initially 100, randomly distributed, 0.01 M ⊕ planetary seed particles per disk.During the formation stage we model the evolution of the gas disk, planetary growth by accretion of planetesimals and gas, gas-driven planetary migration, and dynamical interactions between the planets by means of an N-body algorithm (scattering, giant impacts, and -combined with orbital migration -capture into resonances).The formation model is outlined in the Methods section.
In the subsequent evolution stage, the planets are evolved individually, taking into account in the calculation of their interior structure the effects of cooling and contraction, atmospheric photoevaporation, bloating, and stellar tides.This leads to a population of planets at an age of 5 Gyr.
For the results shown here, we have extended the formation phase to 100 Myr compared to 20 Myr in the original work 9 before evolving the individual planets.For the nominal model, we mix the accreted water with any present H/He 33 and, for water, use a new equation of state 34 , which covers all possible phases.Under this assumption, water is also present in the upper layers of the gaseous envelope and can even be the only volatile constituent.Therefore, we include the photoevaporative mass loss of water 35 and add it mass weighted to the photoevaporative loss of H/He 36 .The evolution processes and their implementation are The light blue line shows the observed distribution without correction for the observational bias 2 and the gray line the synthetic one using the updated, nominal evolution model with the bias of the Kepler survey applied.We restrict the sample to planets with orbital periods lower than 100 days.Bin counts are normalized by the total number of planets in the different samples.also detailed in the Methods section.
As a final step, in order to compare synthetic results to transit observations, we apply the detection biases from the Kepler mission using KOBE 37 .The procedure is described in the Methods and allows for a statistical comparison to the California Kepler Survey 38 supplemented with Gaia data 2 .

Results
Radius distribution We contrast our theoretical radius distribution with applied observational bias to observations 2 in Fig. 1.We find an excellent match for the locus of the sub-Neptune peak, the radius valley, and the super-Earth peak.The synthetic radius cliff is only marginally steeper while the relative number of super-Earths in the model is higher than observed.Statistical quantification of the differences (Extended Data Fig. 6) reveals that the radius distribution of super-Earths within 30 d is well matched but at larger distances, we synthesize and theoretically "detect" more rocky planets than observed.The depth of the radius valley is deeper in the synthetic population, which is in this comparison not affected by a smoothing effect of measurement uncertainties as for the observed distribution 13 .We note that the match to the observed distribution of planetary radii was obtained naturally from consistently including more realistic physics, especially the different phases of water, compared to previous works 11 .The result is obtained from following the self-consistently formation and evolution of initially 100 lunar-mass planets per disk.We Radius histograms of synthetic planets with orbital periods shorter than 100 days for different model assumptions without any bias applied.For each panel, the full distribution is shown in gray and colored histograms are showing different compositional subsets (pure rocky in green, water-rich without any H/He in blue, with some H/He in red).The left panel shows the nominal model, in the middle panel water was assumed to be condensed out in a solid ice layer resistant to evaporation below H/He, and the right panel uses the nominal treatment of water but excludes evaporation.The dotted histogram shows the full distribution from the nominal simulations.As in Fig. 1, bin counts are normalized by the total number of planets in the different samples.
did not adjust any parameters neither in the underlying formation phase nor in the evolutionary model.
In Fig. 2, we show radius distributions of different modeling runs without an observational bias applied, splitting the population according to bulk composition.From the left panel in Fig. 2, it can be inferred that the valley separates smaller, dry planets from lower-density, wet planets with a vapor envelope.There is only a small overlap of the two distributions.The planets whose envelopes contain some H/He make up only 13% to 16% (depending weakly on age) of the sub-Neptunes.Furthermore, most of the planetary envelopes of sub-Neptunes with some H/He, contain less than ten percent of H/He by mass.
The radius cliff located at ∼3 R ⊕ is in our simulations a second compositional transition from planets containing ≳90 % water to those which contain several tens of percent of H/He in mass.It is accompanied with a corresponding drop in the occurrence of planets at ∼20 M ⊕ .Thus it is a feature which emerges due to more gas accretion for the more more massive planets.
We further varied model assumptions to understand how robust the results are.We found the results to be insensitive to any bloating mechanism (Supplementary Information).In contrast, under the assumption that water is buried below the H/He envelope and is forced to remain in condensed form we do not reproduce the radius valley (middle panel of Fig. 2).Instead, water-rich cores populate the radius range where the observed valley is located.This echoes previous works 29,11 excluding water-rich cores and shows that the cause of recovering a radius valley with water-rich planets can be attributed to the (correct) phase of water and its distribution within the envelope 18 .
The rightmost panel of Fig. 2 shows the case of excluding photoevaporation while keeping the water mixed into the H/He envelope.There, we get a distribution which is neither in agreement with the presence of a radius valley.Instead, we obtain many (low-mass) large planets with a rocky core and a H/He envelope and also a distribution of water-rich planets which smoothly extend to low radii.In reality, both of these kinds of planets would be unable to retain their H/He or (to a smaller extent) water envelopes.For the same reason, too few rocky planets exist.This highlights the need of atmospheric escape shaping the distribution of planetary radii even for water-rich compositions.It is necessary to populate the super-Earth peak with rocky planets by stripping their H/He envelopes.We conclude that the valley is a hybrid consequence of both formation (migration leading to the sub-Neptune peak) and evolution (evaporation leading to the super-Earth peak).
Dependency on orbital period In addition to radii, orbital periods of exoplanets can be determined precisely.Fig. 3 shows the period-radius distribution of observed 2 (left) and modeled (middle and right) exoplanets.Qualitatively, the observed distribution matches well the synthetic distribution with applied observational bias.We note that we apply the bias of the full Kepler survey without taking into account that not all planets are included in the California Kepler Survey catalog 2 .Therefore, a larger number of points is shown in the middle panel.In both the observed and the synthetic distribution, the radius valley can be made out at comparable locations.
The right panel shows the synthetic data without observational bias applied.Additionally, the composition is color-coded.Rocky Earths and super-Earths (green) are found at small radii below the valley.This distribution is truncated at around 300 d where the equilibrium temperature is close to 300 K.
The planets with water but without H/He (blue) are filling the parameter space at radii greater than the rocky planets above the valley and at larger distances.Planets with H/He (red) populate the even larger radii and are also more common at lower radii further away from the star.
As revealed by the selected formation tracks in Extended Data Fig. 7, this pattern is shaped by migration and collisions during the formation stage as well as photoevaporation.Rocky planets generally form at short distances inside of the water snowline.They grow first by planetesimal accretion and then (and most importantly) by giant impacts with other rocky protoplanets.Giant impacts are visible as vertical parts in the formation tracks.However, their mass growth is limited due to the limited amount of building blocks inside of the ice-line.In absolute terms, only little orbital migration inside of the ice-line happens.
At larger distances to the star, more solids are available for accretion for the assumed MMSN-like profile of the planetesimal disk.In addition, even more material is available for solid accretion thanks to the condensed icy material outside of the water ice-line.This leads to promoted planetary growth up to a several Earth masses where the timescale of type I migration 39 reduces and planets migrate efficiently into the inner region.While migrating, the more massive, water-and H/He-rich planets interact and often collide with rocky planets in the inner system, especially after the systems are destabilized after the gas disk dissipates due to a lack of eccentricity damping 40 .Collisions with other bodies -or at the closest distances the radiation flux of the host star -can lead to the expansion and Roche lobe overflow of the gaseous envelope and thus remove the H/He content.However, under our assumptions, this is not the case for the heavier water.Only after switching to the evolution stage, we assume that the constituents perfectly mix.These processes therefore give rise to a population of planets with pure water envelopes.More massive, larger planets or planets at larger distances share a similar formation history but are not stripped completely of H/He (red planets and tracks in Fig. 3 and Extended Data Fig. 7).
During the evolution stage, planets cool and are subject to photoevaporation.Thus, they move vertically downwards in Fig. 3.The most frequent evolutionary pathway is the loss of a (pure) H/He atmosphere of a rocky core.In a smaller fraction of the cases, also a mixed or water-dominated envelope can be lost completely to result in a bare rocky core.This happened for 17 % of the (eventually) rocky super-Earths in the biased, synthetic population which had at least ten percent of water by mass after the formation stage.This rare outcome occurs for the lowest mass planets which were able to migrate.
Although there is an overlap in mass space between volatilerich and rocky planets (see Extended Data Fig. 8), the overall formation pathway occurs along the following lines: rocky cores are lower mass planets which formed almost in-situ by a giant impact stage while volatile-rich planets are more massive and for that reason migrated substantially to their present-day location.Low-mass volatile-rich in contrast do not migrate close to the host star in significant numbers (which would fill the valley), because type I migration is slower the lower the planet mass is 41 .

Discussion
Mass distribution and mass-radius diagram A fundamental property of planets with regards to their formation is their mass.
Here, we started at planetary embryos of 0.01 M ⊕ and modeled their growth by solid and gas accretion as well as the long-term evolution of the bodies which results in a distribution of planetary masses shown in Extended Data Fig. 8.This approach is very different from mass distributions derived from fitting the observed properties of the close-in low-mass population found by the Kepler satellite via a inference analysis 43,26 .As mentioned above, there is a clear trend in planetary mass from low-mass, rocky planets over more massive, migrated, steamy water worlds to the H/He-rich (sub)-giants.However, the mass distributions are broad enough to lead to a significant overlap which results in a unimodal total mass distribution.
The characteristic masses are few Earth masses for rocky planets and about ten Earth masses for water worlds, but with broad distributions.These mass scales can be understood analytically by recalling that most of the rocky planets went through a giant impact phase after a destabilization of the systems.The appropriate mass scale is then given by the amount of solid building blocks in an dynamically enhanced feeding zone (Goldreich mass) 44,45,46 .For planets which migrate, thus the water-rich planets, the equality of migration against growth or -if present -the saturation of co-rotation torques set the mass scale (Equality or Saturation mass) 45,46 .
While the mass of the rocky planets depends on the solid accretion mechanism, the mass of migrated steam-worlds is less sensitive to it.Independently from our work, the study 10 using a global model with pebble accretion instead of planetesimal accretion resulted in a mass distribution with more distinct separation of rocky and water-rich planets in mass.An overall similar mass distribution to ours was retrieved in a Bayesian framework by 26 shown in Fig. 8 (dashed).A strong difference is however that a very low probability of planets containing water was inferred which we attribute to their work assuming water to be in the icy/condensed phase at all temperatures.
In Extended Data Fig. 9, we also show the synthetic massradius diagram for the nominal model at 5 Gyr.We find that our model leads to planets covering the area occupied by precisely characterized, observed planets around solar-type stars.A tentative over-density of both observed and simulated planets is located close to the lowest-density planets without H/He (uppermost blue points in the diagram).This would be in agreement with recent observational findings 25 for M stars but is for Solar-type stars without statistical significance.So far, no unbiased, statistical sample of characterized planets with precise mass and radius measurements exist.This is expected to change with PLATO 47 which includes ground-based follow-up.By then, it will be possible to statistically assess trends on the mass-radius diagram 48 .
Potential reasons from discrepancy of close-in ice rich planets.While the synthetic and observed radii match well, we find a difference in the orbital period distribution of the close-in sub-Neptunes.In Fig. 3 by comparing the observed (left) with the synthetic, biased (middle) population, it becomes apparent that a group of sub-Neptunes is theoretically predicted to exist at orbital periods of 1 d-5 d which is absent in the observed sample.While they are only straddling the border of the observed paucity of Sat-   42 .The synthetic data is biased (middle panel) using KOBE (see Methods).The rightmost panel shows the unbiased synthetic distribution color coded by their composition (as in Fig. 2).urnian mass planets (the sub-Jovian desert 49 ) and the individual planets' radii are not in disagreement with observations, they are more numerous than observed.Instead, a similar number of planets is missing at longer orbital periods.It is thought 50 that even at high irradiation, steam envelopes can be kept for sufficient core masses.However, with the exception of 55 Cancri e 51 , there are no observed planets with low bulk densities in this regime.Thus, the very close-in sub-Neptune population that we obtain is disfavored by observations.A possible explanation is that they might stop their migration in the type I regime 52,39 at larger orbital distances than what the theoretical model currently predicts.At the moment, the inner edge of the gas disks acts as migration trap and is set to the corotation radius derived from observed rotation periods of young T Tauri stars e.g. 53,54,55.However, the stopping distance could be further out.Since sub-Neptunes form early in the gaseous disks (to be able to migrate significantly), and given a sufficient source of disk turbulence in the inner disk, viscous heating would be efficient.Therefore, the location of ionization, that is, the inner edge of the dead zone to the magneto-rotational instability 56,57 , can lie further away from the star.This introduces a migration trap 58,59 and causes the distribution of migrated planets to peak at larger orbits.In the future, the formation model needs to include the ionized region and a transition in the strength of magnetorotational turbulence.

Summary and conclusion
By coupling a global end-to-end planet formation model to different evolution pathways, we identify scenarios which lead to a distribution of planetary radii consistent with the observed location of the radius valley 1 .These scenarios are shown in Figure 4. Key mechanisms in the the formation model are orbital migration and N-body interactions.The evolution part was calculated with and without updated models of photoevaporation and bloating.Furthermore, we assume in our nominal case that the accreted water is not condensed into solid ice on top of the rocky core as some previous works but is in the correct phase, which is typically supercritical vapor which can be mixed with H/He.
In this way, we provide theoretical support of the scenario of formation, or, more specifically, gas-driven orbital migration, shaping the distribution of mostly water-rich, steam-envelope planets populating the sub-Neptune peak (scenario IV in the schematic figure 4).Only at larger planetary radii H/He becomes the dominant gaseous constituent.This is in agreement with an earlier study 10 , albeit the formation mechanism assumed in that study was pebble accretion.The bi-modal mass distribution (below 20 M ⊕ ) shaped by pebble accretion and its isolation mass is not required to match observed radii.Instead a uni-modal mass distribution can also reproduce the observations.At the same time, we also give evidence for the necessity of an evolutionary mass-loss mechanism.Atmospheric escape is necessary to populate the super-Earth peak with evaporated rocky cores.Without evaporation, rocky planets with small H/He atmospheres which form during the gas disk stage inside of the ice-line would lead to low-mass planets with large radii resulting in a radius distribution inconsistent with observations (Scenario III in Fig. 4).
Our results are not sensitive to modifying the photoevaporation model or the presence of a bloating mechanism.With both orbital migration and atmospheric escape causing the radius valley, one can speak of a hybrid origin of the radius valley caused by both formation and as evolution.
Assuming the mass distribution of our model is approximately correct, we can also falsify the scenario (II in Fig. 4) of a condensed-out ice layer.In this scenario, the icy planets' radii fall into the valley and a lack of planets is found at the sub-Neptune peak.This echoes earlier works assuming condensed ice 29,11 .However, at the planets' equilibrium temperatures, water is not in the solid ice form 28,60,61 .
Finally, based on planet evolution only we can not rule out • mixed steam • with evaporation  We qualitatively found a negative slope of the distributions in all scenarios.We identified in this work that the radius valley is either shaped by a mass-loss mechanism without any water on the planets or by photoevaporation combined with water in supercritical steam envelopes which is in better agreement with the predictions of formation theory, in particular that planets should be undergoing orbital migration.Neither the scenario with condensed water (Scenario II) nor without photoevaporation (Scenario III) reproduce the observed valley.
the classical picture of photoevaporation of H/He envelopes on top of exclusively rocky cores 7 for a review only shaping the radius distribution.However, as a key result, many different planet formation models consistently predict the migration of ∼ 10M ⊕ water-rich planets to regions close to the host star, a prediction for example already made in the first generation of population syntheses 21 (their "horizontal branch" planets).This leads to the presence of a population of close-in, water-rich planets for which we showed that it will not lead to a disagreement with the observed valley.If the prediction of the presence of water-rich sub-Neptunes should turn out to be incorrect, it would call for a revision of fundamental aspects of formation theory.These aspects could for example be protoplanetary disk structures leading to less orbital migration 62,63 or a high efficiency of volatile loss during planet assembly 64 .Today, direct observational constraints on the bulk composition of sub-Neptunes are still inconclusive but with the help of the James Webb Space Telescope and the future ARIEL mission 65 , we expect to find more evidence for either the water-rich or the H/He dominated composition and advocate the investigation of sub-Neptune atmospheres to resolve the mysteries of the radius valley.

Methods
Formation model The formation part of the Bern model gathers the evolution of a viscous-accreting gas disk, the dynami-cal state of the solids in the disk, the concurrent accretion of solids and gas by the protoplanets, planet-disk interactions leading to gas-driven migration, and dynamical interactions between the protoplanets.Both the gas and solid disks are described by 1D axisymmetric, vertically-integrated profiles.The model solves an advection-diffusion equation to track the evolution of the gas disk 66,67 , with additional sink terms for the accretion by the protoplanets and external photoevaporation.The standard α parametrization 68 is used to compute the viscosity, while a radiative balance is used to determine the vertical structure 69 .Solids are assumed to be in planetesimals, whose dynamical state (eccentricity and inclination) is evolved due to damping by the gas disk, self-stirring, and stirring by the protoplanets 70 following.
Planet formation follows the core accretion paradigm 71,72,73 with the concurrent accretion of planetesimals and gas 74 .At time zero, 100 embryos of 10 −2 M ⊕ (nearly lunar-mass) are randomly placed in the disk with a uniform probability in the logarithm of the distance between the inner edge and 40 au.Planetesimal accretion is assumed to occur in the oligarchic regime 75,76,77 and their radius is set to 300 m.The enhancement of the capture cross-section by the presence of an envelope is included 78 .The model solves the 1D spherically symmetric internal structure equations 79 to obtain either the mass or the radius of the envelope.During the early stages, gas accretion is limited by the cooling of the planet 74,80 and the envelope mass is retrieved from the structure.Cooling efficiency improves as the planet grows, leading to a point where the gas accretion rate exceeds the supply from the disk, which is set according to the Bondi rate.Once this point is reached, the gas accretion is known and the internal structure equations are used to track the contraction of the envelope 81 , yielding the planet radius and luminosity.
The model prescribes gas-driven migration 39 plus a reduction of the co-rotation torques due to planet eccentricity and inclination 82 .Gas-induced eccentricity and inclination damping are also included 40 .Dynamical planet-planet interactions are tracked using the mercury N-body code 83 .
Improved evolution model In the following, we describe in detail the changes in the evolution model compared to the previous version which was described in full detail 8 .The contraction and cooling of the interior structure as well as the tidal evolution remained unchanged.
The first major change concerns the treatment of water in the interior structure model.Previously, we assumed that it is always in condensed form and resides in a pure water/ice shell between the inner iron-silicate core and a possible outer gas envelope which consists of pure H/He.In the New Generation Planetary Population Synthesis (NGPPS) series 8,9 , this approximation was required as our interior structure model was not updated for mixed H/He + H 2 O envelopes.Instead, we now allow for all phases of water -including vapor and supercritical -to exist.The orbital distances at which the valley is observed lies closer to the star than the habitable zone 84 , meaning the temperatures at the top of the atmospheres exceed the threshold where water could condense out.Additionally, a strong runaway greenhouse occurs in vapor atmospheres at these distances 28,61 .Given the fact that H 2 O is also miscible in H/He on a molecular level 33,61 .This means that in reality, water does not form a separate condensed layer, but is mixed into a H/He + H 2 O vapor envelope (or in a pure vapor envelope if the planet contains no H/He).One of the latest updates to our interior structure and planet evolution model was to include such mixed compositions 85 .Specifically, we mix water described with the AQUA equation of state (EoS, see also the description below) 34 with H/He 86 . 1 This implies that the fraction of heavy elements Z is radially constant in the envelope.We note that, this averaged Z is used to calculate molecular opacities as a function of temperature but assuming solar-like elemental abundances and equilibrium chemistry 88 .
Second, and related to the previous step, we improved the prescription for X-ray and Extreme Ultra Violet (XEUV)-driven atmospheric escape of the planetary envelopes.XEUV radiation drives a mass loss rate of where F XEUV is the flux received in either X-ray (dominating early) or EUV wavelengths, R base is the base of the ionization layer, R τ=2/3 is the radius of the τ = 2/3 layer, G the gravitational constant, M tot the mass of the planet, K(ξ) = 1 − 3 2ξ −1 2ξ 3 , ξ = R Roche /R τ=2/3 the ratio of the planets Roche limit to its radius, and ϵ a escape efficiency factor.R base is located in the planetary structure by equating the partial pressure to the pressure where an optical depth of one is reached for ultraviolet pho-tons 89 .If this critical pressure lies exterior to the resolved envelope structure, we extrapolate using the scale height determined at 1 bar.As improvements to the NGPPS escape model 11,8 , the time evolution of F XEUV is updated from a simple log-constant slope 90 to an tabulated model 91 based on more recent observational data 92,93 .Furthermore, we calculate escape rates for both water and H/He separately.For water, we use Eq. ( 1) with a variable ϵ given such that the mass loss values found for a pure water vapor atmosphere 35 are reproduced.In that work, the author makes use of a model 94 which evolves hydrodynamically and chemically a 1D atmosphere from first principles.For the escape of H/He, we use extended tables 36 based on a hydrodynamic model accounting for various heating (including XEUV) and cooling mechanisms described in 95 .The two rates from water and H/He escape are summed by weight Ṁesc,KuFo,Jo = Ṁesc,H 2 O,Jo Z + Ṁesc,H/He,KuFo (1 − Z), where Z is the mass fraction of H 2 O in the envelope.When removing mass from the envelope, we leave Z unchanged, that is, we assume no fractionation which is motivated by classical theory for large escape rates 96 .
The third and final relevant variation is in the mechanism for so called 'bloating' where we use an observationally-derived, empirical relation 97 in the nominal model.Bloating is the term used to describe an empirically found increase of planetary radii of mostly hot Jupiters over the expected theoretical values 98 .To reproduce observations, an additional heat source in the deeper interior of the planetary structure is required.We model this as an additional luminosity added to the energy equation at the coreenvelope boundary.In addition to the empirical model 97 , we also explore cases without bloating and with a fit (see the next paragraph) to the results of a physical model of potential temperature advection 99 .
Bloating model using a fit to the potential temperature advection model To include potential temperature advection as a bloating mechanism 99 , we start with a fit of the temperature at 100 bar given a stellar irradiation flux F 99 their Fig. 5.We further subtract 200 K motivated by the results from 3D simulations 100 .We first calculate the entropy, and then using our interior structure models the corresponding bloating luminosity L bloat .
To take into account the mass fraction of heavy molecules (i.e.water) in the envelope Z, we use a Z dependent fit of the form where F is the irradiation flux.To obtain results for continuous values of Z we interpolate linearly between the fits and we further found a linear dependency of the luminosity on planetary mass to obtain the desired entropy.The resulting parameters for equation (2) are listed in Table 1.(f) high-temperature gas including ionization and dissociation of H 2 O 107,108 ; and (g) supercritical fluid at very high pressures including super-ionic phases 109 .The location of the seven regions from individual EoS are shown in Fig. 5. Since there are region boundaries which do not follow a physical phase transition, e.g. between 106 and 109 , AQUA provides interpolated values, to assure a smooth transition in all provided state parameters.The state parameters which AQUA provides for a given pressure and temperature, are density ρ, adiabatic gradient ∆ Ad = ∂ ln T ∂ ln P S , entropy s, internal energy u, bulk speed of sound w, mean molecular weight µ, ionization fraction x ion , dissociation fraction x d , and a phase identifier to identify the corresponding phase.KOBE Kepler Observes Bern Exoplanets (KOBE) 37 is a program that simulates transit surveys of exoplanets2 .If Kepler (or TESS, PLATO, etc.) was to, hypothetically, observe a synthetic population of planets then KOBE identifies those synthetic planets that would have been detected by such a survey.As a result, KOBE allows theoretical models of planet formation (such as the Bern Model used here) to be compared with transit observations.KOBE is organized in three sequential modules.The first module, KOBE-Shadow, finds transiting planets from a synthetic population of planets given their orbital elements.A planet can only transit when its orbit is aligned with respect to a hypothetical observer's line-of-sight.KOBE-Transits, the next module, calculates transit parameters.It applies the detection biases coming from physical limitations; large planets in tight orbits around a quiet star are strongly favored.For comparison to Kepler, transiting planets which transit at least three times and have a signal to noise ratio ≥ 7.1 are potentially detectable.KOBE-Vetter, the final module, applies the completeness and reliability of the Kepler The blue shaded areas show where neighboring regions have to be interpolated to achieve smooth transitions in the state parameters.The time evolution of a sub-Neptune water envelope at 3 days orbital period with an initial mass of 2.7 M ⊕ on top of a rocky 6.3 M ⊕ core is shown in gray.Its inner layers are supercritical at high pressure 109 and the upper layers mostly in the high temperature gas regime 107,108 .

AQUA equation of state
pipeline by emulating Kepler's Robovetter 110 .Transiting planets that are identified as planetary candidates by KOBE-Vetter make up the KOBE-biased catalog which we use for comparison to the Kepler-based CKS results 42 .The planets in KOBE catalog are comparable to the exoplanet population discovered by Kepler.
Th.H. and C.M. supervised the project.7: Orbital period against masses of synthetic planets after 5 Gyr of evolution.Planets observable and non-observable with the Kepler space telescope are shown (KOBE bias applied) as opaque or transparent points respectively.The points are colored as in Fig. 3 and we show in addition the formation tracks of a selection of planets from the different categories as lines.Stationary jumps in mass are caused by giant impacts.Since there is no effect of the lower density material on the planetary mass, the radius valley is not visible in mass space.

B Supplementary Information B.1 Dependency on model assumptions
We investigate the effect of several key model assumptions on the final planet properties.We use the same planetary population after the formation stage but change the used description for the effects of photoevaporation and bloating of the planetary envelopes.

B.1.1 Impact of the evaporation model
As a first test, we see that removing the effect of photoevaporation results in a distribution of radii without a valley.Photoevaporation is a necessary process to populate the envelope-stripped, super-Earth peak (see Fig. 2 in the main text) with planets located close to or within the observed radius valley otherwise.Furthermore, without photoevaporation, planets in the super-Earth regime would commonly contain large amounts of water, which is not realistic at these planetary masses and distances to the star 50 .
To model photoevaporation, we nominally used the tabulated evaporation rates based on recent, detailed radiationhydrodynamic models for H/He 36 and water 35 dominated envelopes.To contrast them to a more simple model, we show in Supp.Fig. 10 in the upper, middle panel the resulting distribution of planetary radii using energy-and radiation-recombinationlimited escape rates 6,11 and further a variant thereof which accounts for an increase in mean molecular weight as a function of the envelope metallicity Z when calculating the base layer of the evaporative flow and a scaling of the efficiency factor ϵ with Z in Eq. 1 in Methods taken from ref. 50 .While the former gives results similar to the nominal evaporation model, the Z-dependent version results in a less pronounced super-Earth peak as less planets lost their primordial envelope.
For reference, we also show the resulting planetary masses in Supp.Fig. 11.A shifted overall peak at higher masses is found in mass space when removing the effect of photoevaporation but no significant differences result based on the different evaporation model.

B.1.2 Impact of bloating
In Supp.Fig. 10, we show panels with the nominal, empirical bloating model 97 (top left), with radius inflation caused by the potential temperature advection mechanism 99 (bottom, left), and without any bloating mechanism at all (bottom, middle).If included, we assume that the mechanism is acting on all planetary masses and we see a small impact of the bloating mechanism on the sub-Neptune population (blue histogram).It is expected that bloating by ohmic dissipation is efficient in sub-Neptunes 111 .Without a bloating mechanism the valley would be more populated by smaller water-rich planets therefore making it less deep and visible.The effect is however too small to be constrained by the current observational data.Both compared bloating models give very similar results.Overall, we find that the presence of a radius valley is robust against uncertain extrapolations of the radius inflation mechanism to lower planetary masses and different compositions.

Figure 1 :
Figure 1: Radius histograms of observed and synthetic planets.The light blue line shows the observed distribution without correction for the observational bias 2 and the gray line the synthetic one using the updated, nominal evolution model with the bias of the Kepler survey applied.We restrict the sample to planets with orbital periods lower than 100 days.Bin counts are normalized by the total number of planets in the different samples.

Figure 2 :
Figure2: Radius histograms of synthetic planets with orbital periods shorter than 100 days for different model assumptions without any bias applied.For each panel, the full distribution is shown in gray and colored histograms are showing different compositional subsets (pure rocky in green, water-rich without any H/He in blue, with some H/He in red).The left panel shows the nominal model, in the middle panel water was assumed to be condensed out in a solid ice layer resistant to evaporation below H/He, and the right panel uses the nominal treatment of water but excludes evaporation.The dotted histogram shows the full distribution from the nominal simulations.As in Fig.1, bin counts are normalized by the total number of planets in the different samples.

Figure 3 :
Figure3: Transit radii as a function of orbital period of observed (left) and the nominal, synthetic population.The data for the observational scatter plot is from the California Kepler Survey42 .The synthetic data is biased (middle panel) using KOBE (see Methods).The rightmost panel shows the unbiased synthetic distribution color coded by their composition (as in Fig.2).

Figure 4 :
Figure 4: Schematic visualization of the four considered scenarios.The location of the observed radius valley is represented as black line while the planetary population of pure silicate and iron bodies (green), water-rich planets (blue), and H/He gas giants (red) are shown as blurred rectangles.Example interior structure schematics are shown with indicators for mixing and escape.We qualitatively found a negative slope of the distributions in all scenarios.We identified in this work that the radius valley is either shaped by a mass-loss mechanism without any water on the planets or by photoevaporation combined with water in supercritical steam envelopes which is in better agreement with the predictions of formation theory, in particular that planets should be undergoing orbital migration.Neither the scenario with condensed water (Scenario II) nor without photoevaporation (Scenario III) reproduce the observed valley.

GPa 101 ,
102,103 .A single EoS which incorporates the many phases of H 2 O and covers the necessary large range in pressure and temperature has not yet been published.All commonly used H 2 O-EoS which cover a large range in pressure and temperature, make significant simplifications in terms of the number of treated phases and the location of the phase transitions.AQUA thus combines seven state of the art EoS into a single tabulated EoS which covers a domain from 0.1 Pa to 400 TPa in pressure and 150 K to 10 5 K in temperature.Each EoS is used in a distinct region in P-T space and the included phases are (a) ice-Ih 102 ; (b) ice-II, ice-III, ice-V, and ice-VI 104 ; (c) ice-VII, ice-VII*, and ice-X 105 ; (d) low temperature gas, low-pressure liquid and low-pressure supercritical fluid 101 ; (e) higher-pressure supercritical fluid 106 ;

Figure 5 :
Figure 5: Phase diagram of H 2 O split into the seven regions, separated by the solid blue lines.In each region a different EoS is used.Most region boundaries follow phase transition curves of H 2 O.The dashed lines are phase transitions that are not region boundaries, meaning the same EoS is used along the phase transition.The blue shaded areas show where neighboring regions have to be interpolated to achieve smooth transitions in the state parameters.The time evolution of a sub-Neptune water envelope at 3 days orbital period with an initial mass of 2.7 M ⊕ on top of a rocky 6.3 M ⊕ core is shown in gray.Its inner layers are supercritical at high pressure109 and the upper layers mostly in the high temperature gas regime107,108 .

Figure
Figure7: Orbital period against masses of synthetic planets after 5 Gyr of evolution.Planets observable and non-observable with the Kepler space telescope are shown (KOBE bias applied) as opaque or transparent points respectively.The points are colored as in Fig.3and we show in addition the formation tracks of a selection of planets from the different categories as lines.Stationary jumps in mass are caused by giant impacts.Since there is no effect of the lower density material on the planetary mass, the radius valley is not visible in mass space.

Figure 8 :
Figure 8: Mass histogram of the nominal simulation with applied Kepler bias.Lines show different groups: all planets (grey), rocky planets (only iron core and silicate mantle, green), planets with water but no H/He (blue), and planets with H/He (red).Dashed lines mark logarithmic mean values of each category and are labeled with mean and standard deviations.The distribution of core masses retrieved by Rogers & Owen 26 (R&O21) in a pure photoevaporation scenario is shown for comparison.

Figure 9 :
Figure 9: Total planetary masses against planetary radii of observed and synthetic planets.The observational data was taken from the NASA Exoplanet Archive (composite data, accessed on Dec 9, 2022) filtering for relative mass, stellar mass, and radius standard errors smaller than 20 % and stellar masses ranging from 0.75 M ⊙ -1.25 M ⊙ .Errorbars show the standard error.We caution that this sample of planets does not come from a homogeneous survey.

Figure 10 :
Figure 10: Histogram of planetary radii for different assumptions in the evolution model.All planets with orbital periods less than 100 d are included.The colored histogram are the same as in Fig. 2.

Figure 11 :
Figure 11: Histogram of planetary masses for different assumptions in the evolution model.The selection of planets and colors are identical to Supp.Fig. 10.

Table 1
34e AQUA equation of state is a collection of seven individual equations of state of H 2 O, which together cover a large domain in pressure and temperature useful to model planetary interiors34.H 2 O is a very peculiar molecule which shows a large number of distinct solid phases and a complex phase diagram.Given its significance in industrial applications, H 2 O is well described by EoS at low pressures, i.e.P < 1