Statistical clumped isotope signatures

High precision measurements of molecules containing more than one heavy isotope may provide novel constraints on element cycles in nature. These so-called clumped isotope signatures are reported relative to the random (stochastic) distribution of heavy isotopes over all available isotopocules of a molecule, which is the conventional reference. When multiple indistinguishable atoms of the same element are present in a molecule, this reference is calculated from the bulk (≈average) isotopic composition of the involved atoms. We show here that this referencing convention leads to apparent negative clumped isotope anomalies (anti-clumping) when the indistinguishable atoms originate from isotopically different populations. Such statistical clumped isotope anomalies must occur in any system where two or more indistinguishable atoms of the same element, but with different isotopic composition, combine in a molecule. The size of the anti-clumping signal is closely related to the difference of the initial isotope ratios of the indistinguishable atoms that have combined. Therefore, a measured statistical clumped isotope anomaly, relative to an expected (e.g. thermodynamical) clumped isotope composition, may allow assessment of the heterogeneity of the isotopic pools of atoms that are the substrate for formation of molecules.

Analysis of the isotopic composition of molecules is one of the key tools for studying element cycles on earth. For the light elements H, C, N and O with relatively small heavy-to-light isotope ratios at natural abundance, the standard analytical instruments have largely limited isotope analysis to single-substituted isotopocules (isotopically substituted molecules). Studies of multiply substituted isotopocules, referred to as clumped isotopes, were only occasionally carried out, often using isotopically enriched substrates or labeling experiments [1][2][3][4][5] . Recent analytical advancements using isotope ratio mass spectrometry [6][7][8] or laser spectroscopy 9 have enabled high precision measurements of clumped isotopes in several molecules such as CO 2 , CH 4 , O 2 and N 2 O 8,10-15 and the field is rapidly expanding.
Since multiply substituted isotopocules are thermodynamically more stable than single substituted ones, classical isotope theory predicts small but measurable positive clumped isotope anomalies for most molecules under natural conditions [16][17][18][19] . These clumped isotope signatures depend on temperature, which is the basis of the new field of clumped isotope thermometry 20 .
Yeung et al. 14 and Wang et al. 13 reported negative heavy isotope clumping in photosynthetic O 2 formation and in biogenic CH 4 , respectively. Yeung et al. 14 attributed the negative Δ values (see equation 3 for definition) in photosynthetic O 2 to different isotopic composition of the two O atoms originating from different sites in the oxygen evolving complex of photosystem II. Triggered by this observation we investigated this further and show here that negative clumping anomalies are necessarily expected whenever two or more indistinguishable atoms of the same element but with different isotopic composition combine in a molecule. The atoms do not need to share a common bond but can be at distant places in a molecule.
Yeung 21 recently presented an analysis of such apparent statistical clumped isotope effects in combination with other isotope effects. In our paper, we restrict the analysis to statistical clumped isotope effects and phrase the calculations exclusively in terms of isotope ratios in order to elucidate the underlying general nature of these apparent isotope signatures. The fundamental origin of the apparent statistical clumped isotope effect is thoroughly presented and visualized geometrically. We then provide a general mathematical formalism for apparent statistical clumped isotope signatures in any multiple isotope system. Finally we demonstrate quantitatively how a certain measured statistical anti-clumping signal can be used to determine the isotopic heterogeneity of indistinguishable atoms in a molecule.

Origin of the statistical negative clumped isotope signatures
We describe and calculate the statistical clumped isotope effects in terms of heavy-to-light isotope ratios i R of individual atoms and molecules, where the index i indicates the mass of the atom or molecule. The same letter R is used for both atomic isotope ratios and molecular isotopocule ratios. In particular, for molecules with multiple heavy isotopes (clumped isotopes), the heavy-to light isotopocule ratio is defined as = − R i amount of multi substituted isotopocule with mass amount of light isotopocule (1) cl i When two atoms with atomic heavy-to-light isotope ratios R 1 and R 2 (where R i can be, e.g., 2 H/ 1 H, 13 C/ 12 C, 18 O/ 16 O, etc.) combine in a molecule in a purely random manner, i.e., without any isotope effect, the ratio of molecules that include the heavy isotopes of both of these atoms relative to molecules including only light isotopes is simply the product of the atomic isotope ratios of the two atoms Clumped isotope signatures Δ i are then by convention calculated as the relative difference between a certain (measured) clumped isotopocule ratio and the random clumped isotope ratio, usually reported in per mill (‰).
The apparent negative statistical clumped isotope signatures that we describe in this paper are fundamentally related to this referencing convention, in particular the choice of the reference ratio i R cl,random that is required to calculate Δ i in Eq. 3. When the isotope ratios R 1 and R 2 of the involved atoms are individually known, i R cl,random = R 1 · R 2 can be precisely calculated. This is always the case when heavy isotopes of different elements clump together (e.g. 13 C and 18 O in CO or in CO 2 ). It also holds when molecules of the same element, for which the individual isotope ratios are known, clump together (e.g. 15 N α and 15 N β in N 2 O, where α and β indicate the central and terminal position of the N atom in the linear NNO molecule, which can be determined independently 22,23 ). This is graphically illustrated in Fig. 1, where R 1 and R 2 are plotted on the x-and y-axis and their product, i R cl,random , is shown as the blue area. When a molecule contains indistinguishable atoms of the same element, it is impossible to determine the individual atomic isotope ratios of these atoms. Nevertheless, the atoms may originate from isotopically distinct populations with isotope ratios R 1 and R 2 . To calculate the correct value of i R cl,random we would therefore need to know the individual isotope ratios R 1 and R 2 . However, since these isotope ratios cannot be retrieved for indistinguishable atoms, it is common (and reasonable) to assign the bulk (≈ average, see below) isotopic composition of the atoms. Through this choice, the real stochastic clumped isotope ratio R 1 ⋅ R 2 (blue area in Fig. 1) is substituted by the approximated value R av ⋅ R av (red area in Fig. 1). Both areas have the same perimeter 2 ⋅ (R 1 + R 2 ) = 2 ⋅ (R av + R av ), but the red square has a larger area than the blue rectangle, i.e., R av ⋅ R av > R 1 ⋅ R 2 . Replacing the area  3) is the product of the individual isotope ratios R 1 and R 2 , indicated by the blue lines and blue area. However, for indistinguishable atoms in a molecule the individual ratios R 1 and R 2 cannot be determined independently, and they are both assigned the average ratio R av (red lines). This leads to the red area for i R cl,random . The systematic error associated with using the red area instead of the blue area to calculate Δ i causes the apparent negative clumped isotope signatures described in this paper.
of the blue rectangle by the one of the red square in the denominator of Eq. 3 causes a systematic negative artifact. This produces the apparent negative statistical clumped isotope effect. The anti-clumping signal is the larger the more different the individual isotope ratios of the indistinguishable atoms are. In the following chapters we derive the general mathematical formalism to calculate these apparent statistical clumped isotope effects in any multi-isotope system. We also show that the measurement of statistical anti-clumping in principle allows quantifying the heterogeneity of the isotopic composition of indistinguishable atoms in a molecule, i.e. to reconstructing the blue area from the red area in Fig. 1.
We emphasize that the apparent anti-clumping signature is not related to a physical isotope effect, but is a mathematical artifact that originates from the referencing convention. It will never occur when the contributing atoms are distinguishable (thus never for atoms from different elements, e.g. for 13 C-18 O clumping in CO 2 ), but it will always occur when two indistinguishable atoms of the same element combine in a molecule (e.g. for 18 O-18 O clumping in CO 2 ). Table 1 shows a selection of common atmospheric molecules and specific clumping signatures for which statistical anti-clumping will occur or not occur, respectively.

Clumping of indistinguishable atoms in one molecule.
Molecules with two atoms of the same element (e.g. N 2 , O 2 , H 2 ). As mentioned above, two indistinguishable atoms of the same element in one molecule may generally originate from different reservoirs or involve different fractionation effects such that their isotope ratios R 1 and R 2 represented two distinct pools when the molecule formed. Since the two atoms are now indistinguishable, we cannot independently measure the isotope ratios R 1 and R 2 . In fact, in conventional isotope ratio measurements of single substituted isotopocules the arithmetic average ratio of the two ratios (e.g. 29 R = 2 15 R av for 15 N measurements in N 2 ) is determined. For rare heavy isotopes (R 1 , R 2 ≪ 1), this average ratio 15 R av is generally similar to the bulk isotope ratio R bulk of the sample (see Supplementary Information). In this case, it is common and reasonable to assign R bulk ≈ R av to each of the indistinguishable atoms for further calculations. For the remainder of this paper, we use R bulk = R av , which considerably simplifies the formulas and removes the dependency of the apparent clumped isotope signal on the isotope ratio. The differences between using R bulk and R av are discussed in detail in the Supplementary Information.
The stochastically expected (random) ratio of isotopocules with two heavy atoms relative to the light isotopocules, from a population of atoms with average heavy isotope ratio of = However, when the atoms represent different isotopic pools with possibly different isotope ratios, the real clumped isotope ratio, R cl , of doubly substituted isotopocules relative to the light isotopocules is The apparent statistical clumped isotope Δ is the relative difference between the real and the stochastically expected clumped isotope ratio The clumped isotope composition is always negative, except for the case R 1 = R 2 , for which Δ = 0. Thus, when two atoms of the same element with different isotopic composition combine in a molecule, the resulting molecule will always have an apparent negative clumping signature. The black curve in Fig. 2a shows the size of this quadratic statistical negative isotope clumping according to Eq. (6). The negative clumping signal Δ does not depend on the absolute value of the underlying isotope ratios, but only on the relative difference of the isotope ratios. In the following we refer to this effect as "statistical clumped isotope signature". Eq. 6 was first derived for the case of formation of molecular O 2 in photosynthesis by Yeung et al. 14 , who indeed observed negative clumped isotope signals relative to the thermodynamically expected values for photosynthetic O 2 .
Generalization: Molecules with three or more atoms of the same element -complete substitution.
When a molecule contains three or more atoms of the same element, these atoms can generally represent populations with different isotope ratios R 1 , R 2 , … R n . We first consider the case of full heavy-isotope substitution, which is the generalization of the two-atom case presented above. As it is not possible to independently measure the individual isotope ratios R i of the indistinguishable atoms, the arithmetic average isotope ratio 18 R av (≈ 18 R bulk , see Supplementary Information) is assigned to each of the indistinguishable atoms for further calculations As the atoms are all assigned the same atomic isotope ratio R av , the stochastically expected ratio of fully substituted isotopocules relative to non-substituted isotopocules from this population of atoms is the n-th power of R av .
This is the n-dimensional equivalent of replacing blue rectangle in Fig. 1 by the red square. The real (= observed) ratio of fully-substituted isotopocules is the product of all isotope ratios involved, which is identical to the n-th power of the geometric mean of the isotope ratios Thus, the statistical clumped isotope signature for fully substituted isotopocules is This equation applies to any set of indistinguishable atoms in a molecule. Since the arithmetic mean is always larger or equal than the geometric mean, the statistical clumped isotope signal is always negative, except for the case where all ratios R i are identical, in which case the arithmetic and geometric means are equal and thus Δ = 0. Figure 2 shows the variation of Δ with the relative difference of the isotope ratios in the 2-, 3-, 4-and 10-atom systems. For the cases illustrated in Fig. 2a, only one isotope ratio is varied and the isotope ratios of all other atoms are kept constant and identical. For the same relative difference in isotope ratio of a single atom, the clumping signal decreases with increasing number of atoms. Although the 10-atom case may not be of much practical use, it is included to emphasize the point that the statistical negative heavy isotope clumping does not require the heavy isotopes to be linked directly by a common chemical bond. For example, statistical negative D-D clumping in ethane (C 2 H 6 , Table 1) may involve pairs of hydrogen atoms at any of the 6 positions.
The isotope combinations presented in Fig. 2a all include the point where all isotope ratios are equal, which corresponds to Δ = 0‰. However, in general, the isotope ratios of the atoms at different positions are not equal. Figures 2b,c show the clumping signal for the 3-and 4-atom cases when one ratio is varied again, and the other ratios are held constant, but at different values for the individual atoms. Now the situation that all isotope ratios are equal cannot occur and the parabola-shaped curves are shifted towards negative Δ values. The y-axis offset increases with increasing difference of the individual isotope ratios, thus with the heterogeneity of the isotopic composition of the individual indistinguishable atoms. The curves in Fig. 2b,c are selected 2-dimensional cross-sections of a multi-dimensional space, which illustrate the effect of varying one of the multiple isotope ratios relative to a fixed set of other ratios. In practice, a certain combination of isotope ratios among indistinguishable atoms will correspond to one single value of Δ and we will show below that the statistical clumped In all cases one isotope ratio is varied to produce the relative difference of this ratio from the mean of all isotope ratios (x axis scale). In (a) and the solid lines in (b,c) the other isotope ratios are fixed and identical. This case includes the situation where all isotope ratios are equal and Δ = 0‰. In (b) (for 3 indistinguishable atoms) and (c) (for 4 indistinguishable atoms) the solid lines are the same curves as in (a). The dashed lines show the case where one of the fixed isotope ratios is increased by 5% relative to the others, and the dotted line shows the case where one isotope ratio is increased by 10%.
Scientific RepoRts | 6:31947 | DOI: 10.1038/srep31947 isotope signal Δ is a measure for the heterogeneity of the isotopic composition of indistinguishable atoms in a molecule.

Molecules with three or more atoms of the same element -incomplete substitution.
Clumping of two heavy isotopes in molecules with three indistinguishable atoms (e.g. 18 O-18 O or 17 O-17 O clumping in O 3 or NO 3 ). As a first example we consider molecules with three indistinguishable atoms and calculate the clumping signature of double substituted isotopocules. The three atoms generally represent three isotopically different populations with isotope ratios R 1 , R 2 and R 3 . However, as the atoms are indistinguishable, the individual ratios cannot be determined and for further calculations they are assigned the average isotope ratio, which can be determined from measurement of the single substituted isotopocules The stochastically expected isotope ratio of isotopocules with exactly two out of three possible heavy isotopes relative to the light isotopocules from a population of indistinguishable atoms with assigned isotope ratio R av is The factor 3 gives the number of possible permutations of two heavy isotopes over three atom positions. The real probability for finding a molecule with exactly two heavy atoms from the three atoms with isotope ratios R 1 , R 2 and R 3 is Thus, the statistical clumped isotope signal for clumping of two out of 3 possible heavy isotopes, Δ 2/3 , is Clumping of two heavy isotopes in molecules with four indistinguishable atoms (e.g. D-D clumping in CH 4 ). We now consider molecules with four indistinguishable atoms and calculate the clumping signature of doubly-substituted isotopocules. The four atoms generally represent isotopically distinct pools with isotope ratios R 1 , R 2 , R 3 and R 4 . As the ratios cannot be determined individually, they are assigned the average atomic isotope ratio The stochastically expected probability for forming a molecule with exactly two out of four possible heavy isotopes from a population of indistinguishable atoms with assigned isotope ratio R av is . However, the real probability to form an isotopocule with exactly two heavy atoms from the four atoms with different isotope ratios R 1 , R 2 , R 3 and R 4 is  Since Δ can again be expressed as a negative sum of squares, it is always negative, except for R 1 = R 2 = R 3 = R 4 where Δ 2/4 = 0. Some examples for the statistical clumped isotope effect of two out of four heavy isotopes in a molecule are shown in Fig. 3b. An important example for this case is the D-D clumping in methane. The reservoirs that supply the different hydrogen atoms in the formation of methane can vary considerably and significant apparent statistical anti-clumping is expected.

Figure 3. Statistical heavy isotope clumping of two out of three (a) and two out of four (b) heavy isotopes.
In all cases shown, the isotope ratio of one atom is varied relative to the mean of the other ratios as indicated on the x axis and the other ratios are fixed. In the base case (solid line), the isotope ratios of the other atoms are identical. This includes the situation for which all isotope ratios are identical (Δ = 0‰), in case a (dashed line) one of the fixed isotope ratios is increased by 5%, in case b (dotted line) one of the fixed isotope ratios is increased by 10% and in case c (long-dashed line) one of the fixed isotope ratio is increased by 10% and another ratio is decreased by 10%.
For the general case of n indistinguishable atoms that represent isotopic pools with isotope ratios R 1 , R 2 , … R n , we assign again the arithmetic mean isotope ratio R av to each of the atoms.
The stochastically expected ratio of isotopocules with exactly m out of a possible n heavy atoms relative to the light isotopocules from a population of indistinguishable atoms with this average heavy isotope ratio R av is cl random av m , The real ratio of isotopocules with m heavy isotopes relative to non-substituted isotopocules (R cl ) from n atoms with isotope ratios R 1 ,  Table 1). However, it is also possible that the heavy 17  The stochastically expected (random) ratio of molecules with one 17 O and one 18 O isotope relative to 16  The real probability for forming 17   Therefore, signature 35 Δ is always < 0 ( = 0 if i R 1 and i R 2 are identical). This can also be shown by inserting the mass dependent fractionation relation as follows: As the atoms do not need to share the same bond to generate statistical isotope clumping, the equation also applies to 17 cl  17 18  1  17  2  18  2  17  1  18  1  17  3  18  3  17  1  18  2  17  3  18  3  17  2  18 and thus    Figures 2 and 3, respectively). In all scenarios two isotope ratios are kept fixed and the third isotope ratio is varied. In (a) the two fixed isotope ratios are identical, which includes the situation that all isotope ratios are identical and Δ = 0‰, in (b) one of the fixed isotope ratios is 10% higher than the other. 17 17 18  1  18  2  18  1  18  2  18  1  18  3  18  1  18  3  18  2  18  3  18  2  18  3  18   1  17  2  17  3  17  1  18  2  18 3 18

O 16 O from this population of O atoms is
The corresponding values of Δ are shown in Fig. 5 together with all other possible isotope clumping combinations for 17 Fig. 6 and the calculated value of Δ for the randomly generated sets of isotope ratios (colored points in Fig. 6). For each multi-isotope system, the outer envelope of the points gives the error within which the apparent statistical isotope clumping can be predicted for a certain set of isotope ratios when the relative standard deviation (stdev/average) of the isotope ratios of the indistinguishable atoms is known. The real clumping is calculated by considering explicitly all 3 configurations cl  52  1  17  2  17  3  18  1  17  2  18  3  17  1  18  2  17  3  17 and thus  (Table 1). In many cases, however, the isotopic composition of O 3 is determined without position information [30][31][32][33] and in this case the atoms can be treated as effectively indistinguishable and need to be treated according to the formalism developed here.

Figure 7. Difference between the analytical curves shown as black lines in
Statistical clumped isotope signals and the heterogeneity of the isotope ratios of indistinguishable atoms. The derivations above show that statistical combination of indistinguishable atoms in a molecule leads to apparent negative clumped isotope signals. The size of the apparent negative clumping in the molecule increases with increasing difference in isotopic composition between the individual atoms, so it may actually contain scientifically relevant information. In order to investigate this further, we created random (within a certain range) sets of isotope ratios for all multi-isotope systems with up to 5 indistinguishable atoms and calculated the apparent negative multi-isotope clumping signature Δ . Figure 6 shows that for each multi-isotope set all these random sets of isotope ratios yield Δ values that fall on distinct curves when Δ is plotted versus the relative standard deviation of the individual isotope ratios. For each of the multi-isotope clumping combinations with m out of n heavy isotopes, the curves can be parameterized as: Note that the Δ values in Fig. 6 are plotted as ‰, so the factor 1/2 in Eq. 35 corresponds to 500‰ in the fit curves.
In the case of m = 2 (2 heavy isotopes in any multi-isotope system) the fit is perfect, but for more heavy atoms clumping in one molecule there is some scatter around these fit lines. This originates from the fact that there is no fixed analytical relation between the arithmetic and geometric means. Figure 7 shows the relative deviation of the explicitly calculated Δ values and the approximation using Eq. 35 for the randomly chosen sets of isotope ratios with known standard deviation and average values. The outer envelopes of the point clouds for each multi-isotope Figure 8. Relative difference of the heterogeneity (relative standard deviation) of the isotope ratios of indistinguishable atoms calculated using Eq. 38 from the true heterogeneity of the randomly chosen sets of isotope ratios. For each multi-isotope system, the outer envelope of the points gives the error within which the heterogeneity of indistinguishable atoms in a certain multi-isotope system can be calculated from the apparent statistical isotope clumping signal Δ . system define the error with which the statistical clumped isotope signal Δ in a molecule can be predicted from Eq. 35 when the standard deviation and the mean of the individual isotope ratios are known.
Scientifically, the opposite relation is more attractive, since the isotopic variability among indistinguishable atoms in a molecule is usually not known, but Δ may be measurable 14 . This means that measurement of the statistical clumped isotope anomaly Δ could provide a novel tracer to determine the heterogeneity (quantified by the standard deviation) of the isotopic pools of indistinguishable atoms in a molecule. For example, in the 2-isotope system O 2 , a Δ value of 1.5‰ below the thermodynamically expected value as measured by Yeung et al. 14 would indicate a relative standard deviation in the isotope ratios of the two O atoms of about 5.5% (Fig. 6, blue curve). Figure 8 shows the relative error that is made when Eq. 36 is used to calculate the relative standard deviation of the isotope ratios from the Δ value for randomly chosen sets of (known) isotope ratios. The outer envelope for each isotope system quantifies the error with which ( ) stdev average for a group of indistinguishable atoms can be derived from Δ . Above Δ values of − 1.5‰ the relative error in ( ) stdev average is generally below 1%, and above Δ values of − 5‰ the relative error is still only 2%. Thus, whereas we are not able to measure the isotopic composition of individual indistinguishable atoms, the apparent statistical isotope clumping provides a means to obtain information about the heterogeneity of the isotope ratios with quite good precision.

Conclusions
The statistical combination of indistinguishable atoms with different isotope ratios in a molecule always leads to apparent negative clumped isotope signals. We emphasize the term apparent, because this signal does not relate to a physical negative clumping process. The underlying reason is that in the calculation of the stochastic reference value for calculating Δ , the actual isotopic composition at each individual atom position is replaced by the average of the isotopic composition of all indistinguishable atoms (which is similar bulk isotopic composition of the molecule for small isotope ratios). Thus the apparent statistical Δ is by nature an artifact originating from our limitation to measure the isotope ratios of indistinguishable atoms. Using the formalism presented in this paper, this apparent statistical clumping signal can be calculated for any multi-isotope system.
We have calculated here the pure apparent statistical heavy isotope clumping values. In nature these apparent clumping signals will always occur in combination with thermodynamic heavy isotope clumping and possible other kinetic isotope effects that lead to clumped isotope anomalies (Yeung 21 . The statistical clumped isotope signatures will always occur whenever two or more indistinguishable atoms clump together in a molecule. For isotope heterogeneities of a few percent, the effect is of the same order of magnitude as the thermodynamic effects in many molecules. Thus, it is important to take these effects into consideration when interpreting isotopic clumping of indistinguishable atoms in nature. Furthermore, when the statistical clumping signature can be separated from other contributions, its magnitude provides quantitative information on the heterogeneity of the isotopic composition of the indistinguishable atoms in a molecule.