Use of Fibonacci numbers in lipidomics – Enumerating various classes of fatty acids

In lipid biochemistry, a fundamental question is how the potential number of fatty acids increases with their chain length. Here, we show that it grows according to the famous Fibonacci numbers when cis/trans isomerism is neglected. Since the ratio of two consecutive Fibonacci numbers tends to the Golden section, 1.618, organisms can increase fatty acid variability approximately by that factor per carbon atom invested. Moreover, we show that, under consideration of cis/trans isomerism and/or of modification by hydroxy and/or oxo groups, diversity can be described by generalized Fibonacci numbers (e.g. Pell numbers). For the sake of easy comprehension, we deliberately build the proof on the recursive definitions of these number series. Our results should be of interest for mass spectrometry, combinatorial chemistry, synthetic biology, patent applications, use of fatty acids as biomarkers and the theory of evolution. The recursive definition of Fibonacci numbers paves the way to construct all structural formulas of fatty acids in an automated way.

Fatty acids (FAs) are of crucial importance for all organisms and many viruses. They occur, for example, within triglycerides, which serve as energy and carbon stores, and within phospholipids in biomembranes 1 . Many lipids such as diacylglycerol play an important role in cellular signalling. The importance of FAs is further underlined by differences between healthy and diseased cells, so that FAs are medically relevant biomarkers. Several FAs with conjugated double bonds exert inhibitory effects on cancer cells 2 . Many fatty acids and lipids are involved in fungal development and pathogenicity 3,4 .
A high diversity of FAs is beneficial for regulating various properties of membranes such as fluidity 5 , for optimizing packaging within lipoproteins as well as for their signalling function between plants and plant pathogens 6 . While an enormous multitude of more than 1000 different FAs occur in living organisms, only around 20 FAs are widely found. Palmitic (16:0), oleic (18:1) and linoleic acids (18:2; Fig. 1a) account for around 80% of commodity oils and fats 7 (numbers in parentheses indicate the numbers of carbon atoms and double bonds). There are striking differences in different kingdoms and lineages: while higher plants synthesize more than 300 different FAs, higher animals synthesize a far smaller range 8 . Despite the wealth of known FAs, it is not immediately clear whether all theoretically possible variants of a given chain length are really used in living nature because synthesizing all of them in a coordinate way would require many different enzymes.
Most natural FAs have even-numbered chain lengths up to 22 carbon atoms, while some FAs reach chain lengths of more than 30 (e. g., on plant cuticles) 7 . Even-chain FAs are commonly synthesized by condensing and reducing several two-carbon units from acetyl-coenzyme A molecules 1 . Odd-chain FAs occur in low quantities in many different species of microorganisms, plants and some animals 9 . For example, pentadecanoic acid reaches a level of approximately 1% in cow milk fat and is made by bacteria in the rumen 7 . Other examples are margaric acid (17:0), a common constituent of lipids, pelargonic acid (9:0), occurring as esters in pelargonium oil, and valeric acid (5:0), occurring in valerian 7 . Linoleic acid and α -linolenic acid (18:3) are two examples of polyunsaturated FAs (PUFAs) and are essential constituents of the human diet 1 . Less common dietary PUFAs such as eicosapentaenoic acid (20:5) are health-promoting and can be obtained from fish or algae 10 . Allenic FAs (two adjacent double bonds) and cumulenic FAs (three or more adjacent double bonds) are rare in nature due to their decreased stability 11 . At least three allenic FAs have been found in Phlomis (Lamiaceae) 12 . One of them is phlomic acid (20:2) with double bonds at positions 7 and 8.
Propionate and butyrate are short-chain FAs (SCFAs) produced by the microbiome in the human gut 13 . While unbranched side chains are most common, there are some examples of branched FAs such as phytanic acid, a chlorophyll catabolite 14 . Hydroxylated FAs include ricinoleic acid (Fig. 1b), cutin acids, which are the building blocks of the polymer cutin covering the aerial surfaces of plants 7 , and several dihydroxy-octadecadienoic acids (18:2) produced by the fungus Aspergillus nidulans and having effects on sporulation 3 .
The goal of the present study is to elucidate the combinatorial multitude of unbranched FAs when allowing for double bonds. This covers a large range of FAs. We here derive formulas for enumerating FAs in dependence on the number of carbon atoms involved. As the number of double bonds and, thus, the number of hydrogens can vary for any given chain length, the different forms are not necessarily isomers, but can be called congeners 15 . Many enumeration techniques in mathematical chemistry are focussed on isomers 16 , although some of these techniques can also be used for counting congeners [17][18][19] . In recent years, the study of congeners (which have similar structures and may or may not have different sum formulas) has attracted more and more interest 15,[20][21][22] . For example, the congeners of common persistent organic pollutants with at most p different substituents instead of hydrogens were enumerated by a graph isomorphism algorithm 21 . Gutman 22 calls the different Kekulé structures of aromatic compounds congeners. For the present analysis, we make use of the relatively simple chemical structure of unbranched FAs. We will exactly define in each case which congener classes of fatty acids we will analyse.
We derive both recursion and explicit formulas for unmodified and two classes of modified FAs. For each of these, we distinguish two cases depending on whether or not cis/trans isomerism is considered. Our analysis not only answers a fundamental question but may also support applications such as lipidomics, a high-throughput technology used for the simultaneous detection and quantification of a large number of lipid species.

Results
Following a broad definition that includes all chain lengths 7 , we here define FAs as straight-chain (unbranched) aliphatic monocarboxylic acids that contain carbon-carbon single or double bonds (Fig. 1a). Further below, we allow for modified FAs including hydroxy and oxo groups. For the sake of easy comprehension, we deliberately build the proof on the recursive definition of Fibonacci numbers and related series rather than on more sophisticated techniques of chemical combinatorics.
Unmodified fatty acids with cis-and trans-isomers combined. In a first approach, we do not count cis-and trans-isomers of unsaturated FAs separately. Allenic and cumulenic FAs are first neglected as well, but will be considered in one of the classes studied below. Let x n denote the number of theoretically possible, unmodified FAs involving n carbons. For n = 1, we just have the carboxy group linked to one hydrogen, which makes up formic acid (Fig. 1a). For n = 2, there is only one possibility to attach a methyl group to the carboxy group, giving rise to acetic acid. For n = 3, the saturated FA is propionic acid. However, there is also the possibility to insert a double bond, giving rise to acrylic acid (Fig. 1a). Thus, 1 2 and x 3 = 2. A general enumeration procedure can be derived by standard methods from discrete mathematics 23 .
We can code single bonds by 0 and double bonds by 1. As no two double bonds must be adjacent to each other, we look for the number of all binary strings (consisting of 0 and 1 digits) of a given length without consecutive 1 digits. The length is n− 2 because the carbon atom of the carboxy group cannot engage in a carbon-carbon double bond, and the remaining n− 1 carbons are connected by n− 2 bonds. The corresponding number series can be calculated by the recursion formula n For the concrete case of fatty acids, the recursion is explained in Fig. 2. For example, the three FAs for n = 4 are butyric acid (saturated, having its name because of presence in milk and butter), crotonic acid and 3-butenoic acid (both unsaturated) (Fig. 1a). Table 1 shows the Fibonacci numbers for n = 1− 22. The number series is illustrated in Supplementary Fig. 1.
Besides the binary string problem mentioned above, there are a number of equivalent problems in mathematics. In graph theory, a matching in a graph is a set of edges without common vertices 26 . That edge set corresponds to the double bonds in fatty acids because the latter must not be adjacent (see also Supplementary Fig. 2). The total number of matchings is the Hosoya index, which is particularly easy to compute for linear graphs and then leads to the Fibonacci series (cf. Supplementary Information) 27 .
It is known from number theory 24 that an explicit formula can be derived from the recursion formula (see also Supplementary Information): This allows one to calculate the number x n directly from n without the necessity to compute previous numbers first. Note that the explicit formula involves irrational numbers although all Fibonacci numbers are integers. In fact, the ratio + ≅ . (1 5 )/2 1 618 is the legendary Golden section 24,25 . Eq. (4) implies that the numbers of FAs show an asymptotically exponential growth with the basis of 1.618. It is understandable that the basis lies between 1 and 2 because there are two possibilities: single bond and double bond, yet the double bonds cannot be adjacent to each other.
An alternative way of proving that the diversity of unmodified fatty acids follows the Fibonacci series is by using a formula derived by Lucas: 24 where m equals the largest integer that is less or equal to (n− 1)/2. In this sum, each term can be interpreted such that it describes the number of possibilities of inserting k double bonds into a chain length of n, in such a way that no double bonds are adjacent to each other nor next to the carboxy end. Due to those constraints, the choice of Larger solid dots, variable chain length. Assume we know all x k from k = 1 up to k = n and wish to calculate x n+1 . We can certainly extend the molecule by linking one carbon to the n-th carbon by a single bond (left-hand side). Moreover, we can add two carbons to the molecule with k = n− 1, such that carbons n− 1 and n are linked by a single bond and carbons n and n + 1, by a double bond (right-hand side). Combining the two procedures (starting at n− 1 and at n), we arrive at Eq. (2). There is no overlap between the molecules thus generated because the x n molecules generated by starting from length n have a single bond at the methyl end, while the x n−1 molecules generated by starting from length n− 1 have a double bond at that end. Moreover, all possibilities of extending the molecules according to the defined rules are covered. positions is made out of n− k− 1 positions. That alternative way of computation is more cumbersome than Eq. (4), though, because sums of binomial coefficients need to be computed. When setting m to an arbitrary value less than the value mentioned above, Eq. (5) can be used to compute the number of FAs with at most m double bonds. In Table 1, we give the numbers, q n , of FAs with at most six double bonds, by way of example. Two examples of unsaturated FAs with six double bonds are docosahexaenoic acid (22:6), which has cardiovascular-protective properties 8 and is a structural component of several human organs, and nisinic acid (all-cis− 6,9,12,15,18,21-tetracosahexaenoic acid, 24:6) in fish. Up to n = 14, the series q n coincides with the Fibonacci numbers x n because a FA with 14 carbons can harbour up to six carbon-carbon double bonds. For higher n, the series q n grows more slowly, as can also be seen in Supplementary Fig. 1. Any other upper bound m on the number of double bonds can be considered in the calculation as well using Eq. (5). Now, we admit adjacent double bonds, that is, allenic and cumulenic FAs, and denote the number of FAs in this case by u n . Axial stereoisomers are not counted separately here. Both for n = 1 and n = 2, u n equals 1 because no double bond can be adjacent to the carboxy group. From n = 3 on, the number doubles for each additional carbon atom because there are two possibilities to extend the chain: u n+1 = 2u n . The series reads n n 2 where the first term u 1 needs to be defined separately as 1. In Table 1 and Supplementary Fig. 1, the series is compared to the one defined by Eq. (4).

Unmodified fatty acids with cis-and trans-isomers considered separately. When the FAs involve
non-terminal double bonds, cis-and trans-isomers can be distinguished. For example, the cis-isomer of crotonic acid is isocrotonic acid (Fig. 1a). This distinction is particularly useful when cis-and trans-isomers exert different biological functions or different effects on the structure of lipid membranes due to their different molecular shape 15 . Here, we exclude allenic and cumulenic FAs. Special attention needs to be paid to FAs with conjugated n x n y n z n u n v n w n q n double bonds such as in the various isomers of conjugated linoleic acid (18:2) or sorbic acid (6:2). As the corresponding double bonds and the single bonds in between form a π -system, the formal single bonds cannot rotate freely. This gives rise to so-called s-cis and s-trans isomers. However, the π interaction in these bonds is weaker than in the formal double bonds so that the isomers equilibrate quickly 28 . Therefore, we will only consider cis-and trans-isomers with respect to double bonds and neglect s-cis/s-trans isomerism here. For n = 1 and n = 2, no double bond is possible. So, the first two numbers in the series are equal to 1. From n = 2 on, there are two cases if we add the (n+ 1)-th carbon: (a) There is a single bond at position n. Then we have two possibilities: Adding a carbon by a single bond or a double bond. (b) There is a double bond at position n. Then we have again two possibilities: Adding a carbon by a single bond in cis configuration or in trans configuration.
As in both cases, the number doubles for each additional carbon atom, we obtain the same series as given in Eqs. (6) and (7).

Modified fatty acids with cis-and trans-isomers combined.
Let us now consider modified FAs, again excluding allenic FAs. Neglecting cis-trans-isomerism first, we start by allowing oxo groups so that one or several carbons can be linked with oxygen atoms by double bonds. An example is acetoacetic acid (n = 4) (Fig. 1b). Biosynthetically, oxidized FAs can be oxylipins (oxidation products of unsaturated FAs) or polyketides 29,30 .
Denoting the number of modified FAs by y n , we obtain the recursion formula (for derivation, see Supplementary  In contrast to the Fibonacci numbers, we have because an oxo group occurs already in glyoxylic acid (Fig. 1b). Together with these initial conditions, Eq. (8) leads to the series given in the column for y n in Table 1 and plotted in Supplementary Fig. 1. In mathematics, they are known as the Pell numbers or 2-Fibonacci numbers 24,25,31 and obey the explicit formula n n (cf. Supplementary Information), where the basis is the Silver ratio. They represent one instance of generalized Fibonacci numbers given by a linear combination (other than the sum) of the two preceding numbers 32 . The same series and formula applies to the case where hydroxy groups are allowed instead of oxo groups. In that case, hydroxy groups as part of an enol moiety are not counted because a rapid equilibrium generally favours the corresponding form with an oxo group (keto-enol tautomerism 28 ). Similarly, geminal diols are easily converted to the corresponding ketones or aldehydes by loss of one water molecule 28 . Thus, they can be considered equivalent to those molecules. Moreover, we exclude the case where n = 1 and a hydroxy group is linked to the only carbon. The corresponding compound, carbonic acid, is an inorganic compound and not considered a FA. When hydroxy groups are included, also different stereoisomers (with R and S stereocenters) can occur, which are not, however, counted separately here. Now we further extend this analysis to modified FAs that can contain both oxo and hydroxy groups. We obtain the recursion formula (see Supplementary Information) and the initial conditions 1 2 This leads to the series z n in Table 1 given by the explicit formula Supplementary Information, plot in Supplementary Fig. 1). In mathematics, these numbers are called 3-Fibonacci numbers 24,25 . The recursion formula (14) corresponds to another instance of generalized Fibonacci numbers 32 . The initial values can be derived from the first chemical structures,

Modified fatty acids with cis-and trans-isomers
Although Eq. (14) has recursion depth two, three initial values are needed here (see Supplementary  Information). From n + 1 = 4 on, Eq. (14) generates the number series v n given in Table 1. This is also obtained by Eq. (15) for all positive integers n.
Now we consider the case where oxo groups and/or hydroxy groups occur and use the symbol w n . In the Supplementary Information, both a recursion formula and an explicit formula are derived: With the relevant initial conditions we obtain the number series given in Table 1, column for w n . Eq. (19) holds from n + 1 = 4 on.
In the special case of vicinal oxo and hydroxy groups, these groups can swap places due to keto-enol tautomerism involving the corresponding enediol intermediate. This reduces the number of really different FAs. We leave it to future studies to consider this effect in enumerating FAs.

Discussion
As a starting point for the enumeration of lipid congeners, we deliberately restricted our analysis to certain classes of FAs. Our goal was to derive enumeration formulas for the biochemically most relevant classes of FAs. For example, unmodified FAs with cis and trans isomers considered separately but neglecting allenic and cumulenic FAs are the basic group of FAs that are usually considered in biochemistry textbooks. We have first derived the number of unbranched, unmodified FAs with different numbers of double bonds to be given by the Fibonacci series when cis/trans isomerism is neglected. By building the proof on the recursive definition, it is very short and easy to comprehend. Also in enumerating Kekulé structures in chemistry, recursive relations are often used 33,34 . In the special case of polyphenanthrenes 33 and related nanotubes 34 , they lead to Fibonacci numbers as well.
As described above, part of our results can also be derived from the concept of Hosoya index 27 . Nevertheless, to the best of our knowledge, the use of (usual and generalized) Fibonacci numbers for enumerating FAs has not been published before. Extending the present analysis, the Hosoya index can be used also for enumerating branched fatty acids. As shown above, the congeners of fatty acids modified by functional groups can be enumerated by generalized Fibonacci numbers.
Fibonacci numbers are named after Leonardo Pisano, called Fibonacci, although they had been found more than one millennium earlier by scholars in ancient India when studying Sanskrit poems (see Supplementary  Information) 35 . In his book "Liber abaci" from 1202, Fibonacci derived this series by studying the population dynamics of rabbits. For a sketch of his life in medieval Italy, see ref. 16. The Fibonacci series frequently occurs in biology such as in phyllotaxis and secondary structures of proteins 25 . Interestingly, the Fibonacci sequence is also employed to model X-ray diffraction patterns of films of mixed FA salts 36 .
Although known for a long time, the Fibonacci numbers have lost nothing of their fascination, and more and more fields of application are found. Here, we have shown that fatty acids, which are important molecules in our own body, obey that appealing arithmetics. This finding also applies to analogous classes of terminally monosubstituted hydrocarbons such as aldehydes, alcohols or aliphatic amino acids 37 .
In the case of unmodified FAs, considering cis/trans isomerism leads to a simple exponential series with the basis of 2. As far as we know, that observation has not been published before either. This number series occurs in the well-known legend of wheat grains on the chessboard 38 . The legend says that a Brahmin called Sessa, the inventor of the game of chess ancestor, chaturanga, was entitled to request a prize from the king. The man asked that on the first square of the chessboard, he would receive one grain of wheat (in some tellings, rice), two on the second one, and so on, doubling the amount each time.
It is somewhat surprising that considering isomerism makes formula (4) for unmodified FAs even easier. For other classes of molecules, it is the other way round. For example, considering stereoisomerism in aliphatic amino acids makes the enumeration more complicated 39 and so does considering cis/trans isomerism in the enumeration of modified FAs.
The ratio of two consecutive Fibonacci numbers tends to the Golden section (Supplementary Information). Therefore, starting from an (unmodified) FA of a given length and investing one more carbon atom, an organism can increase the variability of the FA approximately by the Golden section factor, 1.618, or by 2 when cis-and trans-isomers are counted separately. Furthermore, the fraction of FAs with a terminal single bond is approximately the inverse Golden ratio, 0.618, or 2/3 if cis-and trans-isomers are counted separately (see Supplementary Information).
An interesting question is why most FAs used in living organisms have chain lengths of [16][17][18][19][20][21][22]. A biological constraint arises from the thickness of biomembranes, while a physico-chemical constraint is that melting temperatures increase with increasing chain length 40 so that very long FAs might be too rigid to be used in organisms.
Although interesting, it is beyond the scope of this paper to study in detail which of the theoretically possible FAs are used in living organisms. An impressive number but certainly not all FAs are used in reality. For example, the Fibonacci number x 18 for unmodified FAs is 2,584 and u 18 = 2 16 = 65,536 (Table 1). Several FAs with n = 18 play a role in biology: stearic acid (18:0), oleic acid and its trans-isomer elaidic acid (one double bond at position 9), linoleic acid (two double bonds at positions 9 and 12) (Fig. 1a), vaccenic acid (one double bond at position 11) and the various isomers of conjugated linoleic acid (two double bonds, e.g. at positions 9 and 11, or 10 and 12). A few of the latter as well as vaccenic acid occur in cow milk 41 .
The following general observations on naturally occurring FAs are worth noting: I. Practically all chain lengths up to about 35 occur in biological systems 7 . II. Comparing the numbers of naturally occurring FAs with the potential numbers, there appear to be peaks at chain lengths of 16 and 18 1 . However, all shorter lengths occur as well, e.g. capric acid (10:0) occurring in coconut oil and goat milk and inhibiting the yeast-to-hyphae transition of the fungus Candida albicans 4 , or cis-2-decenoic acid (10:1) made by Pseudomonas aeruginosa 42 and the various medium-length FAs mentioned above. III. FAs rarely involve more than six conjugated double bonds. Therefore, coloured FAs can only rarely be observed in living organisms. The yellowish colour of some adipose tissues comes from carotenoids 43 . Longer conjugated systems show lower stability and, thus, are sensitive to light. It is believed that methyl branches and terminal cyclohexenyl groups, as observed in carotenoids, contribute to increased polyene stability 44 . Two examples of rare modified FAs with extensive conjugated systems are laetiporic acid and its derivative 2-dehydro-3-deoxylaetiporic acid produced by the basidiomycete Laetiporus sulphureus, with the former FA being its major orange pigment 45 . They include 10 conjugated carbon-carbon double bonds and one methyl branch each, and the latter even has a further, non-conjugated carbon-carbon double bond. IV. Double bonds often occur at a distance of three carbons and then are called homoconjugated. Examples are provided by nisinic acid (24:6, see above) and adrenic acid (all-cis-7,10,13,16-docosatetraenoic acid, 22:4).
Beside the academic interest 39 , a promising field of application of this analysis is lipidomics, in which the entirety of a cell's lipids is studied under different conditions 46 . Lipids and their constituents are most commonly identified by mass spectrometry (MS), and quantification is typically based on comparison of mass-spectrometric ion intensities between individual lipids and suitable standards 47 . Similar to the more advanced proteomics field, the generation of lipidomics data by MS relies on accurate metabolite databases. In analysing the spectra, it is very helpful to know the maximum number of compounds that can potentially appear. Lipid databases are required for the identification in high-throughput and can also guide fragmentation experiments 48 . The formal description presented here may help refine lipid databases and thereby facilitate automated lipid identification as well as in the screening of fungistatic compounds. Moreover, in patent applications related to chemical compounds, it is crucial to know the number of (existing or potential) compounds for which a patent is filed. As outlined in the Supplementary Information, other applications of the presented results concern several aspects of synthetic biology and the understanding of how chemical complexity arose during evolution.
In future studies, it is of interest to derive formulas for larger classes of FAs, using more sophisticated and complex methods. For example, stereoisomers of hydroxylated FAs and the (rarely occurring) FAs involving triple bonds can be studied (for enumeration of amino acids involving triple bonds, see ref. 37). The number of different FAs allowing branching and only carbon-carbon single bonds can be computed in the same way as that of primary alcohols or of aliphatic amino acids only involving single bonds 39 . One suitable method for doing so is based on Pólya's enumeration theorem 49 . Furthermore, the probability of having double bonds at specific positions (such as ω -3, 6, 9) can be studied.
The recursive definition of Fibonacci and generalized Fibonacci numbers paves the way to list all structures successively in silico. With the above-mentioned coding of FAs as binary strings such as 0001010 and appropriate software, this can be translated into the chemical structures.