Open questions in understanding life’s origins

The chemical space of prebiotic chemistry is extremely large, while extant biochemistry uses only a few thousand interconnected molecules. Here we discuss how the connection between these two regimes can be investigated, and explore major outstanding questions in the origin of life.

species, ecosystems, etc. 9,10 ). Network development may also lead to the emergence of novel phenomena 9 : new graph rewriting rules may be created by networks, for example by creating new phases which alter some reactions' kinetics. How networks achieve closure and simultaneously bring about "internal causation," in which species are created by catalytically closed reaction networks, is a complex but addressable question. Previous models have demonstrated how these types of emergent systemic properties may have contributed to the origins of life [11][12][13] , but models with more precise chemical predictivity are needed.
There are several fundamental problems in understanding the chemical origins of life which require study in the context of networks and their closure (summarized in Fig. 1).
1. Understanding how individual chemical reactions concatenate to expand reaction networks. To understand the transition from prebiotic chemistry to biochemistry, it is important to first understand the generation of complex chemical networks from some kind of "primordial" feedstock. "Wet" chemists approach this problem by doing experiments, and defining rules to predict outcomes of reactions based on these experiments. This approach is of course the most realistic, but studying the whole chemical possibility space could be time consuming (if at all possible). A more time efficient approach is to formalize chemical rules using "graph grammars" 5,6 , and use those rules to predict "real-world" chemistry in silico. However, this type of approach leaves many problems unresolved, for example important reaction pathways may be excluded because they are not intuited using human or machinelearned screening. The expansion of networks using formalized grammars creates complex reaction networks, but kinetics affect the abundances of products, which influence downstream network dynamics. Reaction networks primarily grow and self-limit when they run out of feedstocks, not because they are limited by "possibility space". 2. Exploring the relationship between networks and chemically realistic catalysis. As reaction networks grow, they may create new compounds capable of influencing network development by acting as catalysts, which enable new reactions or enhance one or more reactions relative to the network. The creation of network-influencing catalysts by reaction networks has been addressed by Kauffman's binary polymer model, which examines how catalysts endowed with randomly-assigned kinetic enhancement properties affect network closure 12  Unlike in the Kauffman model, however, real catalysis is an inherently three dimensional problem wherein the catalytic molecule interacts with a reactant or transition state to alter the energetics of the reaction. Because of this three dimensional nature, it is then reasonable to assume that catalysis may be transferable to other reactions with similar shape and electrotopological character. It is presently unknown how this type of three dimensional biasing effects the behavior of reaction networks. Rule-based reaction network expansion does not have an inbuilt mechanism for the discovery of this type of catalysis, or the estimation of its kinetic effects. Reaction-expansion methods presently poorly predict these kinds of reaction feedbacks, including ones that may steer stereochemistry. 3. Understanding the principles of spontaneous phase separation leading to cellularization. Diversity-generating reaction networks may create products capable of producing phase separation, which provides new playgrounds for network growth, including covalent and non-covalent aggregation and compartmentalization, which forms the basis of phenomena like Polymerization-Induced Self Assembly 14 .
The types of phases prebiotic chemistry is capable of generating, and the nature of their interactions, are likely more complex than generally appreciated 15 , and new phases may in turn create novel network dynamics 6 . 4. Understanding the origins of activation and group transfer processes. Biochemistry enables formally thermodynamically impossible reactions to occur by coupling activation to environmental energy sources and novel catalytic processes. Network generation assumes reactions proceed in thermodynamically favored directions, thus they must be fed. How chemical energy was supplied to primitive environments would have affected how prebiotic reaction networks developed. Before chemical networks were able to produce and take advantage of the multiple benefits created by the emergent phenomena discussed above, it is unclear how networks adapted to the changing availability of chemical energy provided by the environment. There was likely a temporal and qualitative order to the development of energy exploitation processes. Using models employing realistic thermodynamics, England 16 has suggested how external energy inputs can drive systems towards increased local order, while increasing entropy generation globally. 5. Understanding the origins of hierarchically structurally decoupled catalytic encoding. The central dogma describes how contemporary biochemical information flows essentially unidirectionally between genotype and phenotype occurs from DNA to RNA to protein in cells 17 , providing a connection between genetic inheritance, mutation and natural selection. How this mapping arose is perhaps the largest open question in the emergence of life. This information flow is mediated by sophisticated covalent and non-covalent interactions and belies the possibility that there may have been alternative earlier flows during the early chemical and biochemical evolution.
Language is one of the "breakout" phenomena which define humanity 18 . A language is a codified system of representation in which objects are represented as symbols. "Grammars" are basic rules which allow for "syntaxes" (concatenated rules which make sense within a language's rules) at each level of representation to function. The connection between lower-level precisely-defined rules and more abstract language rules means that low-level rules interactively and stochastically construct "meaning" in languages. Chemically, this allows for indirect coupling of structure-based catalysis with larger reflex arcs. Viewed in this light, the phenomenon of hierarchical emergence is reminiscent of the analogical mapping between the genetic code and human language.
Language involves the mental encoding of concept, followed by expression via speech, followed by auditory or visual reception, and finally mental decoding into received "meaning." Molecular interactions may not be directly "about" anything, molecules react and interact according to rules. The concept of "something being about something else" depends on concatenated rules and systemic context 19 , and is a form of meta-catalysis.

Outlook
Undoubtedly, understanding these interlinked and emergent phenomena will benefit from close collaboration between experimental and computational chemists. For example, discovering small molecule catalysts requires algorithms which can evaluate potential intermolecular interactions and their effects on network kinetics. Literature-based prediction is currently only poorly capable of predicting the effects of such catalysis, however medicinal chemists routinely use docking techniques to rapidly screen non-covalent interactions of drug candidates with enzymes 20 . These methods may be adaptable to rapid screening for potential catalytic interactions among reaction network products expanded using graph transformation rules.
At the level of understanding phase separation and selfassembly, the molecular properties (e.g., K ow , LogP, LogD, etc.) that enable non-covalent molecular aggregation can be estimated using chemoinformatics techniques 21 . However, these methods are presently imprecise since the formation of micelles or vesicles depends on understanding higher order solvent-interaction effects, requiring the use of still other computational techniques. Such effects require their own parameterization and evaluation, but it should be possible to cull the local regions of network chemistry which can give rise to them to save computation time, and careful in vitro screening of large amphiphile libraries would improve the predictivity of computational methods.
Rapid methods for the discovery of meta-catalysis may provide insight into the observed "jumps" evident in evolution 18 . Metacatalysis is indirect, and may help explain the origins of heritable and mutable information coding, which enable the responsiveness and adaptivity of reaction networks to external stimuli. To discover meta-catalytic phenomena, in silico generative networks need to be analyzed using computationally intensive tests at the level of interactional catalysis, which is not presently simple, and then re-evaluated with every other molecule in a network, to see if new "meaningful" hypergraphs are created. The models in turn need careful vetting using wet chemistry.
The RNA World concept provides an example of what is lacking in origins models. Chemists have provided ever more "prebiotically plausible" syntheses of RNA 22 , and SELEX experiments have shown it is easy to isolate and amplify molecules which bind target molecules 23 , but a recent computational exploration of nucleic acid space found more than a million possible alternative backbones to deoxyribose/ribose life as we know it uses 24 . None of these approaches can presently address how reaction networks make compounds in thermodynamically and kinetically feasible ways, and predict how the resulting products interact to modify the reaction networks which produce them.

Summary
The transition from reaction mechanisms predetermined by physical chemistry to catalyzed network-pruning reactions, to self-sorting phase generating reactions and indirect catalysis form fundamental questions on the origin of life. Both experimental and computational approaches will help understand these transitions. However, numerous problems need to be solved to be able to apply computation in meaningful, tractable ways. Borrowing and adapting techniques from other disciplines is likely the most straightforward method of making progress in this area. Refining these approaches will help focus studies in experimentally testable ways.