A robotic prebiotic chemist probes long term reactions of complexifying mixtures

To experimentally test hypotheses about the emergence of living systems from abiotic chemistry, researchers need to be able to run intelligent, automated, and long-term experiments to explore chemical space. Here we report a robotic prebiotic chemist equipped with an automatic sensor system designed for long-term chemical experiments exploring unconstrained multicomponent reactions, which can run autonomously over long periods. The system collects mass spectrometry data from over 10 experiments, with 60 to 150 algorithmically controlled cycles per experiment, running continuously for over 4 weeks. We show that the robot can discover the production of high complexity molecules from simple precursors, as well as deal with the vast amount of data produced by a recursive and unconstrained experiment. This approach represents what we believe to be a necessary step towards the design of new types of Origin of Life experiments that allow testable hypotheses for the emergence of life from prebiotic chemistry.


Hardware
The automated platform system consists of four main elements: the input system, the reactor, the inline sample system and the sample storage. All elements are run by a modular python code written in-house. As a schematic of the whole build up is shown in the manuscript as Figure 2, so here we only elaborate on specific elements of the setup.
The input system consists of three Tricontinent syringe pumps each connected to a Tricontinent rotary 6-way distribution valve. Every valve is connected to six starting material bottles, resulting in a library of 18 input choices. Liquid is transported through 1/8 (3.2 mm) PTFE tubing with PEEK fittings.
The reactor, is a 100 mL round bottom flask with a specifically tailored head allowing attachment of screw fittings for the tubing connections and a reflux condenser, which is kept under a positive pressure of nitrogen gas, controlled by a flow controller (model 0254 by Brooks Instrument), ensuring a controlled reaction atmosphere. The temperature is controlled via an IKA RET 'control-visc' hotplate with a USB interface.
For the in-line analytical system, shown in Supplementary Figure 1 below, an HPLC-DAD (diode array detector) system is coupled to an Advion L-series benchtop mass spectrometer.

Supplementary Figure 1:
In-line sampling and analytical system with external sample loop on the left, HPLC-DAD in the middle followed by a split valve, which injects into the ESI-MS on the right The sample is drawn from the reactor by a peristaltic pump (Vapourtec model SF-10) and filtered through two 0.2 µm size nylon syringe filters fitted between a pair of Luer (Male) and Flat Bottom (Female) ETFE/Polypropylene adapters. It is injected into a 16 cm sample loop on an external, directly controlled, HPLC six-port switching valve. The sampling is triggered through the python code and actuated through an Arduino Mega 2560/RAMPs combination. Once the loop is loaded with fresh sample, the external sample valve is switched to flush the loaded sample loop with mobile phase from the LC-system and deliver the sample directly on the column. After the column, the sample moves through the diode array detector (DAD) and then into a split valve, which reduces the flow from 0.5 mL/min to 0.2 ml/min (the excess going to waste) ready for direct injection into the electrospray ionisation mass spectrometer (ESI-MS). This is triggered through a contact closure from the HPLC system, ensuring no delay in the parallel run time of both instruments. The sample storage system (Supplementary Figure 2), which allows the storage of up to 20 samples of 50mL each in plastic sampling tubes, is build around an in-house designed 3D printed wheel. This wheel is based in a Geneva drive mechanism motion, allowing step-by-step increment of positions with high accuracy. The box in which the wheel system is contained is built from custom cut V-Slot aluminium rails. The column to hold the driven wheel, the drive wheel that increments the driven wheel, the stepper motor securing element, the supporting levelling arches and the solution dispensing part are all 3D printed on a Connex 500 printer with the translucent material RGD720. The base plate and the vial plate for holding the 50 ml plastic sample tubes is laser cut from acrylic plate. The levelling arches are all suited with an 8 mm ball bearing, and the Nema11 stepper motor that increments the drive is controlled through an Arduino MEGA.

Supplementary Figure 2:
Custom build sample storage system providing space for up to 20 samples of a volume of up to 50 mL per sample, which are collected in plastic centrifuge tubes. The lids of the tubes (orange), sit on the acrylic plate and are holding the vials in place. The 3d printed parts are shown in blue.

Dilution vs. product persistence
In this study we are interested in the persistence of products, not of input reagents. If the experimental concept were linear and solely be based on periodically changing input compounds, every product would become diluted over time. However, our experimental concept is different, and with the inclusion of minerals, the reaction system is not linear. Over time, many of the components of the mixture will be diluted to infinitesimally low concentrations, but not all. Binding to mineral particles can lead to product build up on their surface which may then lead to species amplification, as well as a change of the species concentration in the reactor over time. Products in our system can be made from multiple reactions / input compositions, including the breakdown components of other product species, which can lead to some specific product species becoming prevalent, even under different input conditions.
The critical question is therefore not about what will happen to the reagents upon serial dilution, but rather what products are formed and whether those products are formed robustly from the reaction mixture or only marginally. Those that are formed robustly (e.g. from multiple sets of reagent combinations) will persist (be amplified) in the face of dilution. When specific products persist through many reaction cycle, we say they have been amplified by the reaction, while all species that do not increase will be diluted out.

Chemical selection process
The starting material library was mostly chosen by purposefully avoiding specific chemical groups, meaning no "common" autocatalytic cycle precursors, sugars or amino acids have been selected. The aim was to solely concentrate on small functional building block molecules with no immediate connection or function. The following table shows the list of selected building blocks and some potential properties.

Electrospray-Ionisation Mass Spectrometry (ESI-MS)
The Bruker Maxis was used for offline analysis. Data is shown in the SI in Figure 11 to 16. The sample was run through a DIONEX Ultimate 3000 series HPLC-DAD set up with a RS (rapid separation) pump.  (1), showing the Mass Index calculation. After thresholding( 10 6 intensity), the heaviest and lightest peak are subtracted from each other to determine the mass range and divided through the number of peaks over threshold.

Mass Index trends
This table summarizes the expected behavior of the Mass Index in various cases.

General
All data was collected as described above. As the size of the datasets (over 1000 samples have been collected) do not allow to present every single sample, we show one experiment in detail, as an example for all experiments discussed in the manuscript. This data analysis was complementary, as the main focus of the project was to algorithmically analyse and systematically investigate the experiment instead of screening for every single species which could be found in the analysis. The experiment which was chosen as the example here is referred as Run G in the manuscript.

Mineral controls
To control the influence of minerals of the experiment, cycles of run G have been manually repeated with and without minerals added.
The experiment has been carried out in 24-hour cycles but based on run G. The first test was the first input set of run G (as described in 4.1) recursed every 24 hours with and without minerals. For the second test, the composition of the replenishing input material was changed in every cycle, e.g. the first input set which was oxalic acid, pyridine and catechol in the first cycle and the second input set catechol, nitric acid and pyruvic acid followed by the third again based on the next input set of run G.
This was as well done with and without minerals. The third test have been minerals stirred in water, in

Offline HPLC-ESI-MS
In the following section, we show run G measured on a more sensitive mass spectrometer, a Bruker MAXIS. As we emphasised before, it is not the goal of these experiments to identify every single species but following we show an example approach which could be used for this. However, as this would be too time consuming in reality, does not present a real option.
compared to each other. The grey areas will be further discussed below.

Experimentally found values
Following is a list of found m/z values in the Bruker spectra and proposed molecular formulas based on this value. Differences between the measured m/z value and the monoisotopic mass can be caused by common adducts in positive mode MS like hydrogen, sodium or potassium.

Matching experimental and theoretical formulas
The network model described in the main manuscript, predicted 2206 possible reactions with the used input library. The theoretical number of the resulting products of these reactions is 429 (multiple reactions can lead to the same product). These products have been checked with the proposed experimentally assigned formula above in Supplementary

Product persistence over dilution
The product peaks shown above in 4.2.2 have been tracked by their intensity over the duration of the run time. Supplementary Figure 22 shows that some product species do persist throughout the run while others are being diluted out after a change in the input composition.
Supplementary Figure 22: Comparison of product peaks of run G. The purple line shows a peak not present in cycle 1 but rising through the whole run up from cycle 2. The green line shows a peak which is high in the initial cycle but is not persisting throughout the run. The dark blue line shows a product peak, present through the full run and the pale blue line shows a peak which is high with the initial input composition but decreasing throughout the run.

Architecture
Supplementary Figure 23: The general architecture of the software controls of the recursive reactor responsible for the background work. The platform code (dark blue) controls the overall experiment, initialises the analysis and manages the hardware control. The data source (green) takes the data from the Advion MS and calculates it into a value which the decision maker (pale blue) can interpret and pass back to the platform code for experiment adjustments.
The control software, written in Python 3, is designed to be modular as well as extensible and incorporates data analysis functionality spread over several Python classes (Supplementary Figure 23).
The data source in the form of the DataSource class is responsible for monitoring the directory in which the analytical data is saved for changes, calculating the appropriate measure of complexity and exposing it to the decision-making algorithm. The DataSource class was designed to utilize various data types and formats; in addition to the LC-MS data used in the present work, other inputs such as chromatography and nuclear magnetic resonance spectroscopy could also be employed.
The decision-making itself is the responsibility of the DecisionMaker class, which implements the algorithm described below; based on the outcome, it sends the appropriate command to the main hardware control loop, as well as saving the results of the calculations to a file for any required offline analysis and plotting.
The main hardware control loop is tasked with interfacing with all the physical components of the system: the stirrer/heater, the pumps and the valves. Whenever a run is started, it also initializes the data source, specifying the directory in which to wait for new data, as well as starting the decision maker in a separate thread to prevent blocking.

Measure of complexity
As mentioned above, the measure of complexity will be specified differently depending on the data format and the analytical technique used. For the bulk of the present study, we focused on the mass spectrometry data and on the so-called Mass Index, used by our group earlier for monitoring complexity changes in amino acid mixtures.
The Mass Index is a real number carrying information on the number of peaks and the mass range in a given mass spectrum. Since each data point in our case was a total ion current (TIC) chromatogram recorded by the LC-MS equipment, a modified approach was required. Upon successful acquisition, the data was exported on-line to the NetCDF file format. This was read using the Python library netCDF4 to convert it to an in-memory array accessible to Python. The resulting array could be thought of as a stack of mass spectra, each entry corresponding to a certain retention time. For each mass spectrum, the Mass Index was calculated by discarding all peaks whose intensity was less than 10 6 , subtracting the lowest m/z value in the spectrum from the highest, and dividing the result by the number of peaks. If there were no peaks left after applying the intensity thresholding, the value 0.0 was returned.
The general visual idea for the process of extracting the Mass Index is presented in Supplementary deemed to be sufficient for meaningful slope calculation. In our case, the slope was taken over 5 most recent points, as long as at least 10 data points have accumulated since either the start of the run or the last input switch.
Each time a new data point is detected and the corresponding m/z index measured, the algorithm first checks if the condition above is metif it is not, it waits for the next data point. As soon as the 10th data point arrives, the slope is taken using simple linear regression against the vector [0, 1, 2, 3, 4].
When the slope is above the threshold specified above, the least recent data entry is removed from the list, replaced with the one just recorded and the algorithm waits for the next data point. For the slopes below the threshold, the DecisionMaker class sets the relevant boolean flag in the configuration file to True. At the start of the subsequent cycle, when the hardware loop detects that flag, it proceeds to the next entry in its list of randomly generated chemical input sets.

Alternative algorithms tested
Based on the issues that have been highlighted, with the initial used mz algorithm, alternative ways to evaluate the data were found. As one of the problems of the Mass Index, was the fact that there was no relation between the mass of the peak and its intensity, it was interesting to investigate how the data would change if the intensity would be multiplied by the mass of the corresponding peak.

Adaptive Mass Index
An adaptive formula was developed to check whether the original mass index was sensitive to 1) the hard threshold used, and 2) outliers at the higher and lower end of the spectrum resulting in spurious values for M_max and M_min. We have shown the results of this analysis applied to Run 10 in Figure   27. To address (1) we have counted peaks partially by using a sigmoid function centered on the threshold to assign weights to the peaks. This means that peaks with intensities well below the threshold (<1%) have weighted closer to 0, and peaks well above the threshold (100x) are weighted effectively Dataframes are objects to store tabular data, specific for the data analysis toolkit Pandas, which can be used in Python. This thresholding strategy is used for every described code below. observe around cycle 20, which drops again afterwards. The algorithm would have clearly made different input choices than the experimental algorithm, but if this algorithm would make more sense than the mx index that was in use is questionable.

Weight by intensity index
An evaluation method based on the weight and the intensity of each peak was developed. In this calculation, the aim is to get a number z, which is the sum of all intensities multiplied by their mass value over each spectrum.

= ∑ * .
When this value is high, the sample has a higher amount of larger products, with a stronger signal.
As just multiplying every value would lead to very high numbers, the intensity is normalised before

Information entropy value
In this approach, a code is developed to compare the cycles of a run based on their information entropy.
Entropy is often defined as a value for the disorder of a system, but in this case, it is used to describe the information content in our system (2). The information entropy of a spectrum is defined as: = − ∑ * ln ( ).
Where = , and = ∑ , is the intensity of peak normalized to the total intensity of the spectra.
This leads to a value which will be lower when the sample has fewer, larger peaks and higher when the sample has many peaks of comparable size. The code starts in a similar manner to the previous ones  Figure   31 shows a moderate course until the index rises rapidly after cycle 60.
If we compare the algorithmically produced data of each run individually, we will compare the modified Mass Index, the weight by intensity and the information entropy as the same thresholding strategy was used in all 3 calculations. It turns out, the different algorithms give almost complimentary information about the individual runs. While run E has the highest weight by intensity value in comparison with the other 3 runs, which leads to the idea that this run has many product species of high mass value and high abundance, the information entropy of this run shows a drop around cycle 120. This means that in this range of cycles, rather than having a high amount of many product species we got a few very large peaks, which is plausible based on the weight by intensity value, as this value cannot distinguish between a high amount of peaks and a few intense peaks, in opposite to the information entropy value.
On the same side, when considering the Mass Index, it suggests a number of larger peaks as if there would just be one dominant peak, this value would be high too. For run F, the weight by intensity index is relatively low throughout the run, this would lead to the idea, that there was a lower amount of species and the overall abundance of the signal was lower too. This explains the high information entropy, as it suggests many peaks of comparable size, which do not contradict the weight by intensity value. The Mass Index of run F is low, which shows that there is no dominant species of high mass, but rather a higher amount of peaks of lower abundance. The interpretation of run G appears more complex. The Mass Index value shows a high rise in the beginning of a run, suggesting an abundant dominant species in the beginning, which breaks down throughout the run. This can be further validated by the weight by intensity value, as this value shows a rise in the begin of the run. The information entropy, shows a drop in the same area, suggesting a fewer number of peaks but not such a clear trend as when calculated with the other two values. Run H shows a peak in the area between cycle 10 and 20, which suggests the build-up of a few species of higher mass, which break down into more species afterwards. This rise is not clear to read in the weight by intensity value as this value shows a small rise at that point but a more dramatic rise later in the run, in which the mz value does not show a rise. On the other hand, the information entropy shows a drop, suggesting that there are rather few larger species, like the Mass Index suggested. It is interesting to see that observations based on the different algorithms can, if handled carefully, build on each other. On the other it is important to state, that the algorithms alone are just able to give ideas of the product distribution in a sample and if a cycle contains dominant product species. For more information about the exact chemical composition and the reactions, which did occur in the system, a more extensive analysis is necessary.