Massive computational acceleration by using neural networks to emulate mechanism-based biological models

For many biological applications, exploration of the massive parametric space of a mechanism-based model can impose a prohibitive computational demand. To overcome this limitation, we present a framework to improve computational efficiency by orders of magnitude. The key concept is to train a neural network using a limited number of simulations generated by a mechanistic model. This number is small enough such that the simulations can be completed in a short time frame but large enough to enable reliable training. The trained neural network can then be used to explore a much larger parametric space. We demonstrate this notion by training neural networks to predict pattern formation and stochastic gene expression. We further demonstrate that using an ensemble of neural networks enables the self-contained evaluation of the quality of each prediction. Our work can be a platform for fast parametric space screening of biological models with user defined objectives.

Supplementary Notes (page 2-12) 9 Supplementary Tables (page 13-20) 10 Supplementary Figures (page 21-31) 11 Supplementary References (page 32-35) 12 are used to report the circuit dynamics since they are co-expressed with T7RNAP and lysozyme 23 respectively (Supplementary Figure 1). 24 The gene circuit dynamics can be described using the following partial differential equations 25 (adopted from Cao et al 2 ), which describe the cell growth, colony expansion, nutrient and AHL 26 diffusion, intracellular circuit dynamics, as well as signaling and transport. Parameters are described in 27 Supplementary Table 1  The following assumptions were made in deriving these equations: 42 1. Cells are assumed to perform an unbiased random walk; their movement is modeled as diffusion 3-43 5 . We considered "diffusion" as an approximation of the observed colony expansion, so that cell 44 movement can be described by a single lumped parameter. Intracellular components are modeled 45 with passive-tracer equations 2 . 46 2. Cell growth is modeled by a logistic term, along with a Monod function. The Monod function is 47 to account for the contribution of nutrient to overall colony growth. The nutrient here refers to 48 one or more limiting factors that constrain growth. The logistic term accounts for the limit of cell 49 growth in a particular location. This carrying capacity is unlikely limited by nutrient availability. 50 Instead, it is limited by the spatial confinement imposed by our device, which is the colony height 51 confined to be ~20 between the coverslip and the agar surface. 52 3. Fast diffusion of AHL and nutrient. 53 4. Gene expression capacity: 54 where \ is defined as the distance between the colony center and the location where cell density is 55 95% of the carrying capacity. 56 5. Assume that , and are at equilibrium due to the reversible first-order kinetics of T7RNAP 57 bind with T7 lysozyme to form T7-lysozyme complex is fast 6 . 58

Non-dimensionalization of the model 59
First, we rescaled the time and space variables as 60 where ℒ is a length scale to be chosen later. 61 We next rescaled the state variables, 62 Then we defined some new parameters for simplicity, 63 With these dimensionless variables, and by defining d^d, f`= ( , ), we can rewrite the 64 model equations in a dimensionless form. Introducing the parameter groups l , ( = 1, … ,12) (see 65   Supplementary Table 2), the non-dimensioned equations become: 66 The definition of parameter groups l , ( = 1, … ,12) can be found in Supplementary To solve the model numerically in Matlab, we exploit the radial symmetry of the system and 70 reduce it to a PDE in polar coordinates, only depending on one spatial variable, namely the radius 71 ∈ [0, ]. We combine the Matlab built-in Runge-Kutta solver ode45 with a second order centered 72 finite difference scheme for discretization of the gradients. Due to the radial symmetry, we use the 1D 73 distribution along the radius as the ground truth for training/testing the neural network 74 (Supplementary Figure 1B) without losing any information. 75 In addition, due to the assumption that , and are at equilibrium, the L-T-P system is updated 76 in each step by projecting it onto the manifold defined by = ƒ "… ƒ "" ƒ " † . With this constraint, the 77 concentrations are updated to ( Q , Q , Q ) 78 The above is the deterministic ODE model of the system. To capture stochastic aspect of the 110 Rb-E2F signaling pathway, we adopt the Chemical Langevin Formulation (CLF) 9 . We adjust the units 111 of the molecule concentrations and the parameters so that the molecules are expressed in molecular 112 numbers. 113 Twenty-four parameters of the SDE model are generated randomly (Supplementary Table 5). 123 The ranges cover almost all the possible rates that can be found in vivo. For each of the generated 124 combination of parameters, we sample 10 q stochastic simulations and collect the final values of all 10 125 variables. We split the values into 100 bins to construct a histogram for each variable. Since the large 126 number of simulations, the histograms are almost continuous. We create a kernel distribution object 127 by using MATLAB function fitdist(). Then we use Matlab function pdf() to get the probability density 128 function of the distribution object, evaluated at the values in each of the discretized intervals (Each of 129 the variables are discretized into 1,000 intervals for this model). 130

132
Deep learning through the training of artificial neural networks has made immense 133 contributions in various fields, such as computer vision 10-12 , speech recognition 13-16 , and beating the 134 world champion at the game of Go 17-19 . This is a result of fast GPUs, high availability of data, and also 135 the advancements of the algorithms for training deep neural networks. Over the last decade, deep 136 learning is also becoming increasingly important for diverse biological researches 20-27 . Among all the 137 applications, a predictive model was developed based on statistical associations among features of a 138 given dataset. The learned model can then be used to predict desired outputs, such as binary responses 139 (e.g., pathogenic or non-pathogenic, toxic or non-toxic), categorical labels (e.g., bacteria strains, stages 140 of diseases), values (e.g., growth rate, drug doses) or sequences (e.g., time/spatial series, probability 141 density functions). 142 Several previous studies have demonstrated how to adopt neural networks to facilitate 143 numerically solving differential equations 28-37 . The massive acceleration enables extensive exploration 144 of the system dynamics that is impossible by solely dependent on the mechanistic model. In our study, 145 we use LSTM network, a type of recurrent neural network (RNN), for prediction of the normalized 146 distribution. 147 148

Recurrent neural networks 149
RNNs are a family of deep neural networks for processing sequential data 38-40 . Different from 150 a feedforward neural network, a recurrent neural network has connections pointing backward. It will 151 send the predicted output back to itself. Supplementary Figure 2A is a demonstration of a recurrent 152 neuron (the simplest RNN, composed of only one neuron receiving inputs, producing outputs, and 153 sending the outputs back to itself). At each sequential step (also called a frame), this recurrent neuron 154 receives input ¬ as well as its own output from previous sequential step ¬®Q ). By unrolling the 155 network against the sequential inputs, we can see that each member of the output is a function of the 156 previous output, and is produced using the same update rule applied to the previous outputs, which 157 results in the sharing of parameters through a very deep computational graph. 158 ¬ = ℎ( ¬®Q ; ¬ ; ) = ℎ(ℎ( ¬®R ; ¬®Q ; ); ¬ ; ) = ℎ(ℎ(ℎ( ¬®p ; ¬®R ; ); ¬®Q ; ); ¬ ; ) 159 Since the output of a recurrent neuron at step is a function of all the inputs from previous steps, 160 it seems to have a form of memory. However, the ordinary RNN cannot be used on long-sequence 161 data. The memory of the first inputs gradually fades away due to the transformations that the data 162 goes through when traversing an RNN, some information is lost after each step. After a while, the 163 RNN state contains virtually no trace of the first inputs 41 . To solve this problem, various types of cells 164 with long-term memory have been introduced and the most successful/popular one is the long short-165 term memory (LSTM) network. 166

LSTM network 167
The LSTM network was proposed in 1997 by Sepp Hochreiter and Jurgen Schmidhuber 42 , and 168 it was gradually improved over the years by Alex Graves 43 , Wojciech Zaremba 44 , and many more. 169 Supplementary Figure 2B     function of the sample and the cumulative distribution function of the reference distribution, or 254 between the empirical distribution functions of two samples. It is a non-parametric test that compares 255 2 cumulative distributions. Kolmogorov-Smirnov (K-S) distance is the supremum (greatest) distance 256 between 2 cumulative distributions. Its value is between 0 and 1. A small distance between two 257 distributions will result in a high similarity value between those distributions. sends that output back to itself. At each sequential step s, this recurrent neuron receives input ¬ 277 as well as its own output from previous sequential step ¬®Q . The blue block indicates a delay of 278 a single sequential step. This neuron (left) is the same as the unrolling computational graph (right), 279 where each node is now associated with one particular sequential instance. 280 B. A typical LSTM neural network unit. In TensorFlow, LSTM cells can be simply implemented 281 by using tf.contrib.rnn.BasicLSTMCell built-in function without needing to know the cell 282 structure. In short, LSTM cells manage two state vectors, one is responsible for short-term 283 memory and one is responsible for long-term memory. For each step, it adds some memories to 284 long-term memories (controlled by input gate), drop some memories (controlled by the forget 285 gate) and decide which parts of the long-term memories should be read and output at this step 286 (controlled by the output gate). More details can be found at the referenced books 40,76 . 287 consists of an input layer with inputs to be the parameters of mechanism-based model, a fully 295 connected layer, LSTM arrays, and 2 output layers, one for predicting peak value of the 296 distribution, one for predicting the normalized distribution. data can be as low as 10 ®Rh , or as high as 10 r . The peak value for data with pattern (one or more 313 than one ring) is more concentrated at the higher end.