Quantitative characterization of recombinase-based digitizer circuits enables predictable amplification of biological signals

Many synthetic gene circuits are restricted to single-use applications or require iterative refinement for incorporation into complex systems. One example is the recombinase-based digitizer circuit, which has been used to improve weak or leaky biological signals. Here we present a workflow to quantitatively define digitizer performance and predict responses to different input signals. Using a combination of signal-to-noise ratio (SNR), area under a receiver operating characteristic curve (AUC), and fold change (FC), we evaluate three small-molecule inducible digitizer designs demonstrating FC up to 508x and SNR up to 3.77 dB. To study their behavior further and improve modularity, we develop a mixed phenotypic/mechanistic model capable of predicting digitizer configurations that amplify a synNotch cell-to-cell communication signal (Δ SNR up to 2.8 dB). We hope the metrics and modeling approaches here will facilitate incorporation of these digitizers into other systems while providing an improved workflow for gene circuit characterization.


Supplementary Note 1.1: Approximating the Initial Distribution of Relative Plasmid Copy Number
Our testing platform (i.e. polyethylenimine, PEI, based transient transfection) provides a rapid test of DNA-based parts and is a likely first step in biological device development useful for circuit construction and validation. However, to incorporate the influence this testing platform has on our device performance, we considered that while transient transfection comes with the advantage of testing varying levels of circuit copy number in a single experiment, these copy numbers will dilute out of the cell given sufficient time and that there exists an initial delay from the time of transfection to time the transfected DNA can make it to the nucleus for transcription. We evaluated each of these areas to model the effect of working with transient transfection data. As previous work also considered these areas and was able to make useful predictions, we focused our modeling efforts in these areas ( [1,2,3]).
Transient transfection comes with the advantage of testing varying levels and different combinations of plasmid copy numbers in a single experiment. However, to describe the different plasmid combinations present in each cell we need a method to parse the data by approximate circuit copy numbers. There is one transcriptional unit, TU (i.e.promoter/gene/terminator), per plasmid and the relative copy number between the transcriptional units is varied by varying the weight ratio of plasmids used in the transfection. Therefore, we know the global ratio between plasmid used in each experiment, but we must consider that some cells will receive slightly disparate ratios of the plasmids or total plasmid content regardless of the intended ratio or weight. This variation can be approximated as we will describe later and is shown in Fig. S9. We refer to this variation as the initial transient distribution in plasmid copy number.
Our circuits are comprised of upto 6 TUs: CAG-mRuby, CAG-rtTA, TRE-BFP, TRE-Flp, FRT-GFP and (U6-shRNA or U6tetO-shRNA). However, we can simplify the dimensionality of the initial transient distribution space from which our data is derived by taking note of the following relationships with the following assumptions: 1) The distribution including mean and covariance of CAG-mRuby should equal that of all the other TU's including FRT-GFP. This variance within a single transfection will also vary less compared to independent transfections of these plasmids, given the number of plasmids used in the transfection. 2) The variation between the copy number of two plasmids within a single cell is much less than the variation of plasmid copy number across cells. Fig. S9 highlights these relationships 1) and 2) and is a pool of three biological replicates of the experiment each containing three technical replicates each.
Given a relationship between fluorescence expression and plasmid copy number, we chose to use the distribution of constitutive fluorescent protein (CFP), ((e.g.) mRuby expression) to approximate the plasmid distribution of plasmids in the system. Additionally with TRE-BFP fluorescence expression, we can also approximate TRE-Flp copy number.
However, the relationship between BFP and Flp is also affected by the expression of shRNA in the system. Assuming the governing source of variation is between cells and not within a single cell, when no shRNA is present we can derive a relationship between BFP expression and Flp expression given a relationship between fluorescence expression and plasmid copy number. When shRNA is present we will need to further take into account the topology of the circuit as modeled in supplementary note Chemical Reactions. Altogether this then shows promise of giving us a method for which to define inputs to our circuit in terms of fluorescence and outputs of a digital enhancer in terms of fluorescence while considering the variation introduced by the testing platform. * Equally contributing authors.

Supplementary Note 1.2: Modeling the Color Control: relationship between fluorescence expression and plasmid copy number
To define a relationship between fluorescence expression and plasmid copy number we modeled the expression of a color control sample over time. This color control consists of three TUs at a 1:1:1 weight ratio and non-expressing filler DNA.As our transfection is mediated by the formation of PEI DNA complexes, it appears there must be a variation in the size of the many complexes formed during the transfection where the larger complexes have larger plasmid numbers but the same relative ratio of plasmids compared to the smaller complexes [4]. The variation in proposed complex size appears to be the governing source of variation as seen in Fig. S9. Our color control used 31.25ng of CAG-BFP, 31.25ng of CAG-GFP, 31.25ng of CAG-mRuby and 156.25ng of filler DNA to get a 1:1:1 ratio of TUs containing a single fluorescent protein in each. We then measured the fluorescence value of this sample overtime via flow cytometry. We repeated this experiment three times and pooled all the data for creation of the histograms.
The variation between technical and biological replicates is further investigated in fitting the initial delay and plasmid dilution as described in supplementary note Plasmid Dilution. We can see in Fig. S9 for a given time point that there is a linear trend between the TU outputs whose variation is smaller than the range of expression along the axis of a single color. It is important to remember these are measures of fluorescence not measures of plasmid copy number.
To derive a method for approximating relative copy number from fluorescence, We can get a relative measure of plasmid copy number for a constitutively expressed fluorescent protein, such as CAG-mRuby, by solving the discrete birth death process described in equation (1). By noting the repeating behavior in this discrete system as time progresses we can come to equation (2). Therefore given a fluorescence value, time of collection, Molecule to MEFL, birth rate, death rate and plasmid division time we can approximate the plasmid copy number. However we do not know plasmid division time, Molecule to MEFL, birth or death rate. Although we will fit the data for plasmid division time as described in supplementary note Plasmid Division, considering the other constants we can only approximate a number for plasmid copy number that is monotonic, however possibly non-linear, in relation to degradation rate.
We will call this number relative plasmid copy number, R C N . Due to the non-linear relationship between this number and the death rate we choose a biologically relevant number for degradation rate and assumed a production and Molecule to MEFL rate of 1. The chosen biologically relevant death rate was chosen to be 0.02/hr as this is a central approximation seen in a survey of protein decay rates from the literature [5]. As we construct the model we will see that the predictive power of the model is unaffected by the values we choose for Molecule to MEFL, birth and death rate and only affects the absolute value of the fit parameters and the error of approximating these numbers. This is expected due to both the monotonic relationship of the relative plasmid copy number to Molecule to MEFL, birth and death rate as well as linearly proportional relationship relative copy number has will the parameters in the model it is multiplied with.
As a proof of concept and to highlight the need of adding an initial delay and/or plasmid dilution to our model, we simulated the time progression of our color control sample with the ODEs found in the equation set (3), where R C N mRub y , R C N BF P and R C N G F P were estimated from the 96 hour time point. b and d p were set to 1/hr and 0.02/hr respectively. 1000 cells were simulated and we compared the results of these simulations to the experimental data (Fig. S10). As solved for we capture the system behavior at the 96hr time point, however, we are missing the delay in expression at the 6hr time point. This experimentally observed delay is expected as it is thought to take some amount of time for the transfection complexes to reach the nucleus, and further time for transcription and translation to occur. However, as we will see in the next section plasmid dilution does not appear pertinent to being able to further describe the observable trends in the data beyond what the current modeled reactions already give us. One more concept we would like to highlight in this section before moving onto plasmid dilution, is we will need to also add the initial delay and plasmid dilution to our estimate of relative plasmid copy number (4). This formalization is very similar to our previous work [1,2] and described in more detail in the supplementary note Plasmid Dilution.

Supplementary Note 1.3: Adapting This Method to for a Minimal Model of shRNA Regulated Recombinases
To adapt this approach to for our minimal model of shRNA regulated recombinases, we further assumed we can use the fluorescent markers in an experiment to approximate the multi-dimensional distribution of the TU's used in the samples by re-centering the distribution assuming linear proportionality between the ratio of ng used in the sample to those used in the plasmids constitutively producing fluorescent proteins. ( e.g. If the sample TU ratio is 1:10ng between a TU not constitutively expressing a fluorescent protein and a TU that is constitutively producing fluorescent proteins we would divide the R C N of the fluorescent TU by 10 to approximate the distribution of the nonfluorescent TU R C N . With this in mind, to approximate R C N for the two assumed key TUs in our system FRT-GFP and TRE-Flp, we used the corresponding fluorescent values as follow: Constitutive expressed CAG-mRuby was used to approximate the reporter plasmid, FRT-GFP, by taking into account the difference in ng used between these two TUs as described above. To use TRE-BFP to approximate R C N for the TRE-Flp expression is harder as it is also influenced by the shRNA production in the system. For these we chose to derive a map from R C N calculated from our TRE-BFP to TRE-Flp expression while taking into account the topology of the system as described in Supplementary Note Chemical Reactions. Also the other TUs are addressed individually in the Supplementary Note Chemical Reactions when making the map from BFP to Flp. where C F P (t ) represents the constitutive fluorescent value coming from CAG-mRuby at time point t . b and d stand for protein production and decay rates. pcn stands for plasmid copy number. M 2M E F L is then the Molecule to MEFL proportionality constant. λ is cell division time. φ (t ) describes decreasing plasmid copy number due to cell division.
where R C N stands for relative plasmid copy number. F P (t ) represents the florescent value in arbitrary units, A.U., coming from CAG-mRuby and time point t. d stands for the decay rate and is set to 0.02/hr as an average value of protein degradation as reported in this referenced survey [5]. λ is cell division time.
where P stands for proteins. Subscripts "mRuby", "BFP" and "GFP" represent their respective florescent proteins. b and d are protein production and decay rates, respectively.
Considering delay in gene expression, we rewrite R C N as: where t i d is the initial delay time due to the gene expression delay seen in transient transfection data.

| Supplementary Note 2: Testing Platform -Plasmid Dilution and Initial Delay
Supplementary Note 2.1: Plasmid Dilution and the Initial Delay Our previous model of transient transfection takes into account both plasmid dilution due to cell division and an initial delay in the production of fluorescence [1,2,3]. There are many reasons to have this initial delay. Three of which are as follows: 1. Transfection complexes need to make their way from the surface of the cell to the nucleus.
2. The initial shock of the transfection reagents is hard on the cells and they may need sometime to recover before they can jump into transcription of these plasmids. 3. As we are measuring fluorescence there will be a difference in maturation time for the various fluorophores in our system (i.e. mRuby folds on the scale of hours whereas eGFP and eBFP fold on the scale of minutes) [6] [7]. From the experimental time course data in Fig. S11 Panel A, as expected there appears to be a delay in the initial start time for observable fluorescence for each cell that varies with which type of fluorophore that was used (i.e. eBFP and eGFP show appreciable signal at 6hr where mRuby needs more time). The difference between the mRuby and eBFP/eGFP fold times is much less that the size of the initial delay, considering this we assumed this difference insignificant. We can see that all the cells do not start fluorescing at the same time, therefore it is logical to conclude that there is a distribution of delay times found in a particular population. One hypothesis for the cause of this distribution is that these delay times result in plasmids only entering the nucleus upon cell division. To explore this hypothesis we assumed a uniform distribution of start times with range 0 to length of cell division time. We then used our system of ODE's to simulate a population drawing a delay time  we constructed a plasmid almost identical to the eGFP single positive color control (CAG-GFP-ORI+) used in all experiments, but that does not contain the SV40 ORI (CAG-GFP-ORI-). We transiently transfected the SV40 ORI+ and ORI-plasmids into separate wells of HEK293FT cells in triplicate, and collected flow cytometry data over a four day period to examine the differences in max expression of each plasmid. We note that there does not seem to be any significant difference in eGFP expression from cells transfected with either plasmid up to 96 hours post transfection, and there is no significant difference between population level eGFP expression at 48 hours (Fig. S13).Therefore we chose to move forward with no dilution or replication incorporated into the model as plasmid replication and /or dilutions should not be having a significant effect on our system as we have characterized it.
Removing cell division we refit with the initial delay mean being µ ID instead of the ratio of the division time to µ ID .
Results of this fit are µ ID and σ fit to 9.5 and 5.5 respectively giving the behavior depicted in

1: Model Construction
We expect the most informative model will be one that balances complexity with usability; therefore our goal is to first construct the simplest model capable of explaining the data and then explore the effect of adding back in complexity. To start we surveyed the literature as described in the main text. We then worked to describe the key features of the testing platform as described in the Testing Platform supplementary note. Here we described how we used these features to build a multi-layered (phenotypic and mechanistic) model similar to our previously published work [1,2,3]. The mechanistic part of our model is a set of ODEs describing the purposed governing reactions. We chose to use ODEs instead of SDEs because we believe our system to reside in a large molecular number regime. The phenotypic part of our model is incorporated by feeding in the distribution of fluorescence values collected in the experiment into the model as rate constants and initial conditions. It is important to note this also incorporates the variability we see in the data under the assumption that the governing source of this variability comes from differences in plasmid copy number. Details of this assumption can be found in the Testing Platform supplementary note. Using the Law of Mass Action, LMA, to derive a set of ODEs followed by considering rate limiting reactions, separation of slow and fast time scales and the conservation of mass we formed our minimal model. To elucidate if this minimal model was sufficiently complex we built a more complex model taking into account a larger set of proposed states for comparison and found little to no clear benefit of increasing system complexity. After evaluating the performance of our model to explain the data we use this model in the main text to look for predictive trends in behavior useful for circuit design. We then tested those predictions showing that the expected qualitative trend is realizable.

Supplementary Note 3.2: Proposed Mechanistic Governing Bio-Chemical States
As mentioned in the main text we are starting by taking the three states depicted in Fig. S14. The list of chemical reactions describing the transfer between these states in the context of each topology can be found in table (1).
Using LMA, for all these reactions we obtain a set of ODE describing the system behavior (Eqns. 5 -15). Variables and Species are described in 2 and 3. Grouping these equations by slow and fast process we can see we have some equations that contain both fast and slow processes (i.e. mixed equations). Two of these equations can be accounted for taking note of conservation of mass and assuming r tT A i is never rate limiting due to the large amount of ng used for expressing this plasmid in each experiment. The last of these equations we can consider by mapping to a space where Z = P F LP − 4D G F P ,i (i.e. Eqn. 16). Scaling time by a slow rate constant, d p and letting = dp f 1 , we can see as the difference between the slow rate constant and a fast rate constant, f 1 , gets large − −− → ∅. Finally with a bit of algebra and remembering the conservation of mass D G F P ,T = D G F P ,i + D G F P ,a + D G F P ,c ; U 6,T = U 6,i + U 6,a , we can arrive at (Eqns. [17][18][19][20][21][22]. Mapping back to the original space we can see the complexity for P F LP d t is a bit unruly. However, it is evident in equations 23-30 that both part2 and part3 converge to 0 as P F LP gets large and the region where P F LP in small we are already assuming is negligible as we are working with systems that have large molecular numbers. We would like to point out that this is in effect assuming the retroactive effect involving P F LP is negligible. However it is worth noting that a similar retroactive effect will occur with the addition of some of the states we are assuming to be non-governing. We will look at the effect of adding back in sequestration of P F LP simulating a possible retroactive effect to investigate this assumption further later in this supplementary note.

Supplementary Note 3.3: Phenotypic Interactions within the Testing Platform
As described in the supplementary note Testing Platform, we are measuring the inputs and outputs to the digital enhancer systems in terms of several unique fluorescence signals. Thus, we can use the distribution in of the fluores-cent signal strength to estimate RCN for both FRT-GFP and TRE-BFP. (i.e. We will let D G F P ,T = ng G F P ·r · R C N G F P ,T and D (r tT A) F LP = ng F LP · α · R C N F LP ) Subsequently D G F P ,x = ng G F P · r · R C N G F P ,x where R C N G F P ,x is not directly calculated but rather estimated from R C N C F P using the conservation law. R C N F LP can, in a like manner, may be estimated from R C N I F P .
As D F LP is inducible, by using R C N I F P at different r tT A a values we are essentially assuming rtTA a is not rate limiting. Therefore, a first order birth rate from mFlp production coming from the D (r tT A a ) is assumed. Similarly, we can make the same assumption for shRNA by setting U 6,T = ng shR N A · κ · R C N U 6T , where R C N U 6T will also be estimated from R C N C F P . The concentration of r tT A a in the system will vary monotonically with the amount of IFP at a given time point. As the concentration of r tT A i is already chosen to be non-rate limiting and the binding rate of DOX is already identified as fast compared to the slow reactions, we chose to model the relationship between IFP and rtTA as being linearly proportional. We realize that this relationship may not be linear; however, we chose to move forward with a linear assumption until and more complex representation appears to be justified. Considering this relationship between IFP and r tT A a we let r tT A a = ηI F P , where IFP represents a relative r tT A a amount (R r tT A a = I F P ) .
Substituting in these relationships and renaming grouped parameters as found in our list of final parameters (table 4), we arrive at our mixed phenotypic/mechanistic model seen in equations 31 -35.
Supplementary Note 3.4: How does making the model more complex affect the system?
After the model was parameterized, as described in the Parameterization supplementary note, we chose to explore a more complex model incorporating retroactive effects on Flp to determine how this phenomenon affects our system, such as the retroactive effect on F LP production. Does modeling this extra complexity change the behavior of the system? To look into this question we added in an additional state for the system based on 4 P FLP + D GFP,a fr ← − → rr D GFP,c , where f r and r r are forward and reverse binding rates or Flp to left over FRT sites after the excision. Fig. S15 is a representative image of the repressive effect this has on the system's induced on state. Even after refitting the parameters we failed to see a reasonably constrained amount of improvement in the fits. Furthermore, after exploring parameter space, a major difference between this more complex model and our minimal model was the ON-state's non-monotonic trend overtime in the sequestration model. However, this non-monotonic behavior was not evident in our experimental data. Therefore, we chose to move forward with the minimal model of our system.

Slow Reactions
Mixed Reactions Mapping to New Space Reduced System in Z space Assuming No Retroactivity We used subsets of our data to parameterize our models. A summary of this process can be seen 5. As seen in equations 31 -35 there is a subset of parameters that will be the same across all models reflecting the different S U P P L E M E N TA R Y TA B L E 1 Chemical Reactions: Where D and P stand for plasmid DNA and proteins. Subscript "FLP" stands for the flippase, and "GFP" the green fluorescent protein. D GFP, i represents the GFP reporter plasmid with the terminator inhibiting transcription, and D GFP, a GFP reporter plasmid with the the terminator excised out. m and shRNA stand for mRNA and shRNA respectively. r tT A i represents r tT A not bound to DOX; and r tT A a represent r tT A bound to DOX. U 6,i represent and repressed U 6 promoter and U 6,a represents a uninhibited U 6 promoter. is also expected to change given different input levels. Therefore we constructed the following plan for fitting: We chose to fit d p and b 2 to our maximal expression data described below. We chose to refit d p and b 1 to the leaky expression data described below. Although these fits were not completely constrained to a single minima, looking at the error landscapes for these fits we were able to observe a linear relationship between optimal fits in both the leaky and maximal expression landscapes. Using the no shRNA dose response data, we incorporated these linear relationships when fitting the next set of parameters, b 2 (subsequently b 1 and d p through the linear relationships), K d ,  representation of these fits can be found in Fig. S16A and Fig. S17A. However many local minimum gave equivalent fits to these data. Looking at a sample of the error landscape we can see that as is often the case with a birth death process these parameters are linearly related (Fig. S16B,C and Fig. S17B,C). We fit these minima to estimate the linear relationship which we found to be b 1 = 1 * d p − 2 for leaky expression and d p = 1 * b 2 − 2 for maximal expression. For the maximal expression we found one local minima resided in a space where the system d p was low enough to not effect the fit. As we believe d p to be a governing reaction we did not include this point in the line fit for estimating the trend in the data, in effect setting a lower bound on d p to be rate limiting. (Note: the representative fit was chosen using an initial delay that appeared to capture the data the best).To consider the error in the estimated the initial delay parameters, we repeated this process for two additional sets of initial delay parameters estimating the range of behavior seen with different initial delay's as measured in our color control experiments (i.e. µ ID = 13.7, σ ID 3.6

Transcription Degradation Regulation
and µ ID = 8.1, σ = 7.9). We found that the linear fits for one significant figure in both the slope and y-intercept are identical. Therefore we chose to move forward with µ ID = 9.5, σ ID = 5.5 as the initial delay estimates. With this we then proceeded to use the linear relationships between these parameters to reduce the number of parameters we are fitting to the dose response data.

Supplementary Note 4.3: Constraining 5 Parameters to Fit the Dose Data
As expected constraining the remaining parameters presents a noteworthy challenge. To do this we started with the no-shRNA dose response data and first investigated the effect each parameter has on the corresponding fit. As b 2 is now related to d p and b 1 via our fit linear relationships changing b 2 for our dose data does not affect the maximal The reaction involving d c is taking in account that the RISC complexes with shRNA is recyclable [9].
level as much as the width of the states (e.g. a higher b 2 appears to decrease the width slightly). b 3 and K d effect when we see the bimodal behavior appear in the dose response curve. f 2 sets the transition rate from the low to high mode. c for the parameter range we are in does not have a huge impact on the fit but does appear to affect the nonlinearity of transferring from the low to the high state for lower Flp levels. After our initial parameter sweep, we applied our descent algorithm to fitting these 5 parameters. Fitting over each DOX level in our dose response curve simultaneously by summing the loss function across each probability value over each dose. We then inspected the error traces in 1D for each parameter to get an idea of the position of any local minimum (Fig. S18 C). Each parameter is being constrained, in at least one dimension. Considering this we chose to move forward with the fitted values as recorded in table 5. Results of this fit are plotted in Fig. S19 and Fig. S20. Fig. S19 represents the fit to the full dose response data set. Fig. S20 is the result plot from only plotting the top 30 percent of the data using either the CFP or the simulated CFP value for determining the percentage. Next we fixed all parameters except b 4 and b 5 and repeated the process for the Constant and Feedforward dose response data to fit b 4 and b 5 respectively.
In order to be diligent we wanted to further explore relationships that may exist between b f and K d . To do this we fix all the other parameters and fit b f (i.e. b 3 , b 4 and b 5 ) and K d to the corresponding dose response data given several initial conditions. When then plotted the error traces for these fits in the same manner as the leaky and maximal expression fits (Fig. 21). As discussed in a later supplemental note, non-dimensionalization of these equations also makes these linear relationships evident. Considering this we may find the non-dimensionalized space  Non-dimensionalization has been used in many engineering fields to explore the connection between physical parameters that govern key predictable features of the modeled system (e.g. Reynolds number). In our search to describe how tolerant our system is to leaky expression we can non-dimensionalize the minimal model found in equations 31-35. This yields equations 36-40 with the non-dimensional parameters found in table 6. eB represent the ratio between the production rates of cleaved and uncleaved reporter TU driving GFP. eC includes many of the parameters that describe how recombinase molecules are recycled after they cleave their product. eF is the ratio of the formation of the cleaved reporter circuit over the degradation rate of proteins in the system. Finally, eK d is the ratio of the number of Flp molecules needed to cleave half the reporter circuits in the system with the parameters that control the production rate of Flp scaled by the degradation rate of all proteins in the system. As we are looking for a way to describe how tolerant the system is to leaky expression eKd appears to have grouped all the relevant parameters.
Supplementary Note 5.2: What does the tolerance to leaky expression (eKd) tell us about the tolerance to leaky expression?
When looking at the structure of our non-dimensionalized system specifically equation 40. We can see a relationship between R C N G F P ,i and eKD that depend on P F LP , where R C N G F P ,i is the non-dimensionalized relative copy number of uncleaved reporter circuits and P F LP is the non-dimensionalized level of P F LP in the system. We can plot this relationship as seen in Fig. S22A. As P F LP increase the number of R C N G F P ,i goes down. Therefore eKd is the amount of P F LP needed to lose half R C N G F P ,i . Changing parameters that increase the value of eKd, such as adding more shRNA to the system, increases the amount of P F LP needed to convert half the R C N G F P ,i to another form. It then follows that systems with higher eKd values are more tolerant to the leaky expression of P F LP .

Supplementary Note 5.3: What is the relationship between non-dimensional eKd values and the physical ratio between Flp and shRNA?
eKd can also serve as a map for how the ratio between ng of Flp used in the system and ng of shRNA used in the system change this tolerance to leaky expression. Looking more deeply at the numbers grouped in the eKd terms we can see that to calculate eKd we need ng of Flp and shRNA used in the system and an approximation of the dox bound rtTA currently in the system. All of the other parameters in the eKd term are already approximated from fitting the data. ng for Flp and shRNA is pretty straightforward and we have chosen to approximate the dox bound rtTA used in the system by assuming it is linearly proportional to the amount of IFP expressed in the system. Fig.   S22B illustrates how the tolerance changes given different amount of Flp and shRNA for the no-shRNA and constant topology. The apparent non-linear relationship can be seen in this 3D projection of this relationship where eKD is the z-axis (Fig. S22B). Fig. S22C,D demonstrate how changing rtTA bound to dox effects eKD. There are many insights we can begin to postulate from this relationship but to thoroughly explore even one of these insights is outside the scope of this paper. Considering this we will just leave you with some closing thoughts. First, given the non-monotonic trend of this non-linear relationship one should be able to achieve the same level of tolerance given different shRNA to Flp ratios. Second, identical tolerance to leaky expression should be achievable between the no, constant and feedforward topologies if shRNA and Flp are balanced to give the same eKd value. Altogether eKd, in theory, allows us to quantitatively compare tolerance to leaky expression for our digital enhancer topologies and identify trends in the data that should be useful for construction of optimally performing digital enhancers.

| Supplementary Note 6: InitialSynNotchResults
We test the efficacy of both our devices and our ability to predict their behavior by using them to enhance signal separation in a cell-to-cell communication system, synNotch. The synthetic notch system, synNotch, is an engineered version of the endogenous delta notch cell surface receptor, comprised of an external scFv (sensor) and internal effector domain (transcription factor) [10]. synNotch is highly modular and can be modified to respond to multiple membrane bound molecules by altering the external scFv portion of the receptor [10] [11] [12]. Upon binding its intended target, synNotch induces self-cleavage of the internal transcription factor to act upon a downstream target, such as a promoter of interest to induce the expression of a reporter gene. Here we chose to test a few of these systems and move forward with one of the higher performing and more widely used versions sensing CD19 and releasing tTA to activate a downstream target (Supplemental Fig. S27A). Previous characterization highlights the power this system has for bioengineering applications from T cell reprogramming to tissue pattern formation [11] [13]. Little work, however, has been done to understand the composability of the synNotch system with downstream signal processing elements, which would aid in their incorporation into more complex cell circuitry. Here, we sought to compose synNotch together with our digital enhancers to show how cell-cell communication can be tuned in a predictable fashion.
To investigate the behavior of synNotch systems, we first transiently transfected three versions previously published by [10] into HEK293FT cells. We also varied the amount of synNotch receptor transfected into the system to study the concentration dependent behavior of each sensor (Supplemental Fig. S27). When cultured with CD19expressing sender cells, we find that lower levels of transfected synNotch receptor improved performance for all three systems in receiver cells primarily because of the decrease in leaky output from uninduced receiver populations (Sup-  Fig. S27B). However, we expect fold change to both increase and decrease based on how many receiver cells come into contact with sender cells. As such, we will use our digital enhancers to tune this output.
To test our digital enhancers with synNotch we took our stable cell line and transiently transfected in 3 shRNA to Flp ratios for both the constant and feedforward enhancer topology. In addition we tested an enhancer without the presence of shRNA, no-shRNA. We chose the ratios of shRNA to Flp to test the both the extreme behavior of the digitizer and also the models ability to predict this extreme behavior.

Supplementary Note 7.1: Uploading Data to SynBioHub Introduction
We have chosen to upload our circuit data to SynBioHub. SynBioHub is an open-source project which allows scientists to store and share their synthetic biology design information [14]. Multiple instances of the SynBioHub repository have been established including the reference instance ( https://synbiohub.org) and the NSF Living Computing Project (LCP) instance ( https://synbiohub.programmingbiology.org) used by this work. The Syn-BioHub repository stores information using the Synthetic Biology Open Language (SBOL) [15], and it contains two different interfaces for various application scenarios: a web interface and the HTTP API. Anyone with a SynBioHub account can easily upload their DNA sequence and other design files, as well as experimental information to the Syn-BioHub repository. SynBioHub also incorporates a built-in visualization system using the SBOL Visual standard [16].
Once data is uploaded, it is placed in a private repository that can only be viewed by the user who uploaded it, as well as any individual provided a share link to this data (such as the reviewers for this paper). Once the data is ready for publication, it can be moved into a public repository, where it can be readily searched for using either the web interface or HTTP API. For our digital enhancer we have uploaded both SBOL converted GenBank files for all constructs used in this paper, as well as spreadsheets describing how each construct was used. The corresponding FACs data collected for each experiment is also available upon request. The SynBioHub entry can be found on the LCP SynBioHub website ( https://synbiohub.programmingbiology.org/public/DigitizingCommunicatio n/DigitizingCommunication_collection/1). The DNA files contain the annotated DNA sequences of plasmids used in experiments. Each file's name is entitled using a unique ID followed by the annotated transcription unit, such as "BW2139(transcription unit)". The ID is used throughout experiments and recorded in the spreadsheets. The detailed description and the GenBank DNA files could be accessed and downloaded through the SynBioHub repository or the corresponding URLs on the spreadsheets.
The Excel spreadsheets record the details needed for experiment reproduction which have four tables in common: • The "Experiment" table introduce necessary experiment information, such as the description of the experiments, the brief procedures of the experiment, the performer of the experiment and the explanation for the shortened key used in the "Sample" table; • The "Sample" table gives information about the setup of the experiment groups as well as the names of the related data files; • The "Experiment DNA sample" table contains the ID of the DNA sample used in the experimental group and the URLs linked to the corresponding DNA files on the SynBioHub.
• The "Calibration DNA sample" table contains the ID of the DNA used in the control group and the corresponding URLs.

Supplementary Note 7.2: Example Use of Database
The experiment-related files are organized in a way in which other researchers could use these files to reproduce the experiments and analyze the data quickly. For example, to repeat the specific experiment, one should open the corresponding experiment spreadsheet, such as "JHT6-LeakyExpressionData", go to the "Experiment DNA sample" or "Calibration DNA sample", click on the SynBioHub URLs and then downloads the DNA files. Once the researchers has the annotated DNA sequence, each plasmid can be synthesized de novo or modified from existing plasmids, and the DNA mixtures can be prepared with appropriate proportions noted on these two tables. After preparing the DNA mixtures well, one can follow the procedures on the "Experiment" table or the complete instructions of experiments on the "protocol" table. The FACs data referenced here can be converted to new standard, MEFL, to increase it comparability [17]. This can be accomplished with many available programs, we recommend TASBE tools ( https://tasbe.github.io/) and Cytoflow ( https://bpteague.github.io/cytoflow/). Once in MEFL scientist should be able to recreate any graph presented in the paper and even look at the data in new ways if so desired.

Supplementary Note 7.3: Why is this repository useful?
Using SynBioHub for experimental file collection promotes standardization and makes the experimental results more accessible for reproduction. Standardization is one of the most widely used concepts in the field of engineering. For synthetic biology, standardization means a series of consensus arrangements about experiment design, data collection and analysis, and information storage. With standardization, different parties could easily understand the goal of an experiment, the material, and the process needed to repeat the experiment. As an additional benefit, this should also aid in recapitulation of the methods applied to understand the experiments. There have been several trials of standardization in synthetic biology such as the iGEM (international genetically engineered machine) competition ( https://igem.org/Main_Page), in which the contestants are requested to submit DNA sequences with annotations to iGEM repository following the specific standards. We hope our use of this repository will also be seen as a trial of standardization encouraging others to reproduce our results. One reason for the current circumstance is the difficulty of finding a balance between flexibility and standardization: though the research methodology may be similar for synthetic biology, the experimental contexts and experimental procedures may vary a lot according to experiment goals and conditions. So designing the structure which can meet the various needs of researchers and compactly format the experiment information at the same time is challenging. However we believe we have arrived at a good balance as our structure was able to handle experiments from this paper as diverse as cell-cell communication to dose response curves. Another obstacle is lacking the reliable repository for DNA sequences and experiment information. GenBank and the journal's server may be the current option for researchers to upload their DNA and experiment details and is a great start; however we hope that more can be accomplished by also uploading our data to SynBioHub where we have a unified structure in data storage, which should encourage reproducibility.

| Supplementary Note 8: Toxicity
When examining the data for the doxycycline dose response curves, we note that as IFP expression increases with dox addition, CFP decreases (Fig. 26). There is a 2 fold decrease in CFP expression between the 0 nM and 225 nM dox concentrations tested, indicating an overall drop in protein production as more dox is added to the cells.
Previous studies have suggested that dox can inhibit mammalian cell growth starting at concentrations as low as 200 nM, with significant toxicities observed in multiple mammalian cell lines with increasing drug addition [18] [19]. Our results seem to corroborate these findings; however, resource competition as mentioned in [8] may also provide an explanation. While it is not surprising that we observe decreased protein production due to dox, these findings may impact the reported values of fluorescent protein expression by falsely decreasing the overall output of each, and the subsequent model predictions of such outputs. This may provide a partial explanation for the unexpected shoulder in the off state populations when the hypothesized 'ideal' digital enhancer modules were composed with the synNotch sensors. In this configuration, the cells are no longer exposed to dox and thus may be producing a higher level of all proteins in the system.    S U P P L E M E N TA R Y F I G U R E 1 6 Leaky Fit: Panel A is a representative image of the time course fit to the leaky expression data, 6hr, 12hr, 48hr and 96hr respectively. Green distributions are experimental data and black distributions are the simulated data for the CAG terminator GFP sample. Panel B is a sample of the fit error landscape. Blue dots represent points along the error trace sampled during the decent and red points represent local minima arrived given different initial conditions. Panel C is a linear fit indicating the relationship between the local minima given different initial delay parameters.

| Additional Supplementary Tables
S U P P L E M E N TA R Y F I G U R E 1 7 Max Fit: Panel A is a representative image of the time course fit to the leaky expression data, 6hr, 12hr, 48hr and 96hr respectively. Green distributions are experimental data and black distributions are the simulated data for the CAG GFP sample. Panel B is a sample of the fit error landscape. Blue dots represent points along the error trace sampled during the decent and red points represent local minima arrived given different initial conditions. Panel C is a linear fit (excluding minima where b2 has little impact on minima) indicating the relationship between the local minima given different initial delay parameters. S U P P L E M E N TA R Y F I G U R E 2 1 bf versus Kd Fit Landscape: Samples of the fit error landscape. Blue dots represent points along the error trace sampled during the decent and red points represent local minima arrived given different initial conditions. The linear fit indicates the relationship between the local minima given different initial conditions. Panel A is for the no-shRNA bf and Kd values. Panel B is for the constant-shRNA and panel C is for the feedforward-shRNA case.
S U P P L E M E N TA R Y F I G U R E 2 2 eKd: Panel A Is a line plot depicting how the percentage of cleaved promoters changes as P FLP increases. eKd is marked on this plot in red and will increase or decrease the amount of P F LP needed to cleave half the reporter circuit for half of the cells in a population. Panel B represents how eKd changes with respect to how much ng of Flp or shRNA is used in the system for the constant shRNA expression topology. Panels C and D represent how eKd changes for the feedforward topology given different amounts of r tT A a created at different DOX levels. S U P P L E M E N TA R Y F I G U R E 2 7 A representative workflow for processing flow cytometry data collected for all experiments shown in this manuscript. Populations of cells are first gated for live, single cells using a 2D Guassian mixture model and autofluorescnece correction for each fluorescent protein collected on each filter is performed using HEK293FT cells transfected with a null plasmid. All fluorescence data collected is then standardized to a single unit of Molecules of Equivalent Fluorescein (MEFL) using data from SpheroTech RCP-30-5A beads collected on each fluorescence channel. Compensation is performed to determine and correct the level of bleedthrough of each fluorophore into unintended channels, and finally all fluorescence data is standardized to GFP MEFL using a sample containing all fluorescent proteins driven by the same promoter (from different plasmids) transfected in equal amounts.