Sparse estimation of mutual information landscapes quantifies information transmission through cellular biochemical reaction networks

Measuring information transmission from stimulus to response is useful for evaluating the signaling fidelity of biochemical reaction networks (BRNs) in cells. Quantification of information transmission can reveal the optimal input stimuli environment for a BRN and the rate at which the signaling fidelity decreases for non-optimal input probability distributions. Here we present sparse estimation of mutual information landscapes (SEMIL), a method to quantify information transmission through cellular BRNs using commonly available data for single-cell gene expression output, across a design space of possible input distributions. We validate SEMIL and use it to analyze several engineered cellular sensing systems to demonstrate the impact of reaction pathways and rate constants on mutual information landscapes.

: Response functions, mutual information landscapes, and optimal input distribution for replicate A1 of BRNs with deactivated lacY and relative lacI translation rate of 1 (Supplementary Figure 6). The leftmost panel uses one-fourth of the output data, the middle panel uses half of the input data, and the rightmost panel uses the total dataset. Figure 13: Response functions, mutual information landscapes, and optimal input distribution for replicate A1 of BRNs with deactivated lacY and relative lacI translation rate of 10 (Supplementary Figure 7). The leftmost panel uses one-fourth of the output data, the middle panel uses half of the input data, and the rightmost panel uses the total dataset. Figure 14: Response functions, mutual information landscapes, and optimal input distribution for replicate A1 of BRNs with activated lacY and relative lacI translation rate of 0.008 (Supplementary Figure 8). The leftmost panel uses one-fourth of the output data, the middle panel uses half of the input data, and the rightmost panel uses the total dataset. Figure 15: Response functions, mutual information landscapes, and optimal input distribution for replicate A1 of BRNs with activated lacY and relative lacI translation rate of 1 (Supplementary Figure 9). The leftmost panel uses one-fourth of the output data, the middle panel uses half of the input data, and the rightmost panel uses the total dataset. Figure 16: Response functions, mutual information landscapes, and optimal input distribution for replicate A1 of BRNs with activated lacY and relative lacI translation rate of 10 (Supplementary Figure 10). The leftmost panel uses one-fourth of the output data, the middle panel uses half of the input data, and the rightmost panel uses the total dataset. Figure 17: Examples for correction of finite-sampling bias. With a smaller sample size N, the mutual information, I(X;g), is overestimated by a higher amount. The Yintercept of the linear fit for multiple subsample sizes was used to estimate the unbiased mutual information from finite data. We used this method to compute the limiting mutual information for the entire design space for all the engineered BRNs. (a) for BRN with deactivated lacY and relative lacI translation rate constant 1, replicate B1, I = 1.54 + 0.00174(N T /N), and (b) for BRN with lacY and relative lacI translation rate constant 1, replicate A1, I = 1.053 + 0.011(N T /N). Each of the plots shows the mutual information values at the optimal input distribution in the design space for the respective BRNs. Figure 18: Plasmid map of construct used for flow cytometry measurements. For data without feedback (main text Fig. 4 panels b, c, d), three sequential codons in the lacY CDS were mutated to stop codons (positions D37, I38, and N39, indicated with red asterisk above cyan lacY symbol). Additionally, to modify the lacI protein concentration in the cell, the RBS of lacI was changed to modify the translation rate of lacI, as indicated with red asterisk above lacI RBS. Sequences described below.

pYFP-lacY
Supplementary Figure 19: Example of automated gating used for cytometry analysis. a, Side scatter signal vs. forward scatter signal before automated cell gating. b, Side scatter signal vs. forward scatter signal after automated cell gating. c, Side scatter signal vs. side scatter area/height ratio before automated singlet gating. d, Side scatter signal vs. side scatter area/height ratio after automated singlet gating. A blank (buffer only) sample was measured before the E. coli samples, and was used to automatically determine the gate to apply to the side scatter vs. forward scatter data to discriminate cell events from non-cell events (a, b). The Attune cytometer is calibrated so that the scattering area and height parameters are equal for singlet events. A second automated gating algorithm took advantage of this to select singlet cells based on the identification of detection events with side scatter area/height ≅ 1.

STOCHASTIC REDUCED ORDER MODEL OF INPUT DISTRIBUTION
SEMIL is used to estimate the mutual information landscape of a BRN across a design space of possible input probability distributions. This is accomplished by first computing a discrete approximation of the continuous input distribution, p(X), for every point in the design space. This discrete approximation is obtained using stochastic reduced-order modeling as described below.
Consider a stochastic input variable X with probability distribution p(X) over a support [Xmin, X max ]. Let a trial discrete approximation be a set of input values {xk} and associated probability masses P * (X = x k ), such that å k P * (X = x k ) = 1. The cumulative distribution function (CDF) for the continuous input variable is and the CDF for the discrete approximation is where Θ(X -X k ) is the Heaviside step function, i.e. Θ(a) = 1 when a ≥ 0 and Θ(a) = 0 when a < 0. The n th moment of the continuous input variable is and the same moment for the discrete approximation is We define two error terms due to the differences between the continuous CDFs and moments and their respective discrete approximations: and

STOCHASTIC REDUCED ORDER MODEL OF INPUT DISTRIBUTION
SEMIL is used to estimate the mutual information landscape of a BRN across a design space of possible input probability distributions. This is accomplished by first computing a discrete approximation of the continuous input distribution, p(X), for every point in the design space. This discrete approximation is obtained using stochastic reduced-order modeling as described below.
Consider a stochastic input variable X with probability distribution p(X) over a support [X min , X max ]. Let a trial discrete approximation be a set of {X (k) } and associated probability The cumulative distribution function (CDF) for the true input variable is and the CDF for the discrete approximation is where H(X X (k) ) is the Heaviside step function, i.e. H(a) = 1 when a 0 and H(a) = 0 when a < 0. The n th moment of the continuous input variable is and the same moment for the discrete approximation is We define two error terms due to the di↵erence in the true and the approximated CDFs and moments. and where m is the total number of moments over which we are quantifying the error ✏ µ . For SEMIL is used to estimate the mutual information landscape of a BRN across a design space of possible input probability distributions. This is accomplished by first computing a discrete approximation of the continuous input distribution, p(X), for every point in the design space. This discrete approximation is obtained using stochastic reduced-order modeling as described below.
Consider a stochastic input variable X with probability distribution p(X) over a support [X min , X max ]. Let a trial discrete approximation be a set of input values {x k } and associated probability masses P ⇤ (X = x k }, such that P k P ⇤ (X = x k ) = 1. The cumulative distribution function (CDF) for the continuous input variable is and the CDF for the discrete approximation is where ⇥(X X k ) is the Heaviside step function, i.e. ⇥(a) = 1 when a 0 and ⇥(a) = 0 when a < 0. The n th moment of the continuous input variable is and the same moment for the discrete approximation is We define two error terms due to the di↵erences between the continuous CDFs and moments and their respective discrete approximations: SEMIL is used to estimate the mutual information landscape of a BRN across a design space of possible input probability distributions. This is accomplished by first computing a discrete approximation of the continuous input distribution, p(X), for every point in the design space. This discrete approximation is obtained using stochastic reduced-order modeling as described below.
Consider a stochastic input variable X with probability distribution p(X) over a support [X min , X max ]. Let a trial discrete approximation be a set of {X (k) } and associated probability The cumulative distribution function (CDF) for the true input variable is and the CDF for the discrete approximation is where H(X X (k) ) is the Heaviside step function, i.e. H(a) = 1 when a 0 and H(a) = 0 when a < 0. The n th moment of the continuous input variable is and the same moment for the discrete approximation is We define two error terms due to the di↵erence in the true and the approximated CDFs and moments. and where m is the total number of moments over which we are quantifying the error ✏ µ . For 2

STOCHASTIC REDUCED ORDER MODEL OF INPUT DISTRIBUTION
SEMIL is used to estimate the mutual information landscape of a BRN across a design space of possible input probability distributions. This is accomplished by first computing a discrete approximation of the continuous input distribution, p(X), for every point in the design space. This discrete approximation is obtained using stochastic reduced-order modeling as described below.
Consider a stochastic input variable X with probability distribution p(X) over a support [X min , X max ]. Let a trial discrete approximation be a set of input values {x k } and associated probability masses P ⇤ (X = x k }, such that P k P ⇤ (X = x k ) = 1. The cumulative distribution function (CDF) for the continuous input variable is and the CDF for the discrete approximation is where H(X X k ) is the Heaviside step function, i.e. H(a) = 1 when a 0 and H(a) = 0 when a < 0. The n th moment of the continuous input variable is and the same moment for the discrete approximation is We define two error terms due to the di↵erence in the true and the approximated CDFs and moments. and 2

STOCHASTIC REDUCED ORDER MODEL OF INPUT DISTRIBUTION
SEMIL is used to estimate the mutual information landscape of a BRN across a design space of possible input probability distributions. This is accomplished by first computing a discrete approximation of the continuous input distribution, p(X), for every point in the design space. This discrete approximation is obtained using stochastic reduced-order modeling as described below.
Consider a stochastic input variable X with probability distribution p(X) over a support [X min , X max ]. Let a trial discrete approximation be a set of input values {x k } and associated probability masses P ⇤ (X = x k }, such that P k P ⇤ (X = x k ) = 1. The cumulative distribution function (CDF) for the continuous input variable is and the CDF for the discrete approximation is where ⇥(X X k ) is the Heaviside step function, i.e. ⇥(a) = 1 when a 0 and ⇥(a) = 0 when a < 0. The n th moment of the continuous input variable is and the same moment for the discrete approximation is We define two error terms due to the di↵erences between the continuous CDFs and moments and their respective discrete approximations: where m is the total number of moments over which we are quantifying the error ϵµ. For all the SEMIL calculations presented in this work we used m = 2. Using the two error functions (5) and (6), we can define the optimal probability masses, {P(X = xk)} as the solution to the minimization problem: where α F and αµ are factors to control the relative importance of the CDF error and moment error in the total optimization error functional. Supplementary Figure 1 shows an example of an optimal discrete approximation of a continuous distribution of the input variable. The set of values for ( α F , αµ ) were chosen by trying out various combinations. Increasing the ratio of α F / αµ improved the accuracy of the stochastic reduced-order model for input distributions with high geometric standard deviation but decreased it for low geometric standard deviation. For all the SEMIL calculations presented in this work we used α F = 100 and αµ = 1 which resulted in a similar magnitude of error for both high and low geometric standard deviation input distributions.
In previous work with stochastic reduced-order models [1], the values of the inputs used in the discrete input set, x k , were allowed to vary during the process of minimizing ϵ F (p,P * ) and ϵµ(p,P * ). Under this condition the solution to Eq. (7) not only yields the best probability masses P(X = x k ), but also the best input values {x} for those masses, and the error from sparse approximation decreases monotonically as the number of inputs is increased [1]. However, to apply this scheme for experimentally measured BRNs would require measurement of the output response data at a new set of input levels (or inducer concentrations) for every input distribution in the design space. Hence, to enable application with experimental data we keep the input values {x k } fixed.

VALIDATION OF SEMIL WITH ANALYTICAL BRN MODEL
To test the validity and quantitative accuracy of SEMIL, we used simulated data from an analytical model of BRN output. For this, we chose a model BRN response function where each value of the input, X, results in a gamma distribution of the output g. and where m is the total number of moments over which we are quantifying the error ✏ µ . For all the SEMIL calculations presented in this work we used m = 5. Using the two error functions (5) and (6), we can define the optimal probability masses, {P (X = x k )} as the solution to the minimization problem.
where ↵ F and ↵ µ are factors to control the relative importance of the CDF error and moment error in the total optimization error functional. Fig. 1 shows an example of an optimal and discrete approximation of a continuous distribution of the input variable. The set of values for (↵ F , ↵ µ ) were chosen by trying out various combinations. Increasing the ratio of ↵ F /↵ µ improved the accuracy of the stochastic reduced-order model for input distributions with high geometric standard deviation, but decreased it for low geometric standard deviation.
For all the SEMIL calculations presented in this work we used ↵ F = 100 and ↵ µ = 1 resulted in a similar magnitude of error both for high and low geometric standard deviation input distributions.
In previous work with stochastic reduced-order models [1], the values of the inputs used in the discrete input set, x k , were allowed to vary during the process of minimizing ✏ F (p, P ⇤ ) and ✏ µ (p, P ⇤ ). Under this condition the solution to Eq. (7) not only yields the best probability masses P ⇤ (X = x k ), but also the best input values {x k } for those masses, and the error from sparse approximation decreases monotonically as the number of inputs is increased [1].
However, to apply this scheme for experimentally measured BRNs would require measurement of the output response data at a new set of input levels (or inducer concentrations) for every input distribution in the design space. Hence, to enable application with experimental data we keep the input values {x k } fixed. 3 and where m is the total number of moments over which we are quantifying the error ✏ µ . For all the SEMIL calculations presented in this work we used m = 5. Using the two error functions (5) and (6), we can define the optimal probability masses, {P (X = x k )} as the solution to the minimization problem.
where ↵ F and ↵ µ are factors to control the relative importance of the CDF error and moment error in the total optimization error functional. Fig. 1 shows an example of an optimal and discrete approximation of a continuous distribution of the input variable. The set of values for (↵ F , ↵ µ ) were chosen by trying out various combination. We noticed that increasing ↵ F while keeping ↵ µ constant improved the accuracy of the stochastic reducedorder model for input distributions with high geometric standard deviation, but decreased it for low geometric standard deviation. Also, increasing ↵ µ while keeping ↵ F fixed improved the accuracy for the low geometric standard deviation but increased it for high geometric standard deviation. For all the SEMIL calculations presented in this work we used ↵ F = 100 and ↵ µ = 1 which seemed to produce similar magnitude of error both for high and low geometric standard deviation input distributions.
In previous work with stochastic reduced-order models [1], the values of the inputs used in the discrete input set, x k , were allowed to vary during the process of minimizing e F (p, P ⇤ ) and e µ (p, P ⇤ ). Under this condition the solution to Eq. (7) not only yields the best probability masses P ⇤ (X = x k ), but also the best input values {x k } for those masses, and the error from sparse approximation decreases monotonically as the number of inputs is increased [1].
However, to apply this scheme for experimentally measured BRNs would require measurement of the output response data at a new set of input levels (or inducer concentrations) for every input distribution in the design space. Hence, to enable application with experimental 3

VALIDATION OF SEMIL WITH ANALYTICAL BRN MODEL
To test the validity and quantitative accuracy of SEMIL, we used simulated data from an analytical model of BRN output. For this, we chose a model BRN response function where each input level X results in a gamma distribution of the output g.
p(g|X) ⌘ f (g; k(X), ✓(X)) = where (·) is the Gamma function, where k is the shape parameter and ✓ is the scale parameter.
Based on fits to real BRN output data, we used the Hill equation as an analytical model for the input dependence of the shape and scale parameters of the output gamma distribution.
where Γ(·) is the Gamma function, k is the shape parameter of the gamma distribution, and θ is the scale parameter of the gamma distribution.
Based on fits to real BRN output data, we used the Hill equation as an analytical model for the input dependence of the shape and scale parameters of the output gamma distribution. and For the set of calculations presented in Fig. 2 We first used SEMIL to determine the input distribution at which the mutual information is maximum, or the optimal input distribution, p c (X), for this model response function. The geometric mean and the geometric standard deviation of p c (X), were E = 50.1 and σ = 2.51, respectively. Then we used SEMIL to estimate mutual information, I(X;g), for two sets of input distributions: (1) with fixed E = 50.1 and σ ∈ [1,10 3 ], and (2) input distributions with fixed σ = 2.51 and E ∈ 2, 5 × 10 2 ].
To obtain the correct mutual information (true solution) for the same sets of input distributions, we numerically integrated the continuous form of the mutual information, using p(g|X) as given in Eq. (11):

COMPARISON OF SEMIL WITH RESULTS FOR THE SMALL NOISE LIMIT
Validating SEMIL with a model BRN response function for which the analytical solution to the optimal input distribution is known is ideal. Such analytical results are rare for biological response p(g|X) ⌘ f (g; k(X), ✓(X)) = g k 1 e g/✓ ✓ k (✓) (8) where (·) is the Gamma function, where k is the shape parameter and ✓ is the scale parameter.
Based on fits to real BRN output data, we used the Hill equation as an analytical model for the input dependence of the shape and scale parameters of the output gamma distribution.
and ✓ = 9 X n X n + IC n 50,2 For the set of calculations presented in Fig. 1(b) from Eq. (9)-(10), and generated mock data by drawing 10000 samples from the resulting gamma distribution, (8).
For each of the concentrations we determined (k, ✓) from Eq. (9)-(10), and obtained output distributions p(g|X) by drawing 10000 samples from (8). First, we used SEMIL to determine the optimal input distribution, p C (X) for this model response function using SEMIL. The geometric mean and the geometric standard deviation of p C (X), were E = 50.1 and = 2.51, respectively. Then we used SEMIL to estimate mutual information, I(X; g), for two sets of input distributions, with fixed E = 50.1 and 2 [1, 10 3 ], and input distributions with fixed = 2.51 and E 2 2, 5 ⇥ 10 2 ].
To obtain the correct mutual information (true solution) for the same sets of input dis-5 analytical model of BRN output. For this, we chose a model BRN response function where each input level X results in a gamma distribution of the output g.
Based on fits to real BRN output data, we used the Hill equation as an analytical model for the input dependence of the shape and scale parameters of the output gamma distribution.
and ✓ = 9 X n X n + IC n 50,2 For the set of calculations presented in Fig. 1 from Eq. (9)-(10), and generated mock data by drawing 10000 samples from the resulting gamma distribution, (8).
For each of the concentrations we determined (k, ✓) from Eq. (9)-(10), and obtained output distributions p(g|X) by drawing 10000 samples from (8). First, we used SEMIL to determine the optimal input distribution, p C (X) for this model response function using SEMIL. The geometric mean and the geometric standard deviation of p C (X), were E = 50.1 and = 2.51, respectively. Then we used SEMIL to estimate mutual information, I(X; g), for two sets of input distributions, with fixed E = 50.1 and 2 [1, 10 3 ], and input distributions with fixed = 2.51 and E 2 2, 5 ⇥ 10 2 ].
To obtain the correct mutual information (true solution) for the same sets of input dis-5 tributions, we numerically integrated the continuous form of the mutual information, using p(g|X) as given in Eq. (11): functions. But, there exists a theoretical result for response functions in the small noise limit [1], where it has been shown that the CDF of the optimal input distribution, pC(X), converges to the response function. To check that estimates of pC(X) from SEMIL satisfies this results, we chose a Hill response function, which is similar used in the original proof [1]. The mean response ⟨g⟩ of this function rises from 0 to 1 with increasing input, X.
where n = 2 and IC 50 = 50. We selected a mean-dependent noise function from [1], where the magnitude of the noise for the response function, Eq. (12), scales with α. The conditional output distribution for an input X is the normal distribution, with ⟨g⟩ from Eq. (12) and σ 2 from Eq. (13). While keeping the mean response function, Eq. (13), the same we used α ∈ {1, 0.1, 0.01} to generate output with the same mean but decreasing levels of noise. We selected 20 input concentrations at each of which we drew 10000 samples from the conditional output distribution Eq. (14) to obtain a set of discretized distributions for P(g|X;α). Then we used SEMIL to compute the optimal input distribution, pC(X), for each noise level (or α).

EVALUATION OF SEMIL WITH MINIMAL BRN STOCHASTIC SIMULATIONS
We further evaluated SEMIL using simulated data for a BRN modeled after the lac operon from E. coli. Using the Gillespie algorithm, we generated simulated output data as described below. We then used the simulation results to compute the mutual information.
To calculate the correct result, we used simulation results for a dense set of input values to numerically integrate the mutual information as: Validating SEMIL with a model BRN response function for which the analytical solution to the optimal input distribution is known is ideal. Such analytical results are rare for biological response functions. But, there exists a theoretical result for response functions in the small noise limit [1], where it has been shown that the CDF of the optimal input distribution, p C (X), converges to the response function. To check that estimates of p C (X) from SEMIL satisfies this results, we chose a Hill response function, which is similar used in the original proof [1]. The mean response hgi of this function rises from 0 to 1 with increasing input, X.
hgi(X) = X n X n + IC n 50 (12) where n = 2 and IC 50 = 50. We selected a mean-dependent noise function from [1], where the magnitude of the noise for the response function, Eq. (12), scales with ↵. The conditional output distribution for an input X is the normal distribution, with hgi from Eq. (12) and 2 from Eq. (13). While keeping the mean response function, Eq.
(13), the same we used ↵ 2 {1, 0.1, 0.01} to generate output with the same mean but decreasing levels of noise. We selected 20 input concentrations at each of which we drew 10000 samples from the conditional output distribution Eq. (??) to obtain a set of discretized distributions for P (g|X; ↵). Then we used SEMIL to compute the optimal input distribution, p C (X), for each noise level (or ↵).

7
Validating SEMIL with a model BRN response function for which the analytical solution to the optimal input distribution is known is ideal. Such analytical results are rare for biological response functions. But, there exists a theoretical result for response functions in the small noise limit [1], where it has been shown that the CDF of the optimal input distribution, p C (X), converges to the response function. To check that estimates of p C (X) from SEMIL satisfies this results, we chose a Hill response function, which is similar used in the original proof [1]. The mean response hgi of this function rises from 0 to 1 with increasing input, X.
hgi(X) = X n X n + IC n 50 (12) where n = 2 and IC 50 = 50. We selected a mean-dependent noise function from [1], where the magnitude of the noise for the response function, Eq. (12), scales with ↵. The conditional output distribution for an input X is the normal distribution, with hgi from Eq. (12) and 2 from Eq. (13). While keeping the mean response function, Eq.
(13), the same we used ↵ 2 {1, 0.1, 0.01} to generate output with the same mean but decreasing levels of noise. We selected 20 input concentrations at each of which we drew 10000 samples from the conditional output distribution Eq. (??) to obtain a set of discretized distributions for P (g|X; ↵). Then we used SEMIL to compute the optimal input distribution, p C (X), for each noise level (or ↵). 7 optimal input distribution is known is ideal. Such analytical results are rare for biological response functions. But, there exists a theoretical result for response functions in the small noise limit [1], where it has been shown that the CDF of the optimal input distribution, p C (X), converges to the response function. To check that estimates of p C (X) from SEMIL satisfies this results, we chose a Hill response function, which is similar used in the original proof [1]. The mean response hgi of this function rises from 0 to 1 with increasing input, X.
hgi(X) = X n X n + IC n 50 (12) where n = 2 and IC 50 = 50. We selected a mean-dependent noise function from [1], where the magnitude of the noise for the response function, Eq. (12), scales with ↵. The conditional output distribution for an input X is the normal distribution, with hgi from Eq. (12) and 2 from Eq. (13). While keeping the mean response function, Eq.
(13), the same we used ↵ 2 {1, 0.1, 0.01} to generate output with the same mean but decreasing levels of noise. We selected 20 input concentrations at each of which we drew 10000 samples from the conditional output distribution Eq. (??) to obtain a set of discretized distributions for P (g|X; ↵). Then we used SEMIL to compute the optimal input distribution, p C (X), for each noise level (or ↵).

7
where P(g = g j |X) is the probability for a cell to contain g j protein molecules given an input value of X, and P(g = gj) = òX p(X)P(g=gj|X)dX is the marginal probability for a cell to contain g j output reporter protein molecules. To estimate P(g = g j |X) and P(g = gj) from the simulation results, we directly used the observed frequencies from the simulated data. Note that the Gillespie simulations give a discrete result for the gene expression output (number of protein molecules) so the integration over g in Eq. (11) is replaced by a summation over g j in Eq. (15). Also, for numerical integration over the domain of X, we divided the domain of log 10 X into 400 equal intervals or 401 values of X for which we obtained estimates of the output distribution from independent Gillespie simulations. The correct mutual information landscape computed using numerical integration of Eq. (15) is shown in Supplementary Figure 3.
To estimate the mutual information using SEMIL, we took sparse subsets of the simulated data at 5, 10, and 20 discrete input values and used the corresponding simulated output data with the SEMIL algorithm. Results comparing the SEMIL estimates and the correct result are shown in Fig. 3 in the main text.
The model BRN used for Gillespie simulations is based on a reaction network model of the lac operon [2], where an input concentration controls the output response of a protein by interacting with a repressor-operator complex. The input to this BRN is the concentration of isopropyl β-D-1-thiogalactopyranoside (IPTG). The stochastic simulations were done using the Gillespie algorithm. Tables 1-4 list the components of the BRN, the reactions, propensity functions and rate constants.
The domain of the input, X = IPTG concentration, was from 10 -1 μmolL -1 to 10 4 μmolL -1 . We divided the interval log10X into 400 equal intervals, resulting in 401 input values. We simulated the BRN (Tables 1-4) for these 401 concentrations of IPTG to get the output data used for evaluation. Each simulation was run for 2600 minutes and the first 100 minutes were neglected to remove any effect of the initial condition. The output reporter protein level in the BRN was sampled after every 0.01 minutes. Examples of discrete distributions of the output reporter protein are shown in Supplementary Figure 2.
For the mutual information landscapes with SEMIL we used the following sets of input values, which are roughly equally spaced in the domain of the input log 10 X.
2. Ten input values ( Fig. 3(b, e)  3. Twenty input values (Fig. 3(c, f)  then used the simulation results to compute the mutual information.
To calculate the correct result, we used simulation results for a dense set of input values to numerically integrate the mutual information as: where P (g = g j |X) is the probability for a cell to contain g j protein molecules given an input value of X, and P (g = g j ) = R X p(X)P (g = g j |X)dX is the marginal probability for a cell to contain g j output reporter protein molecules. To estimate P (g j |X) and P (g j ) from the simulation results, we directly used the observed frequencies from the simulated data. Note that the Gillespie simulations give a discrete result for the gene expression output (number of protein molecules) so the integration over g in Eq. (11) is replaced by a summation over g j in Eq. (15). Also, for numerical integration over the domain of X, we divided the domain of log 10 X into 400 equal intervals or 401 values of X for which we obtained estimates of the output distribution from independent Gillespie simulations. The correct mutual information landscape computed using numerical integration of Eq. (15) is shown in Fig. 3.
To estimate the mutual information using SEMIL, we took sparse subsets of the simulated data at 5, 10, and 20 discrete input values and used the corresponding simulated output data with the SEMIL algorithm. Results comparing the SEMIL estimates and the correct result are shown in  (14) up to N T /N = 20 was applicable. But for BRNs with activated lacY we could use only up to N T /N = 10 for the linear fit. With the experimentally measured BRN output data used for this work, there were typically more than 5,000 single cell observations for each input value. Consequently, the mutual information estimates obtained directly from the full data (N T /N = 1) were close to the extrapolated, unbiased estimate (within 0.01 bits near the optimal input distribution and typically within 0.1 bits near the boundaries of the design space).

Mutual information landscapes of replicate measurements
The mutual information landscapes for the replicate measurements are in Supplementary Figures  5-10. The black or white dots show the coordinates of the optimal input distribution, (E C ,σ C ), and the same colored contours around these points bound the mutual information values that are within 0.05 bits from the maximum mutual information.

Mutual information landscapes using smaller sets of input values
We took one of the replicates for each of the six experimentally-studied BRNs and studied the impact of a smaller set of output data on the mutual information landscape from SEMIL. Supplementary Figures 11-16 show the landscapes by systematically using the output data for a subset of inputs values. The first panel of each of the figures shows the mutual information landscape obtained using one-fourth (5) of the input values, the second panel uses half (9) of the input values, and the third panel uses the total set (18) of input values. The landscapes obtained using the half and the full set of input values are indistinguishable for each of the BRNs. The difference in the mutual information landscapes from 5 and 9 input values is only prominent for BRNs that have a relatively higher maximum mutual information (Supplementary Figure 12-13).

Maximum mutual information from SEMIL and Blahut-Arimoto algorithm
The most common method to determine the maximum mutual information (Imax), or the channel capacity, is the Blahut-Arimoto algorithm [7]. We compare the maximum mutual information obtained using Blahut-Arimoto algorithm and SEMIL in Supplementary Figure 17. SEMIL evaluates mutual information constrained to a chosen design space of input probability distributions, whereas Blahut-Arimoto performs an unconstrained search. Hence the maximum mutual information from SEMIL is consistently lower than the estimate from Blahut-Arimoto algorithm. However, the optimal input distribution from Blahut-Arimoto is spiky and discontinuous and difficult to interpret as a biologically plausible input distribution [3], SEMIL circumvents this problem of interpretation by using a well-defined design space of continuous input distributions.

Details of E. coli culture conditions
Details of the growth conditions used for the E. coli cultures are provided in Supplementary Table  6 and 7, in the format recommended as a minimum information standard for bacterial cell growth [8].
For data shown in the main text Fig. 4, data without feedback (panels b, c, d) was collected by truncating the lacI protein at positions D37, I38, and N39 (highlighted cyan, red text) by substituting three stop codons: TAATAATAG.
For main text Fig. 4, RBS of lacI was changed to modify translation rate of lacI. RBS sequence in magenta text above and changed to sequences in Supplementary Table 5.

Flow Cytometry Gating
After growth, 20 µL from each E. coli sample was diluted into 180 µL of phosphate buffered saline supplemented with 170 µg/mL chloramphenicol to halt protein translation. The resulting diluted samples were measured on an Attune NxT flow cytometer with 488 nm excitation laser and a 530 19 nm ± 15 nm bandpass emission filter. Blank samples were measured with each set of E. coli samples, and the results of the blank measurements were used with an automated gating algorithm to discriminate cell events from non-cell events (Supplementary Figure 19(a,b)). A second automated gating algorithm was used to select singlet cell events and exclude doublet, triplet, and higher-order multiplet cell events. All subsequent analysis was performed using the singlet cell event data (Supplementary Figure 19(a,b)). Between 1,000 and 50,000 singlet cell events were detected and analyzed for each E. coli sample.

CORRECTING FOR FINITE-SAMPLING BIAS IN MUTUAL INFORMATION ESTIMATES
Mutual information estimates obtained directly from data are typically biased. There are two sources of related, but competing bias: First, finite sampling of the data used to estimate probabilities results in a bias of the mutual information estimate toward values that are higher than the true value [3][4][5]. Second, binning of the data can alleviate some of the finite-sampling bias, but if the number of bins used is too small, the estimated mutual information will be less than the true value [3,4].
We can use existing methods [3][4][5] to correct for the finite sampling bias by extrapolating to an infinite amount of data as described below. However, the bias from binning of the data represents an actual loss of information that cannot be recovered. So, the general approach is to choose a number of bins that is sufficiently large to avoid systematic underestimation of the mutual information, but that is also sufficiently small to avoid severe undersampling [6].
We chose the number of bins for the experimentally studied BRNs by estimating the maximum mutual information (without correcting for finite sampling bias) in the design space using different numbers of bins, nb ∈ {100, 250, 500, 1000}. We found that the maximum mutual information was independent of the number of bins (constant to within 0.05 bits) for ≥ 250. We therefore selected nb = 500 to determine the discretized output distributions P(g = gj |X = x i ) for the BRNs in Supplementary Figure 4.
For the model BRN in Supplementary Figure 1b-1d, we used 100 bins for the output distributions. We repeated the SEMIL calculations with 200 bins and did not observe a significant difference in the mutual information values.
For each mutual information estimate, we corrected for finite sampling bias by extrapolating to an infinite amount of data using previously described methods. The biased mutual information, Ibias, is related to the correct mutual information I ∞ by the asymptotic relationship [3,4]: where N is the number of samples used to estimate the mutual information. For sufficiently large sample size, N, the second order term in Eq. (16) is negligible and I bias is close a to linear function of 1/N. Let the total number of available samples be NT and rewrite Eq. (16) as where NT is the total number of samples available and N T /N = 0 corresponds to infinite samples. We chose subsample sizes, N T /N = {1, 2, 5, 10, 20} and for each value of N T /N we drew 5 replicate sub-samplings (with replacement) of the output data and computed the corresponding biased estimates of mutual information. We then fit the (I bias , N T /N) dataset with Eq. (17) using linear value [2,5].
We can use existing methods [1,2,5] to correct for the finite sampling bias by extrapolating to an infinite amount of data as described below. However, the bias from binning of the data represents an actual loss of information that cannot be recovered. So, the general approach is to choose a number of bins that is sufficiently large to avoid systematic underestimation of the mutual information, but that is also sufficiently small to avoid severe undersampling [4].
We chose the number of bins for the experimentally studied BRNs by estimating the maximum mutual information (without correcting for finite sampling bias) in the design space using different numbers of bins, nb 2 {100, 250, 500, 1000}. First, for each of the six experimental BRNs (Fig.   3 main text) we determined the maximum and the minimum value of the output for all the inputs X 2 {x i } as g max and g min , respectively, which we used as the edges for the histograms P (g|X = x i ). Using bin sizes nb we obtained a family of conditional output distributions P nb (g|X) for each X 2 {x i }. Then we determined the corresponding maximum mutual information, C nb using Eq. (2) in Methods for each nb 2 {100, 250, 500, 1000}. We evaluated the difference in the estimated maximum mutual information with increasing number of bins. Since we noticed that |C 250 C 500 | < 0.05 bits and |C 500 C 1000 | < 0.05 bits, we selected number of bins as 500 to determine the discretized output distributions P (g|X = x i ) for the BRNs in Fig. 3.
For the model BRN in Fig. 1b-1d, we used 100 bins for the output distributions. We repeated the SEMIL calculations with 200 bins and did not observe a significant difference in the mutual information values.
For each mutual information estimate, we corrected for finite sampling bias by extrapolating to an infinite amount of data using previously described methods. The biased mutual information, I bias , is related to the correct mutual information I 1 by the asymptotic relationship [2,5]:

13
where N is the number of samples used to estimate the mutual information. For sufficiently large sample size, N , the second order term in Eq. (16) is negligible and I bias is close a to linear function of 1/N . Let the total number of available samples be N T and rewrite Eq. (16) as where N T is the total number of samples available and N T /N = 0 corresponds to infinite samples.
We chose subsample sizes, N T /N = {1, 2, 5, 10, 20} and for each value of N T /N we drew 5 replicate sub-samplings (with replacement) of the output data and computed the corresponding biased estimates of mutual information. We then fit the (I bias , N T /N ) dataset with Eq. (17) using linear least squares to obtain the unbiased mutual information, I 1 . This procedure was followed for every point on the design space.
Two examples of the set of biased mutual information as a function of N T /N are shown in