Maximal information transmission is compatible with ultrasensitive biological pathways

Cells are often considered input-output devices that maximize the transmission of information by converting extracellular stimuli (input) via signaling pathways (communication channel) to cell behavior (output). However, in biological systems outputs might feed back into inputs due to cell motility, and the biological channel can change by mutations during evolution. Here, we show that the conventional channel capacity obtained by optimizing the input distribution for a fixed channel may not reflect the global optimum. In a new approach we analytically identify both input distributions and input-output curves that optimally transmit information, given constraints from noise and the dynamic range of the channel. We find a universal optimal input distribution only depending on the input noise, and we generalize our formalism to multiple outputs (or inputs). Applying our formalism to Escherichia coli chemotaxis, we find that its pathway is compatible with optimal information transmission despite the ultrasensitive rotary motors.

. Connection between environmental cues and chemotactic response. Chemotactic bacteria live in complex microenvironments in which input distributions of chemical concentrations are shaped by the swimming behavior (top left), chemical sources and sinks (top middle), and competition with other bacteria (top right). Inputs are processed by the cell-internal chemotaxis pathway, which can be viewed as an inputoutput device (bottom). Specifically, input-output curves are measured in experiments by dose-response curves with noise. The resulting final behavior feeds back into the environment. Evolution is assumed to select the best input-output curves for maximizing fitness. The chemotactic pathway is a two-component system, and for modeling purposes, is divided into two information-transmission channels: receptors sense external concentration of stimuli and their activity regulates the protein CheY p (receptor channel, bottom left). CheY p is the internal representation of the external stimulus and regulates motor switching (clockwise or counterclockwise rotation) and thus bacterial motility (straight swimming via a 'run' or random reorientation via a 'tumble'; motility channel, bottom right). Note that there is additional adaptation both at the receptors 75 and the motors 52 2 where p(y|x) is the conditional probability of observing Y = y at given X = x, encoding the input-output curve and noise. Quantities p(x) and p(y) represent the input and output distributions, respectively, which are mathematically connected by the conservation of probability p(x)dx = p(y)dy, valid in the small-noise limit. Considering small Gaussian noise with conditional probability, σ π σ | = − − p y x y y ( ) exp[ ( ) /(2 )]/ 2 2 T 2 T 2 , with mean y (x) and standard deviation σ T (x), Eq. (1) becomes x x 2 T on off  where x on and x off set the sensitive region, i.e. the dynamic range of inputs 14,15 (similar equations appear in 12,13,20 ). In addition to p(x), Eq. (2) depends on the gain ′ y (x), i.e. the first derivative of the input-output curve y (x), and total noise σ T (x).
To understand if biological systems maximize information flow, we need to maximise the mutual information and derive general principles or compare with data. To maximise the mutual information, the channel capacity is often considered, i.e. the mutual information maximized with respect to the input distribution 12,13 . Alternatively, the mutual information can be maximized with respect to the input-output curve assuming a fixed input distribution 14,15 . The former method is an attempt to deal with the often unknown input distribution, while the latter is based on the idea that the biological channel can be modified by evolution. Additionally, the two approaches were combined for specific input-output Hill and Hill-like functions [16][17][18] . However, is there a general way to unify the different methods, without making assumptions about functional form of the input-output functions?
Formally, we maximize the mutual information, Eq. (2), with respect to p(x) and y x) by writing where the right-hand side of Eqs. (3) and (4) are Euler-Lagrange equations from calculus of variations with x p y y p p ( , , , ) log e y 2 2 T , i.e. the integrand of Eq. (2). Equation (3) represents the channel capacity applied to a Gaussian channel (i.e. Gaussian conditional probability p(y|x), see 12,13 for examples in gene regulation). In contrast, Eq. (4) is used to obtain the optimal input-output curve for a given input distribution (see 14,15 for examples in sensory systems). For completeness, we provide the solutions of the individual maximizations of Eqs. (3) and (4) in SI Text Sec. 1.1-1.2, with a discussion of the sensitive region in Sec. 1.8.
When the noise is uniform (σ T constant) Eqs (3) and (4)  so that p/ ′ y = const and hence is conserved (not in time but in input space), following the Emmy Noether theorem. In this case, maximizing the mutual information leads to a simple matching relationship ( ′ y ∝ p), so that the input-output curve is the cumulative integral of the input distribution (see S1 text, Sec. 1.1 and Fig. S1) 30 . However, in general when both input and output noise matter the noise is a function of the input and input-output curve, given by σ T = σ T (x, y , ′ y ). Assuming independent cell-external and internal noise, we consider x y T 2 2 2 which follows from error propagation. Specifically, σ x is the input noise depending on x only, amplified by the gain ′ y (x), and σ y is the output noise depending on y (x) only. In case of negligible input (σ x ≈ 0) or output (σ y ≈ 0) noise, Eq. (3) again converges to Eq. (4) and the system can be solved for any input-output curve y (x) (see S1 text, Sec. 1.2-1.3). As a result, the predicted input and output distributions from the two optimization approaches become identical ( Fig. 2A). However, in general the two equations differ and the resulting optimal input and output distributions are very different in the two approaches (Fig. 2B). In particular, the output distributions can be uni-or bimodal with details described in SI Text, Sec. 1.3.
It is worth noting that, in Bayesian statistics the Fisher information is linked to the channel capacity 19,20 . A key problem in Bayesian statistics is choosing a prior distribution for a given stochastic process (i.e. p(y|x)) 31 . The idea of having a prior which does not affect the posterior distribution (i.e. p(x|y)) is linked to maximal mutual information, given by the average Kullback-Leibler divergence between prior and posterior distributions. This prior distribution is called the reference prior 19 , given by . As shown in ref. 20 . the Fisher information is the result of maximizing the equivalent of Eq. (2) for a general (not necessarily Gaussian) conditional probability distribution (see S1 Text, Sec. 2.). Hence, the channel capacity and the approach based on the Fisher information are equivalent.
Maximizing mutual information: a new approach. The difference between Eqs. (3) and (4) is that Eq.
(3) assumes a fixed input-output curve and a variable input distribution, while Eq. (4) assumes a fixed input distribution and a variable input-output curve. There might be situations in which one approach is more appropriate than the other but in a general biological context the two are intrinsically connected (Fig. 1). From a mathematical point of view, Eqs (3) and (4) can be combined and solved together, i.e. x y [ , ]  maximized with respect to both p(x) and y (x). Similar numerical double optimizations are common in rate distortion theory using, e.g., the Blahut algorithm 4, 32 .
In what follows, we provide the analytical solution for p(x) and y (x) by solving Eqs. (3) and (4) together. We assume a fixed dynamical range of inputs set by x on and x off (given by the receptor sensitivity), leading in return to a fixed dynamical range of outputs from y(x on ) = 1 to y(x off ) = 0. We consider Eq. (2) with noise given by Eq. solving the maximization problem Eq. (3) for p(x) using a fixed y (x) and noise (red), and maximization with respect to input-output curve, i.e. solving the maximization problem Eq. (4) for y (x) using a fixed p(x) and noise (blue). (For specific examples of the solutions of Eqs (3) and (4) and a discussion of the bimodality of the output distribution, see SI Text Sec. 1). In both cases, the noise is provided as a function of the input distribution and input-output curve (top row), with the input-output curve y (x) = [1 + (x/k d ) n H] −1 assumed a Hill function for simplicity, with Hill coefficient n H and threshold k d (inset). (A) For small input noise, the two approaches converge, i.e. the red and the blue input (middle left) and output (bottom left) distributions match. (B) For large input noise, the two approaches predict different input (middle right) and output (bottom right) distributions. Using Eq. (5) for the noise, the input noise is σ x 2 = α 1 x, while the output noise σ y 2 = α 2 y (1 − y ) + α 3 y + α 4 has three different contributions with y the input-output curve. The parameters are chosen to provide an overall similar level of noise, given by α 1 = 10 −8 , α 2 = 2 · 10 −6 , α 3 = 10 −7 , α 4 = 10 −8 (panel A) and α 1 = 10 −7 , α 2 = 10 −8 , www.nature.com/scientificreports www.nature.com/scientificreports/ x y 2 2 2 where Z and Q are two constants set by normalization and boundary conditions, respectively (see Materials and Methods and S1 Text, Sec. 1.7). Equation (6a) for the input distribution extends the matching relationship found in 30 to nonuniform noise. In the latter case the optimal input distribution weighs certain inputs more than uncertain inputs 13,20,33 . Equation (6b) determines the input-output curve which maximizes the mutual information given the noise. A solution of the system of equations exists if the transmitted input noise can be expressed in terms of the output noise or vice versa (see S1 Text, Sec. 1.7). While such a solution may seem very specific, it is certainly plausible, given enough time, that evolution eventually finds it.
How may evolution find the solution? To mimic evolution, we envision an adaptive algorithm, allowing the pathway to iteratively reach optimal information transmission. Given an environment and hence a distribution of inputs, p 1 (x) (Fig. 3A), evolution selects the optimal internal input-output curve, y 1 (x) (Fig. 3B). However, the distribution of inputs is susceptible to changes, which might be caused by a change of the organism's behavior, even in the same environment. The new input distribution, p 2 (x), may again lead to an increase in information transmission at fixed input-output curve, y 1 (x) (Fig. 3C). Subsequently, evolution will select a new input-output curve, y 2 (x), which enhances information transmission at a fixed input distribution, p 2 (x) (Fig. 3D). This cycle is repeated many times. If the optimal configuration is achievable and information transmission is a proxy for fitness, we expect that the solution of Eqs. (6a) and (6b) naturally emerges in the pathway. This is indeed the case for the examples studied here (see Fig. 3E,F). information transmission at E. coli chemoreceptors. To apply our new approach, we use the chemotaxis pathway of E. coli as an explicit example, since it is relatively simple and well characterized in its molecular components 6 . Briefly, chemoattractant (ligand) binding turns receptors off, inhibits the kinase CheA, and hence reduces the phospho-transfer from CheA p to CheY. This leads to 'runs' as only CheY p can bind the 6-8 motors to introduce 'tumbling' . There is also an adaptation mechanism, where addition of methyl groups to receptors compensates for increased attractant concentration by increasing the receptor activity and hence the CheY p level to induce cell tumbling. Removal of methyl group has the opposite effect 34,35 . In order to study signaling in fixed adaptational states, the adaptation enzymes can be removed from the chromosome and the receptor expressed with specific, genetically engineered, receptor modification levels to mimic receptor methylation (see S1 Text, Sec  www.nature.com/scientificreports www.nature.com/scientificreports/ Specifically, we consider the instantaneous information transmission between the chemoattractant methylaspartate (MeAsp) as the input and the response regulator CheY p as the output. Hence, we consider the information transmitted by the initial (fast) response for a given adaptational state (which only changes slowly). (At a later time this response is removed by adaptation and hence is transient only). Note, unlike ref. 36 we do not assume small Gaussian inputs but natural stimuli drawn from broad, potentially asymmetric input distributions p(x). When the input distribution of cells simulated in gradients of different strength match the optimal information-theoretical input distribution for the same receptor modification levels, the drift velocity up the gradient is maximized and, hence, this leads to optimal chemotaxis 15 . As this matching of input distributions occurs anywhere in the gradient, these initial responses describe chemotaxis in the whole gradient, and so implicitly include adaptation. Indeed, the predicted distributions of inputs are scale invariant (when normalized by the adapted concentration) and reproduce Weber's law and logarithmic sensing 15 . The latter can also be captured by the predictive mutual information 37 .
The functional form of the noise is assumed to be known and derived from microscopic theory as in 38 (see S1 Text, Secs. 3 and 4.2 for noise estimation and sensitivity to noise parameters, respectively). In short, the input variance is considered to be proportional to the input strength, σ x 2 = α 1 x, with x in units of the ligand concentration the cell is adapted to. Furthermore, α 1 ∝ (DNτ) −1 is given by the Berg-and-Purcell limit 39 , where D is the diffusion constant of the ligand molecules, N is the number of receptors acting cooperatively in a cluster, and τ is the averaging time, assuming a spherical cell 39 . The output noise has three contributions: signaling noise, switching noise due to on/off changes of the receptor state, and a constant background noise, leading to σ y 2 = α 2 y (1 − y ) + α 3 y + α 4 with phosphorylated y in units of the total CheY level, Y T , an intrinsic dependence on the (unknown) input-output curve, and α 2−4 additional parameters defined in S1 Text, Sec. 3. Note these effective noise terms are time-averaged due to their dependence on chemical reactions based on finite rate constants 38 . Despite σ y 2 being specific, this noise should apply to many receptor-signaling pathways, including other two-component pathways 40 .
Using this noise, the explicit solution of Eq. (6b) is where C and Q are constants set by imposing fixed boundary conditions, y(x off ) = 1 and y(x on ) = 0 (Fig. 4A).
Note that x appears with the prefactor Q/α 1 . Thus, α 1 is set by the boundary conditions and it can be seen as the units of x. We consider two special cases: by simplifying the output noise for α 2 = 0, we obtain  www.nature.com/scientificreports www.nature.com/scientificreports/ where the solution is again independent of α 1 after imposing the boundary conditions. For α 2,3 = 0, we obtain which does not depend on both α 1 and α 4 once Q and C′′ are set by the boundary conditions. The optimal input distribution obtained by inserting Eq. (6b) into Eq. (6a) is . In particular, for our choice of external noise and fixed sensitivity range, the input distribution converges to independently of α 1−4 (see S1 Text, Sec. 1.4.6 and Fig. S3). Importantly, this is a general result for the Berg-and-Purcell input noise, and hence should be valid for many signaling pathways. This result was previously found numerically 16 , and such an input distribution of glutamine was suggested to optimize nitrogen sensing 33 .
Extracting the sensitive regime of E. coli receptors for a fixed modification level (resembles receptor methylation level, see S1 Text, Sec. 4.3) 35,41 , we test the convergence of the adaptive algorithm to the solution in Eq. (7). After a few iterative cycles the system indeed converges (Fig. 4A), increasing the mutual information at each step (Fig. 4C, blue line). This convergence to the analytical solution occurs when starting at different initial conditions, showing robustness of our algorithm. Note that in Fig. 4C the solution y (x) is fitted to Hill functions for convenience of presentation, allowing the mutual information to be plotted as a function of a single parameter (i.e. the Hill coefficient n). The optimal curve selects a Hill coefficient compatible to the experimental measurements from FRET data, at least for larger receptor modification levels (Fig. S9) 35 . Applying the adaptive algorithm instead to Hill-function constrained input-output curves produces the same optimal Hill coefficient n albeit with a smaller mutual information (Fig. 4C, red solid line, with Fig. 4B comparing the corresponding optimal input distributions and optimal input-output curves). Note that for a fixed Hill equation the optimal mutual information is calculated directly using the input distribution from Eq. (6a), resulting in x x 2 T on off  as shown in Fig. 4C (red dashed line). However, unlike the sine function in Eq. (7), experimental dose-response curves of CheY p are thought to be well approximated by Hill functions rather than Eq. (7) 34 . There are several possible reasons for this discrepancy. For instance, in our model receptors are either fully sensitive or fully insensitive, and the solution given by Eq. (7) is only valid in the sensitive region and constant otherwise. In reality, receptors have a smooth sensitivity curve spanning the ligand-dissociation constant of the off and on states (such as dF/dlog(x) in 41 , where F is the receptor free-energy difference between on and off states). This may lead to a smooth Hill-function-like response. One way of imposing smooth input-output curves is to introduce the additional constraint of zero first derivatives at the boundary. In this case, however, we only obtain sigmoidal input-output curves without internal switching noise (α 3 = 0) (see S1 Text, Sec. 1.4.4 and Fig. S3E). Moreover, very asymmetric (or even bimodal) input distributions might be uncommon in natural environments 15 , potentially favoring log-normal input distributions and hence Hill-function-like responses 42 . Finally, E. coli needs to account for many other constraints and the pathway performs other tasks at the same time, such as sensing temperature and pH [43][44][45] . Hence, the E. coli sensory system might be in a suboptimal configuration for transmitting information about chemicals in order to account for all the other tasks. In the S1 Text, Sec. 1.4.5 we also solve the inverse problem and derive the optimal noise, which leads to an exact Hill function (see Fig. S4). In this case, the predicted input and output noises are not independent anymore. In summary, the mismatch between the experimental Hill functions and the solution in Eq. (7) is not an artifact of our assumptions on the sources of noise; it emerges when considering independent input and output noise, and when maximizing the mutual information with respect to both the input distribution and the input-output curve. information transmission along the E. coli chemotaxis pathway. Now that we understand the optimization of the mutual information better, we can tackle the second problem: Does the chemotaxis pathway maximise information transmission? Previous work suggests that the higher the information transmission at the receptors, the higher the drift velocity in the direction of the gradient 15 . Is this finding compatible with the recent observation of the ultrasensitive response of the motor to changes in internal CheY p 21 , or does such a steep response prevent the cell from high information transmission? To answer this question we extend our analysis to the whole chemotaxis pathway.
We consider a minimal model of two channels: a receptor channel for sensing by the chemoreceptors and a motor channel for the flagellar motors. For the receptor channel, the external chemical concentration x is the input and the internal CheY p concentration, y, is the output. For the motor channel, y is the input and the motor clockwise (CW) bias z (for tumble) is the output (Fig. 5A). To simplify the problem and to closely resemble the experimental dose-response curves, we now restrict the curves to Hill functions, with Hill coefficients n and m for receptors and motors, respectively. The noise expressions for the receptor and motor channels are given by 4 , respectively, where G y and G z are the gains of the receptor and motor channels. Parameters β 2−4 represent the noise of the motors and are kept generic due to lack of characterization, but may reflect analogous biological processes including adaptation of the motors 21,46,47 (see S1 Text, Sec. 3 for further details and robustness of results to changes in β values). Due to the immense gain at the motors 21 , we generally have higher noise at the motor then at the receptor (see S1 Text Sec. 4.5 for the discussion of the two limits). Here only n and m are considered adjustable parameters in our model.
The data processing inequality, which characterizes the flow of information in a Markov chain, states that, at any additional processing step, information can only be lost, never gained 48 . For instantaneous information transmission this means that the mutual information between the external concentration and the motor bias cannot be higher than the minimum of the mutual informations of the receptor and the motor channels, i.e.
min{ [ , ], [ , ]}. A strategy to possibly increase the mutual information is then to maximise the limiting mutual information.
We start by considering a single motor, represented by a single output z. We calculate the maximal mutual information at the receptor (Fig. 5B, left) and motor (Fig. 5B, right) channels, dealing with the optimization of the two channels separately. The mutual information at the motor, y z [ , ]  , always limits the whole information transmission for any Hill coefficient n of the receptors and m of the motor. Hence, single-motor cells should optimize the motor rather than the receptor channel. This result is not unexpected for the chemotaxis pathways since the ultrasensitive motor enhances the downstream noise, which is generally larger than the upstream noise (here the While for a single motor the optimal mutual information of the receptor channel (opt rec, blue solid line) is higher than the optimal mutual information at the motor (opt mot, red solid line), increasing the number of motors (see legend) enhances the optimal mutual information of the motor channel (opt 1 mot, dashed red line, for two motors). Arrows point to the predicted m values. The purple arrow indicates the optimal m value for a single motor (corresponding to purple circle in B), the green and orange arrows point to the optimal m values for two motors (corresponding to green and orange circles in B). The mutual information is further increased when the optimal mutual information for two motors is calculated (opt 2 mot, orange dashed line, see S1 Text, Sec. 4). Parameters: α 1−3 = 10 −4 , α 4 = 10 −5 , β 2 = 7 · 10 −4 , β 3 = 7 · 10 −4 , and β 4 = 1 · 10 −4 . www.nature.com/scientificreports www.nature.com/scientificreports/ total CheY p noise is the input noise for the motor channel). The resulting optimal information transmission corresponds to relatively low n and m (≈6; red area in right panel of Fig. 5B). Experimentally, the Hill coefficient of the receptor channel agrees with our prediction, ranging from 6-12 in Tar-only cells 35 . In contrast, the ultrasteep motor response curve with m ≈ 20 is in stark contradiction to our single-motor dose-response model. Hence, the single-motor model indicates higher information transmission at the receptors (see Figs. 5, S6, S7, and S1 Text, Secs. 3 and 4.2 for dependence on noise). However, E. coli has multiple motors, which might effect information transmission.
We now extend our model to multiple (K) motors, allowing the cell to make multiple measurements of the internal CheY p concentration. There are now a single input, y, and multiple outputs z 1 , ..., z K , of the CW biases. The chain rule for the mutual information allows us to calculate  .. (see S1 Text, Sec. 4.5). Real motors show evidence of partial coupling 49 , and thus we assume conditional independent motors, i.e. p(z 1 , ..z K |y) = Π i K p(z i |y). This means that all motors depend on the common y level but can independently select their CW bias. For two motors, the mutual information becomes

 
Using the small noise Gaussian approximation for p(z i |y), the conditional entropy is given by · π σ y log ( 2 e ( ) ) T , and H(z 2 |z 1 ) ≈ 0 (Fig. 5C, see also S1 Text, Sec. 4.5). Thus, Eq. (11) becomes ] [ ; ] d ( )log ( 2 e ( ) ) (12) We numerically tested that the conditional independence of the motors holds for two motors, despite the fact that motors compete for the binding of internal CheY p molecules, which can introduce negative correlations (see S1 Text, Sec. 4.6 and Fig. S10) Hence, for conditionally independent motors, the mutual information at the motors will eventually overtake the mutual information at the receptors when the number of motors increases. Consequently, the mutual information at the receptors becomes the limiting factor for information transmission (Fig. 5C, see 50 for the case of Gaussian input distributions). In other words, for a small number of motors the cell has high information transmission at the receptors, which will be wasted at the motors (cf. red and blue solid lines). In contrast, for a large number of motors the information transmission at the motors exceeds the information transmission at the receptors without overall improvement. However, in the intermediate case both receptors and motors equally limit the transmission of information (cf. red dashed and blue solid lines in Fig. 5C). E. coli chemotaxis seems to avoid bottlenecks and to optimally allocate resources the latter case is the most advantageous 51 . Hence, the ultrasteep Hill function of the motor (m ≈ 20) can be explained by this matching of the information transmission at the receptors and motors (orange arrow in Fig. 5C). Note that in addition to the high Hill coefficient m of the motors there is also a corresponding low m solution (green arrow in Fig. 5C). However, the latter is not robust to changes in m, i.e. a small change in m can lead to a drastic reduction of information transmission, which can emerge from varying the number of FliM molecules of the motor 52 . In addition, note that the mutual information shown in Fig. 5C with a red dashed line is calculated assuming that the two motors are optimized separately. However, the mutual information is further increased by maximizing the two motors simultaneously (dashed orange line in Fig. 5C). Our overall result that a high mutual information can be achieved with a high Hill coefficient of the motors remains valid (see S1 Text, Sec. 4). In between m ≈ 1 and ≈20, the information transmission of the motor is wasted as receptors are information-flow limiting. In conclusion, multiple ultrasensitive motors are only useful when motors are sufficiently independent. Any residual coupling among motors may be the result of close motor proximity or mechanical coupling of the flagella.

Discussion
This study presents a new approach to maximise the mutual information, particularly suitable for evolving biological systems subject to random mutations and selection. Previously, the channel capacity, i.e. the mutual information maximized with respect to the input distribution was widely used for electronic and biological communication channels 12,13,20,33 . However, this method fails to capture possible changes of the internal input-output curve (e.g. by mutations). Furthermore, the mutual information maximized with respect to the input-output curve neglects the biological relevant feedback of the output on the input 14,15 . Here, we reconciled these two approaches by maximizing the mutual information with respect to both the input distribution and the input-output curve for Gaussian channels with small noise. Only when the total noise is uniform, or when the input or the output noise is negligible, the two approaches are identical. Unlike previous joint optimizations [16][17][18] , our input-output curves are not restricted to Hill or Hill-like functions. Our adaptive algorithm demonstrates how evolution might implement this iteratively.
Our analytical solution of the joint optimization provides a number of new insights into optimal information transmission. First, the optimal input distribution is universal, depending only on the input noise. For Berg-and-Purcell type input noise, we specifically obtain − p x x ( ) 1 . Hence, organisms are optimized for environments in which low intensity stimuli occur with high frequency. This is sensible as their high frequency would www.nature.com/scientificreports www.nature.com/scientificreports/ compensate for their large relative noise levels. Second, the optimal input-output curve is invariant to up or down scaling of the input noise (parameter α 1 ), which sets the units of the input. Hence, only the shape of the input noise (i.e. its functional dependence on input x) affects the input-output curve. Third, our optimal input-output curve is rather linear (Figs. 3D and 4A). While this does not match the sigmoidal Hill functions as suggested by models of E. coli chemotaxis 34,53 , a near linear input-output curve makes best use of a given dynamic range. Furthermore, enforcing either zero slope of the input-output curve at the boundaries of the sensitive region or Hill functions as input-output curves leads to assumptions on the noise which are hard to justify biologically. Hence, Hill functions are incompatible with independent input and output noise (see Supplementary Information Sec. 1.7 for details).
How can cells actively influence and optimize their distribution of sensory input? Genetic changes in the downstream pathway and motor can clearly change chemotactic behavior and hence the experienced input stimuli. For instance, increases in the motor speed lead to larger changes in stimulus and hence broader distributions of inputs. Similarly, faster adaptation leads to narrower distributions. However, the notion that cells influence their microenvironments is most supported by the important role of niches in stem cell differentiation, cancer development, gut microbiota, and host-pathogen interactions [54][55][56][57][58] . Once inside the gut, E. coli related pathogen C. rodentium in mice (and similarly EPEC/EHEC in humans) injects effector proteins into the epithelial host cells. In response, these cells secrete increased levels of oxygen, allowing in return the pathogen to perform aerobic metabolism 59 . Hence, its aerotaxis ability, inherited from E. coli based on Aer and Tsr receptors, experiences an increased frequency of oxygen stimuli, which the pathogen actively stimulated. If we take the assumption of maximal information transmission seriously, then cells do not only actively influence but also optimize their environment.
To apply our information-theoretical approach, we analytically showed that the entire E. coli chemotaxis pathway can maximize the instantaneous mutual information between chemical concentration and motor bias despite the ultrasteep dose-response curve of the motors. Briefly, ultrasensitive motors do not restrict information transmission, since a collection of motors boosts information transmission, in addition to providing other chemotactic advantages in the soil or animal intestine 60 . In particular, our model identifies the number of motors and their conditional independence as key quantities to transmit large amounts of information in peritrichous bacteria.
What is the additional information at the motors used for if the ultimate behavioral output is just binary runs and tumbles? We speculate that the tumble angle, torque, and filament handedness could be regulated 61,62 . Indeed, real-time imaging of E.coli with fluorescent flagella showed that the tumble angles increased with the number of clockwise-turning motors, allowing for differential cell responses 61 . Having non-identical motors with different Hill coefficients and thresholds may further increase the information transmission (e.g. as produced by different number of FliM in the motor ring) 16 but this may not be feasible in the bacterial chemotaxis pathway, as the adapted activity set the operating point of the motors. For instance, different threshold values for the motor would lead to some motor always rotating clockwise and counter-clockwise. Our model also makes the prediction that chemotactic bacteria with a single motor should prefer a relatively low Hill coefficient at the motor or multiple response regulators feeding into a motor with a high Hill coefficient to highly transmit information. This prediction could be tested with the uni-flagellated bacterial species, such as Rhodobacter sphaeroides, Pseudomonas aerugiuosa or monotrichous marine bacteria 60,63 . In support of our theory, R. sphaeroides is known to have multiple CheY's 64,65 .
While applicable to many biological systems, our model makes a number of simplifications (in addition to assumptions on noise and receptor sensitivity). Our results are based on the independent maximizations of the receptor and motor channels. However in SI text, Sec. 4.5, we discuss the general case, providing estimes of the mutual information  x z [ ; ], between ligand input and final motor output. Our analysis suggests, once again, that high Hill coefficient for multiple motors can support high information transmission. In particular, we analytically identify two expected limits, (i) when the receptor noise is much smaller than the motor noise, we obtain ≈ x z y z [ ; ] [ ; ]   , and (ii) when the motor noise is smaller than the receptor noise, we have   ≈ x z x y [ ; ] [ ; ]. Our analysis over the chemotactic pathway primarily focuses on the Hill coefficient. However, the dissociation constant k d m of the motor response is known to be larger than the adapted CheY p concentration. In SI text, Sec. 4.5.4, we explicitly study the role of the dissociation constant k d m , and found a relative weak dependence of the mutual information on it. We also found that after fixing the Hill coefficient m and using the optimized output distribution of the receptor channel, the k d m that maximizes the mutual information at the motor matches the experimentally measured value (which is larger than the adapted CheY p level, see Fig. S8).
Another simplification is that our model deals with instantaneous information transmission, and hence does not explicitly include any history dependence 36,37,66,67 . Hence, our approach should be highly suitable for the slow genetic response in quorum sensing 68,69 . In this system, the input-output relation has been measured but input distributions were simply guessed, and not predicted. Another area of application is eukaryotic chemotaxis, where cells move slowly while actively shaping their chemical gradient by ligand secretion 70 and degradation 71 . In all these examples, the input distributions and cell behaviors need to match the input-output relations to allow for optimal information gathering. Nevertheless, our model is valid for information transmission by initial transient chemotactic responses, and as this applies anywhere in the gradient, our model describes chemotaxis even including adaptation 15 . We expect that our model even works in relatively steep gradients, where, in addition to adaptation, long-history effects are important, such as caused by receptor saturation and rotational diffusion 24 . The main assumption in 15 is that gradients can be linearized over the range of input distributions. However, we do not assume small Gaussian-distributed inputs. A drawback of our model is that we neglect any cell-to-cell variability, which can be substantial 72,73 , so that in effect our theory focuses on a certain subpopulation of cells. This cell-to-cell variability may lead to advantages in terms of bet-hedging strategies not directly related to information processing 74 .