Bayesian-based decipherment of in-depth information in bacterial chemical sensing beyond pleasant/unpleasant responses

Chemical sensing is vital to the survival of all organisms. Bacterial chemotaxis is conducted by multiple receptors that sense chemicals to regulate a single signalling system controlling the transition between the direction (clockwise vs. counterclockwise) of flagellar rotation. Such an integrated system seems better suited to judge chemicals as either favourable or unfavourable, but not for identification purposes though differences in their affinities to the receptors may cause difference in response strength. Here, an experimental setup was developed to monitor behaviours of multiple cells stimulated simultaneously as well as a statistical framework based on Bayesian inferences. Although responses of individual cells varied substantially, ensemble averaging of the time courses seemed characteristic to attractant species, indicating we can extract information of input chemical species from responses of the bacterium. Furthermore, two similar, but distinct, beverages elicited attractant responses of cells with profiles distinguishable with the Bayesian procedure. These results provide a basis for novel bio-inspired sensors that could be used with other cell types to sense wider ranges of chemicals.

We counted rotational motions more than 7.5 degrees/frame 4 (see legend of Fig. 2c in main text). Note that our criteria for rotational displacement of > 7.5 degrees/frame were set not to accept rotational displacements of noise, which include thermal fluctuations and/or image processing errors, then, all rotational displacements < 7.5 degrees/frame were rejected. Scale bar is 100 µm.
In order to measure rotational motion during whole time range (with solution exchange flow), we used a PDMS (polydimethylsiloxane) microchannel device. We designed the micro channel to reduce flow speed 6 , because physical perturbation of rotational motions of cell bodies by high flow speed disturbs output CW bias data. The disturb decreases reproducibility of data 7 , and becomes source of error for statistical analysis. We had developed the design of micro channels by trial and error for supplying stable observation of rotational motion through solution exchange. With our microchannel devices (Fig. S3), we achieved stable measurements of whole CW bias outputs (during 10 minutes; Fig. 1c, d in main text). Chemotactic responses of cells (output) to a step input (in high temporal resolution; ramp input) were observed at yellow arrows indicated wide area, of which both sides are narrowed for reduction of flow speed. Schematic drawing of observation is shown in Fig.   1b in main text. c, engineering drawing of one microchannel. Both narrow areas of 0.04 mm width work to decrease flow speed.

Feature extraction from output responses (vectorization of outputs).
In order to handle output responses for computer calculations, it is necessary to extract features from the analog time course data of CW bias ( Fig. 1d and 2a in main text, Fig.   S4) and to quantify the features. When we look at the time courses of output responses just geometrically without biological knowledge (Fig. 1d, 2a in main text, and Fig. S4), we notice intuitively that they can be roughly fitted by six lines (time domains) as shown in Fig. S4. The lines correspond to (L1) initial flat area, (L2) CW bias ≈ 0 area that begins immediately after input stimulus (indicated black arrow in Fig. S4), (3) CW bias increase (recovery) area (including an overshoot part), (4) top flat CW bias area, (5) overshoot amount reduction area, (6) final flat area, respectively. Although it may be possible to characterize biological meaning for these six lines, here we dare to treat these lines as a template describing geometrical properties. We determined the fitting lines by adjusting size of lines with minimizing residual errors on the premise of template consisting of six lines. Then, we describe each output response numerically by an index set, which consists of 15 index values, { 1 , 2 , … , 15 }. The index values are not necessary to refer to any processes of biological reactions in our present method, then they are arbitrarily set from appearance of output response waveform (Fig. S4). Here, we haven't hardly optimized setting of indexes, because our purpose of present study is construction of general procedure to apply Bayesian-based decipherment to microorganisms (E. coli).
Optimization of the index set is our future plan at this stage. For making an index set, we use (1) to (6) lines, here. An output response is fit with 4 flat lines ((1), (2), (4), (6)) and 2 slope lines ((3), (5)). These 6 lines are defined by 9 fitting parameters. Although a set of 9 fitting parameters is sufficient to describe an output response, it is difficult to consider intuitively output response by only these 9 fitting parameters. Then we convert these 9 parameters to 15 index values, which allows us to interpret easily output response. An index set is defined by { 1 , 2 , … , 15 }, where In order to compare proportional expression (s4) among different species of , we normalize proportional expression (s4), then, where, each normalized coefficient ( , ) is calculated as,  purple) and L-asparagine (L-Asn; green). Each graph is coloured according to chemical species same as

13
Calculation of standard deviation of accuracy rates for blind data sets under the condition of random selection.
Here, we consider probability functions (PF) of number of correct answer (̂= ), , under the condition that total number of blind test samples is = , where and are number of BTS data and correct answer for individual chemicals tagged with , respectively.
Quantitative measurements of response outputs for 6 standard chemicals.
We prepared six amino acids as standard chemicals in this report. Each line in graphs shows leaner model function for representing a dependence of index value to concentrations. We use differences of dependences of index values to concentrations to identify input blind samples. Then, with these standard index sets, by using Bayesian inference with machine learning, we succeeded to construct DeSIRAM. Accuracy rate of decipherment of chemical types of blind samples with linear model functions among these six amino acids is 32 % (N = 210, middle bar in Fig. S8d). Simple linear model function seems to represent dependences of index values to concentration insufficiently. In fact, by using high order functions as model functions, identification rate among these six amino acids improved to be 49 % (Table S3).