Abstract
Noninvasive and labelfree spectral microscopy (spectromicroscopy) techniques can provide quantitative biochemical information complementary to genomic sequencing, transcriptomic profiling, and proteomic analyses. However, spectromicroscopy techniques generate highdimensional data; acquisition of a single spectral image can range from tens of minutes to hours, depending on the desired spatial resolution and the image size. This substantially limits the timescales of observable transient biological processes. To address this challenge and move spectromicroscopy towards efficient realtime spatiochemical imaging, we developed a gridless autonomous adaptive sampling method. Our method substantially decreases image acquisition time while increasing sampling density in regions of steeper physicochemical gradients. When implemented with scanning Fourier Transform infrared spectromicroscopy experiments, this gridless adaptive sampling approach outperformed standard uniform grid sampling in a twocomponent chemical model system and in a complex biological sample, Caenorhabditis elegans. We quantitatively and qualitatively assess the efficiency of data acquisition using performance metrics and multivariate infrared spectral analysis, respectively.
Introduction
Advancements in optical microscopy, especially fluorescence microscopy, have enabled biologists to observe multiplexed living cellular events with ever higher spatial and temporal resolutions^{1}. The use of targeted fluorescent indicators provides spatial and temporal context to omics analyses^{2,3,4}, resulting in discoveries of dynamic spatial architecture in disease pathogenesis^{5}, organogenesis^{6}, and wound healing^{7}. These advances inspired the drive towards highdimensional imagebased profiling^{8}, which requires high informationcontent, rapid, robust measurements of as many living cell or tissue phenotypes as possible to capture timedependent spatial heterogeneities in structure and morphological patterning.
One solution is to introduce another complementary dimension of labelfree observation one that focuses on the spatiochemical mapping of biological systems. This information can be used to guide fluorescence microscopy, its realtime imaging capabilities limited to a few features of interest identified a priori, and to improve the interpretation of omics data and information from advanced transmitted light microscopy images. Noninvasive and labelfree multiplexed imaging techniques, such as scanning synchrotron radiationbased Fourier transform infrared (SRFTIR) spectromicroscopy, can identify and monitor spatial heterogeneity in chemical composition that is indistinguishable using the visible region of the electromagnetic spectrum; however, a major challenge in using these techniques for realtime characterization of timedependent biochemical processes is the substantial image acquisition times that ranges from minutes to hours. This complication emerges from the high dimensionality of the generated data set, which contains not only spatial but also spectral information, and the utilization of uniform grid (UG) sampling as the current standard sampling method, which historically is objective and computationally inexpensive.
With advancements in the accessibility of computing technology, we find that gridless autonomous adaptive data acquisition (AADA) is a viable and more efficient alternative to UG sampling. AADA maintains a systematic and reproducible approach while capturing spectral and spatial heterogeneity with fewer sampled points and shorter experimental time frames. We discuss the significance of this method for studying timesensitive living systems and its future development towards monitoring timedependent phenomena in biological systems prior to expanding our discussion towards AADA’s applicability to other fields and workflow processes.
Results
Implementation of AADA
To implement an autonomous adaptive sampling algorithm (Fig. 1a) for data acquisition, we prioritized optimization of spatiotemporal and spatiochemical sampling efficiency while operating under experimental parameters that nonetheless yield comprehensive and informative spectral map data. We assume that less predictable yet detectable phenomena emerging from spatiochemical heterogeneity are primary regions of interest, informational “hot spots” that should be spatiochemically resolved with subsequent sample points after initial detection. To achieve this, our adaptive sampling is driven by leaveoneout crossvalidation (LOOCV)^{9} to facilitate rapid and accurate approximations of the experimentally mapped space^{10} for predictive error calculations from which the algorithm can rapidly identify regions for subsequent sample exploitation^{11,12,13}. We build our surrogate models using a hybrid sequential sampling strategy closely related to other established methods^{11,14,15} by combining Twodimensional (2D) barycentric linear interpolation with Voronoi tessellation (LIV). With LIV, the relative importance of a sampled point is determined by its Voronoiweighted leaveoneout error (ϵ_{LOO})^{12,16}, which is calculated by normalizing and equally weighing LOO with the Voronoi predictive error. Since collected IR spectra often form continuous and multimodal regions in our input space per sampled point^{12}, we introduced an IR spectral preprocessing module upstream of our surrogate model construction and LOOCV to conserve the spectral resolution while the algorithm determines where to subsequently sample.
Simulationbased evaluation of adaptive sampling performance
To assess adaptive LIV sampling algorithm performance prior to our experiments, we performed preliminary simulations using 11 previously collected, spatially highresolution, broadband SRFTIR spectral maps of different Caenorhabitis elegans (C. elegans) strains, our final experimental system. We assumed each complete map to be the “ground truth” upon which we compared four different sampling strategies: nonadaptive UG, uniform random (UR), least unexplored region (LUR), and adaptive LIV simulated subsampling methodologies. We calculated our performance metric of ground truth error (ϵ_{GT}), a value that measures the error between sampling method’s interpolation and its corresponding complete highresolution map, to enable quantitative evaluations and comparisons among the sampling strategies. When benchmarking the aforementioned methodologies against UG sampling (Fig. 1b), we find that although UG sampling does perform better than other nonadaptive sampling methods, it is significantly outperformed by our adaptive LIV sampling–LIV required 66% of the sampled points that UG needed to achieve equal ϵ_{GT}.
AADA for imaging a twocomponent abiotic system
As our first experimental demonstration, we designed a twocomponent chemical model system of blue permanent marker and high vacuum grease for spatiochemically resolved characterization using scanning FTIR spectroscopy. This complete sample characterization enabled quantitative evaluations and comparisons between adaptive LIV and widely utilized, nonadaptive UG data acquisition (Fig. 1c). In this visibly featureless case, the mapped domain was selected with minimal experimenter knowledge input to guide the autonomous adaptive data acquisition. Under these experimental conditions, we quantitatively and qualitatively determine data acquisition performance using mathematical and spectral metrics. When using mean Voronoiweighted LOO \({\langle {\epsilon }_{\mathrm{LOO}}\rangle }_{V}\) to quantify modeling accuracy, we found that adaptive LIV data acquisition outperformed the nonadaptive data acquisition methods (Supplementary Fig. 1) in this experimental system. To verify this conclusion, we tuned the spectral, ontarget ratio (OTR) assessment by selecting the major contributing peak per component using our normalized mean standard spectra (Supplementary Fig. 2) and variance spectra (Supplementary Fig. 3); peak selection guided by normalized spectra emphasize chemical identification^{17} over concentration in spectral interpretation. For high vacuum grease, we referenced the symmetric stretching mode of ν(SiOSi) at 798 cm^{−1} emerging from its fumed amorphous silica^{18} composition. For permanent ink presence, we used the major peak at 1580 cm^{−1} stemming from conjugated carbon–carbon ring ν(C=C) stretching modes^{19} in pigment compounds^{20}, which was further substantiated by the presence of aromatic ν(=CH) vibrations between 3105 and 3000 cm^{−1} ^{19} (Supplementary Fig. 4). All spectra were evaluated for nonadaptive UG and adaptive LIV experiments prior to processing the OTR as the proportion of ontarget sampled points to total sampled points. Using this spectrally based metric for enhanced realworld fidelity^{21}, we confirmed that adaptive LIV data acquisition (OTR = 0.95) outperforms nonadaptive UG (OTR = 0.19) data acquisition in experimental cases where domain knowledge is either limited or unavailable.
To verify that our acquired adaptive LIV data is interpretable through multivariate analysis from an experimenter’s standpoint, we performed principal component analysis (PCA) followed by linear discriminant analysis (LDA)^{22} on the noiseremoved IR data to discriminate between the permanent marker and high vacuum grease present in the spatiochemical map (Fig. 1d). We see that the first PCLDA factor distinguishes between permanent inkcontaining spectral regions and pure high vacuum grease, while the second PCLDA factor separates between pure permanent ink and regions containing both permanent ink and high vacuum grease. This conclusion is further supported by the mean spectra plotted per cluster (Fig. 1e); we see the imine ν(C=NH) from 3400 to 3300 cm^{−1} and intermolecular hydrogenbonded ν(OH) at 3550 and 3230 cm^{−1} contributions^{19} from permanent ink’s pigment compounds and alcohols, respectively. The identified high vacuum grease cluster matches the standard mean spectra expectations with vibrational silence in frequencies >3000 cm^{−1}, while peak broadening and the change in peak ratio between the imine and intermolecular hydrogen bonding regions of the permanent ink and mixed component clusters suggest that permanent ink alcohols experienced inhibited evaporation in the mixed component regions due to the ink’s deposition under the high vacuum grease during sample preparation (Supplementary Fig. 5).
AADA for imaging living multicellular organisms
For our proofofprinciple bioimaging case, we applied scanning broadband SRFTIR spectromicroscopy to overcome signaltonoise limitations when characterizing a young L2 C. elegans animal. Caenorhabditis elegans are well characterized in genetics, microscopy, and omicsrelated fields while also representing a large, wholeorganism experimental model containing known compartmentalized chemistry. Relative to the diffractionlimited spatial resolution (2–10 μm) of scanning SRFTIR spectromicroscopy, their large size of 100 μm to 1 mm in length when coupled with current mapping region software restrictions often lead to temporally inefficient spatiochemical mapping of unfixed samples. With our implemented user interface (Supplementary Fig.e 6), we were able to apply domain knowledge in spatial and spectral restrictions to better optimize our adaptive data acquisition of C. elegans (Fig. 2a) for comparison against the highspatial (stepsize 1.5 μm) resolution map of the same sample. We found that increased adaptive LIV sampling in the spatial domain (Fig. 2b) identified regions characterized by chemistries consistent with those of known anatomical features. Sampling increased in either transitional or overlapping anatomical regions between pharyngeal, head, neck, and body wall muscle^{23}, regions of the nerve ring^{24}, and the lipidrich intestine^{25}. Our qualitative post validation of adaptive data acquisition using multivariate curve resolution^{26} (MCR) and Fourier selfdeconvolution^{27} (FSD) SRFTIR analyses further confirmed these anatomical colocalization results with reliable MCR components^{28} 1 and 4 (Fig. 2c) corresponding to hydrated proteins (amino acid ν(NH) stretching modes)^{19} and hydrated lipid assemblies (NH, OH, methyl, and ν((CH_{2})_{n}) stretching modes)^{19,29}, respectively (Fig. 2d and Supplementary Fig. 7). With these two components overlapping in the more frequently sampled region, we verify that adaptive LIV data acquisition helps resolve spatiochemical gradients in a complex wholeorganism model.
Discussion
We constructed and implemented gridless adaptive LIV data acquisition to address a key challenge in the hyperspectral imaging of timesensitive systems. Specifically, we decrease image acquisition time while improving sampling density in regions of increased spatiochemical complexity. Using this sampling strategy, we nondestructively explore the chemistry of anatomical features in living C. elegans. We observe that increased sampling density corresponded with known anatomical features, and these results serve as a proofofprinciple for the use of AADA on a complex, biological specimen.
In this study, all experimental LIVbased AADA cases were performed on standard hardware found with commercial highdimensional imaging microscope designs, revealing the accessibility and computational efficiency of the algorithm for a broadened use in imaging techniques that require a sequential exploration of space, such as scanning probe techniques. We show that LIVbased AADA can operate efficiently and effectively under conditions where the map area is unconstrained, and therefore, when the main goal of a study is to characterize a system through a discovery approach. This performance implies that LIVbased AADA will still benefit an experimentalist who has a detectable, discovery aspect of their research in an otherwise wellcharacterized biological systems that can range from single living cells to, in the case of smaller animal models like C. elegans, whole organisms. We also report that LIVbased adaptive sampling outperforms standard sampling methodologies in complex biological systems in which we apply domain knowledge to restrict mapping regions. Specifically when referencing instrument time usage to spatiochemically image the young L2 C. elegans experimental case, we were able to map the head region in 45 min with the LIVbased AADA software in comparison to ~4.9 h with the commercially available software. Lastly, we find that LIVbased AADA provides more comprehensive spatiochemical understanding of the total map domain at any given time interval in comparison to the established and standard UG sampling (Fig. 2e), suggesting that this aspect can be harnessed for further development of AADA to achieve adaptive highdimensional realtime, noninvasive, labelfree imaging through modular additions to the sampling algorithm.
This advance in hyperspectral imaging offers the biological community an orthogonal perspective into the dynamic physicochemical architectures of studied tissues and model organisms. Critically, this information can potentially guide an investigator towards timepoints and regions of interest for followup omics characterization, which is important in but not limited to the areas of carcinogenesis and developmental biology. Particularly in cases of discoverybased experimental design, AADA enables unbiased assessment of spatially resolved chemical changes between biological samples that differ by genotype, drug treatment or substance exposure, and physiological state such as age. More broadly, LIVbased AADA can be applied to fields outside of biology, such as hyperspectral remote sensing and space exploration. In these cases, future development towards realtime AADA will enable rapid identification, characterization, and even surveillance of chemical spills, toxic algal bloom formation, and spontaneous solar events.
Methods
Autonomous adaptive sampling
Our adaptive sampling workflow is based upon LOOCV and begins with an initial scan of randomly distributed points. Using PCA for dimensionality reduction, frequency domain restriction, and rubberband baseline correction in our IR preprocessing module, we increased computational and temporal efficiency by calculating and operating over the first five principal components during our proofofprinciple, temporally intensive, highdimensional data acquisition. A model U_{0} based upon barycentric linear interpolation (LIV) is constructed from this processed data set. We quantify the sensitivity of the surrogate model to the removal of an individual data point through the ϵ_{LOO}. By removing a single point X_{i}, model U_{−i} is rebuilt using the incomplete data set. The ϵ_{LOO} associated with the sample point is the difference between the two models evaluated at the removed point \(\delta \left({U}_{0}({X}_{i}),{U}_{i}({X}_{i})\right)\) with respect to the L_{2} norm^{30}. After this is iterated for every sampled point in the acquired data set, the region defined by the sampled point with highest ϵ_{LOO} is sampled next by picking a random point within that region. This procedure is repeated until a set criterion is reached, which in our case was 500 total sampled points.
To assess algorithm sampling performance, we aggregate the ϵ_{LOO} of all sample points in the acquired data set and quantify the selfconsistency using established LOOCV^{31}. We take the mean ϵ_{LOO}, \({\langle {\epsilon }_{\mathrm{LOO}}\rangle }_{V}\,\), of all sample points in the data set and use it as an unbiased, quantitative measure of the model accuracy due to theoretical guarantees of \({\langle {\epsilon }_{\mathrm{LOO}}\rangle }_{V}\) convergence to a model’s generalization error^{32}. Since acquired points are often neither regularly nor uniformly distributed in the case of adaptive sampling, we partition the region into a collection of cells {V_{i}} containing positions closest to each point {X_{i}}. The mean is then weighted by the associated Voronoi area of sample point {X_{i}}. Explicitly, we define
With the LOOCV adaptive sampling procedure, we follow the heuristic for \({\langle {\epsilon }_{\mathrm{LOO}}\rangle }_{V}\) minimization, and thus, effectively achieve minimization of model generalization error by sampling near the point with the largest ϵ_{LOO}.
Surrogate modeling
We use the scipy.interpolate.griddata method from the Python Scipy package to implement 2D barycentric linear interpolation and treat each PCA component independently. Although it is computationally efficient, it does not quantify uncertainty in error estimate. To address this, we include the Voronoi area associated with each point into our calculated ϵ_{LOO} by treating it as an ad hoc regularizer. For a collection of points \({\bf{X}}=\{{{\bf{x}}}_{i}\in {{\mathbb{R}}}^{d}\}\), the Voronoi cell that we associate with point x_{i} is the region of space containing positions in Euclidean distance closest to x_{i}:
The Voronoi area is the area of the set, \({{\mathcal{V}}}_{i}=  {V}_{i} \). This implies that if point x_{i} is spatially isolated from the rest of the data set, then point x_{i} will be associated with a greater Voronoi area. By approximating the error uncertainty using the Voronoi area, we make use of the fact that linear interpolation error tends to increase with larger distances from points used in the interpolation. To achieve this, we first normalize both ϵ_{LOO} and \({{\mathcal{V}}}_{i}\) in order to compare both quantities using a linear scaling from [0, 1]:
Next, we take the regularized LIV ϵ_{LOO} to be
which is used to calculate our adjusted ϵ_{LOO} for our adaptive data acquisition in simulations and experiments^{33}. This technique is inspired by and related to the LOLAVoronoi and CVVoronoi surrogate modeling techniques^{11,14}.
Simulations
One thousand simulations were conducted on each of the 11 SRFTIR maps of C. elegans with rasterscanned step resolution ranging from 1 to 5 μm for a total of 11,000 simulations per simulated sampling method (analysis of these datasets beyond the benchmarking here will be described elsewhere; Elizabeth A. Holman et al., in preparation). Sampling was simulated by retrieving subsets of data from the fullresolution maps. Assessed sampling procedures were nonadaptive UG, UR, LUR, and adaptive LIV sampling.
UG sampling takes measurements over a static, predefined rectangular grid. For our subsampling procedure, every k points were selected to produce a lower resolution grid with roughly 1/k^{2} fewer points. Simulation results were collected over all k^{2} possible subgrids, a set emerging from the lower resolution grid changing with the location of the first subsampled point. In the case of UR, LUR, and LIV subsampling, we used spectral data from the spatially closest grid point in the highresolution SRFTIR map to reasonably approximate the spectral information at the determined subsampled point. This approximation holds true when the number of subsampled points is significantly smaller than the total number of points contained within the SRFTIR map. UR subsampling drew measurements from k uniformly random positions. At each iteration for every k points, LUR subsampling collected data from the most sparsely sampled region within the defined map boundary. For LIV subsampling, we use the previously described procedure (see “Surrogate modeling” above) and iterate for every k point.
Each sampling procedure performance was quantitatively evaluated using the ground truth error (ϵ_{GT}) of the interpolation to its corresponding highresolution map. ϵ_{GT} was calculated by measuring the root meansquared error (RMSE) of the data subset’s interpolation to the spatially resolved and complete SRFTIR C. elegans map, which we treated as our “ground truth.” We use linear interpolation for models U_{UG}, U_{LUR}, and U_{UR}, but we construct U_{LIV} from a linear interpolation of sampled points. Every model U is built from a collection \({\{{{\bf{X}}}_{i}\}}_{i = 1,\ldots ,{N}_{\mathrm{s}}}\in {{\mathbb{R}}}^{2}\) of N_{s} sample points with spectra \({\{{{\bf{Y}}}_{i}\}}_{i = 1,\ldots ,{N}_{\mathrm{s}}}\in {{\mathbb{R}}}^{{N}_{\mathrm{f}}}\), where N_{f} is the number of spectral dimensions. We denote the points in the “ground truth” map \({\{{{\bf{X}}}_{i}^{(0)}\}}_{i = 1,\ldots ,{N}_{0}}\), with spectra \(\{{{\bf{Y}}}_{i}^{(0)}\}\), where N_{0} is the number of samples taken. Since we assume that highspatial resolution SRFTIR maps are our “ground truth,” we can assume N_{0} ≫ N_{s} and aim for U to be a good model in that
With this assumption, we define ϵ_{GT} as a metric of merit to be the RMSE of the model when compared to the “ground truth”:
Sample preparation
All samples were mounted on 0.5mmthick ZnSe crystals, which were cleaned with MilliQ water, 5% acetic acid, acetone, then MilliQ water sequentially in order to remove organics while minimizing crystal damage. The twocomponent control sample was prepared with high vacuum grease (20218541993, Dow Corning) that was lightly applied to a 0.5mmthick ZnSe crystal (CAS# 1315099, International Crystal Laboratories) in an area identifiable by fiducial markings drawn with a permanent marker (Item #37003, Sanford UltraFine Blue Sharpie Permanent Marker). Spectral standards were acquired of both components independently prior to autonomous adaptive sample acquisition of abiotic twocomponent system. Spectral regions of component mixing could be identified by alcohol presence in the mixed spectra.
The first C. elegans (N2; Caenorhabditis Genetics Center) animal used for temporal exploratory LIV experiments was selected at the late L1 stage (based on morphology). The second animal for qualitative LIV assessment via FTIR spectral analysis was selected at the young L2 stage (based on morphology). Each animal was moved from their agar growth plates to 1 μL of 0.25 mM Levamisole (CAS# 16595805, SigmaAldrich) on the ZnSe crystal and rinsed three times with MilliQ water before mounting the sample onto the microscope stage for imaging.
Instrumentation
Scanning benchtop and synchrotron FTIR measurements were performed on a Nicolet NicPlan IR microscope with a ×32, 0.65 numerical aperture objective with a Thermo Scientific Nicolet iS50 FTIR spectrometer using a KBr beamsplitter and MCT (HgCdTe) detector at Beamline 1.4.3 of the Advanced Light Source at Lawrence Berkeley National Laboratory. Adaptive sampling was implemented using a GUI (Supplementary Fig. 6) developed in PyQt and installed on the Beamline 1.4.3 computer (Dell Optiplex 7050: 8 GB RAM, Intel Core i57500 CPU @ 3.41 GHz, Windows 10 64 bit). OMNIC 9.8 software by Thermo Fisher Scientific controlled the microscope and FTIR bench, and our software communicated with OMNIC through Dynamic Data Exchange to store the OMNIC backgroundsubtracted spectral output into our software’s dataframe format.
In this study, we used two different infrared sources: an internal globar source and a synchrotron source. Although an internal globar source is readily available in commercial FTIR microscopes, an acceleratorbased synchrotron source offers at least 1000 times improvement in brightness (photon counts per unit time per unit area) over the globar source^{34} at the same spatial resolution. As a result, we used different total coadded scan and spatial resolutions for measurements performed on each instrument, which is detailed in the following sections.
Globar FTIR spectromicroscopy and multivariate analysis
Benchtop scanning FTIR measurements using internal globar source were performed in transmission mode with an aperturelimited spatial resolution of 75 μm × 75 μm. IR spectra between 650 and 4000 cm^{−1} at 4 cm^{−1} spectral resolution were collected with 16 coadded scans at a interferometer mirror velocity of 1.83 cm/s. Rubberband baseline correction and dimensionality reduction via PCA to five components was performed over the entire collected spectral region during adaptive LIV data acquisition. For each experimental assessment of sampling method, the total sampled points were limited to 500 to remain below the fullresolution map of 840.
Ontarget ratio
We define OTR to be the number of samples that meet the ontarget criteria over the total number of sampled points. To determine the criteria by which a spectrum is considered ontarget, we use our fullresolution data set and remove spectra close to the detection limitations of the instrument that violate the signaltonoise filter criteria. Using this noiseremoved subset of data, we calculate the mean spectra of the noiseremoved subset. After identifying one major peak component per known component standard, we evaluated all acquired spectra per method for the presence of either aforementioned peak above the threshold that we determined as the noiseremoved mean intensity at defined frequencies to define OTR as
where N_{a∨b} is the number of spectra that met the either the first mean peak criterion, second mean peak criterion, or both mean peak criteria, while N_{total} is the total number of spectra acquired using the referenced data acquisition method. Using this definition of spectral metric, we calculated \({\mathrm{OTR}}_{\mathrm{LIV}}=\frac{474}{500}\) (0.95) and \({\mathrm{OTR}}_{\mathrm{UG}}=\frac{95}{500}\) (0.19).
FTIR multivariate analysis
The control sample components (high vacuum grease and permanent marker) were evaluated individually as spectral standards. The data were baseline corrected and vector normalized using OMNIC 9.8, and the spectral mean was calculated over eight spectra per standard. Referencing the normalized mean and variance spectra, we use domain knowledge to perform PCA over the frequency domains of 3600 to 2800 cm^{−1} and 1750 to 1450 cm^{−1} simultaneously before applying LDA to maximize interclass variance over intraclass variance of our factors^{22} of our baselinecorrected and vectornormalized data in MATLAB R2017a. 2D score plots were generated in which the nearness between classes indicates similarity, whereas distance implies dissimilarity. The final mean spectrum of each cluster is shown for spectral validation of vibrational modes, resulting in segregation of classes.
Synchrotron FTIR spectromicroscopy and multivariate analysis
Scanning SRFTIR diffractionlimited (2–10 μm) spectra were collected in transmission mode between 650 and 4000 cm^{−1} at 4 cm^{−1} spectral resolution and recorded with eight coadded scans at an interferometer mirror velocity of 6.3 cm/s. We restricted the spectral domain adaptive LIV sampling workflow from 900 to 3700 cm^{−1} to avoid signal contamination from detectable synchrotron noise and to decrease sample morphology^{17} baseline effects, respectively, on subsequent dimensionality reduction and error calculation steps. Rubberband baseline correction and dimensionality reduction to five components was performed over the restricted spectral region between 900 and 4000 cm^{−1} during adaptive LIV sampling. Using domain knowledge, we restricted our mapping region to the pharynx, nerve ring, and intestine^{35} of our young L2 C. elegans to reduce offtarget sampling with respect to C. elegans for increased temporal efficiency in spatiochemical mapping.
SRFTIR multivariate analysis
We restricted our analyzed MCR domain from 3500 to 2800 cm^{−1} for reduction of morphological effects on the spectral baseline and for higher diffractionlimited spatial resolution, since the goal of MCR analysis was to qualitatively assess adaptive LIV data acquisition performance. Based upon the cumulative explained variance calculated by OMNIC 9.8 on our experimental data, we performed MCR analysis in OMNIC using five components in which 99.82% of data variance is explained. Guided by wellcharacterized C. elegans anatomy and chemistry, we identified reliable MCR components^{28} that would strongly correlate with muscle and lipid assembly structures—components 1 and 4. For better accuracy in peak identification on our MCR components, we applied FSD to the CH vibrational region. Since our analysis region was restricted, we could only broadly state the presence of proteinrelated stretching vibrations of ν(NH) from amino acids between 3390 and 3260 cm^{−1}^{19} and polyglycine asymmetric CH_{2} stretching modes at ~2925 cm^{−1}^{19} (Supplementary Fig. 7) in MCR component 1. Similarly for MCR component 4 and in referencing characterized hydrated lipid assemblies, we found broad peak contributions from NH and OH stretching modes between 3400 to 3100 cm^{−1}^{29}, lipidrelevant antisymmetric ν((CH_{2})_{n}) modes at ~2932 cm^{−1}^{29}, and lipidrelated methyl antisymmetric and symmetric stretching at 2963 and 2873 cm^{−1}^{29}, respectively.
Statistics and reproducibility
Each sample size, type, and statistical method applied is described in the relevant “Method” section. For the twocomponent model system, spectral standards for permanent ink and high vacuum grease were performed with sample replicates (n = 8).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Infrared spectral data are available through the CaltechDATA repository (https://doi.org/10.22002/D1.1609)^{,36}. The 11 highresolution spectral maps used for calibration simulations are not included in the repository, since they are undergoing spectral analysis and interpretation in a different manuscript. Any remaining data is available from the corresponding author upon reasonable request.
Code availability
This proprietary adaptive sampling code and GUI are specific to the Infrared Beamline 1.4.3 at the Advanced Light Source (https://als.lbl.gov/). They are available to IR beamline users through the DOEsupported Berkeley Synchrotron Infrared Structural Biology (BSISB) Imaging Program (https://bsisb.lbl.gov/wordpress/). Further requests concerning this code can be directed to H.Y.N.H.
References
Liu, T. et al. Observing the cell in its native state: imaging subcellular dynamics in multicellular organisms. Science 360, eaaq1392 (2018).
Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W. & Prasher, D. C. Green fluorescent protein as a marker for gene expression. Science 263, 802–805 (1994).
Giepmans, B. N. G., Adams, S. R., Ellisman, M. H. & Tsien, R. Y. The fluorescent toolbox for assessing protein location and function. Science 312, 217–224 (2006).
Bernd, B. Multiplexed epitopebased tissue imaging for discovery and healthcare application. Cell Syst. 2, 225–238 (2016).
Choi, M., Kwok, S. J. J. & Yun, S. H. In vivo fluorescence microscopy: lessons from observing cell behavior in their native environment. Physiology 30, 40–49 (2015).
Prahst, C. et al. Mouse retinal cell behaviour in space and time using light sheet fluorescence microscopy. eLife 9, e49779 (2020).
Zhao, M. et al. Electrical signals control wound healing through phosphatidylinositol3OH kinasegamma and PTEN. Nature 442, 457–460 (2006).
Rohban, M. H., Abbasi, H. S., Singh, S. & Carpenter, A. E. Capturing singlecell heterogeneity via data fusion improves imagebased profiling. Nat. Commun. 10, 2082 (2019).
Asprey, S. P. & Macchietto, S. Designing robust optimal dynamic experiments. J. Process Control 12, 545–556 (2002).
Queipo, N. et al. Surrogatebased analysis and optimization. Prog. Aerosp. Sci. 41, 1–28 (2005).
Crombecq, K., De Tommasi, L. D., Gorissen, D. & Dhaene, T. A novel sequential design strategy for global surrogate modeling. In Proc. 2009 Winter Simulation Conference (WSC), 731–742 (2009).
Li, G., Aute, V. & Azarm, S. An accumulative error based adaptive design of experiments for offline metamodeling. Struct. Multidiscip. Optim. 40, 137 (2010).
Wang, C. et al. An evaluation of adaptive surrogate modeling based optimization with two benchmark problems. Environ. Model. Softw. 60, 167–179 (2014).
Xu, S., Liu, H., Wang, X. & Jiang, X. A robust errorpursuing sequential sampling approach for global metamodeling based on voronoi diagram and cross validation. J. Mech. Des. 136, 071009 (2014).
Singh, P., Deschrijver, D. & Dhaene, T. A balanced sequential design strategy for global surrogate modeling. In Simulation Conference (WSC), 2013 Winter, 2172–2179 (IEEE, 2013).
Elisseeff, A., Evgeniou, T. & Pontil, M. Stability of randomized learning algorithms. J. Mach. Learn. Res. 6, 55–79 (2006).
Baker, M. J. et al. Using Fourier transform IR spectroscopy to analyze biological materials. Nat. Protoc. 9, 1771–1791 (2014).
Lippincott, E. R., Van Valkenburg, A., Weir, C. E. & Bunting, E. N. Infrared studies on polymorphs of silicon dioxide and germanium dioxide. J. Res. Natl Bur. Stand. 61, 61–70 (1958).
Socrates, G. Infrared and Raman Characteristic Group Frequencies (Wiley, 2001).
Awab, H., Jar, A. D. M., Yong, W. K. & Ahmad, U. K. Infrared spectroscopic technique for the forensic discrimination of marker pen inks. Malays. J. Forensic Sci. 2, 1–7 (2011).
Razavi, S., Tolson, B. A. & Burn, D. H. Review of surrogate modeling in water resources. Water Resour. Res. 48, W07401 (2012).
Hu, P. et al. Metabolic phenotyping of the cyanobacterium Synechocystis 6803 engineered for production of alkanes and free fatty acids. Appl. Energy 102, 850–859 (2013).
Altun, Z. F. & Hall, D. H. in WormAtlas https://doi.org/10.3908/wormatlas.1.6 (2009).
Altun, Z. F. & Hall, D. H. in WormAtlas https://doi.org/10.3908/wormatlas.1.1 (2009).
Mak, H. Y. Lipid droplets as fat storage organelles in Caenorhabditis elegans. J. Lipid Res. 53, 28–33 (2012).
Felten, J. et al. Vibrational spectroscopic image analysis of biological material using multivariate curve resolution? Alternating least squares (MCRALS). Nat. Protoc. 10, 217–240 (2015).
Tooke, P. B. Fourier selfdeconvolution in IR spectroscopy. Trends Anal. Chem. 7, 130–136 (1988).
Motegi, H. et al. Identification of reliable components in multivariate curve resolutionalternating least squares (MCRALS): a datadriven approach across metabolic processes. Sci. Rep. 5, 15710 (2015).
Mantsch, H. H. & Chapman, D. (eds). Infrared Spectroscopy of Biomolecules (WileyLiss, 1995).
RamirezLopez, L. et al. Distance and similaritysearch metrics for use with soil vis?NIR spectra. Geoderma 199, 43–53 (2013).
Cawley, G. C. & Talbot, N. L. C. Efficient leaveoneout crossvalidation of kernel fisher discriminant classifiers. Pattern Recogn. 36, 2585–2592 (2003).
Kearns, M. & Ron, D. Algorithmic stability and sanitycheck bounds for leaveoneout crossvalidation. Neural Comput. 11, 1427–1453 (1999).
Bandler, J. W. et al. Space mapping: the state of the art. IEEE Trans. Microw. Theory Tech. 52, 337–361 (2004).
Holman, H. N., Martin, M. C. & McKinney, W. R. Tracking chemical changes in a live cell: biomedical applications of SRFTIR spectromicroscopy. Spectroscopy 17, 139–159 (2003).
Altun, Z. F. & Hall, D. H. in WormAtlas https://doi.org/10.3908/wormatlas.1.1 (2009).
Holman, E. Dataset for Autonomous Adaptive Data Acquisition (AADA) (Version 1.0). CaltechDATA. https://doi.org/10.22002/D1.1609 (2020).
Acknowledgements
We thank Dr. Hans Bechtel and ALS Beamline 1.4.3 staff for their instrumentation support, Drs. Peter Zwart and Derek R. Holman for discussion, and the reviewers for their constructive comments. This research used resources of the Berkeley Synchrotron Infrared Structural Biology (BSISB) Imaging program, the Molecular Foundry, and the Advanced Light Source, DOE Office of Science User Facilities, under contract no. DEAC0205CH11231. E.A.H. was supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE1745301 and the Howard Hughes Medical Institute under Grant No. 047101, with which P.W.S. was an investigator. Y.S.F., L.C., and H.Y.N.H. were supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research under contract no. DEAC0205CH11231.
Author information
Authors and Affiliations
Contributions
H.Y.N.H. conceived the idea. H.Y.N.H., and Y.S.F. designed adaptive data acquisition. Y.S.F. developed and wrote the algorithm, designed, and performed simulations. Y.S.F., H.Y.N.H., and E.A.H. implemented algorithm at ALS Beamline 1.4.3. Y.S.F. and E.A.H. designed and tested IR processing module. Y.S.F. wrote IR processing module. E.A.H. designed and performed proofofprinciple experiments. L.C. and E.A.H. performed IR data processing. E.A.H. performed IR spectral analysis, gathered materials from all authors, and wrote the manuscript. M.D., L.C., and H.Y.N.H. supervised Y.S.F. P.W.S. supervised E.A.H.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Holman, E.A., Fang, YS., Chen, L. et al. Autonomous adaptive data acquisition for scanning hyperspectral imaging. Commun Biol 3, 684 (2020). https://doi.org/10.1038/s42003020013853
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003020013853
This article is cited by

Gaussian processes for autonomous data acquisition at largescale synchrotron and neutron facilities
Nature Reviews Physics (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.