Fast Bayesian Optimization of Needle-in-a-Haystack Problems using Zooming Memory-Based Initialization (ZoMBI)

Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only $0.82\%$ out of $146$k total materials in the open-access Materials Project database have a negative Poisson's ratio. However, current state-of-the-art optimization algorithms are not designed with the capabilities to find solutions to these challenging multidimensional Needle-in-a-Haystack problems, resulting in slow convergence to a global optimum or pigeonholing into a local minimum. In this paper, we present a Zooming Memory-Based Initialization algorithm, entitled ZoMBI. ZoMBI actively extracts knowledge from the previously best-performing evaluated experiments to iteratively zoom in the sampling search bounds towards the global optimum"needle"and then prunes the memory of low-performing historical experiments to accelerate compute times by reducing the algorithm time complexity from $O(n^3)$ to $O(\phi^3)$ for $\phi$ forward experiments per activation, which trends to a constant $O(1)$ over several activations. Additionally, ZoMBI implements two custom adaptive acquisition functions to further guide the sampling of new experiments toward the global optimum. We validate the algorithm's optimization performance on three real-world datasets exhibiting Needle-in-a-Haystack and further stress-test the algorithm's performance on an additional 174 analytical datasets. The ZoMBI algorithm demonstrates compute time speed-ups of 400x compared to traditional Bayesian optimization as well as efficiently discovering optima in under 100 experiments that are up to 3x more highly optimized than those discovered by similar methods MiP-EGO, TuRBO, and HEBO.


Introduction
Current optimization algorithms achieve good results on low-dimensional problems that are smooth and have wide basins of attraction.Examples of smooth manifolds with wide basins of attraction within material science include processand recipe-optimization problems such as tuning perovskite manufacturing variables to achieve higher efficiency [1], optimizing microfluidics flow parameters to achieve ideal droplet formation [2], optimizing silver nanoparticle recipes for optical properties [3], and tuning perovskite compositions with physics-based constraints to maximize stability [4].Optimization techniques like Bayesian optimization (BO) are well-suited to model these simple manifolds using a Gaussian Process (GP) surrogate [5,6,7,8,9].However, the performance of this BO with a GP breaks down as the manifold complexity increases.Material property optimization problems that have high technological significance, such as discovering materials with rare properties or materials with a specific combination of properties, have search space manifolds that more closely resemble a Needle-in-a-Haystack [10], shown in Figure 1(b), rather than a smooth or convex space.In process optimization, there often exists a real and continuous path between each condition.This 3D projected manifold is adapted from the 6D perovskite process optimization problem by Liu et al., where X 1 is spray flow rate, X 2 is plasma voltage, and f (X) is cell efficiency [1].(b) However, in materials optimization, there are often only discrete combinations of properties that define real materials, resulting in a rough topology with extreme outliers.For example, Li 2 NbF 6 and Li 2 ZrF 6 lay close to each other in space because they have similar density, formation energy, and structure, however, they have vastly different target properties: Li 2 NbF 6 has a Poisson's ratio of −1.7 while Li 2 ZrF 6 has a Poisson's ratio of 0.3 [11].Extreme outliers, such as Li 2 NbF 6 , consist of only a small fraction of the manifold hypervolume, resulting in a Needle-in-a-Haystack regime arising.This 3D projected manifold is obtained from the 6D Poisson's ratio optimization problem presented in this paper, where X 1 is density, X 2 is formation energy, and f (X) is negative Poisson's ratio [12].
This Needle-in-a-Haystack (NiaH) problem arises when only few optimum conditions exist within the entire dataset, resulting in an extreme imbalance.Interpolating the parameter space of an imbalanced dataset with an estimation function, such as a GP, results in smoothing over the optimum or over-predicting the properties of the materials found near the optimum [13,14,15].Examples of NiaH materials optimization problems include discovering auxetic materials (i.e., materials that have a highly negative Poisson's ratio, ν) for energy absorptive medical devices or protective armor [16,17,18] and discovering materials that have a combination of high electrical conductivity and low thermal conductivity (i.e., a highly positive thermoelectric figure of merit, ZT ) used for improving sensor technology to enable ubiquitous solid-state cooling [19,20,21].Optimization of these rare material properties illustrates examples where an extreme data balance exists in the dataset because only a fraction of the total number of materials exhibit these rare properties [16,12,11,22,23].This NiaH optimization challenge of extremely imbalanced datasets is largely applicable to many fields, not just materials science, including the fields of ecological resource management [24,25], fraud detection [26,27], and rare diseases [28,27].
Several challenges exist for the current landscape of computational tools that inhibit effective optimization of these complex NiaH problems.Firstly, the "needle" makes up only a small percentage of the total manifold search space, resulting in a weak correlation between the measured input parameters and the target property of interest, inhibiting discovery of the region containing the needle [29,30,13].This challenge requires the development of an algorithm that can more quickly determine the plausible region of the manifold where the needle exists.The second challenge for algorithms, such as BO, to optimize NiaH manifolds is in the nature of the acquisition function to pigeonhole sampling into local minima because of the narrowness of the needle's basin of attraction [31,32].Standard BO acquisition functions, including expected improvement (EI) [33] and lower confidence bound (LCB) [7,14], are static sampling techniques that only adjust sampling based on the output of the surrogate model, which enacts smoothing of the needle [13,5,6].To overcome this challenge, active learning-based tuning of the acquisition function hyperparameters can be implemented to improve the sampling quality and avoid pigeonholing.Lastly, there exists a computing challenge for NiaH problems where, typically, several thousands of samples must be observed to find an optimum when using an algorithm that is poorly suited to tackle NiaH manifolds [10].The compute time of BO using a GP surrogate scales with the complexity O(n 3 ), where n is the number of experiments sampled, hence, the compute time of traditional BO blows up as more data is required to find the optimum [34,35,36,5,6,37,38].To solve this computing challenge, an algorithm must be designed that both efficiently optimizes the space in as few experiments as possible and reduces the effect of compounding compute times over the length of the optimization procedure.
In recent literature, algorithms have been developed to address some of these challenges individually, but not all of them together.The first class of solutions bound the search space using a trust region approach to sample regions with higher probability of containing the optimum.[41], which interleaves sampling between global and local search regions, where the local search regions are defined by the single best historical experiment sampled.Although these methods offer solutions to one of the three challenges presented, each method has its downfalls when optimizing NiaH problems.For example, TuRBO requires the computation of several GP model runs, which increases compute time and also does not guarantee that the needle will be resolved due to interpolation effects; TRIKE is inflexible to the use of other acquisition functions as it locks the user in to only using EI, which may pigeonhole into local minima; TREGO uses only the best sampled experiment to define its search regions, which will yield inconsistent or sub-optimal results when the needle consists of a fractional region of the manifold and single point is unlikely to land in its basin of attraction.The second class of solutions to the challenges presented in this paper are designed to decrease the computing time required to run an optimization procedure.A common method for reducing the compute time of BO with a GP surrogate is to introduce a sparse GP [5,42,37].A sparse GP uses a small subset of pseudo data, often denoted as m, to reduce the GP time complexity from O(n 3 ) to O(nm 2 ) [43].However, the process of selecting a useful subset requires minimizing the Kullback-Leibler divergence between the sparse GP and true posterior GP, which is often a computationally intensive procedure of using variational inference [44].In addition to sparse GPs, new algorithms have been developed in literature to improve the compute time of optimization in various ways.Van Stein et al. develop MiP-EGO [45], which parallelizes the function evaluations of efficient global optimization (EGO) to discover optima faster and in fewer experiments using derivative-free computation [46].Joy et al. [47] use directional derivatives to accelerate hyperparameter tuning by 100x and achieve higher accuracy than the FABOLAS baseline by Klein et al. [48].Zhang et al. develop FLASH [49] to achieve optimization speed-ups of 50% by using a linear parametric model to guide algorithm search within high-dimensional spaces.Snoek et al. [15] design a neural network-based parametric model that reduces the overall time complexity of BO to O(n) compared to the complexity of O(n 3 ) of standard BO with a GP surrogate model.These existing methods from literature within the class of solutions for accelerating compute time are generally introducing external models necessary to perform optimization, such as neural networks, variational inference, or parametric models.While these external models do speed up compute time, they often lack the predictive capabilities to capture the weak correlation between measured input parameters and the target property of interest in NiaH problems.We illustrate this mechanism later in the paper when comparing the optimization results on two materials science NiaH problems of a fast algorithm MiP-EGO with that of TuRBO, an algorithm better suited for discovering optima within narrow basins of attraction.
Although these methods from existing literature address some of the challenges in optimizing NiaH problems, none of them have been designed specifically to quickly and efficiently discover a needle-like optimum within a haystack of suboptimal points, resulting in all of them falling short of a full solution.Therefore, in this paper, we design an algorithm that addresses all three of the challenges faced when optimizing NiaH problems by (1) zooming in the manifold search bounds iteratively and independently for each dimension based on m number of best memory points to quickly converge to the plausible region containing the global optimum needle, (2) relieving compute utilization by pruning the low-performing and redundant memory points not being used to zoom in the search bounds, (3) anti-pigeonholing into local minima by using actively learned acquisition function hyperparameters to tune the exploitation-to-exploration , combines these three contributions into a method that efficiently optimizes NiaH problems quickly.Figure 2 demonstrates the accelerated convergence ability of the proposed (ZoMBI) algorithm compared to standard BO.In essence, this process of scanning broadly and then focusing in on points of interest based on memory was inspired by the way we humans solve similar problems, but stands in contrast to the way standard BO methods with static acquisition functions solve problems.We demonstrate the performance of this algorithm on three vastly different NiaH problems in materials science and ecological resource management: (1) discovery of materials with negative Poisson's ratio, (2) discovery of materials with both high electrical conductivity and low thermal conductivity, and (3) detection of environmental conditions conducive of sustaining wildfires.The performance of the proposed ZoMBI algorithm is compared against standard BO with static acquisition functions as well as against three more algorithms: (1) HEBO, the winning submission of the NeurIPS 2020 Black-Box optimization challenge [50] and one algorithm from each of the two classes of partial NiaH solutions (2) TuRBO (bounded search space) [39] and (3) MiP-EGO (faster compute) [45].Finally, we stress-test the proposed ZoMBI algorithm across 174 additional datasets varying the optimum needle width, optimum distance to edges, dimensionality, and initialization conditions.

Methodology
In this paper, we develop two major contributions: (1) the ZoMBI algorithm and (2) adaptive learning acquisition functions.Through the combination of these two contributions, the optimum region of a NiaH manifold can be quickly discovered in fewer experiments without pigeonholing into local minima.Thus, the three challenges of optimizing NiaH problems are addressed: (1) the challenge of finding a hypervolume within the manifold that contains the needle-like optimum [29,30,13], (2) the challenge of the polynomially increasing compute times of BO using a GP surrogate [35,36,5,6,37,38], (3) the challenge of avoiding pigeonholing into local minima [9,1,31,32].We demonstrate the implementation of ZoMBI on a 6D analytical Ackley function, a 6D dataset of materials with Poisson's ratios, a 6D dataset of thermoelectric materials, and an 11D dataset for wildfire detection, all of which exhibit an extreme data imbalance and a NiaH regime, and compare the performance to that of MiP-EGO [45], TuRBO [39], and HEBO [50].For each of the three problems, the objective is to find the target value, y, with either the lowest or highest value depending on if the problem is minimization or maximization.This optimum y-value resembles a needle for each problem because it is located within a narrow and steep basin of attraction.Precisely, the needle optimum for each problem has a value of y = 0 for the Ackley function (minimization), y = −1.7 for Poisson's ratio dataset (minimization), y = 1.9 for the thermoelectric merit dataset (maximization), and y = −12 for the wildfire detection dataset (minimization).To extend the applicability of ZoMBI optimization performance results to a wider array of applications, additional stress tests are conducted on 174 analytical datasets.First, a set of 144 analytical datasets are optimized to assess the failure and success conditions of ZoMBI on problems with extremely narrow optima and few initialization data points.Then, in the Supplemental Information, a set of 30 analytical datasets are optimized to assess the failure and success conditions of ZoMBI on problems with insufficient initialization data and cases where the global optimum is near the edge of the manifold.

Zooming Memory-Based Initialization (ZoMBI) Algorithm
The ZoMBI algorithm has two key features: (1) iterative inward bounding of proceeding search spaces using the m number of best-performing memory points within the prior search space and (2) iterative pruning of low-performing historical search space memory.The newly computed search space bounds are unique for each dimension, such that the optimum basin of attraction of complex, non-convex NiaH manifolds can be discovered.This algorithm leverages these two key features to guide the acquisition of new data towards more optimal regions while only fitting the surrogate within the suggested optimum region to resolve more detail of the space of interest, as shown in Figure 3 and Figure 2.This process subsequently reduces the compute time significantly compared to the compute of a GP in a standard BO procedure, as shown in Figure 4.
Algorithm We define m as the number of retained memory points during an activation of ZoMBI.The m memory points are saved to memory while all other data are erased from memory.These are the historical data points that achieve the m lowest (for minimization) target values, y, and they are used to zoom in the search bounds.Using these memory points, the multi-dimensional upper and lower bounds of the zoomed search space are computed for each dimension, d.Let X := {X 1 , X 2 , . . ., X n } be a set of data points, where X j ∈ R d .Let f : R d → R be the objective function.We first assume that the points in X are in general position so that f (X) contains unique elements.Then, for each m ≤ n define X (m) = {X π(1) , . . ., X π(m) } where π is a permutation on {1, . . ., n} so that {f (X π(j) )} is in ascending order.If f (X) contains repeated elements, we may first remove the points with repeated f values and apply the definition above.Then, for each d, the bounds are defined as: where B l d and B u d computed lower and lower bounds for each dimension, d, respectively.The bounds [B l d , B u d ] constrain the proceeding acquisition of new data as well as the computation of a GP, such that sampling cannot occur outsides of the bounded region.This constraining process operates independently for each dimension, such that each dimension has a unique lower and upper bound.To initialize the algorithm with data from the constrained space, i data points are sampled from the bounded region using Latin Hypercube Sampling (LHS).LHS splits a d-dimensional space into i * d equally spaced strata, where i is the number of points to sample uniformly over d dimensions with low variability, unlike random sampling that has high sampling variability [51].A GP surrogate model is retrained on these i LHS points sampled from the constrained space and then for every proceeding experiment sampled from the space, denoted as a forward experiment, the surrogate model is retrained.Thus, the GP is only being trained on information within the constrained region and as the constrained region iteratively zooms inward and decreases in hypervolume, so does the region computed by the GP.This process allows for more information to be resolve within regions plausibly containing the global optimum basin of attraction.Up to φ forward experiments are sampled in serial, where {X i } ∪ {X φ } ⊆ {X n }.These forward experiments are sampled by maximizing an acquisition value, a ∈ [0, 1], computed by a user-selected acquisition function from one of the four functions EI, EI Abrupt, LCB, and LCB Adaptive, introduced in Section 2.2.Once i + φ number of experiments are sampled, the bounds are re-constrained using the m best performing experiments, i new experiments are sampled from the zoomed-in space using LHS, and then the memory is pruned.The process of collecting φ forward experiments is repeated.A complete constraining-resetting iteration is denoted as an activation, α.This iterative zooming and pruning process over several α significantly speeds up compute time, discussed further in Section 3.2.Implementation of ZoMBI is shown in Algorithm 1.

Adapative Acquisition Functions
Traditional BO acquisition functions, such as EI [52] and LCB [53], use the computed means and variances from a surrogate model to compute an acquisition value; maximizing this acquisition value guides sampling of the manifold [7,33,14].However, these traditional acquisition functions are static, such that they do not actively use any information about the performance of previously sampled experiments to guide sampling.Hence, we implement an adaptive learning approach into the acquisition functions to develop two novel functions, EI Abrupt and LCB Adaptive, that dynamically adapt their sampling based on the quantity and quality of previously sampled experiments.In contrast to a static acquisition function, these adaptive acquisition functions are initialized with an initial set of hyperparameter values to guide their search but then tune these values as sampling progresses.The developed EI Abrupt and LCB Adaptive functions are used within the ZoMBI framework to further accelerate optimization and avoid pigeonholing, see line 9 of Algorithm 1.
LCB Adaptive builds off of previous work that also tune sampling based on the number of experiments collected, n [54,55,56].In this paper, we design LCB Adaptive to tune its hyperparameter to become less explorative as more samples are collected.For example, as the n increases, LCB Adaptive decays its β hyperparameter value to become less explorative and more exploitative.Specifically, this information feedback received by the function determines the contribution of both µ(X) and σ(X) to the acquisition value, a. Similar to EI Abrupt, LCB Adaptive computes an acquisition value, a ∈ [0, 1], for a given X, wherein the X with the highest a is selected by the acquisition function as the next suggested experiment to measure.LCB Adaptive is implemented for a minimization problem as: where n is the number of experiments sampled, and β = 3 and = 0.9 are hand-tuned initialization hyperparameters selected based on a priori domain knowledge of the function's performance on a variety of different problems.Having a large β and an close to 1 supports a gradual decay from very explorative to very exploitative, rather than a rapid decay.In the following section (Section 3.3), the dynamic EI Abrupt and LCB Adaptive are shown to both discover optima faster and avoid pigeonholing into local minima better than their static counterparts by actively balancing the ratio of exploitation to exploration using learned information about the quality and quantity of previously sampled experiments.
EI Abrupt is a novel implementation that flips between the exploitative EI [52] and explorative LCB [53] acquisition functions based on the computed finite differences of recently evaluated experiments.For example, if the evaluated experiment y-values plateaus for three or more experiments in a row, EI Abrupt will abruptly switch from a greedy sampling policy to a more explorative sampling policy.Specifically, this information feedback received by the function determines if the current round of sampling should exploit the surrogate mean values, µ(X), or explore the surrogate variances, σ(X).EI Abrupt computes an acquisition value, a ∈ [0, 1], for a given X, wherein the X with the highest a is selected by the acquisition function as the next suggested experiment to measure.EI Abrupt is implemented for a minimization problems as: Zooming in the search bounds on the manifold addresses challenge number one of optimizing NiaH problems, which is the challenge of finding the general hypervolume region that contains the needle-like optimum.As more experiments are amassed and committed to memory to run traditional BO by computing the GP surrogate, the compute time increases polynomially, following the O(n 3 ) time complexity of GP matrix inversion [34,5,6,60,37,38].This complexity is unfavorable as it leads to compounding compute times as more experiments are run.Therefore, we implement a memory pruning feature into the ZoMBI algorithm that iteratively selects which prior data points to keep and which to prune from the memory during each activation, α.Memory pruning is demonstrated to remove redundant features during the optimization procedure.Figure 2 illustrates how ZoMBI accelerates the convergence of a GP prediction to the precise location of the true.However, only data within the newly computed bounds of ZoMBI are used prediction of the true target, hence, all data outside this boundary becomes redundant and is pruned to decrease compute time.

Memory Pruning
Through memory pruning, the number of experiments used to train the GP surrogate varies between [i, i + φ] for every α, rather than being proportional to n, where the number of initialization samples is fixed at i = 5.In this paper, we use φ ∈ [0, 10], i.e., once φ = 10, the activation is complete and resets to φ = 0.This is computationally favorable because {X i } ∪ {X φ } ⊆ {X n }.Thus, for a single α, the time complexity is O((i + φ) 3 ) ≈ O(φ 3 ), since i is fixed.Furthermore, since the range of φ is capped, a non-increasing sawtooth pattern in compute time is exhibited, illustrated in Figure 4. Therefore, the compute complexity of ZoMBI trends towards O(1) for α > 1 as a result of the efficient memory pruning process.After collecting 1000 experiments, the compute time of traditional BO trends towards > 400 seconds per experiment, whereas for ZoMBI the compute time maintains a constant trend of approximately 1 second per experiment.Therefore, the memory pruning feature of ZoMBI accelerates the optimization compute time by over 400x at n = 1000 and achieves further relative acceleration as n increases.

Anti-pigeonholing
Pigeonholing into the local minima of a function occurs when an optimization algorithm has insufficient learned knowledge of the manifold topology to continue exploring potentially profitable regions or when the algorithm's hyperparameters are improperly tuned, leading to overly exploitative tendencies [1,9].The ZoMBI algorithm's antipigeonholing capabilities are two-fold: (1) the zooming search bounds help the acquisition function to quickly stop sampling local minima once a better performing data point is found and (2) actively learned acquisition function hyperparameters use knowledge about the domain to help exit a local minimum.Figure 5 demonstrates the antipigeonholing capabilities of ZoMBI on optimizing a 6D Ackley function with both static and dynamic acquisition functions, compared to that of traditional BO.   Figure 6 shows the results of this large-scale optimization experiment of 48 independent trials of ZoMBI across each of the 144 unique permutations of the Ackley function dataset with varying optimum hypervolumes and dimensionality.All points below the grey-shaded region fall within the optimum basin of attraction.The red trace of the Pareto front indicates the narrowest optimum hypervolume and dimensionality conditions of a dataset that result in the best minimum function value being discovered.We show that with an initialization set of i = 5, ZoMBI can reliably discover the global minimum region for needles as narrow as 0.05% of total hypervolume space.Moreover, as the optimum becomes narrower than 0.05% of the total hypervolume, the initialization set is no longer sufficient and ZoMBI gets trapped in local minima, as indicated by the greyed-out region.Conversely, as the optimum becomes wider than 5% of the total hypervolume, the manifold becomes flatter, expressing the greedy nature of ZoMBI to falsely zoom inward to less ideal function values than it would for narrower optimum conditions.This experiment quantifies the range of ZoMBI's Goldilocks zone to be between 0.05% and 5% optimum hypervolume.Therefore, for ideal performance, ZoMBI is best used on datasets with optimum conditions consisting of between 0.05% and 5% of the total number of conditions.This optimum hypervolume trade-off of ZoMBI is further assessed relative to other optimization methods in Figure S-3.
In the next section, three real-world datasets are optimized using ZoMBI -each of these datasets have extreme data imbalances, illustrated in 7 within the specified ideal ranges of ZoMBI performance.The 6D Poisson's Ratio dataset has an imbalance of 0.82% optimum conditions, the 6D Thermoelectric Figure of Merit dataset has an imbalance of roughly 1.32% optimum conditions, and the 11D wildfire detection dataset has an imbalance of 4.16% optimum conditions.This range of ideal performance of ZoMBI between 0.05% and 5% optimum hypervolume is facilitated by the initialization set.Hence, to improve performance for narrower optima, either the number of initialization samples must be increased, or initialization conditions should be adjusted.Additional initialization conditions experiments of ZoMBI are shown in supplemental Section S-5.

6D Poisson's Ratio
The first experimental dataset is 6-dimensional and consists of 146k materials from the publicly available Materials Project database with different mechanical properties, described by Poisson's Ratio, ν [12].Only 0.82% of the total 146k materials have a negative Poisson's Ratio, ν < 0 [16,12,11,17].Hence, for this experiment, we aim to minimize The noisy, non-convex manifold topologies of each dataset generated by a random forest regression with 500 trees.Each manifold is a projected 3D slice of higher dimensional space with the z-axis and colorbar indicating the target property, where (a) X 1 is density and X 2 is formation energy, (b) X 1 is formation energy and X 2 is band gap, (c) X 1 is evapotranspiration and X 2 is precipitation.
ν.A positive ν > 0, describes a material that expands when a compressive load is applied to the orthogonal direction [63,64].Conversely, a negative ν < 0 describes a material that contracts rather than expands when compressed in the orthogonal direction, denoted as an auxetic material -a rare phenomenon [16,23].Auxetic materials with highly negative Poisson's ratios have energy absorptive properties that are ideal materials for wearable medical devices and protective armor that must absorb the energy of large impacts to keep bones from shifting or to inhibit the penetration of the protective layer [17,18].).The variance of ν values for the final experiment across all ensemble runs is illustrated as a KDE plot for each method to highlight the sampling density and general rate of success.HEBO discovers the global minimum after ZoMBI with LCB and LCB Adaptive, however, the spread of runs for ZoMBI is narrower than that of HEBO, which indicates that for this problem, ZoMBI can more consistently discover the minimum, that is 3x lower than those discovered by MiP-EGO and TuRBO.Furthermore, the rate of convergence on Needle 1 is faster for ZoMBI than HEBO.

6D Thermoelectric Figure of Merit
The second experimental dataset is 6-dimensional and consists of 1k materials with different thermal and electrical properties, described by the Thermoelectric Figure of Merit, ZT .Since ZT values are always positive, there is no clear cutoff for what "optimum" conditions are, but with a threshold of ZT > 0.8, 1.32% of the total 1k materials are considered optimum.A higher ZT indicates that the material is better able to convert a thermal gradient into an electrical current [65].Hence for this experiment, we aim to maximize ZT.Unlike Poisson's Ratio, Thermoelectric Merit is determined by a combination of several variables, rather than a single variable [65]: where S is the Seebeck coefficient, σ is electrical conductivity, T is the average temperature, and κ is thermal conductivity.The ZT is computed for each material with valid thermal and electrical properties in the Materials Project database using BoltzTraP [61].ZT is a common figure of merit used to describe the thermal-to-electrical or electrical-to-thermal conversion efficiency of thermoelectric materials [66,67,68,69].Materials with high ZT values have a range of applications from usage as solid-state cooling devices to being used as sensors that when heated up, will produce an electrical signal [19,20,21].Ultimately, this experiment demonstrates that ZoMBI can optimize material objective functions that have a complex combination of variables (Equation 4) with roughly 2x better performance than HEBO.

11D Wildfire Detection
The third experimental dataset is 11-dimensional and consists of 128k meteorological conditions and an index, ψ, that determines whether the set of conditions has a high likelihood of generating or sustaining a wildfire in the state of California -publicly available from the California Irrigation Management Information System (CIMIS) weather stations [62].Only 4.16% of the total 128k meteorological conditions have a negative wildfire detection index, ψ < 0.
A highly negative ψ indicates a high risk of wildfires.Hence, for this experiment, we aim to minimize ψ, to best detect meteorological conditions at high risk of wildfires.The dataset spans over two years of data collected from 2018 to 2020, during which over 2500 wildfires have occurred, burning over 24 million acres of land [70].In California, temperature and precipitation alone are poor indicators for wildfire outbreaks (see Figure S-1(c)), resulting in researchers using computer-vision methods or convolutions of many meteorological variables to reliably detect wildfire conditions instead [25,70].Thus, there is a high need for algorithmic support to aid humans in early wildfire detection.
Figure 10 demonstrates the optimization performance of ZoMBI on the Wildfire Detection dataset compared to MiP-EGO, TuRBO, and HEBO.In this experiment, LCB Adaptive, EI, and HEBO discover the lowest index value, ψ ≈ −3.5, for detecting wildfires based on a high-dimensional convolution of ten meteorological variables.TuRBO and MiP-EGO also discover a low index value, ψ ≈ −2.5, however, these methods have widely distributed variances, as shown by the KDE plots, indicating inconsistent optimization results given only 100 sampled experiments.Similarly, HEBO has high variance across model runs while the LCB Adaptive and EI ZoMBI methods have a tight distribution, indicating more reliable optimization results with a higher rate of success.Furthermore, ZoMBI methods achieve a faster rate of convergence than HEBO onto the Needle 1 optimum, similar to the optimization results on the Poisson's Ratio dataset in Section 4.2.
Figure 7(c) illustrates the distribution of ψ values within the full dataset.The ground truth "needle" conditions for detecting wildfires are those with the most negative detection index values, ψ.Although ZoMBI with the LCB Adaptive and EI acquisition functions as well as HEBO discover the lowest needle-like ψ conditions after 100 sampled experiments, none of the tested methods are able to find the global ψ min ≈ −12.These results imply that, even for ZoMBI, with a narrow enough needle-like optimum, the LHS initialization of i = 5 experiments, may not be sufficient.In the next section, we stress-test this hypothesis on ZoMBI.

Summary & Conclusions
In this paper, we proposed the [Zo]oming [M]emory-[B]ased [I]nitialization (ZoMBI) algorithm that builds on the principles of Bayesian optimization to accelerate the optimization of Needle-in-a-Haystack problems by two-fold, firstly by requiring fewer experiments to achieve better optimum faster than existing MiP-EGO [45], TuRBO [39], and HEBO [50] on a variety of real-world applications, and secondly by pruning the memory of low-performing historical experiments to speed-up compute time.The ZoMBI algorithm convergences onto narrow and sharp optima quickly in Needle-in-a-Haystack datasets by (1) using the values of the m best performing previously sampled memory points to iteratively zoom in the search bounds of the manifold uniquely on each dimension and (2) implementing two custom acquisition functions, LCB Adaptive and EI Abrupt, that adapt their hyperparameters to tune sampling of new experimental conditions based on learned information from the surrogate model.The main contributions of this algorithm solve three fundamental challenges of optimizing non-convex Needle-in-a-Haystack problems: (1) the challenge of locating the hypervolume region of the manifold containing the narrow global optimum basin of attraction [29,30,13] is alleviated by introducing iterative search bounds based on learned knowledge of the manifold; (2) the challenge of polynomially increasing compute times of BO using a GP surrogate [34,35,36,5,6,37,38] is addressed by actively pruning the retained memory of the algorithm after each activation, α, in turn, reducing the time complexity from O(n 3 ) to O(φ 3 ) for φ forward experiments per activation, α, which trends to a constant O(1) when α > 1; (3) unwanted pigeonholing into local minima [31,32,5,6] is avoided by both the zooming mechanics of ZoMBI as well as using the two acquisition functions developed in this paper, LCB Adaptive and EI Abrupt, that tune their hyperparameters through adaptive learning.By developing the ZoMBI algorithm to solve these challenges, it becomes possible to quickly and efficiently find optimal solutions to complex Needle-in-a-Haystack problems in fewer experiments.
Solving a Needle-in-a-Haystack problem that arises from extremely imbalanced data is a significant challenge that has important implications in science and engineering, especially within the field of materials science [10,29].In this paper, we use ZoMBI to discover the optimum materials in two real-world materials science Needle-in-a-Haystack datasets where only a small fraction of the entire search space consists of the target optimum conditions.For breadth, we also extend our analysis to a third real-world dataset but for ecological resource management with the objective of discovering the environmental conditions that have a high likelihood of sustaining wildfires for early detection of wildfires.In the first materials dataset, we discover a material with a highly negative Poisson's ratio, ν, [12,11]; in the second materials dataset, we discover a material with a highly positive thermoelectric figure of merit, ZT [61,12], both rare material properties; and in the third dataset for ecological resource management, we discover a set of environmental conditions with a highly negative wildfire detection index, ψ [62,25,70].For the first dataset, both the ZoMBI algorithm with the LCB and LCB Adaptive custom acquisition functions and HEBO [50] discover the material with the minimum ν ≈ −1.7, however, the ZoMBI methods converge on this minimum in only 70 experiments while HEBO takes 90 experiments.TuRBO [39] and MiP-EGO [45] only discover materials with ν ≈ −0.55 and ν ≈ −0.20, respectively.For the second dataset, the ZoMBI algorithm with the LCB Adaptive custom acquisition function discovers the material with the maximum ZT ≈ 1.4, while HEBO [50], TuRBO [39], and MiP-EGO [45] only discover ZT ≈ 0.78, ZT ≈ 0.65, and ZT ≈ 0.45, respectively.For the third dataset the ZoMBI algorithm with all acquisition functions and HEBO [50] discover a minimum ψ ≈ −3, while TuRBO [39] and MiP-EGO [45] both only discover ψ ≈ −2.However, the ZoMBI methods converge on the minimum faster and with less variance.In general, we note HEBO [50] outperforms the other benchmark methods, TuRBO [39] and MiP-EGO [45].Thus, for future investigation, we believe the performance of ZoMBI may be further improved by running optimization within the latent space of a variational autoencoder, similar to HEBO [71, 72].Overall, these results demonstrate that the ZoMBI algorithm is more well-suited to tackle various real-world Needle-in-a-Haystack optimization problems than current methods, however, ZoMBI has performance limitations for extremely narrow optima when instantiated with an insufficient initialization set.Therefore to assess these limitations, we stress tested ZoMBI on an additional 174 analytical datasets with varying optimum needle widths, optimum distance to edges, dimensionality, and initialization conditions.These results concluded that with a fixed initialization set of 5 samples, ZoMBI has ideal performance on datasets with needle-like optima consisting of between 0.05%-5% of total hypervolume space.Furthermore, by extending the range of the initialization set, ZoMBI is capable of discovering global minima that lay on the absolute edge of a manifold's limits.Thus, in these certain cases, convergence to a global optimum using ZoMBI is not guaranteed, but with slight modifications based on some a priori domain knowledge of the optimization landscape, ZoMBI produces high-performance and low-variance results.
Ultimately, the significance of developing the ZoMBI algorithm is to quickly and efficiently tackle difficult Needle-ina-Haystack optimization problems in extremely imbalanced datasets.In this paper, we showcased the ability of the developed algorithm to discover rare materials and conditions with highly-optimized properties in a short period of time using few experiments.Discovering rare materials quickly and efficiently enables widespread access to a new range of materials applications from engineering high-performance medical devices to ubiquitous solid-state cooling systems [17,18,19,20,10,21].However, the application space for ZoMBI to accelerate the efficient discovery of highly-optimized solutions extends past materials science and is generally applicable for many Needle-in-a-Haystack problems, including those found in ecological resource management [24,25], fraud detection [26,27], and rare disease prediction [28,27].We aim for this contribution to support the elimination of the time and resource barriers previously inhibiting the throughput of optimizing complex and challenging Needle-in-a-Haystack problems across a broad range of application spaces.

S-2 Dataset Variables and Descriptions
In this paper, we provide analyses of three real-world datasets: (1) Poisson's Ratio [12,11], (2) Thermoelectric Merit [61], and (3) Wildfire Detection [62].The Poisson's Ratio dataset has a sample size of N = 146, 232 experimental and simulated materials, publicly available from the Materials Project database [12].For this paper, we select five real-valued material variables to optimize Poisson's Ratio over, these variables are shown in Table S-1.The feature importance ranking of these variables with the target variable is shown in Figure S-1(a).The target variable, Poisson's Ratio is measured directly or computed using Density Functional Theory (DFT) from the stress, elasticity, and strain tensors, σ ij , C ijkl , and kl [11]: The Thermoelectric Merit dataset has a sample size of N = 1, 063 experimental and simulated materials, publicly available from the Materials Project database [12].For this paper, we select five real-valued material variables to optimize Thermoelectric Merit over, these variables are shown in

S-4 Varying Basin of Attraction Width
The ZoMBI algorithm is designed specifically to tackle NiaH problems where the basin of attraction containing a global minimum is narrow [31,32].In this experiment, we explore the question, how does basin of attraction width affect the ability of ZoMBI to find the global minimum?We present a synthetic example of a 6D noisy and non-convex dataset with a global minimum of f min = −2 whereby applying a Gaussian smoothing function to the dataset, the basin of attraction width is synthetically widened.The original, unmodified 6D manifold is based on the Poisson's Ratio dataset [12,11].Figure S-3 illustrates the trend lines of optimization performance of ZoMBI with each of its possible acquisition functions, relative to TuRBO [39] and MiP-EGO [45].The trend lines show that for wider basin of attraction problems, TuRBO and MiP-EGO outperform ZoMBI with its various acquisition functions.However, as the basin of attraction narrows, the problem becomes a NiaH.The trend in performance of ZoMBI improves as the basin width decreases, the algorithm becomes more capable of finding the global minimum.Conversely, TuRBO and MiP-EGO become less capable of discovering the global minimum as the manifold transitions into a NiaH problem.As the Gaussian smoothing is lessened, all global and local minima basins narrow, resulting in less of the manifold space containing optimal regions, which accelerates the inward zooming of the search bounds using ZoMBI.Additionally, the more explorative acquisition functions are shown to perform better than the exploitative functions, such as EI, because pigeonholing is more readily avoided, as shown previously in Figure 5.

S-5 Optima Near Edges
A dataset with a global minimum near an edge is a challenge for ZoMBI to optimize unless the initialization set is altered.
In this experiment, we evaluate the performance of ZoMBI across fifteen datasets, where each dataset moves the global optimum closer to an edge region of the manifold.All fifteen experiments are repeated but with an initialization set extending beyond the limiting edge of the manifold.Figure S-4(a) illustrates the minimum function values discovered by ZoMBI over 100 experiments for each of the fifteen 6D datasets with varying optimum distances from the center, using an initialization set of i = 5 with bounds [0, 1] 6 for each run.It is shown that once the optima begin to approach the corner (between distance 12-15 from the center), ZoMBI is no longer able to discover the minimum and plateaus performance across 100 sampled experiments.These results are explained by the intertwined nature of the iterative zooming feature of ZoMBI and the initialization set used to instantiate the optimization procedure.If the initialization is at the edge of a small initialization set, an edge-bounded optimum may be easily excluded from the internal initialization hypervolume.Hence, when ZoMBI computes a GP surrogate and zooms in its bounds, the global minimum is excluded because it was not contained sufficiently within the initialization bounds.To surmount this unfavorable mechanic, one may either increase the number of initialization points or extend the bounds of LHS sampling if known a priori that the global minimum of the manifold lies close to a boundary.performance degradation of edge-bounded optima is avoided entirely.Extending the initialization bounds allows the GP surrogate to extrapolate edge feature information, which ZoMBI can more reliably zoom inward on than features at the edge of the bounds.Since it was known a priori that the global minimum of the tested datasets existed at the upper limit of the manifold hypervolume, the upper limit of the LHS initialization set was extended.For an arbitrary and normalized d-dimensional manifold with an optimum existing at the upper limit of the manifold, LHS bounds are simply extended from [0, 1] d to [0, 1 + γ] d , where γ < 0. Although ZoMBI does not work out-of-the-box for every application, simple modifications can be made to a single step within the procedure to significantly improve optimization results for a breadth of dataset topologies without increasing the required number of sampled experiments.

Figure 1 :
Figure 1: Archetypal Manifolds in Materials Science Optimization.(a)In process optimization, there often exists a real and continuous path between each condition.This 3D projected manifold is adapted from the 6D perovskite process optimization problem by Liu et al., where X 1 is spray flow rate, X 2 is plasma voltage, and f (X) is cell efficiency[1].(b) However, in materials optimization, there are often only discrete combinations of properties that define real materials, resulting in a rough topology with extreme outliers.For example, Li 2 NbF 6 and Li 2 ZrF 6 lay close to each other in space because they have similar density, formation energy, and structure, however, they have vastly different target properties: Li 2 NbF 6 has a Poisson's ratio of −1.7 while Li 2 ZrF 6 has a Poisson's ratio of 0.3[11].Extreme outliers, such as Li 2 NbF 6 , consist of only a small fraction of the manifold hypervolume, resulting in a Needle-in-a-Haystack regime arising.This 3D projected manifold is obtained from the 6D Poisson's ratio optimization problem presented in this paper, where X 1 is density, X 2 is formation energy, and f (X) is negative Poisson's ratio[12].

Figure 2 :
Figure 2: Accelerated Convergence to True Target using ZoMBI.Using a standard Bayesian optimization procedure, the discovery of a Needle-in-a-Haystack condition does not progress significantly after 10 additional experiments from the initial GP guess.However, using ZoMBI to zoom the bounds inward and prune redundant memory points, the needle-like optimum region is resolved to be accurately aligned with the true target.(a) The true target to optimize, which is a slice from the 6D Poisson's Ratio dataset, (b) The initial guess of the target function using a GP surrogate with 20 randomly sampled experiments, (c) (top) The estimated target resolved by standard BO after 10 additional experiments sampled using a greedy LCB acquisition function (β = 0.1); (bottom) the estimated target resolved by ZoMBI after 10 additional experiments sampled using the same greedy LCB acquisition function.The red memory points do not assist in resolving this target after zooming in the bounds, hence, they are pruned from memory by ZoMBI.

Figure 3 :
Figure 3: Zooming Search Bounds.For every activation of ZoMBI, the search bounds are zoomed inward based on the prior best-performing memory points.A 4D Ackley function manifold is projected in 2D.The bounding regions of each 2D slice are illustrated by the red and orange boxes.The φ number forward experiments sampled for each activation, α, are illustrated as black markers.The global optimum is indicated by the red region of the heatmap.
Figure3illustrates how the ZoMBI algorithm iteratively zooms in the search bounds based on the number of activations, α.An Ackley function is used as a simulated example due to its non-convexity and needle-like global optimum[57,58].For each activation, m prior points that achieved the lowest target values, y, are retained in memory and used to zoom the search bounds in.This zooming occurs independently across each dimension and is based on the minimum and maximum values of the m memory points along each dimension, as shown in Equation1.The red and orange rectangles illustrate the evolution of the bounds over space and time.Initially, sampling occurs across the entire manifold for φ forward experiments per activation, shown by the black markers.However, by using the best-performing memory points to zoom in the search bounds, pigeonholing into local minima can also be avoided as the search bounds are pulled away from these trap minima and move closer towards the global minimum basin of attraction.The iterative zooming of ZoMBI does not guarantee convergence on the global optimum, but if a sufficient initialization set is obtained, convergence often gets close to the global optimum as shown across several examples in Figure5and Figures8, 9, 10.Furthermore, we comprehensively demonstrate the performance limitations of ZoMBI where initializations miss extreme needle-like optima in Figure6and where optima are near the edges of a manifold in Figure S-4.

Figure 4 :
Figure 4: Wall-clock Compute Time.The compute time per experiment is illustrated for traditional BO with a GP surrogate (orange) and for ZoMBI with a GP surrogate (blue) with the y-axis in log-scale.Four independent trials of each method were run to optimize a 6D Ackley function with a narrow basin of attraction using an NVIDIA Tesla Volta V100 GPU [59].Each trial of standard BO and ZoMBI is run using one of the four acquisition functions: LCB, LCB Adaptive, EI, and EI Abrupt.The averages of the trials are shown as solid orange and blue lines while the shaded regions indicate the maximum and minimum compute times bounds.The red dashed line indicates the trend of the ZoMBI compute times.The measured compute time includes the time to compute the GP surrogate model and the time to acquire an experiment from the surrogate.

Figure 5 : 4 Experiments 4 . 1
Figure 5: Acquisition Function Sampling Density.The colored heatmaps indicate the regions of a 2D slice from a 6D Ackley function where sampling density is high for each respective acquisition function: (a) LCB, (b) LCB Adaptive, (c) EI, and (d) EI Abrupt.The contour lines indicate the manifold topology with local minima as the circular and pointed regions of the contours.The red "x" indicates the global minimum.For each acquisition function, the left panel shows the sampling density after n = {20, 40, 80} evaluated experiments without the use of ZoMBI while the right panel shows the sampling density after n = {20, 40, 80} evaluated experiments with the use of ZoMBI.

Figure 6 :
Figure 6: Varying Optimum Hypervolume.(left) Depiction of decreasing optimum basin of attraction hypervolume in 1D.(right) The Pareto-optimal dataset hyperparameters for usage with the ZoMBI algorithm over 144 analytical datasets with 48 independent trials each: 12 trials for each of the four acquisition functions, LCB, LCB Adaptive, EI, and EI Abrupt, for a total of 6912 independent trials.Each analytical dataset is a permutation of the Ackley function with a different optimum basin of attraction width and manifold dimensionality.Hypervolume percent makeup is synthetically decreased both by decreasing the basin of attraction width and by increasing the manifold dimensionality.Each scatter point represents the median final minimum function evaluation after 1000 experiments across the 48 independent trials initialized with a fixed set of i = 5 samples.The colorbar of the scatter point represents the dimensionality of the manifold tested and the error bars represent the variance across the 48 trials.The possible function values for every dataset vary between [0, 25], hence, for the Ackley function as further described in the supplemental Section S-1, trials achieving minimum function values < 10 are considered to have found the optimum basin of attraction while trials with function values ≥ 10 after 1000 experiments are considered to be trapped in local minima.Both the xand y-axes are in log-scale.

Figure 7 :
Figure 7: Data Distributions of Real-world Needle-in-a-Haystack Datasets (top) The histogram distributions of the full real-world datasets with callouts for optimum conditions: (a) Poisson's Ratio with 146k materials in the dataset and ν min = {−1.7,−1.2},(b) Thermoelectric Figure of Merit with 1k materials in the dataset computed by BoltzTraP[61] and ZT max = {1.4,1.9}, (c) Wildfire Detection with 128k meteorological conditions collected over 33 months from January 2018 to September 2020 from CIMIS[62] and ψ < 0 conditions indicating those with a high likelihood of wildfire outbreaks.(bottom) The noisy, non-convex manifold topologies of each dataset generated by a random forest regression with 500 trees.Each manifold is a projected 3D slice of higher dimensional space with the z-axis and colorbar indicating the target property, where (a) X 1 is density and X 2 is formation energy, (b) X 1 is formation energy and X 2 is band gap, (c) X 1 is evapotranspiration and X 2 is precipitation.

Figure 8 2
Figure 8 demonstrates the optimization performance of ZoMBI on the Poisson's Ratio dataset compared to MiP-EGO, TuRBO, and HEBO.The ZoMBI algorithm is run with each of the four acquisition functions: LCB, LCB Adaptive, EI, and EI Abrupt.In under 100 evaluated experiments, LCB and LCB Adaptive discover the global minimum NiaH material, Li 2 NbF 6 (ν ≈ −1.7).The variance of ν values for the final experiment across all ensemble runs is illustrated as a KDE plot for each method to highlight the sampling density and general rate of success.HEBO discovers the global minimum after ZoMBI with LCB and LCB Adaptive, however, the spread of runs for ZoMBI is narrower than that of HEBO, which indicates that for this problem, ZoMBI can more consistently discover the minimum, that is 3x lower than those discovered by MiP-EGO and TuRBO.Furthermore, the rate of convergence on Needle 1 is faster for ZoMBI than HEBO.

Figure 7 (
Figure 7(a) illustrates the distribution of ν values within the full dataset.The ground truth "needle" materials with the lowest ν values are Li 2 NbF 6 with ν ≈ −1.7 and Na 2 CO 3 with ν ≈ −1.2.ZoMBI with the LCB and LCB Adaptive acquisition functions and HEBO discover Li 2 NbF 6 , while ZoMBI with the EI Abrupt acquisition function discovers Na 2 CO 3 .

Figure 8 :
Figure 8: Discovery of Rare Negative Poisson's Ratio Materials.The optimization objective is to find the material with the minimum Poisson's ratio in 100 experiments from the dataset presented in Figure 7(a).The green, blue, red, and orange lines indicate the median best running evaluated sample of ZoMBI using the LCB, LCB Adaptive, EI, and EI Abrupt acquisition functions, respectively.The pink, black, and teal lines indicate the median best running evaluated sample of the methods MiP-EGO, TuRBO, and HEBO respectively.Random sampling is illustrated as a dashed grey line for benchmarking.The median for each method is taken over the best 12 independent model runs.The shaded regions indicate the variance between model runs.The cross-hatched region indicates the space discovered by standard BO methods, without the use of ZoMBI, which use the same hyperparameters as described in Section 2.2.The distribution across all 12 model runs of the final sampled experiment for each method is shown as a kernel density estimation (KDE) along the y-axis.The y-values for the needle-like optima are indicated by dashed black lines.

Figure 9 :
Figure 9: Discovery of Rare Positive Thermoelectric Figure of Merit Materials.The optimization objective is to find the material with the maximum Thermoelectric Figure of Merit in 100 experiments from the dataset presented in Figure 7(b).The green, blue, red, and orange lines indicate the median best running evaluated sample of ZoMBI using the LCB, LCB Adaptive, EI, and EI Abrupt acquisition functions, respectively.The pink, black, and teal lines indicate the median best running evaluated sample of the methods MiP-EGO, TuRBO, and HEBO respectively.Random sampling is illustrated as a dashed grey line for benchmarking.The median for each method is taken over the best 12 independent model runs.The shaded regions indicate the variance between model runs.The cross-hatched region indicates the space discovered by standard BO methods, without the use of ZoMBI, which use the same hyperparameters as described in Section 2.2.The distribution across all 12 model runs of the final sampled experiment for each method is shown as a kernel density estimation (KDE) along the y-axis.The y-values for the needle-like optima are indicated by dashed black lines.

Figure 9
Figure 9 demonstrates the optimization performance of ZoMBI on the Thermoelectric Figure of Merit dataset compared to MiP-EGO, TuRBO, and HEBO.In this experiment, although none of the tested methods are able to discover the maximum needle, LCB Adaptive discovers the second highest needle-in-a-haystack material, Na 4 Al 3 Ge 3 IO 12 (ZT ≈ 1.4) in under 100 experiments.Neither HEBO, TuRBO, nor MiP-EGO are capable of discovering any needle-like ZT optima and MiP-EGO performs worse than random sampling in this experiment.The wide variance across runs for ZoMBI and HEBO, shown in the KDE plots, indicate that both methods operate relatively explorative to discover maxima in this topology.Ultimately, this experiment demonstrates that ZoMBI can optimize material objective functions that have a complex combination of variables (Equation4) with roughly 2x better performance than HEBO.

Figure 7 (
Figure 7(b) illustrates the distribution of ZT values within the full dataset.The ground truth "needle" materials with the highest ZT values are Sr 4 Al 6 SO 12 with ZT ≈ 1.9 and Na 4 Al 3 Ge 3 IO 12 with ZT ≈ 1.4.ZoMBI with the LCB Adaptive acquisition function is the only method that discovers one of these needles, Na 4 Al 3 Ge 3 IO 12 .

Figure 10 :
Figure 10: Detection of Environmental Conditions with Wildfire Risk.The optimization objective is to find the meteorological conditions with the minimum wildfire detection index, ψ, in 100 experiments from the dataset presented in Figure 7(c).Conditions with ψ < 0 have the highest risk of sustaining wildfire.The green, blue, red, and orange lines indicate the median best running evaluated sample of ZoMBI using the LCB, LCB Adaptive, EI, and EI Abrupt acquisition functions, respectively.The pink, black, and teal lines indicate the median best running evaluated sample of the methods MiP-EGO, TuRBO, and HEBO respectively.Random sampling is illustrated as a dashed grey line for benchmarking.The median for each method is taken over the best 12 independent model runs.The shaded regions indicate the variance between model runs.The cross-hatched region indicates the space discovered by standard BO methods, without the use of ZoMBI, which use the same hyperparameters as described in Section 2.2.The distribution across all 12 model runs of the final sampled experiment for each method is shown as a kernel density estimation (KDE) along the y-axis.The y-values for the needle-like optima are indicated by dashed black lines.
where a determines the connectivity of local minima, b determines the width of the global minimum basin of attraction, c determines the periodicity of the local minima, and d is the number of dimensions.For this paper, the Ackley functions used have minima connectivity of a = 20 and high minima periodicity of c = π with b and d varying for most applications, but generally b = 0.5, d = 5.The response variable, f (X) ∈ R : [0, 25] with the global minimum at f (X) = 0 and the basin of attraction existing between 0 ≤ f (X) < 10.

Figure S- 1 :
Figure S-1: SHAP Feature Importance Ranking.Each of the training variables for each real-world dataset have their importances ranked based on which variable has the highest impact on the output target variable.Variables with a higher spread of points have a higher impact on the target variable.Red scatter points are training variable values that increase the target variable output, while blue scatter points are training variable values that decrease the target variable output.The SHAP analysis was conducted by training a random forest regressor with an 80/20 training/testing split on the raw data points from each dataset.

Figure S- 2 :
Figure S-2: Cumulative Wall-clock Compute Time and Function Evaluations.The median total cumulative compute time to sample 1,000 experiments of a 6D Ackley function is compared for four independent trials of standard BO (solid yellow line) and ZoMBI (solid blue line).Wall-clock compute time is measured on an NVIDIA Tesla Volta V100 GPU [59].The black dotted line indicates the cumulative sum of a constant 1 second compute per experiment -the compute time of ZoMBI lies below this line.The yellow and blue dashed lines correspond to the minimum evaluated point of the Ackley function for standard BO and ZoMBI, respectively, where zero is the global minimum.The cumulative compute times are plotted on a log-scale, whereas the function evaluations are plotted on a linear scale.

Figure S- 3 :
Figure S-3: Synthetically Varying Basin of Attraction Width.The colored lines indicate the median trend lines of 12 independent runs for the final experimental evaluation of each algorithm after 100 sampled experiments for a given basin width.The spread of the data from the trend line is indicated by the shaded regions.The datasets with different width basins are obtained by synthetically convolving the original 6D Poisson's ratio dataset topology with Gaussian noise of increasing magnitudes.The objective is to find the minimum function value, f min = −2.In wide basin datasets, TuRBO and MiP-EGO, perform the best.However, for NiaH problems with narrow basins, the ZoMBI implementations perform better.

Figure S- 4
Figure S-4(b) demonstrates that by extending the LHS sampling bounds during only the initialization of ZoMBI, performance degradation of edge-bounded optima is avoided entirely.Extending the initialization bounds allows the GP surrogate to extrapolate edge feature information, which ZoMBI can more reliably zoom inward on than features at the edge of the bounds.Since it was known a priori that the global minimum of the tested datasets existed at the upper limit of the manifold hypervolume, the upper limit of the LHS initialization set was extended.For an arbitrary and normalized d-dimensional manifold with an optimum existing at the upper limit of the manifold, LHS bounds are simply extended from [0, 1] d to [0, 1 + γ] d , where γ < 0. Although ZoMBI does not work out-of-the-box for every application, simple modifications can be made to a single step within the procedure to significantly improve optimization results for a breadth of dataset topologies without increasing the required number of sampled experiments.

Figure S- 4 :
Figure S-4: Varying Optimum Edge Distance and Initialization Conditions.(a) The performance of ZoMBI is shown to degrade if the global optimum needle of a manifold exists near a boundary with a constant initialization set.(b) However, by simply extending the initialization bounds for LHS sampling, this unfavorable degradation in performance is avoided.Each trace represents the median and variance of four independent trials of ZoMBI with the LCB Adaptive acquisition function over 100 evaluated experiments of a 6D Ackley function with hundreds of trap local minima.The colorbar of the traces represents the optimum distance from the center of the manifold.A distance of 15 represents an optimum at the upper limit of the hypervolume corner.The final variance across all trials is illustrated as a KDE plot along the y-axis.

where y j ∈ R 4 Overwrite memory X ← {X} and y ← {y} 5 for f in range(1, φ) forward experiments do 6 Let n = i + f 7 Retrain surrogate model GP(X) using target values y
Set of data points {X 1 , X 2 , ..., X n }, where X j ∈ R d , y: Set of target values {y 1 , y 2 , ..., y n }, where y j ∈ R,Initialize with i LHS data points {X} := {X 1 , X 2 , . .., X i }, where X j ∈ R d , [B l d , B u d ]and target values {y} := {y 1 , y 2 , . . ., y i }, 3 Local latent space bayesian optimization over structured inputs.2022.[72] Antoine Grosnit, Alexandre Max Maraval, Rasul Tutunov, Ryan-Rhys Griffiths, Huawei Noah, Ark Lab, Alexander I Cowen-Rivers, Lin Yang Huawei Noah, Ark Lab Lin Zhu Huawei Noah, Ark Lab Wenlong Lyu Huawei Noah, Ark Lab Zhitang Chen Huawei Noah, Ark Lab Jun Wang, Jan Peters, and Haitham Bou Ammar.High-dimensional bayesian optimisation with variational autoencoders and deep metric learning.2021.Alexander E. Siemenn * , † , Zekun Ren ‡ , Qianxiao Li ¶ , Tonio Buonassisi † †Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA ‡Department of Electrical and Computer Engineering, Singapore-MIT Alliance for Research and Technology, Singapore 138602, Singapore ¶ Department of Mathematics, National University of Singapore, Singapore 138602, Singapore * Corresponding author: asiemenn@mit.eduS-1 Ackley Function Description The Ackley function is an analytical function used in this paper to simulate a Needle-in-a-Haystack problem and is capable of having dimensionality between 1-dimension and d-dimensions.The global minimum or "the needle" is challenging to discover in few experiments because the percentage of space made up by the global basin of attraction decreases exponentially with dimensionality.The basin of attraction is the region containing the global minimum that once discovered, can be simply descended to the discover the global minimum.As the b value of the Ackley function increases, the basin of attraction region narrows.In addition to having this needle-like global minimum, the Ackley function also has many local minima, generating a non-convex response surface.This response surface of the Ackley function is quantified through the following equation [57]: 12 C 13 C 14 C 15 C 16 C 12 C 22 C 23 C 24 C 25 C 26 C 13 C 23 C 33 C 34 C 35 C 36 C 14 C 24 C 34 C 44 C 45 C 46 C 15 C 25 C 35 C 45 C 55 C 56 C 16 C 26 C 36 C 46 C 56 C 66