Using scalable computer vision to automate high-throughput semiconductor characterization

High-throughput materials synthesis methods, crucial for discovering novel functional materials, face a bottleneck in property characterization. These high-throughput synthesis tools produce 104 samples per hour using ink-based deposition while most characterization methods are either slow (conventional rates of 101 samples per hour) or rigid (e.g., designed for standard thin films), resulting in a bottleneck. To address this, we propose automated characterization (autocharacterization) tools that leverage adaptive computer vision for an 85x faster throughput compared to non-automated workflows. Our tools include a generalizable composition mapping tool and two scalable autocharacterization algorithms that: (1) autonomously compute the band gaps of 200 compositions in 6 minutes, and (2) autonomously compute the environmental stability of 200 compositions in 20 minutes, achieving 98.5% and 96.9% accuracy, respectively, when benchmarked against domain expert manual evaluation. These tools, demonstrated on the formamidinium (FA) and methylammonium (MA) mixed-cation perovskite system FA1−xMAxPbI3, 0 ≤ x ≤ 1, significantly accelerate the characterization process, synchronizing it closer to the rate of high-throughput synthesis.


Introduction
To discover commercially relevant semiconductor materials, e.g., for solar applications [1][2][3], vast compositional search spaces must be rapidly synthesized and characterized, e.g., for band gap [4][5][6] and stability [7,8].Recently, several new methods have been developed for high-throughput (HT) combinatorial synthesis across a wide range of material domains including perovskites, nanomaterials, porous media, aerosols, and lithium-ion batteries [9][10][11][12][13][14][15][16][17][18].Although these HT manufacturing methods have shown great progress in driving the rapid screening of large material search spaces in an automated fashion, much of the materials characterization process is still hindered due to its manual nature [14,19] or rigid microplate-based form factors [9,10,[20][21][22].This results in a significant bottleneck in which the rate of synthesis can achieve throughputs over 800x faster than those of characterization (e.g., Supplementary Figure S-1).The importance of developing rapid and accurate methods of characterization for HT materials discovery and optimization derives from the intractable nature of exhaustively testing every material within a functional material's search space using these conventional tools [14].
The metal halide perovskite material search space is both highly dimensional and vast, hence, as a result, it is intractable to map using conventional synthesis or characterization methods.The common metal halide perovskite subspace explored in literature consists of an eight-component system: (FA x MA y Cs 1−x−y )(Pb z Sn 1−z )(Br a Cl b I 1−a−b ) 3 [7,9,14,[23][24][25].For one to exhaustively search this subspace at 1% compositional steps would require synthesizing and characterizing 7 × 10 12 unique material samples (Supplementary Figure S-14).Thus, only small regions of this search space can be explored experimentally with current methods, given the large discrepancy between search space size and characterization throughput.For example, Escobedo et al. [26] develop automated programs to compute the band gap of semiconductor materials, however, the throughput of the software is insufficient for HT combinatorial synthesis applications as it can only compute the band gap of one sample at a time using pre-collected optical data.Surmiak et al. [21] and Reinhardt et al. [22] expand on these methods by employing HT optical characterization of perovskites for up to 24 unique samples per batch, however, their measurements occur in serial and are rigidly hard-coded to measure only specified Fig. 1: a-c, Overview of the synthesis and characterization of perovskite semiconductors.a, High-throughput combinatorial synthesis of FA 1−x MA x PbI 3 perovskites attain throughputs of 10 4 samples/hr.b, Manual characterization of the high-throughputmanufactured materials using UV-Vis spectroscopy and manual determination of band gap and degradation bottlenecks the pipeline down to a throughput of 10 1 samples/hr.c, Autocharacterization, developed in this paper, of the high-throughputmanufactured material's band gaps and degradation attain throughputs of 10 3 samples/hr using scalable and parallelizable computer vision measurement.Band gap is determined by automatically segmenting and fitting the material reflectance spectra while the degradation pathway is detected by the material yellowing, in the case depicted above due to a phase change from α-FAPbI 3 to δ-FAPbI 3 .The widths of the gray backgrounds visualize the process throughputs.locations, in turn, lacking scalability and generalizability.Similarly, Langner et al. [10] and Wang et al. [9] develop significantly higher throughput drop casting tools capable of synthesizing and characterizing up to 6048 organic photovoltaic films per day, while Du et al. [20] develop an automated robotic synthesis and characterization platform for full organic photovoltaic devices.However, these prior methods all utilize rigid characterization which hard-codes measurement locations, in turn, locking the method of characterization to a specific platform architecture.Unlike the aforementioned literature, Wu et al. [27] develop a HT and scalable tool, automated from end-to-end, to characterize the optical properties of organic molecules.However, their developed tool is designed to only characterize organic molecules actively suspended in solution [27], thus, leaving the task of rapidly and automatically characterizing semiconductors deposited with flexible form factors onto a substrate as an open research gap.
Debottlenecking the materials screening pipeline is not only a matter of accelerating the characterization time per sample [28,29] but also a matter of scaling the characterization procedure to many samples in parallel [30,31].Computer vision methods have the capacity to scale a measurement to arbitrarily many samples, each with differing form factor geometries, without significantly slowing characterization time [32][33][34].Material science applications of computer vision are gaining traction in current literature, specifically in the use case of rapid morphological analysis of large microscopy datasets [34].In turn, several analytical computer vision tools have been developed to access and characterize this morphological information, often focused on identifying microstructures.For example, Park et al. [35] create a semi-automated image segmentation and analysis algorithm to classify the morphology of nanoparticles from image data.Likewise, Chowdhury et al. [36] utilize computer vision and machine learning to detect dendritic microstructures from a database of solder alloy micrographs.There is also extensive work surrounding materials recognition in nonmicrographic images using computer vision [28,[37][38][39].In order to support the growing need for the analysis of micrographs and other experimental images, advances like those made by Li et al. [33], Wang et al. [40], Neshatavar et al. [41], Tung et al. [30], and Jain et al. [31] on object segmentation, denoising, and scalability allow for further use cases of computer vision in scientific research.The throughput dichotomy between characterization and HT synthesis motivates the integration of computer vision into the semiconductor characterization pipeline to parallelize measurements and, in turn, match or exceed the rate of synthesis while achieving accuracies comparable to those attained by domain expert evaluation.
In this paper, we address the unresolved challenge of characterizing deposited materials quickly and automatically in parallel by developing a suite of computer vision-based automated characterization (autocharacterization) tools that enable the quantification of three key properties within minutes: composition [42,43], optical band gap [4][5][6], and environmental degradation [7][8][9].We propose the following in this contribution: (1) sample detection by a scalable computer vision tool that segments arbitrarily many, spatially non-uniform material samples, (2) a tool to map the elemental composition of HT-manufactured material arrays, (3) a scalable autocharacterization algorithm for the computation of direct band gaps from hyperspectral reflectance data, and (4) a scalable autocharacterization algorithm for quantifying the environmental stability of perovskite samples from optical degradation data.The performances of the developed autocharacterization methods are demonstrated on 200 unique HT-manufactured FA 1−x MA x PbI 3 perovskite semiconductor samples, generating ultra-high compositional resolution trends of band gap and stability, and are benchmarked against X-ray diffraction [44,45], X-ray photoelectron spectroscopy [46,47], and domain expert evaluation.With the integration of computer vision into the materials characterization workflow, data across many samples can be captured and analyzed in parallel as a fast and scalable process [30][31][32][33][34]. Computer vision can be applied to both standard RGB and hyperspectral image data types.Figure 2 illustrates the segmentation of a hyperspectral datacube, taken for one batch of HT-manufactured perovskite samples [28].In this study, we synthesize three batches of samples, amounting to a total of N = 201 unique semiconductors along the FA 1−x MA x PbI 3 compositional series with 0 ≤ x ≤ 1.Using Algorithm 1, we generate unique pairings for each discrete semiconductor sample, ( X, Y ) n ∈ N , and its corresponding reflectance spectra, R(λ), via parallel image segmentation and mapping, shown in Figure 2b.

Computer Vision Parallelization
The process of parallel segmentation uses a sequence of edge-detection filters [48] to first identify each island of material and then uniquely index each island based on its position within a graph connectivity network [49,50].The pixel coordinates of each segmented sample are then spatially mapped to their corresponding reflectance spectra, in turn, generating the segmented datacube Φ = ( X, Y , R(λ)).Parallel segmentation and mapping of these reflectance spectra accelerate a once point-by-point measurement process [14,19], to a rapid and scalable process that can rate-match the throughput of HT synthesis (Supplementary Figure S-1).Furthermore, this segmentation process is shown to scale to more than 80 unique samples in parallel (Supplementary Figure S-4).Algorithm 1 illustrates the sample size-agnostic nature of the method, highlighting its further scalable potential.This segmented reflectance data serves as the starting point for automating the mapping of composition as well as the automating the computation of optical band gap and degradation for all N = 201 unique semiconductors.

Composition Mapping
Fig. 3: a, High-throughput combinatorial synthesis of one batch of perovskite samples illustrated with its corresponding computer vision-segmented composition map.The print head rasters in a serpentine pattern (black connecting lines) to print a gradient of FA 1−x MA x PbI 3 deposits onto a glass substrate, where the purple labels indicate FA-rich deposits, and the yellow labels indicate MA-rich deposits.Integrating the pump speeds over time determines the proportion of MA, x, in the composition.b, XRD peak traces at the (012) crystallographic plane measured at uniformly spaced compositions in the batch print.The peak shifts towards a higher 2θ angle gradually as the proportion of MA increases in the composition.c, XPS traces of the C=N bond peak (red area under the curve) and C-N bond peak (gray area under the curve) measured at uniformly spaced compositions in the batch print.The C=N peak intensity decreases as the proportion of MA increases.Purple-labeled traces are FA-rich while yellow-labeled traces are MA-rich.
Semiconductor properties, such as band gap and stability, are largely governed by the chemical composition of the material [7,8,42].Thus, determining the composition accurately enough and mapping the composition to a correct sample is essential for scalable HT setups that deposit variable sample geometries.In this work, we use HT deposition of solution-processed precursors to synthesize perovskite semiconductor samples.The variable rotational velocities of two pumps, ω FA and ω MA , determine the output sample composition by combining the liquid-based FAPbI 3 and MAPbI 3 precursors.Figure 3a illustrates this printing process as a function of both space and time.The FA-rich materials are printed first and then proportions of MA are added gradually as the print head rasters in a serpentine pattern.Thus, to determine the composition of each material deposit, ω FA and ω MA are spatially and temporally mapped onto the computer vision-segmented samples, ( X, Y ) n ∈ N , and then integrated over time to determine each sample's computed FA 1−x MA x PbI 3 composition: where x is the proportion of MA, t a and t b are the starting and ending timesteps for the deposition of a single sample, and ω FA (t) and ω MA (t) are the pump velocities at a given timestep for the FAPbI 3 and MAPbI 3 precursors, respectively.Figure 3b-c illustrates the composition validation results using X-ray diffraction (XRD) and X-ray photoelectron spectroscopy (XPS).XRD is used to validate the crystalline structure [44,45] while XPS is used to validate the elemental composition of the manufactured perovskite deposits [46,47].By assessing the gradated shifts in both XRD and XPS peaks, the composition mapping can be validated.For crystal structure validation, the crystal lattice size of MAPbI 3 is smaller than that of FAPbI 3 [43], thus, we expect the XRD peaks of the MA-rich deposits to shift toward higher angles [51].In Figure 3b, the (012) crystallographic plane at 2θ ≈ 31.5 • [52,53] is shown, for uniformly-spaced samples across the batch print, to gradually increase in angle from the FA-rich to the MA-rich compositions of FA 1−x MA x PbI 3 , amounting to a total shift of approximately 0.16 • (quantitative shifts shown in Supplementary Figure S-13a).For elemental validation, the A-site MA and FA cations are distinguished by the presence of a carbon-nitrogen double bond (C=N), where FA contains a C=N bond while MA contains only a C-N single bond [51].In the high-resolution XPS scans, shown in Figure 3c, the C=N bond peak appears at approximately 400eV [54].This C=N bond peak is shown, for uniformly-spaced samples across the batch print, to gradually decrease in intensity from the FA-rich to the MA-rich compositions of FA 1−x MA x PbI 3 (quantitative shifts shown in Supplementary Figure S-13b).These results validate both the structural and elemental composition gradients synthesized and mapped using computer vision.With the developed composition map, we can now automatically compute the band gap and detect degradation across the 201 uniquely synthesized compositions.
Fig. 4: a-c, Automatic band gap computation shown for two unique, computer visionsegmented perovskite deposits.b, The reflectance intensities, R(λ), are acquired for each sample from a, the vision-segmented hypercube, Φ. c, The Tauc curves are computed from the median reflectance spectra for each deposit, recursively segmented into line segments, and then iteratively fit with linear regression lines.The best-fit regression line that minimizes the RMSE between the detected Tauc peaks is illustrated by the thick red line, which determines the band gap, E g , from the x-intercept.

Automated Band Gap Extraction
Band gap is essential in defining the light-harvesting potential of a semiconductor material [4,43].However, conventionally computing the band gap of a material is a laborious process, requiring a user to manually curve-fit the Tauc-transformed UV-Vis spectra [19,55].In this paper, we automate and accelerate the computation of band gap across 201 unique semiconductor samples by leveraging hyperspectral imaging and computer vision segmentation, such that the characterization process is parallelized across batches of HT-manufactured semiconductor samples.
Figure 4 illustrates the autocharacterization process of band gap by first, (a) extracting the reflectance spectra from each computer vision-segmented sample within the hyperspectral datacube, then, (b) transforming each reflectance spectra to its Tauc curve, and finally, (c) recursively segmenting the Tauc curves into linear segments (R 2 ≥ 0.990) to find the regression line of best fit between peaks, which determines band gap.The Tauc curves are obtained from a hyperspectral reflectance datacube using the following transformation [19]: where F (R(λ)) is the Kubelka-Munk function [56] applied to reflectance spectra, R(λ), for each wavelength, λ, with h as the Planck constant, ν as the photon frequency, B as a constant, and γ = 1 2 for direct band gap and γ = 2 for indirect band gap.To demonstrate the performance of the band gap autocharacterization developed in this paper, the algorithm's output band gaps are compared with those calculated by a domain expert using the manual fitting process described in Makula et al. [19].The autocharacterization output was withheld from the domain expert.Figure 5 illustrates the performance of the autocharacterization-calculated band gaps relative to the expert-calculated band gaps for N = 201 FA 1−x MA x PbI 3 compositions across three independent batches of samples.The autocharacterization output achieves a strong linear fit of R 2 = 0.975 with the expert-calculated results, however, a systematic underprediction of the autocharacterization algorithm is noted.Relative to the expert-computed band gap, the autocharacterization method achieves 98.5% accuracy within a 0.02eV range on the FA 1−x MA x PbI 3 system (Supplementary Figure S-6).In addition to the autocharacterization achieving high similitude with the domain expert, it also provides significant speedups in band gap determination.The domain expert takes approximately 510 minutes to compute the band gap of 200 unique samples while the autocharacterization takes 6 minutes to compute the band gap of 200 samples, resulting in the developed autocharacterization tool achieving 85x faster throughput via the parallel-processing power of computer vision.
Using the fast and accurate band gap autocharacterization tool developed in this paper, we tractably generate an ultra-high resolution band gap trend for the FA 1−x MA x PbI 3 , 0 ≤ x ≤ 1 series, shown in Figure 5, where 120 of the 201 compositions are unique, a resolution that has not yet been reported in literature.Prior literature reports band gap compositional resolutions from 0 ≤ x ≤ 1 for 9 compositions [57], 7 compositions [58,59], 6 compositions [60], and 5 compositions [7] using conventional characterization methods.Thus, with autocharacterization, we achieve over a 13x increase in the compositional resolution of FA 1−x MA x PbI 3 band gap, to our knowledge.Sufficient stability of perovskite semiconductors is required for the material to be utilized in solar cell applications [1][2][3]61].As a lead halide perovskite degrades, it changes color from black to yellow, a result of a phase change and/or decomposition of the structure [7,62,63].We leverage this RGB-detectable degradation mechanism [8] and use parallelized computer vision segmentation to automate the detection of degradation within perovskites, as shown in Figure 6c.Three independent degradation experiments are conducted across the N = 201 samples by placing each batch of samples within a degradation chamber, shown in Figure 6a, for 2 hours at an illumination of 0.5 suns, temperature of 34.5 • C ± 0.5 • C, and relative humidity of 40% ± 1% (Supplementary Figure S-7).We compute the degradation intensity, I c , of each HTmanufactured perovskite composition by integrating the change in color, R, for each sample over time, t [7]:

Automated Degradation Detection
where T is the duration of the degradation and the three reflectance color channels are red, r, green,  The performance of the degradation autocharacterization is demonstrated by comparing the output I c to the ground truth degradation, obtained from the pre-and post-band gap deviation [7,64] (Supplementary Figure S-10a).Figure 7a illustrates the output of the autocharacterization where high computed I c values strongly correspond to the occurrence of the ground truth degradation in the samples (yellow scatter points).The determination of ground truth degradation is conducted by a human domain expert, further described in Supplementary Figure S-10a.This classification performance of the autocharacterization algorithm achieves a precision-recall AUC (area under the curve) of 0.853 (Supplementary Figure S-10c), and a maximal accuracy of 96.9%, relative to the ground truth (Supplementary Figure S-10d).The yellowing pattern of the FA-rich samples is shown in Figure 7b as a result of the phase change from favorable cubic phase α-FAPbI 3 to the non-perovskite hexagonal phase δ-FAPbI 3 [62] (Supplementary Figure S-11).Furthermore, running a full degradation detection computation using autocharacterization takes only 20 minutes per 200 samples, given 48000 total degradation images over the 2-hour degradation experiment.This is a significant speedup from the standard microscopy or XRD methods of determining degradation, which can take hours or days to identify the degradation of an equivalent number of samples.
Using the fast and accurate stability autocharacterization tool developed in this paper, we tractably generate an ultra-high resolution stability trend for the FA 1−x MA x PbI 3 series, shown in Figure 7a where, similar to band gap, this trend has not been reported at such a high resolution yet in literature.Prior literature reports stability compositional resolutions from 0 ≤ x ≤ 1 for 11 compositions [65], 9 compositions [57], and 7 compositions [66] using conventional characterization methods.Moreover, Charles et al. [65] reports the stability at x ≈ 0.1 compositional increments from 0 ≤ x ≤ 1 using 6 timesteps, amounting to a total of 66 temporal data points.Comparatively, this study reports the stability at x ≈ 0.008 unique compositional increments from 0 ≤ x ≤ 1 using 240 timesteps, amounting to 28800 unique temporal data points (with 48000 total temporal data points).Thus, with autocharacterization, we achieve over a 10x increase in the compositional resolution and a 40x increase in the temporal resolution for a total of a 436x increase in the number of unique data points reported for the FA 1−x MA x PbI 3 stability series, to our knowledge.Furthermore, with this high-resolution stability trend, we note the same regions of high-degradation appear in Figure 7a as do in the literature for the α-FAPbI 3 → δ-FAPbI 3 degradation pathway at 0.0 ≤ x ≤ 0.15, with the optimal low-degradation region occurring at x ≈ 0.40 [65,67].Through the generation of ultra-high resolution trends, we may achieve a better understanding of complex semiconductor composition-property relationships to enable higher-performance design of materials in the future.

Conclusion
Accelerating the characterization of key material properties relevant to semiconductor engineering, such as band gap and stability, is a necessary step to enable the highthroughput discovery and optimization of perovskite materials.Conventional methods of characterization bottleneck the materials screening pipeline when high-throughput synthesis is utilized, inhibiting optimally efficient high-throughput experimentation.For example, computing the band gap of 200 unique halide perovskite samples takes a domain expert over 8 hours to complete.In this work, we demonstrate the fast and accurate characterization of band gap and detection of degradation within the FA 1−x MA x PbI 3 , 0 ≤ x ≤ 1 perovskite system using parallelized, scalable computer vision segmentation.From the segmented data, the band gaps and degrees of degradation are automatically computed using the developed autocharacterization algorithms, and ultra-high compositional resolution trends are generated.The band gaps of 200 unique perovskite samples are determined in 6 minutes at 98.5% accuracy within a 0.02eV range using the band gap autocharacterization tool.The degrees of degradation of 200 unique perovskite samples are determined within 20 minutes at 96.9% accuracy using the degradation autocharacterization tool.Overall, the developed autocharacterization methods achieve 85x faster throughputs than conventional domain expert evaluation, in turn, contributing to the debottlenecking of high-throughput and autonomous materials discovery and optimization workflows.Therefore, through the wider application of autocharacterization methods, the ability to scan through larger material search spaces is unlocked, in turn, enabling the tractable design of higher-performance functional materials Methods Materials 3"×2"×1mm glass slides (C&A Scientific) are cleaned using deionized water (DI, < 1.0µS/cm, VWR), Hellmanex III (VWR), and isopropyl alcohol (IPA, ≥ 99.5%, VWR) to be used as substrates.Lead iodide powder (PbI 2 , 99.999% trace metal basis, Sigma-Aldrich), formamidinium iodide powder (FAI, >99.9%,Greatcell Solar Materials), methylammonium iodide (MAI, >99.9%,Greatcell Solar Materials), dimethylformamide (DMF, ≥99.8%,Sigma-Aldrich), and dimethylsulfoxide (DMSO, ≥99.9%,Sigma-Aldrich) are used to prepare the perovskites.

Computer Vision Segmentation of Hyperspectral Datacubes
A hyperspectral datacube of size X × Y × R(λ) → 900px×800px×300, where 300 is the number of wavelengths, λ, is captured as a raw image, Ω = (X, Y, R(λ)), from the hyperspectral camera (Resonon, Pika L).This datacube is passed through several filters to find the edges and segment each material deposit sample ( X, Y ) n ∈ N and then index the features appropriately, such that each pixel is mapped to its reflectance spectra, R(λ).Once segmented, each deposited material contains an area of approximately 1000px worth of spatial spectral data.Inputting images or features of different sizes may require tuning the kernel sizes, κ, of the filters.
Algorithm 1 describes this segmentation process of (X, Y, R(λ)) → ( X, Y , R(λ)).First, the input image is cropped and converted to greyscale, then it is passed through several layers of thresholding and smoothing.By thresholding, eroding, blurring, and then thresholding again, we capture the edges of each feature while removing edge effects.The background is indexed as zeros, hence, all features split by zeros are assigned a unique index using island-finding graph network methods from the OpenCV Python library [48]: LabelFeatures(•) and Watershed(•) [49,50].Once segmented, the features are smoothed, and any improperly segmented aberrations are pruned with the user-selected variables Θ min , Θ max , where features of size s < Θ min or s > Θ max are removed.Finally, a boolean mask is created for all pixels encoded with non-zero values to output Φ, where each uniquely indexed material deposit is directly mappable to the R(λ) measured for that deposit.Let κ be kernel size

Perovskite Elemental Composition Mapping onto Computer Vision Segmented Data
The composition for each segmented deposit, ( X, Y ) n ∈ N is determined by mapping the encoded pump speeds of the high-throughput combinatorial printer to the time step, ∆t = [t a , t b ], at which each deposit is printed.This is done by mapping the segmented image Φ to the printer head raster path, acquired from the G-code controlling the printer motion.Hence, each deposited material now has its pixel coordinates ( X, Y ) mapped to a timestamp along the raster path.Then, this positional timestamp is mapped to its corresponding pump speed timestamp, t( X, Y ) → t(ω FA , ω MA ).Since these variable pump speeds ω FA and ω MA are both monotonic along the time series, they are deterministic of the proportion of FA and MA within the FA 1−x MA x PbI 3 compositional structure by integrating pump speed over the ∆t for each material deposit via Equation 1.

Automating Band Gap Computation using Hyperspectral Reflectance Data
The direct band gap for all N = 201 FA 1−x MA x PbI 3 perovskite compositions is computed from the vision-segmented reflectance data, Φ = ( X, Y , R(λ)), since all the FA 1−x MA x PbI 3 compositions are direct band gap materials at atmospheric pressure [68,69].For every segmented sample, ( X, Y ) n ∈ N , the spatial median R(λ) spectra is used for computing the band gap.R(λ) spans across wavelengths, λ, where {λ ∈ Z : 380nm ≤ λ ≤ 1020nm} for hyperspectral imaging and λ = {r, g, b} for the red, green, and blue color channels of RGB imaging.First, the median R(λ) spectra are transformed into a Tauc curve using Equation 2, with γ = 1 2 for direct band gap.Then, transformed Tauc curves are recursively segmented in half until each segment achieves a fit of R 2 ≥ 0.990, indicating that each segment is near-linear: where y is the predicted value and y is the average value of the set.Once the recursion is complete, each pair of adjacent line segments is iteratively fit to a linear regression line, generating a set of candidate fit lines to use for computing band gap.To determine the best candidate fitted line, RMSE is used rather than using the inclination angles of Tauc curves [26] to improve generalizability across different materials, e.g.FAPbI 3 and MAPbI 3 : We implement an iterative root-mean-square error (RMSE) minimization routine that automatically identifies the Tauc curve peaks to fit between.Then, the RMSE is computed between each regression line and the Tauc curve within the lower bound of the regression x-intercept and the upper bound of the Tauc peak location minus one-half of the peak width.Enforcing the RMSE computation to occur within these bounds was shown to increase fitting accuracy with the Tauc slope.The band gap, E g , is then extracted from x-intercept point of the regression line that achieves the minimum RMSE.

Detecting Perovskite Degradation from RGB Time Series Data
In order to use color as a reproducible and repeatable quantitative proxy for degradation, color calibration needs to be applied because the illumination conditions in the aging test chamber may create distortions to the true sample color.At the beginning of the degradation study, an image of a reference color chart (X-Rite Colour Checker Passport; 28 reference color patches), I R , is taken under the same illumination conditions as the perovskite semiconductor samples.Images at each time step, Ω(∆t), are transformed into CIELAB colorspace and subsequently into a stable reference color space, CIE 1931 color space with a 2-degree standard observer and standard illuminant D50, by applying a 3D-thin plate spline distortion matrix D [7,70] defined by I R and known colors of the reference color chart: Here, O(n, m) is an n × m zero matrix, V is a matrix of the color checker reference colors in the stable reference color space, P is a matrix of the color checker RGB colors obtained from I R , and K is a distortion matrix between the color checker colors in the reference space and in I R .Using the color-calibrated images and droplet pixel locations given by Φ, a final array, R(t; X, Y ) of the average color at time t for perovskite semiconductor of composition FA 1−x MA x PbI 3 is created.The color of each droplet is measured to determine the stability metric I c [7], calculated using Equation 3.

Substrate Preparation
Glass slide substrates are prepared for printing the perovskite samples using a threestep cleaning process: (1) ultrasonication for 5 minutes in DI water with 2%vol.Hellmanex III solution, (2) ultrasonication for 5 minutes in DI water only, and (3) ultrasonication for 5 minutes in IPA.Once cleaned, the substrates are transferred to an inert nitrogen environment glovebox with moisture levels < 10ppm.

Perovskite Preparation
FAPbI 3 (formamidinium lead iodide) and MAPbI 3 (methylammonium lead iodide) are prepared as 0.6M liquid-based precursors for high-throughput printing.For printing, 2mL of each precursor is prepared in an inert nitrogen environment glovebox with moisture levels < 10ppm.First, 3.2mL DMF is mixed with 0.8mL of DMSO to make 4mL of 4 : 1 DMF:DMSO solution.Then, 1.106g of PbI 2 powder is dissolved into the 4mL of 4 : 1 DMF:DMSO to make a PbI 2 stock.Next, the 4mL PbI 2 stock is split in half, pipetting 2mL of stock per vial.Lastly, 0.206g of FAI powder is dissolved into one of the 2mL PbI 2 stock vials and 0.191g of MAI powder is dissolved into the other making 0.6M FAPbI 3 and 0.6M MAPbI 3 , respectively.

High-throughput Perovskite Synthesis
The liquid-based FAPbI 3 and MAPbI 3 precursor solutions are used in the highthroughput combinatorial printer to synthesize N = 201 unique FA 1−x MA x PbI 3 composition samples.The high-throughput printer is custom-made, and parts of its construction are documented in Siemenn et al. [28].To begin printing, first, all printer plumbing lines are flushed twice with the 4 : 1 ratio DMF:DMSO solution.Then, the FAPbI 3 and MAPbI 3 precursors are extracted into syringes using a microcontroller to communicate with the pumps.These syringes contain prismatic motion plungers that use positive displacement to fill and eject solution.Next, all plumbing lines are primed with the precursor solution.After priming, the precursors are purged at equal rates from both syringes for 50 seconds to remove air bubbles.Finally, motor encoders pump the precursors out of the syringes at pre-programmed rates, illustrated in Supplementary Figure S-3, and enter a mixing chamber prior to deposition.A pinch valve breaks up the fluid flow within a 1/32"ID × 3/32"OD silicone tube, actuating at 11Hz frequency and 5% duty cycle to deposit each sample as a discrete droplet onto the cleaned glass slide.The print head translates at a speed of 38mm/s over the 3"×2" glass slide in a serpentine pattern, depositing approximately 70-80 unique composition samples per batch in 16.5 seconds.Approximately 0.15mL of total precursor volume is consumed per print, which includes the volume required for purging and priming the plumbing lines.After the droplets have been deposited onto the substrate, the substrate is transferred to a hotplate to anneal for 15 minutes at 150 • C.

Experiments
Processing Times In this paper, we aim to achieve higher rate-matching between the synthesis and characterization of materials for high-throughput screening.In Figure S-1, the minimum processing time required to go from precursor to data is illustrated for perovskite samples.The minimum processing times to collect band gap data for 100 perovskite samples are shown and represent the times experimentally recorded during this study.Three different scenarios are represented that use combinations of both manual and high-throughput methods.The processing time to collect these band gap data is further broken down into four steps: (1) precursor preparation, (2) synthesis, (3) annealing, and (4) characterization.These are the minimum processing times as they do not account for sample transfer times or reloading, e.g., moving a sample to the hotplate.In Figure S-1b, moving from manual to high-throughput synthesis results in a higher discrepancy of throughputs between synthesis and characterization, which bottlenecks the materials screening loop.However, in Figure S-1c, by using high-throughput autocharacterization, the rate of high-throughput synthesis is closely matched to that of high-throughput synthesis, in turn, enabling more efficient materials screening.The bottleneck then becomes precursor preparation, which is out of the scope of this study.
To detail the individual processing time contributions in this study, precursor preparation is always conducted manually and takes 90 minutes to make 100 samples worth of solution for all three scenarios.Synthesis takes 45 seconds per sample for manual spin coating and takes 20 seconds per 100 samples for high-throughput manufacturing.Annealing takes 10 minutes for manual thin films where 8 samples can fit on a single hot plate at a time.It takes 15 minutes for the thicker, high-throughput samples but all 100 samples can fit on a single hot plate.Characterization for computing band gap takes 255 minutes per 100 samples manually by a domain expert.It takes 3 minutes per 100 samples using the band gap extractor autocharacterization algorithm developed in this paper.

Experimental Reproducibility
Variability exists across samples manufactured using our high-throughput combinatorial printer, as the setup is designed for low-cost, scalable, and high-throughput screening rather than high-fidelity experiments.Although the purpose of the paper is not to highlight the perovskite manufacturing method but to focus on the rate matching of characterization, it is important to understand the sources of variability from experiment to experiment, as these sources of variability will arise for those who replicate the proposed approaches.Figure S-2 shows the post-degradation structural morphology differences between two of the same sample compositions across two separate batches, manufactured using the same printing conditions.Both batches were degraded under the same environmental conditions for 2 hours at 35 • C, 40% relative humidity, and 0.5 suns of AM1.5 illumination (without UV).During crystallization, batch A achieved more uniform and compact grain boundaries, whereas the crystallization in sample B produced jagged boundaries, inducing more pathways for degradation.This crystallization mismatch explains the accelerated degradation noted in batch B relative to batch A. Samples that are prone to phase changes (e.g., compositions near phase boundaries such as 0.0 ≤ x ≤ 0.15, for FA 1−x MA x PbI 3 ), may experience high sample-to-sample variances using any sample preparation approach.Therefore, the effects of these variations must be carefully considered in high-throughput manufacturing scenarios, where not every sample can be fully characterized at high fidelity.These morphological variations in the high-throughput manufacturing process must be further studied to ascertain better control over experimental reproducibility.In this study, three batches of samples were selected with similar morphology in an attempt to minimize this variation and maintain the emphasis of the paper on the assessment of the proposed automatic characterization methods, rather than on the method of synthesis.

Computer Vision Segmentation and Composition Mapping
Figures S-4a-b illustrate the process of going from a raw hyperspectral datacube (Ω) to computer vision-segmented data (Φ), which is used as input to the autocharacterization methods developed in this paper.An image, Ω, which can be in either Hyperspectral or RGB format, is segmented using Algorithm 1, producing the segmented pixels, ( X, Y ), and their corresponding set of reflectance values, R(λ).The matched sets of ( X, Y ) and R(λ) are denoted as Φ.The compositions of each deposited sample are then able to be mapped onto the segmented Φ using the G-code raster path of the printer head, the pump speed traces from Figure S-3, and Equation 1.

Band Gap
The materials within the FA 1−x MA x PbI 3 compositional series are direct band gap semiconductors [68,69].To compute the band gap of these materials, first, the reflectance spectra of all samples are measured using a hyperspectral camera (Resonon Pika L) that measures reflectance within the wavelength range λ ∈ [380nm, 1020nm] at 2nm resolution.Figure S-5 illustrates the measured reflectance spectra for all N = 201 samples in the paper, gathered using the computer vision-segmentation of the raw hyperspectral datacube, as shown in Figure S-4.Then, the reflectance spectra are converted to their corresponding absorption spectra using the Kubelka-Munk equation [19,56] for sufficiently thick samples (the thickness of our samples is approximately 300µm -which is considered sufficiently thick for reflectance measurement): where R is the reflectance for the entire range of λ for a given segmented pixel in the reflectance hypercube, ( X, Y ).Next, the Tauc curves are computed from F (R) [55]: where hν is energy (hν = 1240 λ ), γ = 1 2 for direct band gap and γ = 2 for indirect band gap, B is a constant that allows the band gap, E g , to be the x-intercept of a regression fit line to the slope of Tauc curve.Hence, the following equation arises that enables computation of the direct band gap from the initial reflectance spectra by equating Equation S-4 to Equation S-3: (S-4) In this paper, we use the theory formulated above from optics to automatically compute the median band gap across all vision-segmented pixels ( X, Y ) for a given   perovskite sample, as described by the autocharacterization algorithm illustrated in Figure 4. Currently, the autocharacterization algorithm is configured to operate only on materials with a single direct band gap.The band gaps computed using the automatic Tauc segmentation and regression RMSE minimization processes employed by the autocharacterization algorithms are benchmarked against the band gaps calculated manually by a domain expert.This band gap comparison between algorithm and expert is used to determine an accuracy metric for the algorithm, assuming the expert-calculated output as ground truth.The accuracy is calculated by taking the average of a binary 0/1 for all N = 201 samples, determined based on whether or not the differences between the automatic and the expert band gap values fall within a specified energy difference threshold (shown along the x-axis of Figure S-6b).
Figure S-6a illustrates the autocharacterization output band gaps as a function of composition for the three independent batches.Figure S-6b illustrates the accuracy of the automatic algorithm as a function of the energy difference threshold.The algorithm achieves 98.5% accuracy within 0.02eV and as the threshold becomes tighter, the algorithm accuracy is expected to decrease.

Stability
To conduct the degradation experiments in this paper, we put the samples in a degradation chamber and monitor the conditions for 2 hours, capturing RGB images (Thorlabs DCC1645C camera with the infrared filter removed to increase sensitivity towards dark samples) every 30 seconds.The construction and operation of the degradation chamber setup are detailed in Keesey & Tiihonen et al. [8]. Figure S-7(a) shows the time series of temperature and humidity conditions over the course of the experiment.Over the 2-hour degradation experiment, the temperature conditions were maintained at 34.5 • C ± 0.5 • C with a relative humidity of 40% ± 1%.An initial jump in temperature with a respective dip in humidity is noted as the samples are placed into the degradation chamber before the internal environment equilibrates.A    After segmenting each sample across the entire time domain of the experiment, the full matrix of degradation time series is populated and color calibrated automatically, as shown in Figure S-9.In this specific experiment, spatial non-uniformity of reflected surfaces was detected in the post-analysis, which does not noticeably affect the instability index calculation but does give rise to artificial color differences in the samples, depending on their location on the substrate.These spatially-dependent color differences arise due to the physical configuration of the environmental chamber and the RGB camera.To account for these color differences in the final output matrix time series, an additional color correction step was applied that initializes all deposited samples to the same color.This color correction is only cosmetic and was not used in the calculations for determining the degradation intensity, I c .Color correction aids in the interpretation of the visualized time series data by making the color changes due to degradation easier to see while diminishing the unwanted effects of spatiallydependent reflectivity aberrations.After the color correction, a fully calibrated matrix of degradation time-series is acquired, showing the color change over time for each perovskite sample within its batch, shown in Figure S-9.
The degradation intensity, I c , is computed from the color-changing time series per sample using Equation 3 [7,8].The degradation detection algorithm automatically computes I c for every sample.High values of I c correspond to high degradation.To benchmark the algorithm, the difference in domain expert-computed band gaps before and after degradation is used as a ground truth to quantify whether the material has truly degraded or not.This is possible due to the change in band gap that occurs in perovskites during either a phase transition or chemical decomposition [7,64].Hence, in   illustrates that the magnitude of I c strongly corresponds with the ground truth determination of degradation using band gap difference as a metric.This correspondence can be quantified using the precision-recall (PR) of the autocharacterization algorithm.A PR curve quantifies the performance of using a classifier, in this case, I c , to predict a ground truth, in this case, degradation: Recall = T P T P + F N Precision = T P T P + F P , (S-5) where T P are the true positives, F N are the false negatives, F P are the false positives.We use the PR curve instead of the ROC (receiver operating characteristic) curve here due to the large class imbalance between the number of degraded samples versus non-degraded samples (there are significantly more non-degraded samples than there are degraded samples).

Phase and Elemental Analysis
Phase analysis, such as X-ray diffraction (XRD), of perovskites is used to determine the structure and quality of the manufactured samples.In this study, we measure our samples using the Bruker X-ray Diffractometer with a Cobalt Source D8 and General Area Detector Diffraction System.Figure S-11 illustrates the pre-and postdegradation XRD traces for equally spaced compositions along the FA 1−x MA x PbI 3 series.The reference peak locations for both the favorable cubic α-FAPbI 3 [62] and favorable tetragonal MAPbI 3 phases [63] are shown together as black vertical lines and " * " symbols since a shift of only ∆2θ ≈ 0.16 • is seen from FAPbI 3 to MAPbI 3 at around the 2θ = 31.5• peak.High-resolution scans of this peak shift along FA 1−x MA x PbI 3 are shown in Figure 3b and can be used as an additional validation tool for composition shift.During degradation, FAPbI 3 phase transitions from a favorable cubic α-phase to a non-perovskite hexagonal δ-phase [62].Hence, XRD is a Elemental analysis, such as X-ray photoelectron spectroscopy (XPS), is used to determine the shift in the binding energy of bonds present within the different A-site cations (FA and MA) along the FA 1−x MA x PbI 3 series, in turn corresponding to a "composition" shift.XPS is a surface-sensitive quantitative spectroscopic technique that can identify crystalline phases.In this study, we measure our samples using the PHI 5000 Versa Probe II Focus X-ray Photoelectron Spectrometer, equipped with a monochromated AlKα X-ray source for excitation at 1486.6eV with an X-ray beam size of 200µm.Survey spectra of FAPbI not be distinguished clearly in survey spectra due to the many similarities between the FA and MA molecules [51].However, the primary distinguishing feature of these is the presence of the carbon-nitrogen double bond (C=N), clearly detectable in the high-resolution XPS scans, as shown in Figure 3c.In the high-resolution scans of C1s1 and N1s2, the calibration is performed on the lowest C1s energy peak of 284.8eV.Hence, the shift in C=N peak intensity quantifies the presence of FA relative to MA, in turn, determining composition along the FA 1−x MA x PbI 3 series.
Figure S-13 shows the quantitative XRD peak shifts and XPS peak intensities from the high-resolution scans for the phase and elemental shifts that occur along the FA 1−x MA x PbI 3 series.The XRD peak of the (012) crystallographic plane shifts from lower to higher 2θ angles as more MA is added to the composition.Conversely, the XPS peak for the presence of C=N bonds shifts from higher to lower intensity as more MA is added to the composition.Thus, both of these measurements validate the presence of a compositional gradient occurring across the synthesized batches of samples.

Size of the Perovskite Search Space
A commonly explored metal halide perovskite search space for photovoltaic applications from literature consists of the following eight-component material system: [7,9,14,[23][24][25]. Figure S-14 shows the discretization of these eight components within the archetypal ABX 3 perovskite structure.The number of steps per edge, n, determines the compositional resolution for each subspace.As the number of steps increases, the number of potential compositions increases, and, in turn, the search space becomes more vast.For this eight-component search space, the number of possible compositions is proportional to the product of each subspace's (A, B, and X) step size to the power of the number of components within each subspace (here, 3-components for A (FA, MA, and Cs), 2-components for B (Pb and Sn), and 3-components for X (Br, Cl, and I)) =⇒ n 3 × n × n 3 for the A×B×X subspaces.A caveat in this equation is that for binary subspaces, the power of the step size is 1 instead of 2 since the space is linear, as shown by the B-site subspace in Figure S-14.Hence, for a low-resolution search space of n = 10 steps, 1 × 10 6 total compositions are considered; and for a high-resolution search space of n = 100 steps, 7 × 10 12 compositions are considered.

Full Experimental Results
Table S-1 contains the full readout of the characterization results extracted by the autocharacterization algorithms for all N = 201 samples.We report the numerical values of calculated composition, autocharacterization-calculated band gap (Auto E g ), human domain expert-calculated band gap (Expert E g ), autocharacterizationcalculated degree of degradation (I c ), and the ground truth degradation determined by the human domain expert.Dashes indicate values unable to be determined for that composition due to missing data.

Fig. 2 :
Fig. 2: a, Raw hyperspectral datacube, Ω, captured using a hyperspectral imager (Resonon, Pika L) of HT-deposited FA 1−x MA x PbI 3 perovskites.(X, Y ) represents the pixel coordinates, and R(λ) represents the reflectance spectra for each pixel.Each sample is deposited onto the glass substrate with a unique composition 0 ≤ x ≤ 1 and flexible form factor geometry.b, Computer-vision segmented datacube, Φ, that pairs each unique sample's pixels, ( X, Y ) n ∈ N , to its reflectance spectra, R(λ).The gray hatched region indicates the discarded background pixels.

Fig. 5 :
Fig. 5: Performance of the autocharacterization of band gap relative to the domain expert-compute band gap for N = 201 unique perovskite samples across 3 independent trials.The solid black line is the regression fit to the band gap data and the dashed black line is the y = x line.Histogram distributions of both autocharacterization and domain expert band gaps are shown on the right and top of the plot area, respectively.The color of the scatter points corresponds to the proportion of MA, x, in the composition FA 1−x MA x PbI 3 .

Fig. 6 :
Fig. 6: a-c Automatic degradation testing and measurement of computer visionsegmented perovskite deposits.a, The samples are placed in the degradation chamber with specified environmental conditions for a total of two hours.b, RGB images of the samples are taken every 30 seconds for two hours to resolve the time-dependent color change in material.c, Computer vision is used to segment each deposited sample over time, Φ(t), to compute the degradation intensity metric, I c .
g, and blue, b, for each sample, ( X, Y ) n ∈ N .High I c indicates high color change, corresponding to high degradation; I c close to zero indicates low color change and low degradation.

Fig. 7 :
Fig. 7: a, Performance of the autocharacterization of degradation intensity, I c , relative to the ground truth degradation determined by a domain expert (yellow scatter points) on N = 201 unique perovskite samples across 3 independent trials.The black dashed line indicates the split between high and low I c values, corresponding to high and low degrees of degradation, respectively.b, Images of the three batches of FA 1−x MA x PbI 3 gradient samples after the 2-hour controlled degradation.The leftmost samples are FA-rich and the rightmost samples are MA-rich.The yellowed FA-rich compounds have undergone a phase transition from α-FAPbI 3 to δ-FAPbI 3 and are considered as "ground truth" degradation samples if they exhibit a deviation of > 0.02eV in band gap from pre-to post-degradation, evaluated by a domain expert.

Fig. S- 1 :
Fig. S-1: Minimum time to process 100 perovskite samples and compute band gap.The processing times are shown for three different scenarios: (a) manual synthesis manual and characterization, (b) high-throughput synthesis and manual characterization, and (c) high-throughput synthesis and high-throughput autocharacterization.

Fig. S- 2 :
Fig. S-2: Optical microscopy of deposit morphology between two different batches of FA 0.67 MA 0.33 PbI 3 after controlled degradation.

Figure
Figure S-4c illustrates this complete mapping of all material deposits with their derived compositions within the FA 1−x MA x PbI 3 series.

Fig. S- 5 :
Fig. S-5: Hyperspectral reflectance of all N = 201 samples synthesized in this study, color mapped as a function of composition.

Fig. S- 6 :
Fig. S-6: (a) Direct band gap values output by the automatic band gap extraction algorithm, split by batch, as a function of FA 1−x MA x PbI 3 composition.(b) Automatic band gap extraction algorithm accuracy as a function of the maximum allowable difference in energy between the domain expert-calculated and automatically calculated band gaps.

Fig. S- 7 :
Fig. S-7: (a) Temporal degradation conditions for each of the 3 batches of samples over the course of 2 hours, measured using a temperature and humidity sensor (Adafruit, AHT10).(b) Spatial uniformity of illumination measured across the substrate within the degradation chamber, measured using a lux sensor (Adafruit, VEML7700).

Fig. S- 9 :
Fig. S-9: Final stability time series matrix of all batches of segmented droplets over the course of a 2-hour degradation experiment.All samples begin the experiment colored as dark gray and over the course of degradation, samples with MA proportions between 0% to 20% exhibit yellowing around the 20min to 40min mark, thus indicating degradation.

Fig. S- 10 :
Fig. S-10: Automatic degradation detection algorithm performance benchmarking.(a) Shows how the ground truth degradation is determined using post-degradation band gap measurement.The break in the graph is used to visualize which compositions had no band gap after degradation, reported as 0.0eV.(b) Values of I c as a function of composition, split by batch with a total of N = 201 samples.I c is used as a classifier for degradation, where the dashed line indicates the separation of high I c versus low I c .(c) Precision-recall performance of I c as a classifier for degradation based on the classification rate of false negatives and false positives.(d) Accuracy of I c as a classifier for determining high and low degradation.The x-axis indicates the decision boundary value of I c , where values above are considered degraded and values below are not.The optimal value of I c as a decision boundary for degradation is I c = 0.92 × 10 5 px•hr, and is shown as a dashed vertical line.This value is where the accuracy of the algorithm is maximum at 96.9%.

Figure S- 10
Figure S-10(b) illustrates that the magnitude of I c strongly corresponds with the ground truth determination of degradation using band gap difference as a metric.This correspondence can be quantified using the precision-recall (PR) of the autocharacterization algorithm.A PR curve quantifies the performance of using a classifier, in this case, I c , to predict a ground truth, in this case, degradation: Figure S-10c illustrates the PR curve of the automatic degradation detection algorithm based on the degradation decision boundary (horizontal black dashed line in Figure S-10b).The goal is to have both high precision and high recall simultaneously.The PR-AUC (precision-recall area under the curve) figure of merit boils the PR curve down to a single number that determines the performance of I c as a good predictor for degradation.The value of PR-AUC falls between 0.0 and 1.0, where a value of 1.0 represents perfect performance.The I c values computed by the autocharacterization algorithm achieve a PR-AUC of 0.853 ∈ [0, 1], implying that high values of I c do strongly correspond to ground truth degradation.Figure S-10d shows the effect of moving the decision boundary on the accuracy in detecting the degradation.Considering recall, precision, and accuracy, I c performs optimally, with an accuracy of 96.9%, in detecting degraded samples when the decision boundary is set to 0.92 × 10 5 px•hr.However, accuracies of over 90% are achieved for a wide range of I c decision boundaries: 0.7 × 10 5 px•hr ≤ I c ≤ 1.6 × 10 5 px•hr.Hence, indicating that I c is a general yet strong predictor of degradation.
Fig. S-11: XRD peak intensities for uniformly-spaced compositions along the FA 1−x MA x PbI 3 series before and after degradation.
Fig. S-12: Full XPS spectra for uniformly-spaced compositions along the FA 1−x MA x PbI 3 series.
Fig. S-13: (a) XRD 2θ values for the (012) crystallographic plane peak shift from FArich to MA-rich FA 1−x MA x PbI 3 compositions.(b) XPS intensity shift for the C=N peak from FA-rich to MA-rich FA 1−x MA x PbI 3 compositions.The horizontal error bars illustrate the relative widths of the XRD and XPS peaks.

Table S - 1 :
Results readout for all perovskite samples produced in this study.

Table S - 1 :
Continued from previous page.Sample Computed Composition Auto E g Expert E g

Table S - 1 :
Continued from previous page.Sample Computed Composition Auto E g Expert E g