Data-driven time-dependent state estimation for interfacial fluid mechanics in evaporating droplets

Droplet evaporation plays crucial roles in biodiagnostics, microfabrication, and inkjet printing. Experimentally studying the evolution of a sessile droplet consisting of two or more components needs sophisticated equipment to control the vast parameter space affecting the physical process. On the other hand, the non-axisymmetric nature of the problem, attributed to compositional perturbations, introduces challenges to numerical methods. In this work, droplet evaporation problem is studied from a new perspective. We analyze a sessile methanol droplet evolution through data-driven classification and regression techniques. The models are trained using experimental data of methanol droplet evolution under various environmental humidity levels and substrate temperatures. At higher humidity levels, the interfacial tension and subsequently contact angle increase due to higher water uptake into droplet. Therefore, different regimes of evolution are observed due to adsorption–absorption and possible condensation of water which turns the droplet from a single component into a binary system. In this work, machine learning and data-driven techniques are utilized to estimate the regime of droplet evaporation, the time evolution of droplet base diameter and contact angle, and level of surrounding humidity. Droplet regime is estimated by classification algorithms through point-by-point analysis of droplet profile. Decision tree demonstrates a better performance compared to Naïve Bayes (NB) classifier. Additionally, the level of surrounding humidity, as well as the time evolution of droplet base diameter and contact angle, are estimated by regression algorithms. The estimation results show promising performance for four cases of methanol droplet evolution under conditions unseen by the model, demonstrating the model’s capability to capture the complex physics underlying binary droplet evolution.

water loading. The combined influence of ambient temperature and relative humidity on early stages (i.e. pinned contact angle) of ethanol droplet was also examined 35 . In another study 36 the water loaded onto ethanol droplet was quantified by gas injection chromatography (GIC) under controlled ambient temperature and relative humidity. The observed reduction in ethanol concentration was attributed to both ethanol evaporation as well as water intake on the drop. They concluded that at low relative humidity, the main mechanism for water intake was that of adsorption-absorption, though at high relative humidity water condensation plays a more dominant role. While the changes in relative humidity are commonly imposed environmental conditions, controlling ambient temperature to tune the effect of humidity is rather an expensive method for practical applications. The effects of relative humidity on methanol droplet evaporation was regulated by adjusting the temperature of the substrate 37,38 which is less expensive compared to controlling ambient temperature. It was concluded that increasing substrate temperature maintains the liquid-gas interface temperature above the dew point which in turn limits the water condensation on the drop. A regime map was also proposed based on droplet evolution under various environmental conditions.
As the number of components in the droplet increases, the underlying physics becomes more complex. Recently, multi-component droplet evaporation revealed new phenomena such as spontaneous nucleation of oil microdroplets, phase transition, and multi-component diffusion [39][40][41][42] . Such intricate physics with numerous parameters in play makes experimental studies sophisticated and time-consuming while requiring advanced equipment to finely control the environmental condition. On the other hand, the highly non-axisymmetric nature of the problem due to compositional inhomogeneities brings up significant challenges to numerical models.
Machine learning methods have emerged as powerful tools for analyzing a wide range of fluid mechanics problems 43,44 such as turbulence [45][46][47][48][49] , phase transition 50 , ignition 51 , vortex vibrations 52,53 , and aerodynamics disturbances 54 . Image processing and pattern recognition techniques have been employed to analyze the remaining stains after evaporation of sessile droplets. Analysis of patterns in dried drops of biological fluids revealed a lot of information for medical diagnostics [55][56][57][58][59] . Two different studies 60,61 showed that the chemistry of fluid and substrate can be identified by recognition of patterns in the stains. The aforementioned studies utilized machine learning techniques to indirectly measure and detect the mechanisms inside droplets from the footprint after evaporation. Here, we introduce a data-driven approach to directly analyze the dynamics of binary droplet evaporation (induced by transfer of a second component present in the atmosphere).
In the present study, we use data-driven classification and regression algorithms to analyze the time-dependent behavior of a methanol droplet at different levels of environmental humidity and temperature of the substrate. Water uptake into droplet through adsorption-absorption and possibly condensation turns methanol droplet into a binary system. Based on the environmental condition, a droplet evolves in different regimes: evaporationdominated, transition, or condensation-dominated. The capability of the proposed model is evaluated by estimating four different parameters, namely: regime of droplet evaporation (through classification algorithms), level of surrounding humidity (through regression algorithms), time evolution of droplet base diameter (through regression algorithms), and time evolution of droplet contact angle (through regression algorithms). First, a classification algorithm is trained to estimate the regime of droplet evaporation through analysis of diameter and contact angle evolution over time. The objective of the model is to detect the regime of droplet evolution with even a single data point at a specific time. Second, a regression algorithm is utilized to detect the humidity of the surrounding by analyzing droplet evolution. The high hygroscopic nature of methanol allows greater amount of water uptake into the droplet in humid environments. The higher content of water in droplet increases the contact angle and alters the rate of change of volume. The regression model analyzes these changes and reversely estimates the humidity. Last, given the condition of the surrounding, the continuous evolution of macroscopic parameters of droplet, i.e., diameter and contact angle, is estimated. Our method shows great potential in opening up new paths to analyze more complicated multi-component droplet evolution and interfacial fluid mechanics in general. Estimating the evolution of different parameters of droplet is a crucial step in designing high quality and high-resolution finish products in droplet-based biodiagnostics, inkjet printing, and microfabrication technologies. Our proposed method provides necessary information on evaporation of organic liquid droplets under various environmental conditions with simple and easy-to-use algorithms without the need to perform complicated simulations. We show that the regime of droplet, relative humidity of surrounding, and time evolution of diameter and contact angle can be estimated under various conditions. In real-world applications anticipating the regime of droplet is of great importance as in many instances, the occurrence of one regime or the other should be avoided. For example, a droplet sitting on a surface forever is not ideal for high resolution of printing or biosensing.

Results
Physics of droplet evaporation. Droplet evaporation is influenced by numerous factors including liquid/substrate properties as well as environmental conditions [25][26][27][28]35,36,62 . We analyze the evolution of a sessile methanol droplet through macroscopic parameters: volume, V, diameter, D, contact angle, θ , and time, t, under controlled relative humidity of surrounding, RH, and substrate temperature, T (see Fig. 1a, inset). The variables are nondimensionalized as: t * = t/t f , V * = V /V 0 , D * = D/D 0 , θ * = θ/θ 0 , T * = T/T 0 , RH * = RH/RH 0 , where θ 0 = 90 • , T 0 = 35 • C, RH 0 = 100% , and t f , V 0 , D 0 , that are experimentally measured, stand for total evaporation time, initial volume, and initial diameter, respectively. The experiments are conducted in a chamber with controlled humidity and on a substrate with controlled temperature (Fig. 1a). Details of experimental procedure are given in "Methods" section. Three regimes of droplet evolution are observed under various relative humidity of the surrounding and substrate temperature (Fig. 1b, top left) namely: evaporation-dominated, transition, and condensation-dominated. Three sub figures represent the evolutions of θ * , V * , and D * over t * . Nondimensional plots are reported to better visualize different evolution patterns. www.nature.com/scientificreports/ At low relative humidity (the green-shaded region in Fig. 1b), change in substrate temperature does not alter the qualitative evolution of droplet. In this regime, the contact angle stays constant for most of droplet lifetime followed by a slight increase and a sharp decrease towards the end (Fig. 1b, bottom left). The modest rise in contact angle is attributed to the interplay of high evaporation rate of methanol and receding speed at the triple line 31 . Diameter and volume monotonically decrease during droplet lifespan.
Due to the high hygroscopic nature of methanol, at higher relative humidity, water vapor transfers into the droplet at the liquid-gas interface. Water adsorbing-absorbing and possibly condensing on the interface is reported in previous studies 30,[33][34][35][36][37] . The growth in the concentration of water content changes the interfacial tensions and results in higher contact angle 34,36,37 . Unlike low relative humidity, substrate temperature plays a determining role in the regime of droplet evolution at high relative humidity of the surrounding. In the transition regime (red-shaded region), contact angle rises to a maximum value before gradually decreasing towards the end of droplet lifetime. Increasing contact angle demonstrates water uptake into droplet while methanol is evaporating. At the point of maximum contact angle, most of methanol has already evaporated and droplet consists mainly of water. However, some studies revealed that a small amount of residual methanol remains until the end of droplet lifetime 31,32 . Even though diameter and volume decrease monotonically, two obvious slopes are observed in their evolutions (Fig. 1b, top right). The two slopes correspond to two stages: the initial stage when merely methanol evaporates and the second stage when water mainly evaporates at a slower rate.
When the humidity of the environment is high and the substrate temperature is sufficiently low, another regime is observed. In condensation-dominated (blue-shaded) regime, contact angle monotonically increases until it reaches a plateau. Both diameter and volume converge to a non-zero value. Lower substrate temperature enhances water uptake through condensation by dropping the liquid-gas interface temperature below that of dew point 37 . In this regime, droplet comes to a quasi-steady state with a remaining droplet consisting mainly of water 31,32,34,36 . Regime classification. We have used a classification algorithm to detect the regime of droplet evaporation as sketched in Fig. 1b. The classifier is trained with data on contact and diameter at each specific point in time and then classifies the regime of droplet evaporation. Dependence of variables is shown by the correlation matrix in Fig. 2a where RG stands for the regime of droplet evaporation. Diameter and volume are coupled for a spherical cap sessile droplet through the relation V = (π/3)(D/2) 3 (2 + cos θ)(1 − cos θ) 2 which assumes slow quasi-static evaporation. t * , D * , and θ * are used as input variables and RG is the target variable. It is observed that the contact angle is highly proportional to humidity because the higher the humidity, the higher the amount of water uptake into drop. Higher water content increases the interfacial tension at the triple line which results in higher contact angles. www.nature.com/scientificreports/ The framework for detection of the regime of droplet evaporation characterizes the behavioral pattern of droplet by learning the values of contact angle and diameter at each specific point in time and classifying them to each regime. The model then labels the test and validation sets based on similar evolution observed previously during training. The ratio of training to test set is 80-20%. Two classifiers of Naïve Bayes (NB) and decision tree (DT) are trained and confusion matrices are used to compare the performance of classifiers on the test set (Fig. 2b). Precision, recall, F-score, and overall accuracy values (shown in Table 1) provide a comprehensive evaluation of the performance of each classifier on the test set. Based on the results shown in the Fig. 2b and Table 1, DT outperforms NB for all regimes of the test set. It is also observed that the detection of the condensation-dominated regime is challenging for both classifiers. Detection of evaporation-dominated regime reaches an F score value of 0.97 with decision tree classifier. For NB, around half (43%) of the points in the condensation-dominated regime are classified as transition regime (see Fig. 2b). Replacing diameter with volume slightly improves the results for both classifiers (6% on average) for this regime. This is due to a more discernible evolution of V * compared to D * towards the end of droplet lifetime for this regime. However, since measuring diameter is a more direct approach and also more convenient for the user, the model is trained with diameter.
The capability of the model to detect the regime of droplet evaporation under each specific condition (RH and T) is evaluated through a validation set. The validation step is performed on a single experiment at each time that is held out during training/testing. Validation results, averaged for each condition, are presented in Table 2. The accuracy and recall are the same for validation because at each condition there is only one true regime. Figure 2c illustrates a sample of validation for RH = 80% and T = 35 • C. The classifier assigns a region for each point in time based on the value of contact angle and diameter. The true regime for this condition is the transition regime (red). The red regions on the two plots on the right side of Fig. 2c represents the regions that are classified correctly as transition regime and the green regions show the regions that are incorrectly classified as the evaporation-dominated regime. It is observed that both classifiers correctly detect the region of the majority of the data points although DT demonstrates better performance. NB struggles at the beginning and end of droplet lifetime. This issue is less pronounced for DT.  www.nature.com/scientificreports/ The capability of the model to estimate the regime of droplet is evaluated on an estimation data set from conditions that are unseen by the model and do not contribute to the model training, testing, and validation. The values of RH and T for these conditions are randomly selected in the range of 20%< RH <80% and 15 • C < T < 35 • C (shown with cross and star marks on the regime map of Fig. 1b). The model classifies each experiment under each regime with different ratios as shown in Table 3. For example, Experiment 1 with RH of 30% and T of 15 • C is close to the boundary of evaporation-dominated and transition regimes. With NB, 54% of the data points in this experiment are classified as evaporation-dominated regime and 46% as transition regime. This is expected due to the location of Experiment 1 on the regime map. It should be noted that the lines on the regime map are approximate boundaries. Figure 2d demonstrates regime estimation of all data points of experiment 4 with both classifiers. The shown evolution of contact angle is not similar to any of the evolutions shown in Fig. 1b subfigures. In fact, contact angle decreases at the beginning and then starts rising. Experiment #4 falls in transition regime. That is the reason that the profile for experimental data in Fig. 2d is colored in red for the true regime of experiment #4. As it is seen, NB and DT correctly classify 84 and 93% of the data points in experiment 4 to transition regime. The green regions represent the data points that are incorrectly classified as evaporation-dominated regime.
Relative humidity estimation. In this section, we show the ability of the model to detect environmental humidity by analyzing the evolution of contact angle and diameter through regression algorithms. Polynomial regression with four different orders (linear, quadratic, third-order, and fourth-order) and regression tree are used for training. The coefficient of determination ( R 2 ) increases from 0.66 for linear up to 0.93 for fourth-order polynomial regression. The test results for all five regression methods are shown in Fig. 3a. The horizontal axis shows the true value of RH * and the vertical axis shows the estimated values averaged over all the points for each RH * . As it can be seen, the higher the order of polynomial regression, the closer the average estimation to the ground-truth value and the smaller the error bar. Furthermore, regression tree performs more accurately compared to all polynomial regression methods. Model performance under each specific condition through validation set is shown in Fig. 2b. The consistent colors throughout Fig. 3a-c represent different regression methods and different markers are used for each substrate temperature in Fig. 3b. Based on the results shown in Fig. 3b, all methods except linear regression produce reasonably accurate results. The validation results get closer to actual values as the order of the polynomial regression increases. Nonetheless, it must be noted that higher order polynomial regression increases the computational cost as well as the chance of over-fitting. Performance of regression tree is comparable to third-order and fourth-order polynomial. The capability of different regression methods to estimate new humidity values by analyzing the time evolution of contact angle and diameter is presented in Fig. 3c. The new relative humidity values (30,33,65, and 75%) are randomly selected in the range of 20-80%. It is noteworthy that the model has not seen any data of droplet evolution under these RH values during training, testing, or validation. It is seen that, unlike testing and validation where increasing the order of polynomial or complexity of the model (i.e., regression tree) produces more accurate results, higher order polynomials do not result in better estimation of unseen conditions. As a matter of fact, linear, quadratic, and third-order polynomials estimate more accurately. This is a common issue when the model fits the training data very well and it negatively affects the model performance on the new data set. Figure 3c clearly indicates overfitting with fourth-order polynomial.  www.nature.com/scientificreports/ Diameter and contact angle estimation. In this section, the capability of the model to estimate the continuous evolution of contact angle and diameter over time is evaluated through regression algorithms. This means that the model estimates the evolution of diameter and contact angle at each time increment (approximately one second apart). The input variables include T * , RH * , t * , and θ * (or D * ) and the target variable is D * (or θ * ). By increasing the order of the polynomial, the coefficient of determination, R 2 , for training improves from 0.87 to 0.99, and from 0.78 to 0.96 for diameter and contact angle estimation, respectively. The performance of five different regression methods on the test set is presented in Fig. 4a. The first row represents the results when D * is the target variable and the second row illustrates the results for θ * as the target variable. The closer the distribution of data to the diagonal line in these plots, the better the performance of the model on the test set. Based on the results shown in Fig. 4a, the diameter test results saturate after third order polynomial while for contact angle the performance keeps improving when increasing the degree of polynomial from third to fourth. The validation results are summarized in Fig. 4b for D * (top) and θ * (bottom) in terms of R 2 and root mean square error (rmse). With D * being the target variable, an average R 2 of 0.8 or higher and average rmse less than 0.1 are achieved for all nine conditions. Going from linear to quadratic to third order polynomial increases and decreases the value of R 2 and rmse, respectively. The profiles of R 2 and rmse exhibit saturation, and further increase in the order of the polynomial does not improve model performance on validation data. This is consistent with the test results where model performance saturates at third order. Furthermore, regression tree demonstrates accuracy comparable to third-order and fourth-order polynomials. By comparing the range of axes in Fig. 4b-top with bottom, it is obvious that R 2 values are generally lower (hardly reaching 0.7) and error is higher when θ * is the target variable. In fact, there are a few instances where the average R 2 turns negative, suggesting that the overall estimation of the model is worse than an estimation with a constant average value.
The performance of the model on estimating the evolution of θ * and D * versus time under four new conditions that did not contribute to model training, testing, or validation is shown in Fig. 4c. One value of R 2 and rmse are reported for each condition (or experiment) which shows the overall quality of the fit. Higher coefficients of determination and lower rmse values demonstrate the better performance of the model in estimating D * then θ * evolution. Based on the results shown in Fig. 4c, third order polynomial regression has the best performance in estimating diameter. The results become less accurate with fourth-order polynomial which suggests over-fitting. It is interesting to note that regression tree, which had higher accuracy during testing and validation, is outperformed even by linear regression during estimation. The evolution of diameter versus time estimated by quadratic regression for Experiment 3 is depicted in Fig. 4d. As it can be seen, even with a quadratic regression, the model estimates the evolution of D * quite accurately for an unseen condition. Considering the range of values on axes of Fig. 4c-top and bottom, estimating the evolution of contact angle is more challenging for the model. Unlike estimating diameter, increasing the order of polynomials has a negligible effect. The accuracy of the model stays almost constant for linear, quadratic, and third order. However, it worsens drastically for fourth-order polynomial due to over-fitting the data. The R 2 and rmse values for contact angle estimation with fourth-order polynomial fall outside the range shown in the plot. Since the estimation of θ * is generally more challenging for the model, the effect of over-fitting is more noticeable compared to D * estimation. The overall better performance of the model for diameter estimation compared to contact angle estimation is due to the fact that diameter evolution is relatively smooth and therefore easier to estimate whereas contact angle evolution changes substantially under different conditions. Figure 4d-bottom illustrates contact angle evolution over time estimated with third-order polynomial for Experiment 1 which is in the evaporation-dominated regime. Although the maximum value of contact angle is underestimated with 3rd order polynomial, the time corresponding to the maximum contact angle is estimated accurately. Also, there is a good agreement between the estimated and actual contact angle evolution during most of droplet lifetime with R 2 value of 0.74 and rmse of 0.05.

Discussion
In this study, we have analyzed the complex physics of sessile droplet evaporation (which starts with a single component and turns into a binary system due to the transfer of a second component i.e., water) through machine learning, classification and regression algorithms. Four different parameters pertaining to droplet evaporation, namely: regime of droplet evaporation, level of surrounding humidity, time evolution of droplet base diameter, and time evolution of droplet contact angle, are estimated. Point-by-point analysis of droplet profile enables time-dependent estimations given a limited number of data points. This means that the model does not need the entire evolution profile of a droplet to make an estimation. Instead, only a few (or even a single) data points are (is) sufficient for estimation, although more data points result in more accurate estimation. The model estimation capability is then assessed on the data that do not contribute to training, testing, or validation. Two different classifiers are utilized to estimate the regime of droplet evaporation. NB is chosen as a simple easy-to-interpret algorithm, while DT serves as a more powerful algorithm. As expected, DT outperforms NB due to a more robust internal structure at the expense of computational cost and transparency. Both classifiers showed impressive performance (minimum 75% accuracy) on estimating the regime of droplet. Knowledge on droplet evaporation regime is necessary for compatible designs in numerous industries such as droplet-based biosensors or ink-jet printing.
Additionally, level of surrounding humidity, time evolution of droplet base diameter, and time evolution of droplet contact angle are estimated through regression techniques. Polynomial regressors, as well as regression www.nature.com/scientificreports/ tree, are trained through point-by-point analysis of droplet evolution. The model performance improved by increasing the order of the polynomial and using regression tree for training, test, and validation sets. However, when estimating the new conditions unseen by the model, fourth-order polynomial and regression tree suffered from data over-fitting. The best performance of the model is achieved by third-order polynomial. In general, the model estimation results are more accurate when estimating diameter evolution compared to contact angle estimation. This is due to smoother, hence easier to estimate, evolution of diameter with time. The sharp changes in θ * under different conditions make the estimation of its evolution challenging for the model. Information of this type is of great importance for technologies such as ink-jet printing, or droplet-based biodiagnostics where the estimations can provide critical information on the base diameter or contact angle of the droplet at a specific time.
In the current work, behavior of a sessile droplet transitioning from a single fluid into a binary mixture following transfer of a second component from the atmosphere is studied through data-driven techniques. The model demonstrated promising performance detecting the regime of droplet evolution, the humidity level surrounding droplet, and time evolution of diameter and contact angle. The current case study demonstrates the capability of the proposed model to analyze complex interfacial fluid mechanics problems through machine learning algorithms. Although we have estimated four parameters of droplet evaporation by this model, these kinds of techniques can be expanded to perform a wide range of estimations. Our preliminary study opens up new ways to study binary or multi-component droplet evolution which might lead to better analyzing the complex physics of the problem.

Methods
Experimental setup and procedures. The experiments are carried out in a chamber (with dimensions 127 × 127 × 76 mm 3 ) of drop shape analyzer (DSA 100) from KRÜSS. The relative humidity inside the chamber is controlled and the temperature is kept at room temperature (i.e., 23 • C ). The bottom side of the chamber is equipped with a Peltier plate (electrical system and a temperature bath) to control the substrate temperature. The top of the chamber has a small hole for passing the syringe. Both sides of the chamber have transparent windows for visualization purposes. An LED light is used for illumination and a CCD camera is utilized to capture the time evolution of droplet profile (Fig. 1a). The whole setup is mounted on an anti-vibration table to eliminate environmental disturbances.
Methanol is purchased from Fisher Scientific with a purity of 99.8%. The glass substrates are coated with a very thin PDMS (polydimethylsiloxane) layer to achieve spherical and reproducible droplets with measurable contact angles. In order to ensure that methanol does not interact with the PDMS coating, multiple methanol droplets are successively deposited at the same location on the substrate and let evaporate. No change is observed in the initial contact angle of droplets or in their evolution during evaporation. The relative humidity inside the chamber and the temperature of the substrate are set to desired values and enough time is passed to make sure quasi-steady state is achieved. Three values of relative humidity: 20%, 50%, and 80% alongside three values of substrate temperature: 15 • C , 23 • C (room temperature), and 35 • C are tested. A drop of methanol is gently deposited on the glass substrate. Droplet volume is under 5 µ l in order to keep droplet size below capillary length. The corresponding volume to the capillary length for methanol droplet is around 45 µ l. The evaporation process of methanol droplet is recorded by the CCD camera at 50 frames per second. All experiments under each relative humidity and substrate temperature condition are repeated five to ten times to ensure the reproducibility of the data. A KRÜ SS Drop Shape Analyzer software is utilized to measure the time evolution of contact angle ( θ ), base diameter (D), and volume (V). Due to the observed differences between the right and left contact angle values, elliptical fit is used for contact angle and volume measurements. For better analysis of droplet behavior, all dimensional parameters are nondimensionalized as followed: D * = D/D 0 , V * = V /V 0 , t * = t/t f ; where D 0 , V 0 , and t f are initial diameter, initial volume, and total evaporation time, respectively. Data acquisition. The data for the model is generated by experiments of methanol droplet evaporation under various environmental conditions. Nine different conditions are created by a combination of three levels of surrounding relative humidity: 20%, 50%, and 80% with three substrate temperatures: 15 • C, 23 • C, and 35 • C. For each of these nine conditions, the relative humidity in the chamber and substrate temperature is set to the desired values and enough time is passed to ensure quasi-steady state. Then a droplet of methanol is gently deposited on the substrate and the evolution of droplet base diameter, contact angle, and volume is recorded over time with a CCD camera of Drop Shape Analyzer (see Fig. 1a inset). Each point in time with corresponding contact angle, diameter, and volume is considered a data point for the model. Data partitioning and processing. The data generated under nine different conditions mentioned in the previous section are used to train, test, and validate the model. The droplet evaporation experiments for each of these nine conditions are repeated five to ten times to make sure that the results are consistent and reproducible. We carried out 60 droplet evaporation experiments under these nine conditions which created a total number of 10,850 data points that are used for training, testing and validating the model. There are additional 761 data points that are used to assess the performance of the model at the final step. The environmental conditions under which these data points are generated are completely different than the other 10,850 data points used for training, testing, and validating the model. Validation. Once the model is trained and tested, it is then examined on the validation set that was held out at the beginning of training. The validation set that is kept out, is the data pertaining to one whole experiment out of 60 experiments. This means that the model has not seen the data for the specific experiment in the validation set. Validation is performed to specifically evaluate the performance of the model under each specific condition of surrounding humidity and substrate temperature. This procedure is repeated 60 times until each experiment is held out once and validated. Validation results are averaged over all experiments under each specific RH and T condition.
State estimation. There are additional 761 data points, called estimation set, that are generated by four experiments of methanol droplet evaporation. It must be noted that the machine has not seen any data of droplet evaporation under these new conditions. The data in the estimation set does not contribute to the framework during training, testing, or validation. The surrounding humidity and substrate temperature for these new conditions are randomly selected to be in the range within which the model is trained (i.e., 20% < RH < 80% and 15 • C< T < 35 • C).
Performance criteria. The performance criteria of the model are reported by standard metrics. For classification, the confusion matrix of the model is used as well as the precision, recall, F score, and overall accuracy values for each regime. For validation, the accuracy values are the same as recall values for each combination of RH and T because there is only one true regime for each validation set. For regression methods, coefficient of determination ( R 2 ) is reported to show how well the model fits the data in the training set. For RH testing, validation, and estimation, the actual RH of the environment is compared against the estimated value of RH. When D * (or θ * ) is the target variable, the test results are demonstrated as estimated values versus actual values. The validation and estimation results are reported by R 2 that shows the quality of the fit; the proportion of variance in the target variable which is predictable from the input variables; and root mean squared error (rmse) which represents how much the estimation is off on average when estimating the average target variable. We have also tested the model convergence for both classification and regression algorithms. We trained the model by considering only fractions of the available data i.e., 80%, 70%, 60%, and 50%, and observed that the results for the test set remain unchanged. This proves the fact that the number of data points used for machine learning of this specific problem is converged.

Classifiers.
We have used Naïve Bayes Classifier 64-67 as a simple and easy-to-interpret algorithm. Since the algorithm is simple, there is less chance for over-fitting the data, it is faster and needs a smaller memory footprint. However, the restrictive underlying assumptions compromise its accuracy for real case scenarios when the variables are not fully independent of each other. Bagged Decision Tree 68-70 with 250 trees is also used. It is a powerful classifier with built-in support for crossvalidation and a specialized function to measure feature importance. However, it results in complex models that are not very transparent. It is often hard to understand how it makes estimations.