Deep learning-based estimation of Flory–Huggins parameter of A–B block copolymers from cross-sectional images of phase-separated structures

In this study, deep learning (DL)-based estimation of the Flory–Huggins χ parameter of A-B diblock copolymers from two-dimensional cross-sectional images of three-dimensional (3D) phase-separated structures were investigated. 3D structures with random networks of phase-separated domains were generated from real-space self-consistent field simulations in the 25–40 χN range for chain lengths (N) of 20 and 40. To confirm that the prepared data can be discriminated using DL, image classification was performed using the VGG-16 network. We comprehensively investigated the performances of the learned networks in the regression problem. The generalization ability was evaluated from independent images with the unlearned χN. We found that, except for large χN values, the standard deviation values were approximately 0.1 and 0.5 for A-component fractions of 0.2 and 0.35, respectively. The images for larger χN values were more difficult to distinguish. In addition, the learning performances for the 4-class problem were comparable to those for the 8-class problem, except when the χN values were large. This information is useful for the analysis of real experimental image data, where the variation of samples is limited.

. Confusion matrices of the 6-class problem at 100 epochs.  Table S2 lists the confusion matrices of the 8-class problem at 100 epochs. The error rates E were 2.5 × 10 -4 , 4.94 × 10 -3 , 1.46 × 10 -2 , and 1.09 × 10 -2 for (f, N) = (0.2, 20), (0.2, 40), (0.35, 20), and (0.35, 40), respectively. For the 8-class problem, the probability that the images for f = 0.35 were more difficult to learn compared to f = 0.2 and this was also observed in 4-class problem. We found that these behaviors of E were consistent with the behavior of the learning curves presented in Fig. S2. Table S2. Confusion matrices of the 8-class problem at 100 epochs. For verification, we performed two additional independent learning steps for each (f, N).
When (f, N) = (0.35, 20), we encountered a learning failure case, as shown in Table S3 (a). As can be observed in Fig. S3 (a), the reason for this failure was that the trained network at 100 epochs accidentally had a significantly low validation accuracy. In the third training, the errors in Table S3 (b), was smaller than those of the second training in Table S3 (a). This behavior was reasonable based on the behavior of the learning curve presented in Fig. S3 (b).   Confusion matrixes of image classifications obtained from ML and DL for the 4-class   problem   Tables S4-S6 present the confusion matrices of image classifications for the 4-class problem using ML with SVM for the histogram of brightness and the HoG features, and DL for binarized images. The overall performance of each method is listed in Table 2. From Table S4 (a) and (b), it can be observed that for f = 0.2, ML with SVM for the histogram of brightness failed clearly. From Table S4 (c) and (d), it can be observed that for f = 0.35, ML with the histogram of brightness were able to classify small χN. As shown in Table S5, for ML with HoG features, the error was large. These behaviors of ML were similar to those of the 8-class problem, which is discussed in the next section. When DL was used for binary images, many classes were incorrectly classified as adjacent classes. These behaviors are similar to the 8-class problem.      Table 3. The behaviors of the obtained results were similar to those of the 4-class problem discussed in the previous section S2. Table S9 shows the confusion matrices of image classifications for the 8-class problem obtained using DL for binarized images. The overall performance is listed in Table 3. The behaviors of the obtained results were similar to those of the 4-class problem discussed in the previous section.      Tables 4 and 5). Figure S4 shows the learning curves for 50 epochs. This run was an independent run from the training for Figs. 3-5 and Tables 4 and 5 in the main text. We found that the results obtained were similar to those obtained for 100 epochs. Figure S4. Learning curves of the regression problem until 50 epochs.

S6. Cases with long learning times (epochs)
Training iterations with 1000 epochs were performed to investigate the crossover behavior in overfitting. We also examined cases with 4000 epochs that clearly showed overfitting. The results for the 4000 epochs are presented in the next section S7. Figure S9 shows the learning curves for 1000 epochs. Due to the discrepancy between the training and evaluation MAEs, overfitting was observed. To investigate the details of the distributions of the estimated χN, they are shown in Figs. S10 and S11.

S7. Behaviors of overfitted networks with much longer learning times
To clarify the behaviors for the overfitting cases, we trained the regression problem until 4000 epochs. Figure S12 shows the learning curve, and Figures S13 and S14 present the distributions of the estimated χN. We observed the typical behavior of failures (sharp peaks at χN values of the teaching images) in the estimations for unlearned χN values.

S8. Transfer learning with frozen network weights of the image classification problem
Transfer learning is a widely used technique for shortening the learning time (epochs) using the already learned network weights as initial weights. For problems in the present work, it is prudent to use the weights learned in the image classification for learning the regression problem. In this section, we describe that the transfer learning is performed by learning only the weights of the full connection layer block, as shown in Fig. 8 (b). Herein, the weight of the convolution layer block of the network shown in Fig. 8 (b) did not change. The results of transfer learning to change all weights, including the convolution layer blocks, are provided in the next 24 section. The behaviors were similar to those when the weight of the convolution layer blocks was fixed. Figure S15 shows the learning curves of the transfer learning using the weights learned in the 8-class image classification problem as initial weights. For all cases, the training and evaluation MAEs became smaller when compare to that shown in Fig. 3. We found that transfer learning shortened the learning time. In the case of regression problems, it is also necessary to maintain the generalization performance of the unlearned χN. This can be investigated through the probability distribution functions. Figure S15. Learning curves of the regression problem until 100 epochs using transfer learning from the classification problem.
Figures S16 and S17 present distributions of the estimated χN for the cases with transfer learning. It was clearly found that the obtained results were typical behaviors (sharp peaks at χN values of the teaching images) of the overfitting.  For comparisons of transfer learning without freezing weights, we performed transfer learning for all weights of the network, as shown in Fig. 8 (b). In the main text, the weight of the convolution layer blocks in Fig. 8 (b) was fixed. Figure S18 shows the learning curves and Figures S19 and S20 present the distributions of the estimated χN. Here, the obtained behaviors (sharp peaks at χN values of the teaching images) were similar to those when the weight of the convolution layer block was fixed.   Here, the size of each bin was 0.05.

S10. Regression problem for the binarized images
To confirm the effects of the interfacial density gradients on the regression problem, we investigated the performance of regression using binarized images. Figure S21 shows the learning curves for binarized images. The obtained MAEs were found to be worse than those for gray-scale images. Figures S22 and S23 present the distributions of estimated χN. The averages and standard deviations for each χN are listed in Tables 6 and 7 in the main text. The distributions for the binarized images were much wider than those for the gray scale images.
Absolute differences from the true values of χN were also larger than those observed for the gray scale images. Therefore, we concluded that the gray scale images provided essential information for estimating χN. This suggests that χN can be evaluated accurately without using DL, if an arithmetic calculation method for estimating χN from a cross-sectional image is developed. At present, the arithmetic calculation method is unknown; therefore, DL is an effective tool. Figure S21. Learning curves of the regression problem for the binarized images.

S11. SCF calculation
In the real-space SCF theory [52,53] for an A-B BCP, Gaussian statistics for the chain conformation were assumed owing to the screening effect in the melts. Thus, the K-type (K = A or B) segment is characterized by the effective bond length bK, and the K-type block is characterized by the number of segments NK.
The local segment density φK(r) of the K-type (K = A or B) segment was obtained by solving a set of self-consistent equations using an iterative refinement method. The selfconsistent external potential VK(r) imposed on the K-type segments is decomposed into two contributions: the direct interaction potential imposed by the nearest-neighbor segments and the constraining potential imposed by the segment density profile φK(r). According to the references [52,53], VK(r) is where µK(r) is the chemical potential of the K-type segment. The first term represents the interaction energy between segments, where ′ is the nearest-neighbor pair interaction energy between a K-type segment and a K'-type segment which is related to the Flory-Huggins interaction parameter via , ′ = [ , ′ − ( , + ′, ′ )/2 ], where z is the number of nearest-neighbor sites. µK(r) can be regarded as the Lagrange multiplier that fixes the density of the K-type segments at position r to φK(r).
The statistical weight of a sub-chain of the K-type from the i-th segment to j-th segment is expressed as where and denote the positions of the i-and j-th segments, respectively. For a small βVK(rj), the statistical weight was governed by following the Edwards equation: where CK denotes the normalized coefficient and NK is the total number of segments in a chain. , and = 1 − , respectively.