Introduction

Multivariate analysis is a powerful tool that can transform complex information into more useful sets of information, and that can extract vital differences from data that may look similar by other methods. Application of multivariate analysis to metabolite evaluations, so-called ‘metabolomics’, is well known. Principal component analysis (PCA) and partial least squares regression (PLSR), developed by Kowalski1, 2 and Wold,3 have been successfully applied to many applications such as the study of the impact of stress conditions on the plant metabolome,4 the evaluation of neurological disease progression5 and studies on toxicological mechanisms.6 An interesting application is the quality classification of Japanese green tea7 or curative plants8 through 1H nuclear magnetic resonance (NMR) spectra of their extracts. Multivariate analysis has also been applied to characterization of polymer materials. PCA and PLSR were utilized for the discrimination of ethylene-vinyl acetate copolymers with different compositions and the prediction of the content of vinyl acetate in the copolymers by infrared emission,9 Raman10 or near-infrared11 spectroscopy. PCA was also applied to analyze the Fourier transform infrared spectroscopy and differential scanning calorimetry data of various types of polyethylenes to classify chain-branching types, and chain-branching content and distribution.12

Recently, we have reported the PCA and PLSR of 13C NMR spectra of methyl methacrylate (MMA) and tert-butyl methacrylate (TBMA) copolymers, homopolymers and different blends of two methacrylates (poly MMA (PMMA) and poly TBMA (PTBMA)).13 The first principal component (PC1) scores contained most of the information of the chemical compositions of polymers, and PLSR (without assigning resonance peaks) predicted the chemical compositions accurately and precisely. The second principal component (PC2) scores reflected the heterogeneity of comonomer sequences. In that case, PCA and PLSR were performed on the spectrum regions of the carbonyl and quaternary backbone carbons. Maximization of the spectral information for entire spectrum regions requires further examination.

In this paper, the shapes of PCA score plots in each spectral region were compared to determine suitable spectral regions that maximized the complete spectral information. The relationship between PC1 scores and the chemical compositions of the samples, as well as the influence on the precision and accuracy of PLSR from the difference of the spectrum regions set, were examined. Furthermore, we focused on quantitative analysis of the heterogeneity of comonomer sequences of MMA-TBMA copolymers. The fractions of comonomer sequences at dyad and triad levels for two blends of copolymers and seven copolymers obtained at the end stage of polymerization by PLSR were determined.

Experimental procedure

Materials

MMA and TBMA (supplied by Mitsubishi Rayon, Tokyo, Japan) were purified by distillation under reduced pressure. 2,2′-Azobisisobutyronitrile (Wako Pure Chemical Industries, Osaka, Japan) was recrystallized from methanol. Ethyl lactate and methanol (Kishida Chemical, Osaka, Japan) were used without purification.

Polymerization

A mixture of the monomer (20 wt%) and 2,2′-azobisisobutyronitrile (0.5 mol% of the monomer mixture) were dissolved in ethyl lactate. Polymerization was carried out at 80 °C under a nitrogen atmosphere. After 3–7 min, the polymerization mixture was cooled to room temperature and poured into a large volume of a methanol/water mixture (3/7 vol/vol). The polymer precipitated and was collected by filtration and dried over night at 60 °C in vacuo. Table 1 summarizes the feed ratio and chemical composition (mol% in TBMA units), Mn, Mw/Mn and the yield of the polymer samples together with those of the homopolymers and the copolymers obtained at higher yields.13

Table 1 Preparation of PMMA, PTBMA and poly(MMA-co-TBMA)sa

Measurements and procedures

The homopolymers, copolymers and homopolymer blends were dissolved in chloroform-d (8% wt/vol). 1H and 13C NMR spectra of the sample solutions were measured at 55 °C on a JNM-ECX400 spectrometer (JEOL, Tokyo, Japan) equipped with a 10 mm multinuclear probe (1H: 45° pulse (8.5 μs), pulse repetition 8.90 s, 16 scans; 13C: 45° pulse (7.5 μs), pulse repetition 2.73 s, 10000 scans, with 1H broadband decoupling). Chemical composition was determined from the integral intensities of 1H NMR signals of the ester groups of MMA and TBMA units.

Each 13C NMR spectrum was stored into 32 768 complex data points covering a spectral width 31250 Hz, and zero-filled to 13 1072 points before Fourier transformation. An exponential apodization function was applied to free induction decays, which corresponded to a line-broadening factor of 3.0 Hz. The 13C NMR chemical shift of chloroform-d was set to 77.0 p.p.m. as an internal standard. Bucket integration at an interval of 0.25 p.p.m. was performed with the JEOL Alice2 ver.5 for metabolome ver.1.6 software for the following six spectral regions; 15.1–23.1 p.p.m. (α-methyl carbons), 26.0–29.0 p.p.m. (methyl carbons of tert-butyl groups in TBMA units), 44.1–48.1 p.p.m. (backbone quaternary carbons), 48.2–58.2 p.p.m. (ester methyl carbons of MMA units and backbone methylene carbons), 79.5–83.0 p.p.m. (quaternary carbons of tert-butyl groups in TBMA units) and 175–179 p.p.m. (carbonyl carbons). The sum of integral intensities in each resonance region was normalized to 100. The average integral intensity was subtracted from each integral intensity. The data set therefore consisted of mean-centered bucket integral values. PCA and PLSR of the data sets were conducted using the Sirius ver. 7.0 software (Pattern Recognition Systems, Bergen, Norway).

Results and Discussion

PCA

Figure 1 shows a 13C NMR spectrum of poly(MMA-co-TBMA) with 53.8 mol% in TBMA units (M-54 in Table 1). The spectra of α-methyl, backbone quaternary and carbonyl carbons of the copolymer were broad and complicated because of configurational and comonomer sequences. Therefore, the assignment of the individual peaks is troublesome.

Figure 1
figure 1

100 MHz 13C NMR spectrum of poly(MMA-co-TBMA) with 55.5 mol% in TBMA units (L-56),as measured in chloroform-d at 55 °C. (a) Whole, (b) carbonyl carbon, (c) backbone quaternary carbon and (d) α-methyl carbon regions.

The individual PCA for 27 samples (2 homopolymers, 9 blends and 16 copolymers) was performed on the six spectral regions described above. Figure 2 shows the PC1–PC2 score plots for these regions. The number in parentheses denotes the contribution rate of variance for each principal component. Spectral information in each region could mostly be explained by the PC1–PC2 score plots. The cumulative contribution rate was over 96% in each case. As shown Figures 2a–c, their score plots reflected on the spectral information of the carbonyl, backbone quaternary and α-methyl carbons were the same shapes, because their carbons were the common structures in MMA and TBMA units.

Figure 2
figure 2

First (PC1) and second (PC2) principal component score plots for (a) the carbonyl, (b) the backbone quaternary, (c) the α-methyl, (d) the backbone methylene and the methyl group of MMA pendant, (e) the quaternary of TBMA pendant and (f) the methyl groups of TBMA pendant 13C NMR signals of PMMA (♦), PTBMA (♦), homopolymer blends (), copolymers obtained at high conversions (▪) and low conversions (□). The number in parentheses is the contribution rate of variance for each principal component.

A new data set was constructed by combining bucket integral intensities of the three spectral regions of the carbonyl, backbone quaternary and α-methyl carbons to improve reliability of the PCA. Figure 3 shows the PC1–PC2 score plots for the combined data set. The contribution rates of PC1 and PC2 were 77.1% and 20.7%, respectively. Further investigation described below was made on this data set.

Figure 3
figure 3

First (PC1) and second (PC2) principal component score plots for the combined 13C NMR signals in the carbonyl, backbone quaternary and α-methyl groups of two homopolymers and 16 copolymers. See Figure 2 for each symbol. The number in parentheses is the contribution rate of variance for each principal component.

Statistical determination of chemical composition

Figure 4 shows 13C NMR extended spectra with the carbonyl, backbone quaternary and α-methyl carbons of PMMA, PTBMA and poly(MMA-co-TBMA) of L-54, together with the corresponding PCA loading histogram. The loading histograms indicate the pseudo-signal reflecting each principal component in the bucket integral intensities for the three resonance regions of all sample polymers. Positive loadings of PC1 were observed at the same chemical shift owing to the signals of PMMA, whereas negative ones were observed from PTBMA. Their loading histograms indicated that PC1 reflected the chemical compositions of MMA and TBMA units in polymer samples, as described in the previous section. The relationship between the PC1 score (y) and the TBMA unit composition in mol% (x) was derived as the following equation with a correlation coefficient (R2) of 0.998:

Figure 4
figure 4

13C NMR spectra of the carbonyl, backbone quaternary and α-methyl carbons of (a) PMMA (H-0), (b) poly(MMA-co-TBMA) (L-56) and (c) PTBMA (H-100). (d) The corresponding loading histograms for the first (PC1) and second (PC2) principal component of the 13C NMR spectra.

The R2 is slightly better than that reported in our previous paper (R2=0.997),13 where the two spectral regions due to the carbonyl and backbone quaternary carbons were used for the PCA.

To predict TBMA unit composition in polymer samples, a PLSR model was constructed with bucket-integral data sets of the three spectral regions and the TBMA unit composition data of the two homopolymers and nine their blends as a training (reference) set. The first latent variable (LV1) was used for the regression model because 99.5% of the spectral information was explained with only LV1. The LV1 loading histograms, shown in Figure 5, were the same shape as the PC1 loading. Therefore, this model was reasonable to predict chemical compositions in samples. Figure 6 shows the relationship between the TBMA unit compositions determined by 1H NMR and those predicted by PLSR. An excellent relationship was obtained with an R2 of 0.999, which is also slightly better than that in the previous paper (R2=0.998).13 The accuracy of predicted TBMA unit compositions by PLSR was evaluated with relative standard deviation, RSD. The RSD was calculated to be 3.4%, which is slightly worse than that of the 2.8% (this RSD of 2.8% is as same sense as the absolute error of 3.4% indicated in our previous report) in the previous paper.13 This is because of the two methyl groups in the 2,2′-azobisisobutyronitrile fragment,14 which were observed in the same region as the α-methyl carbons.

Figure 5
figure 5

The loading histograms of the PLSR model for predicting chemical compositions in samples, constructed in the 13C NMR spectra and TBMA unit compositions with the PMMA, PTBMA and nine of their blends.

Figure 6
figure 6

Correlation between predicted TBMA compositions by partial least square regression and observed by 1 H NMR for PMMA (♦), PTBMA (♦), nine homopolymer blends () and 16 copolymers (□, ▪). The solid line indicates the ideal correlation. Training (reference) set model: PMMA, PTBMA and nine homopolymer blends.

Relationship between the PC2 score and the heterogeneity of monomer sequences

The PC2-positive loadings in the PCA loading histograms, as shown in Figure 4, appeared at 176.50–177.75 p.p.m. in the resonance region of carbonyl carbon, at 44.60–44.85 p.p.m. and 46.35–46.85 p.p.m. in that of backbone quaternary carbon, 16.60–17.10 17.85–18.10 and 18.60–19.10 p.p.m. in that of α-methyl carbon, and corresponded to the signals of the two homopolymers. Conversely, the characteristic signals of copolymers, especially at 175.75–176.50, 178.00–179.00 and 44.85–46.35 p.p.m., arose on the negative side in PC2 loadings. The prime characteristic of copolymers is the presence of different monomeric sequences. Therefore, PC2 should reflect the heterogeneity of monomer sequences, as the stereoregularity of the copolymers is not expected to vary greatly with the feed monomer ratio in this copolymerization.15

To obtain quantitative information about the heterogeneity of monomer sequences from PC2, the fraction of hetero dyad sequence f12, consisting of M1 and M2 units was used as an index. The radical copolymerization of methacrylate monomers generally proceeds according to the terminal model. The f12 in the copolymer is expressed as Equation (1),16

where P12 and P21 denote the probability of addition to M2 monomer by the M1-ended radical and vice versa, respectively. P12 and P21 are defined by Equations (2) and (3) in the case of low conversions,17

where r1 and r2 denote reactivity ratios of monomers M1 and M2, respectively, and F1 and F2 denote chemical composition of M1 and M2 units in the copolymer obtained at an early stage of polymerization (F1+F2=1). Thus, the f12 in the copolymer obtained at low conversions is calculated by Equation (4):

The f12 depends on comonomer composition. A copolymer with equimolar composition, F1=F2=0.5, maximizes f12. A completely random copolymer whose comonomer sequence distribution obeys Bernoullian statistics (r1=r2=1) gives f12=0.5 at F1=0.5. Homopolymer and completely alternating copolymers exhibit extreme values of f12, 0 and 1, respectively. The f12 is equivalent to the RN defined by Harwood et al.18 (run number=100 f12).

The monomer reactivity ratios of MMA, rM and TBMA, rT were calculated to be 0.81±0.06 and 1.26±0.03 with an R2 of 0.997, respectively, by the Kelen-Tüdõs method19 using nine copolymers obtained at early stages of copolymerization (L-6–L-93, Table 1). These were comparable to previously reported values (rM=0.96 and rT=1.35 by Yuki et al.,15 rM=0.68 and rT=1.29 by Zhao et al.20). The fMT (M1: MMA, M2: TBMA) for L-6–L-93 were calculated with Equation (4). The good relationship between the PC2 score (w) and the fMT (z) of L-6–L-93 and the two homopolymers was obtained as the following equation with an R2 of 0.996:

PC2 scores succeeded in reducing the quantitative information of the fraction of hetero dyad sequences, fMT.

Statistical determination of dyad sequence distribution by PLSR

Dyad sequence distributions (dyad sequence distributions mean, strictly speaking, the distributions of sequence of two monomeric units in each copolymer chain. In this report, we simply define them as the fractions of each dyad sequence in measured samples) of copolymers should be determined by PLSR with a training set of copolymers obtained at low conversions, because the information of the fraction of two homo dyads, M1–M1: f11 and M2–M2: f22 were also included in the three spectral regions. The f11 in the copolymers obtained at an early stage of polymerization is expressed as Equation (5),16

where P11 denotes the probabilities of addition to the M1 monomer by the M1-ended radical (P11=1–P12).17 f22 is also expressed by replacing suffix number 1 with 2 in Equation (5). The fractions of three dyad sequences, fMM, fMT, fTT for PMMA, PTBMA and the copolymers of L-6–L-93 were calculated from Equations (4) and (5), as summarized in Table 2. Chemical composition was adopted as fMM and fTT of the blends of two homopolymers with fMT=0. Leave-one-out cross-validation21 was performed by PLSR to validate the training set models of fMM, fMT and fTT, which were constructed with 20 13C NMR spectra and theoretical fractions in Table 2. Each cumulative contribution rate of three PLSR models, as summarized in Table 3, was over 99% with two latent variables, LV1 and LV2. Figure 7 shows the LV1 and LV2 loading histograms of their PLSR models on the spectra regions of carbonyl carbons. The shapes of the LV1 and LV2 loading histograms of the fMM model coincided with those of PC1 and PC2 on PCA loadings, and indicated that the LV1 and LV2 of fMM model caught information on the homo and hetero dyad sequences in the polymer samples. The LV1 of the fMT model were the same as LV2 of the fMM model reflected the hetero dyad sequences. The positive and negative LV1 loading of the fTT model showed the same shape of PTBMA and PMMA spectra, which were the opposite sign of those of fMM model. The loading histograms on the spectra regions of the quaternary and α-methyl carbons showed the same results as those of the carbonyl carbon. Therefore their PLSR models were reasonable to predict the three fractions of dyad sequences in copolymer samples. Figure 8 (see Supplementary Table S1 and Figure S1) shows the PLSR models predicting fMM, fMT and fTT. This indicates that a good fit between theoretical and predicted fractions was obtained, with an R2 of 0.999–0.997 and an RSD of 3.5–7.1%. Thus, their PLSR models should be able to predict the fractions of three dyad sequences with good accuracy and precision from spectra of unknown samples.

Table 2 Theoretical fractions of dyad sequences for training set modelsa
Table 3 Contribution rates of PLSR models to predict the fractions of dyad sequences
Figure 7
figure 7

The loading histograms of the PLSR model, which predicts the fractions of three dyad sequences in samples, constructed with the 13C NMR spectra and theoretical fractions of dyad sequences with the two homopolymers, and nine copolymers and their blends obtained at early stages of copolymerization.

Figure 8
figure 8

PLSR models for predicting fractions of (a) MMA homo (fMM), (b) MMA-TBMA hetero (fMT) and (c) TBMA homo (fTT) dyad sequences from 13C NMR signals of the carbonyl, backbone quaternary and α-methyl carbons of PMMA (♦), PTBMA (♦), their blends () and poly(MMA-co-TBMA)s obtained at early stages of copolymerization (□).

Verifying adaptability to a wide variety of MMA-TBMA copolymers, the prediction of fMM, fMT and fTT of the two copolymers, which were blended with different large copolymers of dyad sequence distributions, were initially tried on the PLSR models in Figure 8. Table 4 shows the amount of the original copolymers and the TBMA unit compositions of the copolymer blends.

Table 4 Calculated and predicted fractions of dyad sequences for the blends of copolymers obtained at low conversions

The fractions of dyad sequences were determined by PLSR models in Figure 8. Table 2 summarizes the fractions of three dyad sequences calculated from the blend molar ratio of the original copolymers and those predicted by PLSR. Calculation procedures are described in Supplementary information. The predicted fractions of dyad sequences were excellent in relation to their predicted values, with an R2 of 0.995 and an RSD of 2.6% for six of the fractions of three dyad sequences. It was confirmed that the fractions of dyad sequences of copolymer samples were predicted with good accuracy and precision by using those training set models.

Figure 9 shows the predicted distribution of seven copolymers obtained at higher conversions, H-83–H-24 and M-78–M-29, using the training set model. These plots deviated slightly from the simulation for the copolymers obtained at early stages of copolymerization owing to the difference of conversions. The average fractions of each dyad sequence at full conversions calculated by the Spinner method22 showed the same tendency as Figure 8 (see Supplementary Table S1 and Supplementary Figure S1).

Figure 9
figure 9

Dyad sequence distributions of the copolymer obtained at higher conversions. Each dashed line indicates the dyad sequence calculated from Equations (4) and (5) obtained at low conversions.

PCA for side chain structures

PCA was also performed for the quaternary carbons (Figure 2e) and the methyl carbons (Figure 2f) of the tert-butyl ester groups in TBMA units. Only the plot of the PMMA homopolymer deviated far from the other plots, indicating that PMMA homopolymer was easily distinguished from other samples. This was because no signals were observed in these resonance regions of the spectrum of PMMA.

The PCA score plots for the backbone methylene carbons and the methyl carbons of methyl ester groups in MMA units (Figure 2d) were a similar shape to those for α-methyl, backbone quaternary and carbonyl carbons (Figures 2a–c), although the PC1 plot distances of the blends of homopolymers narrowed with increasing PC1 scores. The narrow plot distance can be attributed to incorporation of the signals from the methyl ester groups, which only belong to MMA monomeric units. If the signals of the methyl carbons of methyl ester groups in MMA units could be separated from those of the methylene carbons, the PCA score plots for the backbone methylene groups would be similar to those for the α-methyl, backbone quaternary and carbonyl carbons (Figures 2a–c).

Conclusions

We applied multivariate analysis of 13C NMR spectra of PMMA, PTBMA, their blends and poly(MMA-co-TBMA)s to compositional and sequential characterization. The two principal components of PCA explained the spectra of the majority of samples tested. PC1 and PC2 indicated the chemical composition and the fraction of hetero dyad sequence, respectively. Also, the dyad sequence distributions of copolymers obtained at higher conversions were successfully determined by PLSR using homopolymers and the copolymers at early stages of copolymerization as a training data set, without assignment of 13C NMR resonance peaks. We are currently investigating a quantitative determination method of stereoregularity in addition to chemical composition and comonomer sequence distribution of poly(MMA-co-TBMA).