Introduction

Microplastics, defined as plastics with particle size < 5 mm, have become one of the most prominent global environmental pollution problems1,2. They may originate directly from industrial and personal products, or from the degradation of large-size plastics3. For environmental management, we can ban the direct sources of microplastics to a certain extent. However, the wide application of plastic products in daily life makes hundreds of millions of tons of plastic waste, which definitely become the precursors of microplastics, be discharged into the environment each year4. As a result, microplastics have been detected in waste water5,6, natural water7,8, and even in drinking water9. At present, the pollution of microplastics has become a persistent environmental problem that needs to be urgently addressed. Therefore, comprehensive and accurate assessment of their environmental risks (e.g., environmental behavior and ecotoxicity) is particularly important for developing effective environmental policies.

Previous studies proved that the large specific surface area makes microplastics show high adsorption capacity to the coexisting organic pollutants, such as polycyclic aromatic hydrocarbons10, polychlorinated biphenyls11, etc. Some ionizable organic pollutants (e.g., antibiotics) also can be adsorbed on microplastics12. The adsorption interaction may further alter the behavior and toxicity of both microplastics and organic pollutants, such as inevitably change the distribution of organic pollutants between the environmental phase and the microplastic phase13, or affect the structures and properties of microplastics and organic pollutants and subsequently affect their environmental transformations. More importantly, more organic pollutants can be carried by microplastics into organisms because of the adsorption, which may increase the bioconcentration of chemicals and cause increased toxicity14,15. Thus, quantitative measurement of the adsorption for organic pollutions on microplastics is necessary for assessing the environmental risk of both microplastics and organic pollutants in a more comprehensive and accurate way.

Generally, equilibrium partitioning coefficient of organic pollutants between microplastics and water (Kd) is used to represent the adsorption capacity. It can be determined through adsorption equilibrium experiment11. Previous studies12,16 indicate that the composition and property of both microplastics and water environment media can affect the determined Kd value. Thus, the specific environmental condition should be considered for measuring the Kd values, which will greatly increase the amount of experimental work. However, the present research on microplastics is still in its infancy, and the adsorption data is scarce, which will certainly limit their further research on microplastics and their risk assessment. Therefore, there is an urgent need for a fast and accurate method to obtain the Kd values at different adsorption conditions.

Quantitative structure–property relationship (QSPR) has been proved to be reliable for quickly predicting the properties of chemicals17,18. Especially, the polyparameter linear free energy relationship (pp-LFER) models based on Abraham descriptors were widely employed to predict the partitioning of chemicals between two phases and explore the partition mechanisms19,20. For example, many researchers predicted the adsorption capacity of polymers with large size (e.g., used for equilibrium passive samplers) based on pp-LFER21. However, the large difference in polymer size may limit the application of these already developed models to the prediction of the adsorption capacity for microplastics22,23,24. A few studies established pp-LFER models of log Kd under corresponding experimental conditions based on their measured experimental values25,26,27. While, the lack of experimental values of Abraham descriptors for many nonpolar chemicals will affect the construction and application of pp-LFER model20,28. In order to expand the application range, different descriptors that can be theoretically calculated (e.g., quantum chemical descriptors29) may be selected to build the Kd prediction models. In addition, some ionizable organics such as antibiotics can also be adsorbed by microplastics. The distribution of dissociation species varies under different pH conditions, which will lead to different apparent Kd values. Thus, the molecular dissociation under certain pH values should be involved in the development of QSPR predictive models.

In this study, we thus collected Kd values for the three most frequently detected microplastics, including polyethylene (PE), polypropylene (PP) and polystyrene (PS) in different waters, and employed the n-octanol/water distribution coefficient at special pH condition (log D), and six quantum chemical descriptors to establish new QSPR models. The main purpose is to develop a more practical computational method that can quickly predict the adsorption capacity of microplastics towards organic pollutants in water environments with different pH values.

Results and discussion

QSPR models for the adsorption of PE

Three QSPR models of log Kd were developed for the adsorption of PE in seawater, freshwater and pure water, respectively:

$$\begin{aligned} {\text{Seawater:}}\quad \log K_{{\text{d}}} & = \, \left( {0.725 \, \pm \, 0. \, 058} \right) \, \times \, \log D + \, \left( { - 36.236 \, \pm \, 9.034} \right) \, \times \varepsilon_{\alpha } \\ & \quad + \, \left( { - {23}.{169 } \pm { 4}.{5}0{1}} \right) \, \times \varepsilon_{\beta } + \, \left( {{17}.{856 } \pm { 2}.{572}} \right) \\ \end{aligned}$$
(1)
$${\text{Freshwater:}}\quad \log K_{{\text{d}}} = \left( {0.667 \, \pm \, 0.047} \right) \times \log D + \, \left( {1.714 \, \pm \, 0.302} \right)$$
(2)
$${\text{Pure}}\;{\text{water:}}\quad \log K_{{\text{d}}} = \left( {0.449 \, \pm \, 0.041} \right) \times \log D + \, \left( {0.265 \, \pm \, 0.115} \right) \, \times M_{{\text{w}}}^{\prime } \, + \, \left( {1.855 \, \pm \, 0.302} \right)$$
(3)

where log D is the n-octanol/water distribution coefficient at special pH value, εα is the covalent acidity, εβ is the covalent basicity and Mw is the relative molecular mass. As shown in Williams plot for model (3) (Fig. S1 of the Supplementary Information, S1), 17α-ethinyl estradiol obtained an absolute SR value (− 3.392) larger than 3 and it was diagnosed as an outlier. Structural analysis showed that 17α-ethinyl estradiol is significantly different from other compounds due to its acetylene group and steroidal ring (unsaturated benzene ring connects with saturated six-membered ring). Such discrepancy may be the main cause of predictive inaccuracy. After removing it, the following model was yielded:

$${\text{Pure}}\;{\text{water:}}\quad \log K_{{\text{d}}} = \left( {0.486 \, \pm \, 0.035} \right) \times \log D + \, \left( {2.420 \, \pm \, 0.199} \right)$$
(4)

The statistical parameters of the developed QSPR models are presented in Table 1. For the models (1), (2) and (4), R2 = 0.868, 0.903 and 0.811, Q2 = 0.868, 0.903 and 0.811, and RMSE = 0.826, 0.686 and 0.612, respectively. The statistical results indicate that the models have high goodness-of-fit. As shown in Table S1, all the VIF values (1.000–1.204) are less than 10, indicating there is no multicollinearity for the three models. The fitting plots (Fig. 1) state a good consistence between the experimental and predicted log Kd values. As shown in Fig. 2, the distributions of predictive errors show no dependence on experimental log Kd values. Thus, the developed models have no systematic error, which is also proved by BIAS = 0.000–0.001 (Table 1).

Table 1 Statistical parameters of the regression models and simulated external validation.
Figure 1
figure 1

Fitting plots of experimental and predicted log Kd by models (1), (2) and (4).

Figure 2
figure 2

Distributions of prediction errors of log Kd calculated by models (1), (2) and (4).

For the simulated external validation, the redeveloped QSPR models (S1S3) based on 70% experimental data and the same descriptors in model (1), (2) and (4) show similar fitting performance (including R2, Q2, RMSE and MAE) and regression coefficients with the models developed by the whole dataset (Table 1). Thus, the models are statistically stable. As the training subsets are randomly assigned, there is no casual correlation. The predictive performance of each rebuilt model to the test set (30% subset, shown by the superscript of b in Table 2) are listed in Table 1. The values of Q2, RMSE and MAE indicate excellent predictive quality of the developed QSPR models. The results of leave-one-out cross validation (Q2CV = 0.882–0.940) also show a good robustness and internal predictivity.

Table 2 Experimental and predicted log Kd values of organic compounds and the values of the selected molecular descriptors in models (1), (2), (4), (5) and (6).

Williams plots were employed to test the application domain of the QSPR models (1), (2) and (4). The calculated alert value h* are 0.324, 0.250 and 0. 128, respectively. As shown in Fig. 3, there are three (oxytetracycline, sulfadiazine and δ-hexachlorocyclohexane), and one (2,2′,3,3′,4,4′,5-heptachlorobiphenyl) compounds located at the right side of h* for models (1) and (4), respectively. As their absolute SR values are < 3, these chemicals are not diagnosed to be outliers. In summary, these results indicate the developed QSPR models have excellent generalization capabilities in their descriptor matrix. Given the molecular structures for developing models, QSPR model (1) can be used to predict the log Kd values of organics including polychlorinated biphenyls, antibiotics, polycyclic aromatic hydrocarbons, chlorobenzenes, perfluorinated compounds and hexachlorocyclohexanes between PE and sea water; model (2) can be employed for predicting the log Kd values of polychlorinated biphenyls and antibiotics between PE and fresh water; model (4) can be performed to predict the adsorption of PE in pure water towards organic pollutants such as polychlorinated biphenyls, antibiotics, polycyclic aromatic hydrocarbons, chlorobenzenes, aromatic hydrocarbons and aliphatic hydrocarbons.

Figure 3
figure 3

Williams plots for the applicability domain of models (1), (2) and (4). The hi refers to the verse leverage value. (a) oxytetracycline; (b) sulfadiazine; (c) δ-hexachlorocyclohexane; (d) 2,2′,3,3′,4,4′,5-heptachlorobiphenyl.

The n-octanol/water distribution coefficient at special pH value (log D) was selected for all the three log Kd predictive models for PE in seawater, freshwater and pure water. The experimental log Kd values significantly correlate with log D, which yields positive correlation coefficients (0.725, 0.667 and 0.486) in models (1), (2) and (4). Thus, the organic pollutants with high hydrophobicity will prefer to be adsorbed onto the PE. For example, hydrophobic polychlorinated biphenyls (PCBs) with large log D values exhibit higher log Kd values than ionizable organic pollutants (e.g., antibiotics). This is because the hydrophobicity of PE itself makes hydrophobic interaction as the main mechanism in the adsorption of PE towards organic pollutants. The same adsorption mechanism was also confirmed by Hüffer et al. who established prediction model based on the log Kow values of seven organic compounds30.

For the adsorption of PE in seawater, εα and εβ, which respectively represents covalent acidity and covalent basicity, were also selected. The quantum chemical descriptor of εα shows a negative contribution to the log Kd values, suggesting that organic pollutant with large εα value prefers to dissolve in water, leading to a decrease in log Kd. That means the surface of PE has a weaker H-accepting ability to organic pollutants than water at the adsorption interface31. Similarly, the log Kd values increase with decreasing εβ, indicating that the H-donating ability of the PE surface is also weaker than water. It follows that hydrogen bond interaction is also an important mechanism for the interactions between PE and organic pollutants in sea water.

Compared with fresh water and pure water, the high salinity of seawater can enhance the dipole–dipole and dipole–induced dipole interactions in the system, which can make hydrogen bonds form easily. As a result, εα and εβ play more important role in the log Kd value of PE for seawater. In brief, the distribution behavior of the studied organics between PE and water is mainly affected by the hydrophobic interaction. For the adsorption in seawater, hydrogen bond interaction is another important driving force.

QSPR model for the adsorption of PP

A QSPR model of log Kd was yielded for the adsorption of PP in seawater:

$${\text{Seawater:}}\quad \log K_{{\text{d}}} = \left( {0.751 \pm 0. \, 035} \right) \times \log D + \left( { - 19.323 \pm 2.072} \right) \times \varepsilon_{\beta } + \left( {6.735 \pm 0.663} \right)$$
(5)

Values of R2, Q2, and RMSE are 0.939, 0.939 and 0.381, respectively. Thus, the model (5) show great goodness of fitting and can explain 94% variability of the whole dataset. The nonlinearity of model (5) has been proved by the VIF values (1.034 for both descriptors, Table S1). As shown in Fig. S2, the predicted log Kd values show good consistence with their experimental values. The Fig. S3 and BIAS value (− 0.003) proved that there is no dependence of predictive errors on experimental log Kd values.

For the simulated external validation, the regression coefficients (R2 = 0.945, RMSE = 0.396 and MAE = 0.307) and statistical parameters of the training subset are similar to that of the whole dataset (Table 1 and model S4). Thus, model (5) is statistically stable and there is no casual correlation. As shown in Table 1, the high prediction quality of the developed QSPR model can be proved by the predictive performance of the new model (Q2 = 0.874, RMSE = 0.369 and MAE = 0.228) to the test subset. Furthermore, model (5 has good robustness and internal predictive ability (Q2CV = 0.957). The Williams plot for the applicability domain of model (5) (Fig. S4) shows that there are two compounds (sulfadiazine and γ-hexachlorocyclohexane) located at the right side of h* (0.257). While, these two compounds yield absolute SR values < 3, indicating they are not outliers. Thus, model (5) can be used to predict the log Kd values of PE in seawater towards the organics including polychlorinated biphenyls, chlorobenzenes, hexachlorocyclohexanes, polycyclic aromatic hydrocarbons and antibiotics.

For the adsorption of PP in sea water, log D and εβ were also selected in model (5). Thus, hydrophobic interaction and hydrogen bond interaction also play determining roles in the adsorption. However, unlike the log Kd predictive model of PE in seawater, the εα representing the covalent acidity is not selected in model (5). Such dissimilarity may come from the addition of methyl groups in the PP structure that reduces the difference of H-accepting ability between the microplastics and water, consequently resulting in a negligible contribution of εα in the adsorption of PP.

QSPR model for the adsorption of PS

For the adsorption of PS in seawater, the experimental log Kd values of 28 organic pollutants (of which 14 are ionizable compounds) were used to established predictive model:

$${\text{Seawater:}}\quad \log K_{{\text{d}}} = \left( {0.357 \pm 0. \, 062} \right) \times \log D + \left( {3.766 \pm 0.384} \right) \times \pi + \left( { - 2.080 \pm 0.540} \right)$$
(6)

As shown in Tables 1 and S1, the obtained statistical parameters (R2 = Q2 = 0.837) prove a good regression performance and the calculated VIF values (1.000 for both descriptors) prove no multicollinearity of model (6). Meanwhile, the favorable consistence between the experimental and predicted log Kd values was observed in Fig. S5. The pattern of predictive errors shown in Fig. S6 reveals no systematic error for model (6), which is also verified by BIAS = 0.000 (Table 1).

Based on the training subset (70%), similar regression coefficients and statistical parameters of the new model (S5) were obtained (Table 1). The comparable statistics were also received for the test set. Moreover, Q2CV value (0.898) of the leave-one-out cross validation was obtained, higher than the acceptable criteria. Thus, model (6) has satisfactory robustness and internal predictive ability. As shown in the Fig. S7 of Williams plot, three compounds (fluoranthene, chrysene and pentacosafluorotridecanoic acid) with ׀SR׀ < 3 locate at the right side of h* (0.321), indicating that they are not outliers. In conclusion, model (6) can be employed for predicting the adsorption carrying capacity (log Kd) of PS for organic pollutants (especially for ionizable organic pollutants) within the application domain in seawater. In previous study20, the influence of dissociation on log Kd for ionizable organic pollutants was not considered in the construction of predictive models. In fact, the physicochemical properties (e.g., hydrophobicity) of various dissociation species are quite different, which may significantly affect the partition of ionizable organic pollutants between PS and seawater. Therefore, the predictive models established without considering the effect of pH on the distribution of dissociation species is only applicable to predict log Kd values under the experimental water pH. However, the QSPR model (6) constructed in this study can expand the predictive application to various pH values. Limited by the number of ionizable compounds and pH range used for model construction, the developed models are more suitable for the pH range of natural waters (6–9).

The presence of log D in model (6) proves that hydrophobic interaction also can enhance the adsorption of organics on PS in seawater. In addition to log D, π was also selected. The experimental log Kd values positively correlate with π (3.766) in the QSPR model, indicating that chemicals with larger π value preferred to be adsorbed onto PS in seawater. As shown in Tables 2 and S2, the organic compound, which contains strong π–electron conjugation in the structure, generally has a large π value. Thus, it can be inferred that the π − π interaction also contributes to the adsorption for PS. The phenyl groups in the PS structure produce higher π–π interactions with organic chemicals than PE and PP, thus yielding higher log Kd values (Table 2). For example, the log Kd value of phenanthrene onto PS (5.50) is much higher than that on PE (4.440) and PP (4.000) in sea water. In brief, hydrophobic interaction and π–π interaction play important roles in the adsorption of PS in sea water.

Materials and methods

Collection of experimental Kd values

In order to improve the predictive accuracy, the properties of microplastics and water environment media were considered by screening and classifying the experimental data used for modeling. For the adsorption of organic pollutants on PE, 37, 24 and 48 experimental Kd values were collected for seawater, freshwater, and pure water, respectively. For the adsorption of PP and PS in seawater, 35 and 28 experimental Kd values were selected, respectively. All these collected data are listed in Table 2. The unit of all Kd values was unified to kg/L. As the value of Kd is quite large, its logarithmic form (log Kd) was used for developing QSPR models. Experimental conditions for determining Kd values are shown in Table S3. Molecular structures for all organic pollutants, including polychlorinated biphenyls, polycyclic aromatic hydrocarbons, aromatic hydrocarbons, chlorobenzenes, hexachlorocyclohexanes, aliphatic hydrocarbons, antibiotics and perfluorinated compounds, are shown in Table S2.

Molecular structural parameters

Based on the previous studies20,30, hydrophobic interaction, hydrogen bond and π-π interaction may play important roles in the adsorption of microplastics towards organic pollutants. Thus, the n-octanol/water distribution coefficient at special pH value (log D), molecular mass (Mw = Mw/100) and six quantum chemical descriptors were calculated for developing QSPR models (Table S4). Six selected quantum chemical descriptors include molecular volume (V′ = V/100), the ratio of average molecular polarizability and molecular volume (π = α/V), the most positive atomic charge on H atom (qH+), the most negative atomic charge (q), covalent acidity (εα = ELUMO − EHOMO-water), and covalent basicity (εβ = ELUMO-water − EHOMO) where EHOMO refers to the highest occupied molecular orbital energy and ELUMO stands for the lowest unoccupied molecular orbital energy. For non-dissociable compounds, the n-octanol/water distribution coefficients are the same for the different pH values. While for the ionizable organics, different log D values for the relevant experimental conditions were obtained from SciFinder42. The values of Mw, V, π, q+, q, EHOMO and ELUMO were extracted from the Gaussian output files.

The structures of all the molecules were optimized at B3LYP/6-31G(d,p) level using Gaussian 09 program package43, and confirmed to be local minima by vibrational frequency analyses with the same method. For the ionizable compounds, all dissociation species may exist under the experimental pH conditions were optimized. The apparent value of each quantum chemical descriptor at special pH value can be calculated as:

$$X_{{{\text{pH}}}} = \, \sum \alpha_{i} X_{i}$$
(7)

where X stands for the quantum chemical descriptor, αi is the fraction of each dissociation species under the experimental pH conditions (Table S3), which can be calculated through the pKa values of the ionizable compounds (Table S5).

Model development and validation

The initial prediction model can be expressed as follows:

$${\log}K_{{\text{d}}} = d\log D + vV^{\prime } \, + mM_{{\text{w}}}^{\prime } \, + a\varepsilon_{\alpha } + b\varepsilon_{\beta } + p\pi + fq^{ + } + eq^{-} + g$$
(8)

where d, v, m, a, b, p, f and e are fitting coefficients, and g is a regression constant. The model development and variable filtration were performed by multiple linear regression (MLR)44 with a step-wise algorithm embedded in soft package SPSS 21.0. The statistical parameters squared correlation coefficient (R2) and root-mean-square error (RMSE) were calculated to characterize the fitting performance and predictive squared correlation coefficient (Q2) was used to represent the predictive ability of the developed QSPR models45. Statistically, the values of R2 and Q2 should be > 0.5. The larger value of Q2 indicates the predictive ability of model is stronger. The collinearity of the employed parameters was assessed by the variance inflating factor (VIF) values. The calculation details for all statistical parameters were listed in the Text S1.

The statistical robustness and predictive ability of the developed models were verified by the simulated external validation and leave-one-out cross validation46. The data set was randomly divided into a 70% training set and a 30% test subset25,29 (shown in Table 2). Based on the training set, a new model was rebuilt with the same descriptors selected by the whole dataset. Subsequently, log Kd values in the test subset were predicted and evaluated by the new models. The values of R2, Q2 and RMSE of the simulated external validation were calculated to estimate the predictive performance47. To assess the model robustness, cross-validated correlation coefficients (Q2CV) were calculated with Weka 3.8.048.

Outliers and application domain

The Williams plot was performed to visualize the application domain and determine the outliers49,50, where the leverage value (hi) was set as horizontal coordinate and standardized predictive residuals (SR) was set as vertical coordinate. Hat-matrix was used to calculate the hi values51. When the absolute value of SR is larger than 3, the relevant compound was designated as outlier and should be removed. Warning value (h*) is defined as h* = 3p/n51, where p and n are the number of descriptors and compounds in the developed model, respectively. If hi > h*, the compound is far away from the descriptor-matrix center. Thus, the Williams plot also can be used to describe the distribution of chemicals in the whole descriptor matrix.

Conclusions

QSPR models were established for predicting the adsorption capacity of organic pollutants on PE in seawater, freshwater and pure water, on PP in seawater and on PS in seawater. The statistical results and application domain validations indicate the satisfactory goodness-of-fit, robustness and predictive ability of the predictive models. The constructed models have two significant advantages: (1) the descriptors used in the models are not dependent on experimental values and can be simply obtained based on the structure of organic pollutants; (2) the models can be used to predict the log Kd values of ionizable compounds at various pH values.

Based on the descriptors selected in the predictive models, main adsorption mechanisms between microplastics and organic pollutants were explored. For all the systems studied here, hydrophobic interaction has been proved to be an indispensable factor for the adsorption. Hydrogen bond interaction and π–π interaction are also considerable mechanisms for the adsorption onto PE and PP in sea water and the adsorption onto PS in sea water, respectively. Thus, this study provides us feasible tools to rapidly and easily predict the adsorption capacity of organic pollutants onto different microplastics in various waters, and also reveals the possible adsorption mechanisms. It will be helpful for further investigation of the environmental risks of both microplastics and their coexisting organic pollutants. Of course, the application scope of the predictive models constructed in this study is still limited as the limitation of experimental data. Therefore, it is still necessary to develop QSPR models for other types of microplastics in the further, or develop predictive method that does not depend on experimental data.