Correlation between the structure and skin permeability of compounds

Zeng, Ruolan; Deng, Jiyong; Dang, Limin; Yu, Xinliang

doi:10.1038/s41598-021-89587-5

Download PDF

Article
Open access
Published: 12 May 2021

Correlation between the structure and skin permeability of compounds

Scientific Reports volume 11, Article number: 10076 (2021) Cite this article

2267 Accesses
9 Citations
Metrics details

Subjects

Abstract

A three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R² of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R² of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.

Predicting skin permeability using HuskinDB

Article Open access 23 September 2022

Laura J. Waters & Xin Ling Quah

An update of skin permeability data based on a systematic review of recent research

Article Open access 21 February 2024

Lisa Chedik, Shamkhal Baybekov, … Catherine Champmartin

Fragment contribution models for predicting skin permeability using HuskinDB

Article Open access 23 November 2023

Laura J. Waters, David J. Cooke & Xin Ling Quah

Introduction

Modeling the penetration of manmade and naturally derived chemicals through human skin is of great importance for pharmaceutical and cosmetic industries, as well as toxicology and risk assessment of environmental and occupational hazards. It is very time-consuming and expensive to estimate the skin permeability of chemicals. Further, there are many ethical challenges associated with human and animal testing for assessment of skin permeability^1,2.

Quantitative structure–activity/toxicity relationship (QSAR/QSTR) models^3,4,5,6 can be used for the prediction of physicochemical property of compounds, even for those that have not been synthesized. Some researchers have carried out QSAR studies for skin permeability of chemicals (the logarithm of the skin permeability coefficients, log K_p).

Patel et al. developed QSAR models for the skin permeability of 158 chemicals with multiple linear regression (MLR) analysis⁷. The model based on four descriptors has an excellent fit to the data with a coefficient of determination of R² of = 0.90. Fujiwara et al. proposed MLR QSARs for the skin permeability of 94 structurally diverse compounds⁸. The models obtained from ten data sets of the skin permeability possess high R² values with an average R² of 0.815. Magnusson et al. introduced a regression model (R² = 0.760) for the skin permeability of 269 compounds⁹. They found that molecular weight was the main determinant of log K_P and QSAR model can be improved when other descriptors such as melting point and hydrogen bonding acceptor capability were added. Chauhan and Shakya built a QSAR model for the skin permeability from the training set of 150 compounds through partial least-squares regression¹⁰. The model with a R² of 0.936 for the training set was validated by the test set of 53 compounds. The root mean square (rms) error and R² from the test set were equal to 0.670 and 0.542. Xu et al. proposed an expanded version of a linear free-energy relationship model for the skin permeability of complex chemical mixtures¹¹. The model (R² = 0.70) showed a better fit and predictive power compared with the simple model (R² = 0.21). Chen et al. generated a MLR model for the skin permeability with four molecular descriptors¹². The model has a R² of 0.858 for the training set (85 compounds), and 0.839 for the test set (21 compounds), which are accurate and acceptable. All these QSAR models referred to were obtained with the linear techniques.

Generally, nonlinear QSAR models possess better statistical performance than linear QSAR models because of the nonlinear correlation between molecular physicochemical properties and structure descriptors. Neely et al. constructed a nonlinear artificial neural network (ANN) model for the skin permeability of 160 molecular structures¹³. The ANN model (10-3-7-1) based on ten descriptor and two hidden layers had an absolute-average percentage deviation, rms error, and R of 8.0%, 0.34, and 0.93, respectively. Khajeh and Modarress introduced a novel nonlinear QSAR model for the skin permeability of 283 compounds with the hybrid of ANN and a fuzzy inference system, adaptive neuro-fuzzy inference system (ANFIS)¹⁴. The ANFIS model was based on a training set of 225 compounds and validated by a test set of 58 compounds. The R² values for the two sets were 0.899 and 0.890, respectively. The model possesses good predictive ability, although there are nine compounds in duplicate in the data set.

ANN algorithm may easily fall into a local minimum value and possesses the disadvantages of slow convergence speed¹⁵. Support vector machine (SVM) algorithm is based on the principle of structural risk minimization. SVMs can effectively avoid local optimums and have unique advantages in solving practical problems such as limited training samples, high dimensional and nonlinear data. The aim of this study was to develop a nonlinear SVM QSAR model for the skin permeability of a sufficiently large data set consisting of 274 compounds.

Materials and methods

Khajeh and Modarress reported 283 compounds and their experimental log K_p values¹⁴. After careful investigation, we found that the sample, p-Chlorobenzene, should be 1-chloro-4-nitrobenzene and 4-Chloro-4-phenylenediamine should be 4-Chloro-m-phenylenediamine. There are no counterions or organometallics in the data set. The molecular weights of 283 compounds were calculated with ChemDraw Ultra 8.0 in ChemOffice 2004. These molecules possessing the same molecular weights were checked carefully to identify the duplicates. There are nine compounds in duplicate, including 4-phenylenediamine (1,4-benzenediamine), 4-hydroxynitrobenzene (4-nitrophenol), methylhydroxybenzoate (methyl 4-hydroxybenzoate), 1,2-benzenediamine (2-phenylenediamine), 2-naphthol (naphthalene-2-ol), 2-nitro-1,4-phenylenediamine (2-nitro-4-phenylenediamine), 1-nonanol (Nonanol), 4-chloro-1,3-phenylenediamine (4-Chloro-m-phenylenediamine), and 1-heptanol (Heptanol). After these duplicates were deleted, 274 compounds were obtained. Table S1 in “Supplementary Materials” shows their SMILES structures and the log K_p values. The units for skin permeability coefficients K_p are cm/h and these log K_p values ranged from − 6.10 to − 0.76. The Kennard-Stone algorithm¹⁶ was used to group the compounds in the training set (139 compounds) and test set (135 compounds). The training set was used to adjust model parameters and train QSAR models; and the test set was used to validate the models.

ChemDraw Ultra 8.0 in ChemOffice 2004 was adopted to generate the structures of 274 compounds, which were converted into three-dimensional structures with Chem3D Ultra 8.0 and optimized with a semi-empirical AM1 method in MOPAC. Dragon 6.0¹⁷ was used to calculate 4885 molecular descriptors for each compound. After some molecular descriptors that equal a constant or their correlation coefficients are above 0.90 were deleted, 1820 descriptors (including Neoplastic-80) were obtained for descriptor selection. Stepwise MLR analysis in IBM SPSS Statistical 19 was performed to select the optimal subset of descriptors and develop MLR models.

For non-linear regression, SVM algorithms map input variables into high-dimensional feature space, from which linear regression analysis is carried out^18,19. For sample data, $\left( {y_{1} ,x_{1} } \right), \ldots ,\left( {y_{l} ,x_{l} } \right),\quad x \in R^{n} ,\;y \in R$, the regression function is expressed as follows:

$$ f(x) = \sum\limits_{i}^{n} {\varphi (x_{i} )w + b} $$

(1)

The optimal regression function can be obtained by means of the following minimization problem:

$$ \mathop {min}\limits_{{w,b,\xi ,\xi^{*} }} J(w,\upxi ,\upxi ^{*} ,b) = \frac{1}{2}\left\| {\left. w \right\|} \right.^{2} + C\sum\limits_{i} {(\xi_{i} + \xi_{i}^{*} )} $$

(2)

Subject to Eqs. (3–4):

$$ y_{i} - \varphi^{T} (x_{i} )w - b \le \varepsilon + \xi_{i} $$

(3)

$$ \varphi^{T} (x_{i} )w + b - y_{i} \le \varepsilon + \xi_{i}^{*} $$

(4)

In SVM regression, the ε-insensitive loss function is employed for minimizing the training error:

$$ \left| {f\left( x \right) - y} \right|_{\varepsilon } = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {\left| {f\left( x \right) - y} \right| < \varepsilon } \hfill \\ {\left| {f\left( x \right) - y} \right| - \varepsilon ,} \hfill & {\left| {f\left( x \right) - y} \right| \ge \varepsilon } \hfill \\ \end{array} } \right. $$

(5)

Thus, Eq. (1) is:

$$ f{(}x{)} = \sum\limits_{i}^{n} {{(}a_{i} - a_{i}^{*} {)}} \varphi {(}x_{i} {)} \cdot \varphi {(}x{)} + b $$

(6)

By applying a kernel function k(x, y), Eq. (6) can be expressed as:

$$ f{(}x{)} = \sum\limits_{i}^{s} {{(}a_{i} - a_{i}^{*} {)}} K{(}x{,}y{)} + b $$

(7)

Gaussian radial basis function (RBF) was used in this work:

$$ K{\text{(X}}_{i} {\text{,X}}_{j} {)} = {\text{exp}}\left( { - \gamma \left\| {{\text{X}}_{i} - {\text{X}}_{j} } \right\|^{2} } \right) $$

(8)

For SVM models, their SVM parameters C and γ can affect greatly their prediction performance. Both C and γ were optimized with the genetic algorithm. In this study, the LibSVM toolbox²⁰ working on Matlab platform was used to develop models, which can be downloaded freely from https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

Results and discussion

After carrying out stepwise MLR analysis in IBM SPSS Statistical 19 for the skin permeability log K_p of 274 compounds and 1820 descriptors, a three-descriptor QSAR model was obtained, which includes A log P, X3v, and Neoplastic-80.

The Ghose–Crippen–Viswanadhan octanol–water partition coefficient (A log P) is based on the A log P model²¹ and calculated by:

$$ {\text{A}}\log {\text{P}} = \sum\limits_{i} {n_{i} a_{i} } $$

(9)

where n_i is the number of atom of type i and a_i is the corresponding hydrophobicity constant. Previous works have shown that A log P is positively correlation with skin permeability log K_p. In this work, the descriptors were converted to a new descriptor cos²[(4.31 + A log P)/8.66]. An analysis of cos²[(4.31 + A log P)/8.66] with respect to the skin permeability log K_p of 274 compounds resulted in regression Eq. (10) and statistical parameters:

$$ \begin{aligned} & {\text{log}}\;K_{{\text{p}}} = 0.{624}{-}{5}.{\text{178 cos}}^{{2}} \left[ {\left( {{4}.{31} + {\text{A}}\;{\text{log}}\;{\text{P}}} \right)/{8}.{66}} \right] \\ & n = {274},\;R = 0.{695},\;R^{{2}} = 0.{483},\;R^{{2}}_{{{\text{adj}}}} = 0.{481},\;{\text{se}} = 0.{713},\;F = {253}.{9}0{3} \\ \end{aligned} $$

(10)

where n is the number of samples in the training set, R² is the coefficient of determination, R²_adj is the adjusted R square, se is the standard error of the estimate, and F is the Fischer ratio. Figure 1 shows the correlation between cos²[(4.31 + A log P)/8.66] and log K_p. The descriptor cos²[(4.31 + A log P)/8.66] (or A log P) describes the hydrophobic character of a compound and is related to log K_p.

Connectivity indices are used widely in QSARs. They are based on the H-depleted molecular graph whose vertexes belong to non-hydrogen atom and are correlated with the number of connected non-hydrogen atoms¹⁷. The general formula for calculating connectivity indices is:

$$ Xk = \sum\limits_{j = 1}^{k} {\left( {\prod\limits_{i = 1}^{n} {\delta_{i} } } \right)}^{{{{ - 1} \mathord{\left/ {\vphantom {{ - 1} 2}} \right. \kern-\nulldelimiterspace} 2}}} $$

(11)

where n is the number of vertices; k is an integer ranging from 0 to 5, denoting the total number of kth order paths present in the molecular graph; and δ is the vertex degrees. Valence connectivity indices (Xkv) can be used to account for the presence of heteroatoms in the molecule as well as of double and triple bonds, by means of replacing the vertex degree with the valence vertex degree. The valence connectivity index of order 3, X3v, describes molecular size and shape.

By correlating log K_p to the two descriptors, cos²[(4.31 + A log P)/8.66] and X3v, we obtained the following regression equation:

$$ \begin{aligned} & {\text{log}}\;K_{{\text{p}}} = {2}.{23}0{-}{6}.{\text{643 cos}}^{{2}} \left[ {\left( {{4}.{31} + {\text{A}}\;{\text{log}}\;{\text{P}}} \right)/{8}.{66}} \right]{-}0.{\text{245 X3v}} \\ & n = {274},\;R = 0.{918},\;R^{{2}} = 0.{844},\;R^{{2}}_{{{\text{adj}}}} = 0.{842},\;{\text{se}} = 0.{393},\;F = {731}.0{89} \\ \end{aligned} $$

(12)

Compared with Eq. (10), the quality of Eq. (12) improved noticeably when the descriptor X3v was added. Figure 2 shows the correlation between the experimental and calculated log K_p with Eq. (12). As illustrated in Fig. 2, there were two samples, ouabain (No. 5 in Table S1), and fluocinonide (No. 11) with larger prediction errors for log K_p. Thus, more molecular descriptors should be added.

The descriptor Ghose–Viswanadhan–Wendoloski antineoplastic-like index at the qualifying range that covers approximately 80% of the drugs studied, Neoplastic-80, depends on A log P and reflects molecular polarity and hydrophobicity¹⁷. The Neoplastic-80 value of a molecule that has a benzene ring, heterocyclic ring, aliphatic amine, carboxamide group, alcoholic hydroxyl group, carboxy ester and/or keto group, was equal to 1, when its A log P value is in the range of − 1.5 to 4.7, the molar refractivity of 43–128, the molecular weight of 180–470, and the total number of atoms of 21–63; otherwise Neoplastic-80 equals zero. A molecule with larger Neoplastic-80 might have a smaller log K_p value. Carrying out regression analysis between log K_p of 274 compounds and the three descriptors stated above resulted in Eq. (13):

$$ \begin{aligned} & {\text{log}}\;K_{{\text{p}}} = {2}.{2}0{9}{-}{6}.{\text{698 cos}}^{{2}} \left[ {\left( {{4}.{31} + {\text{A}}\;{\text{log}}\;{\text{P}}} \right)/{8}.{66}} \right]{-}0.{\text{174 X3v}}{-}0.{7}0{\text{4 Neoplastic}} - {8}0 \\ & n = {274},\;R = 0.{945},\;R^{{2}} = 0.{893},\;R^{{2}}_{{{\text{adj}}}} = 0.{892},\;{\text{se}} = 0.{324},\;F = {756}.{879} \\ \end{aligned} $$

(13)

The correlation coefficient R of 0.945 in Eq. (13) was slightly higher than the 0.942 of the model¹³. Moreover, Eq. (13) has accurate prediction for the skin permeability log K_p of compounds including the two samples (Nos. 5 and 11 in Table S1 in “Supplementary Materials”) stated above, since Fig. 3 shows that there are no samples with obvious larger errors. When the descriptor A log P, together with X3v and Neoplastic-80, was directly used to develop the MLR model, its correlation coefficient R was only 0.939, which was lower than the 0.945 of Eq. (13). Thus the three descriptors, cos²[(4.31 + A log P)/8.66], X3v, and Neoplastic-80 shown in Table S1 in “Supplementary Materials” were used to develop QSAR models.

A correlation analysis between the skin permeability log K_p of 139 compounds in the training set and the three descriptors resulted in Eq. (14) (i.e., MLR model):

$$ \begin{aligned} & {\text{log}}\;K_{{\text{p}}} = {2}.0{68}{-}{6}.{\text{515 cos}}^{{2}} \left[ {\left( {{4}.{31} + {\text{A}}\;{\text{log}}\;{\text{P}}} \right)/{8}.{66}} \right]{-}0.{\text{722 X3v}}{-}0.{\text{168 Neoplastic}} - {8}0 \\ & n = {139},\;R = 0.{949},\;R^{{2}} = 0.{9}0{1},\;R^{{2}}_{{{\text{adj}}}} = 0.{899},\;{\text{se}} = 0.{349},\;F = {41}0.{148} \\ \end{aligned} $$

(14)

The characteristics of molecular descriptors in MLR model are listed in Table 1. As can been observed in Table 1, the three descriptors, cos²[(4.31 + A log P)/8.66], X3v, and Neoplastic-80 descriptor all were significant and made a contribution to log K_p, because their significance values (or P values) are less than 0.05. In addition, their variance inflation factors (VIF) were far less than ten suggesting that the three descriptors describe different structure factors affecting skin permeability log K_p. The t-test can be used to measure the significance of descriptors in making a contribution to molecular physicochemical properties. The higher the absolute value of the t-test, the greater the significance of the descriptor. According to the t-test values in Table 1, the absolute values of t-test increased in the sequence: Neoplastic-80, X3v, and cos²[(4.31 + A log P)/8.66], the significance of descriptors increased in the same sequence.

Table 1 Characteristics of molecular descriptors in MLR model.

Full size table

The MLR model was further used to predict the skin permeability log K_p of 135 compounds in the test set. The correlation coefficient R of the test set was 0.928. The rms errors for the training set, test set and total set were 0.343, 0.302, and 0.323, respectively. The prediction log K_p values are illustrated in Fig. 4 and listed in Table S1 in “Supplementary Materials”.

The three molecular descriptors used in Eq. (14) were used as input variables to develop SVM models for skin permeability log K_p from the training set of 139 compounds, by applying the LibSVM toolbox in the MATLAB R2014a software platform. A genetic algorithm was adopted to optimize the SVM parameters C and γ under the following conditions: the searching range of parameters C was [0, 1000], the searching range of γ was [0, 10], the m in m-fold-cross-validation was 5, the maximum generation was 200, the maximum population size was 20, and the ε in the ε-insensitive loss function was 0.001.

The optimization results for the SVM model were obtained: the parameters C being 7.2906 and γ being 1.7200, and the internal correlation coefficient based on leave-one-out (LOO) cross-validation method being 0.82. The optimal SVM model was further validated with the test set of 135 compounds. The SVM prediction results are listed in Table S1 in “Supplementary Materials” and illustrated in Fig. 5. The coefficient of determination R² and rms error for the training set of 139 compounds were 0.946 and 0.253, respectively; R² and rms for the test set of 135 compounds were 0.872 and 0.302, respectively; and R² and rms error for the total set were 0.925 and 0.270, respectively. The rms errors of 0.253, 0.302, and 0.270, respectively, for the training set, test set and total set from the SVM model were lower than those (0.343, 0.302, and 0.323, respectively) of Eq. (14) (MLR model) in this study. Therefore, there were non-linear relationships between the skin permeability log K_p and molecular descriptors used.

The SVM model was further evaluated with the criteria by Golbraikh and Tropsha:²²

$$ q_{{{\text{ext}}}}^{{2}} = 1 - \frac{{\sum {\left( {y_{i} - \widetilde{y}_{i} } \right)^{2} } }}{{\sum {\left( {y_{i} - \overline{y}_{{{\text{train}}}} } \right)^{2} } }} = 0.905 > 0.5 $$

(15)

$$ 0.85 < k = \frac{{\sum {y_{i} \widetilde{y}_{i} } }}{{\sum {\widetilde{y}_{i}^{2} } }} = 1.001 < 1.15 $$

(16)

$$ 0.85 < k^{\prime} = \frac{{\sum {y_{i} \widetilde{y}_{i} } }}{{\sum {y_{i}^{2} } }} = 0.985 < 1.15 $$

(17)

$$ R_{0}^{2} = 1 - \frac{{\sum {\left( {\widetilde{y}_{i} - y_{i}^{{r_{0} }} } \right)^{2} } }}{{\sum {\left( {\widetilde{y}_{i} - \overline{{\widetilde{y}}} } \right)^{2} } }} = 0.858 $$

(18)

$$ R_{0}^{\prime 2} = 1 - \frac{{\sum {\left( {y_{i} - \widetilde{y}_{i}^{{r_{0} }} } \right)^{2} } }}{{\sum {\left( {y_{i} - \overline{y} } \right)^{2} } }} = 0.872 $$

(19)

$$ {{\left| {R^{2} - R_{0}^{2} } \right|} \mathord{\left/ {\vphantom {{\left| {R^{2} - R_{0}^{2} } \right|} {R^{2} }}} \right. \kern-\nulldelimiterspace} {R^{2} }} = 0.016 < 0.1; $$

(20)

$$ {{\left| {R^{2} - R_{0}^{\prime 2} } \right|} \mathord{\left/ {\vphantom {{\left| {R^{2} - R_{0}^{\prime 2} } \right|} {R^{2} }}} \right. \kern-\nulldelimiterspace} {R^{2} }} = 0 < 0.1 $$

(21)

where $q_{ext}^{2}$ is external correlation coefficient; R₀² and R₀^′2 are determination coefficients of the predicted vs. the observed values and of the observed vs. the predicted values, respectively; k and k′ are slopes of regression lines of the predicted vs. the observed values and of the observed values vs. the predicted values; $\overline{y}_{{{\text{train}}}}$ is the average value of the training set; y_i and $\overline{y}_{i}$ are the observed and the predicted activities, respectively; $y_{i}^{{r_{0} }} = k\widetilde{y}_{i}$ and $\widetilde{y}_{i}^{{r_{0} }} = k^{\prime}y_{i}$. Obviously, our SVM model satisfied the validation criteria^22,23.

The coefficient of determination R² (= 0.946) in this study is higher than the R² of 0.90⁷, 0.815⁸, 0.760⁹, 0.936¹⁰, 0.70¹¹, 0.858¹², and 0.93¹³. In addition, the rms errors of the training set, test set and total set from the ANFIS model of Khajeh and Modarress that dealt with the 283 samples were 0.318, 0.308, and 0.316 respectively¹⁴, which were greater than the rms errors ( 0.253, 0.302, and 0.270, respectively) from our SVM model. Compared with results of other models reported in the literature^{9,10,11,12,13,14}, our SVM model shows better statistical performance in a model that deals with more samples in the test set.

Conclusions

A three-descriptor SVM model with SVM parameters C of 7.2906 and γ of 1.7200 was successfully built for the skin permeability log K_p of a sufficiently large data set consisting of 274 compounds, by means of a genetic algorithm. The SVM model possesses rms errors of 0.253 for the training set (139 compounds), 0.302 for the test set (135 compounds), and 0.270 for the total set (274 compounds). Our SVM model shows better statistical performance in a model that deals with more samples in the test set, compared with other QSARs of the skin permeability of log K_p reported in the literature. There were non-linear relationships between the skin permeability log K_p and molecular descriptors used. It was reasonable applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability.

Data availability

All data generated or analysed during this study are included in this published article (and its “Supplementary Information” files).

References

Fitzpatrick, D., Corish, J. & Hayes, B. Modelling skin permeability in risk assessment––The future. Chemosphere 55, 1309–1314 (2004).
Article ADS CAS Google Scholar
Alves, V. M. et al. Predicting chemically-induced skin reactions. Part II: QSAR models of skin permeability and the relationships between skin permeability and skin sensitization. Toxicol. Appl. Pharmacol. 284, 273–280 (2015).
Article CAS Google Scholar
Varpe, B. D. et al. 3D-QSAR and Pharmacophore modeling of 3,5-disubstituted indole derivatives as Pim kinase inhibitors. Struct. Chem. 31, 1675–1690 (2020).
Article CAS Google Scholar
Heo, S. K., Safder, U. & Yoo, C. K. Deep learning driven QSAR model for environmental toxicology: Effects of endocrine disrupting chemicals on human health. Environ. Pollut. 253, 29–38 (2019).
Article CAS Google Scholar
Lotfi, S., Ahmadi, S. & Zohrabi, P. QSAR modeling of toxicities of ionic liquids toward Staphylococcus aureus using SMILES and graph invariants. Struct. Chem. 31, 2257–2270 (2020).
Article CAS Google Scholar
Rahmani, N., Abbasi-Radmoghaddam, Z., Riahi, S. & Mohammadi-Khanaposhtanai, M. Predictive QSAR models for the anti-cancer activity of topoisomerase IIα catalytic inhibitors against breast cancer cell line HCT15: GA-MLR and LS-SVM modeling. Struct. Chem. 31, 2129–2145 (2020).
Article CAS Google Scholar
Patel, H., ten Berge, W. & Cronin, M. T. D. Quantitative structure-activity relationships (QSARs) for the prediction of skin permeation of exogenous chemicals. Chemosphere 48, 603–613 (2002).
Article ADS CAS Google Scholar
Fujiwara, S.-I., Yamashita, F. & Hashida, M. QSAR analysis of interstudy variable skin permeability based on the “latent membrane permeability” concept. J. Pharm. Sci. 92, 1939–1946 (2003).
Article CAS Google Scholar
Magnusson, B. M., Anissimov, Y. G., Cross, S. E. & Roberts, M. S. Molecular size as the main determinant of solute maximum flux across the skin. J. Invest. Dermatol. 122, 993–999 (2004).
Article CAS Google Scholar
Chauhan, P. & Shakya, M. Role of physicochemical properties in the estimation of skin permeability: In vitro data assessment by Partial Least-Squares Regression. SAR QSAR Environ. Res. 21, 481–494 (2010).
Article CAS Google Scholar
Xu, G., Hughes-Oliver, J. M., Brooks, J. D. & Baynes, R. E. Predicting skin permeability from complex chemical mixtures: Incorporation of an expanded QSAR model. SAR QSAR Environ. Res. 24, 711–731 (2013).
Article CAS Google Scholar
Chen, C.-P., Chen, C.-C., Huang, C.-W. & Chang, Y.-C. Evaluating molecular properties involved in transport of small molecules in stratum corneum: A quantitative structure-activity relationship for skin permeability. Molecules 23, 911 (2018).
Article Google Scholar
Neely, B. J., Madihally, S. V., Robinson, R. L. & Gasem, K. A. M. Nonlinear quantitative structureproperty relationship modeling of skin permeation coefficient. J. Pharm. Sci. 98, 4069–4084 (2009).
Article CAS Google Scholar
Khajeh, A. & Modarress, H. Linear and nonlinear quantitative structure–property relationship modelling of skin permeability. SAR QSAR Environ. Res. 25, 35–50 (2014).
Article CAS Google Scholar
Zhou, T., Lu, H., Wang, W. & Yong, X. GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl. Soft. Comput. 75, 323–333 (2019).
Article Google Scholar
Daszykowski, M. et al. TOMCAT: A MATLAB toolbox for multivariate calibration techniques. Chemom. Intell. Lab. Syst. 85, 269–277 (2007).
Article CAS Google Scholar
Talete srl. DRAGON (Software for Molecular Descriptor Calculation) Version 6.0. http://www.talete.mi.it/ (2012).
Yu, X., Xu, L., Zhu, Y., Lu, S. & Dang, L. Correlation between 13C NMR chemical shifts and complete sets of descriptors of natural coumarin derivatives. Chemom. Intell. Lab. Sys. 184, 167–174 (2019).
Article CAS Google Scholar
Yu, X. Prediction of depuration rate constants for polychlorinated biphenyl congeners. ACS Omega 4, 15615–156205 (2019).
Article CAS Google Scholar
Chang, C. C. & Lin, C. J. LIBSVM: A library for support vector machines. Acm. T. Intel. Syst. Tec. 2, 27 (2011).
Google Scholar
Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. Prediction of hydrophobic (liphophilic) properties of small organic molecules using fragmental methods: An analysis of ALOGP and CLOGP methods. J. Phys. Chem. A 102, 3762–3772 (1998).
Article CAS Google Scholar
Golbraikh, A. & Tropsha, A. Beware of q2. J. Mol. Graph. Model. 20, 269–276 (2002).
Article CAS Google Scholar
Yu, X., Bing, Y., Yu, W. & Wang, X. DFT-based quantum theory QSPR studies of molar heat capacity and molar polarization of vinyl polymers. Chem. Pap. 62, 623–629 (2008).
CAS Google Scholar

Download references

Acknowledgements

This work was supported by the Provincial Natural Science Foundation of Hunan (No. 2020JJ6012) and the Open Project Program of Hunan Provincial Key Laboratory of Environmental Catalysis & Waste Regeneration (Hunan Institute of Engineering) (No. 2018KF11).

Author information

Authors and Affiliations

Hunan Provincial Key Laboratory of Environmental Catalysis & Waste Regeneration, College of Materials and Chemical Engineering, Hunan Institute of Engineering, Xiangtan, 411104, Hunan, China
Ruolan Zeng, Jiyong Deng, Limin Dang & Xinliang Yu

Authors

Ruolan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jiyong Deng
View author publications
You can also search for this author in PubMed Google Scholar
Limin Dang
View author publications
You can also search for this author in PubMed Google Scholar
Xinliang Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.Z., J.D. data curation, manuscript revision. L.D.: data collection and curation, descriptor calculation. X.Y.: conceptualization, methodology, software, model development, writing-original draft preparation, manuscript revision.

Corresponding authors

Correspondence to Jiyong Deng or Xinliang Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Table S1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zeng, R., Deng, J., Dang, L. et al. Correlation between the structure and skin permeability of compounds. Sci Rep 11, 10076 (2021). https://doi.org/10.1038/s41598-021-89587-5

Download citation

Received: 14 January 2021
Accepted: 28 April 2021
Published: 12 May 2021
DOI: https://doi.org/10.1038/s41598-021-89587-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.