Abstract
The ongoing COVID19 pandemic continues to pose significant challenges worldwide, despite widespread vaccination. Researchers are actively exploring antiviral treatments to assess their efficacy against emerging virus variants. The aim of the study is to employ Mpolynomial, neighborhood Mpolynomial approach and QSPR/QSAR analysis to evaluate specific antiviral drugs including Lopinavir, Ritonavir, Arbidol, Thalidomide, Chloroquine, Hydroxychloroquine, Theaflavin and Remdesivir. Utilizing degreebased and neighborhood degree sumbased topological indices on molecular multigraphs reveals insights into the physicochemical properties of these drugs, such as polar surface area, polarizability, surface tension, boiling point, enthalpy of vaporization, flash point, molar refraction and molar volume are crucial in predicting their efficacy against viruses. These properties influence the solubility, permeability, and bio availability of the drugs, which in turn affect their ability to interact with viral targets and inhibit viral replication. In QSPR analysis, molecular multigraphs yield notable correlation coefficients exceeding those from simple graphs: molar refraction (MR) (0.9860), polarizability (P) (0.9861), surface tension (ST) (0.6086), molar volume (MV) (0.9353) using degreebased indices, and flash point (FP) (0.9781), surface tension (ST) (0.7841) using neighborhood degree sumbased indices. QSAR models, constructed through multiple linear regressions (MLR) with a backward elimination approach at a significance level of 0.05, exhibit promising predictive capabilities highlighting the significance of the biological activity \(IC_{50}\) (Half maximal inhibitory concentration). Notably, the alignment of predicted and observed values for Remdesivir’s with obs \({pIC_{50} = 6.01}\),pred \({pIC_{50} = 6.01}\) (\(pIC_{50}\) represents the negative logarithm of \(IC_{50}\)) underscores the accuracy of multigraphbased QSAR analysis. The primary objective is to showcase the valuable contribution of multigraphs to QSPR and QSAR analyses, offering crucial insights into molecular structures and antiviral properties. The integration of physicochemical applications enhances our understanding of factors influencing antiviral drug efficacy, essential for combating emerging viral strains effectively.
Similar content being viewed by others
Introduction
Graph theory has seen a surge in its application to pharmacology and medicine, with chemical graph theoreticians focusing on computing topological indices of drug structures to gain insights into molecular properties and aid in drug development. SARSCoV2, a singlestranded RNA virus, causes COVID19, the first major pandemic of the twentyfirst century. In 2003, SARS, caused by a new corona virus strain, led to 916 deaths globally. Similarly, COVID19 emerged in December 2019, originating in Wuhan, China, and was declared a global public health emergency by the WHO in January 2020^{1}. We are in the half past of 2023, but still, we are facing the corona virus pandemic situation. As of May 12, 2024, 10:39am CEST, the World Health Organization (WHO) has reported a global total of 775,379,864 confirmed COVID19 cases, with 7 million recorded fatalities. For the latest statistics, refer to https://covid19.who.int/.
Our research, extending on prior studies highlighting double bonds, could improve correlation results in molecular modeling. Our study is inspired by previous research such as that by Kier et al.’s^{2} observation in “Medicinal Chemistry: A Series of Monographs” about doubleedge counts providing a more accurate representation of double bonds. Recent work by Simon et al. also indicated improved correlations for molecules with weighted Wiener indices compared to traditional Wiener indices for simple graphs, while Zakharov et al. proposed a novel approach using multigraphs for enhanced statistical QSAR model building^{3,4}. Using these insights, by these insights, we conducted a comparative analysis between simple and complex models to investigate the impact of double bonds on property estimation accuracy. Topological indices analyze the structureproperty relationships in chemical compounds, providing numerical parameters for QSPR and QSAR studies. The research on TI’s has led to the development of over 3000 indices, reflecting the structural properties of the graphs used for their calculation. Most recently, Sakander Hayat et al. research explores the use of temperaturebased topological indices, valencybased descriptors, distancebased graphical indices, and eigenvaluesbased indices to predict physicochemical and thermodynamic properties of polycyclic aromatic hydrocarbons and benzenoid hydrocarbons^{5,6,7,8,9,10}. Recently, QSPR/QSAR analysis on the antiviral drugs, corona drugs and anticancer drugs has been analyzed using degree/reverse degree/distance/neighborhood based topological descriptors^{11,12,13,14,15,16}. Zaman et al.^{17,18,19,20,21,22,23,24,25,26} research delves into diverse applications of analytical and theoretical studies in chemistry and related fields, focusing on structural analysis, topological characterization, and mathematical modeling of various nanostructures, biochemical networks, and metalorganic models. The author’s work explores the relationships between molecular topology, irregular molecular descriptors, and novel topological indices, offering insights into the structural properties of complex materials and nanostructures.
This article represents chemical structures using hydrogen suppressed molecular multigraphs with the inclusion of double bonds. A multigraph is a graph containing multiple edges, where multiple edges indicate more than one connection between two vertices, and loops represent edges connecting the same vertex at both ends^{27}. Marrero Ponce in^{28} discusses the application of QSPR/QSAR analysis for pseudographs (graphs with loops and parallel edges), with considerations for heteroatoms using the Valence delta concept^{29}. This study compares multigraph and simple graph modeling approaches using topological structure descriptors to estimate physicochemical and biological activity through QSPR/QSAR analysis. Multiple linear regression techniques validate correlation values, aiding in understanding estimators and identifying potential drugs. Notably, no previous literature directly compares multigraph and simple graph efficacy in this context, making this study’s contribution novel and original.
In this study, multigraphs are employed to establish correlations between the physicochemical properties and biological activity of the antiviral drugs. Our QSAR model, utilizing multigraphs, demonstrates a stronger association between the studied biological activity \((pIC_{50})\) with the topological indices compared to the QSAR model proposed by Kirmani et al.^{11}. Scientific literature has introduced several graph polynomials to aid in the calculation of various graph indices. Distancebased polynomials like the Hosoya polynomial, PI polynomial, Schultz polynomial, and modified Schultz polynomial have been suggested in previous studies see^{30,31,32}. In addition, Deutsch and Klavzar (2015)^{33} developed the Mpolynomial as a means to compute different degreebased TI’s.
The Mpolynomial of graph \(\mathscr {G}\) is defined in the following manner
In this context, \(m_{jk}\) represents the count of edges uv \(\in\) \(E(\mathscr {G})\), where \(d_u\) and \(d_v\) are the degrees of vertices u and v, respectively, and (j, k) corresponds to their respective degrees. The NMpolynomial, akin to the Mpolynomial, is a polynomial designed specifically for neighborhood degree sumbased indices^{34}. It serves a similar purpose and its definition is as follows:
Here \(nm^{*}_{jk}\) represents the count of edges uv \(\in\) \(E(\mathscr {G})\), where \(nd^{*}_u\), \(nd^{*}_v\) = (j,k) respectively. \(nd^{*}_u\), \(nd^{*}_v\) denotes the neighborhood degree of the vertices u and v in the graph respectively. The objective of this research is to create reliable QSPR/QSAR models that can effectively forecast the physical/chemical and biological properties of drugs targeting COVID19. Throughout the article, the abbreviations ‘NBD’ (neighborhood degree sumbased indices) and ’D’ (Degree based indices) are used in specific sections for convenience.
Material and method
In our study, we utilized algebraic polynomials to determine the topological indices of several antiviral drugs’ structures, our analysis yielded important findings in this regard. Table 1 presents the relationship between different TI’s derived from the Mpolynomial and NMpolynomial and the range of integration defined in Table 1 as x = 1 and y = 1 is proved by Sandi Klavžar in^{33}. Neighborhood degree sumbased topological indices, as discussed in references^{35,36} which demonstrates a remarkable capability to predict various physicochemical properties with high accuracy. Furthermore, a parallel effort has led to the construction of several other neighborhood degree sumbased topological indices, along with their corresponding classical degreebased topological indices, as detailed in references^{37,38,39}. Mondal et al. conducted a study^{28} to assess the efficacy of four antiviral drugs in the treatment of COVID19 patients. The study employed the Mpolynomial and NMpolynomial methods for evaluation purposes. Additionally, Kirmani et al.^{11} recently developed QSPR/QSAR models utilizing linear and multiple linear regression to establish relationships between physicochemical/biological properties and potential antiviral drugs using TI’s in the context of COVID19 treatment.
To model the antiviral activity of drugs investigated for COVID19 treatment, a combination of ten ’D’ and ten ’NBD’ based TI’s, alongside eight physicochemical properties, such as polar surface area, polarizability, surface tension, boiling point, enthalpy of vaporization, flash point, molar refraction and molar volume, were employed. The study focused on analyzing the drugs Hydroxychloroquine, Theaflavin, Lopinavir, Ritonavir, Arbidol, Chloroquine and Remdesivir. Thalidomide was excluded from the QSAR study due to insufficient available data on its antiviral activity. Fig. 1 displays the chemical structures of these drugs. We utilized ChemSketch to generate visual representations of the below chemical drug structures. Within this article, the QSAR model incorporates the biological activity \(IC_{50}\) (Half maximal inhibitory concentration) to predict the antiviral activity of the mentioned drugs. Multiple linear regression (MLR) is employed as the statistical technique for this purpose. \(IC_{50}\) is a widely used measure in drug development to assess the strength of potential drug candidates and compare their efficacy. It is also used in biochemical studies to understand the properties of proteins and enzymes. \(pIC_{50}\) represents the negative logarithm of \(IC_{50}\). The physicochemical properties and biological activity data of the antiviral drugs mentioned are presented in Table 2. These values were sourced from ChemSpider and the halfmaximal inhibitory concentrations (\(IC_{50}\)) of antiviral activity for the compounds were collected from the scientific literature^{11,40,41,42,43}. and converted to their negative logarithmic scale (\(pIC_{50}\)) to facilitate data analysis and interpretation.
Results and discussions
Computation of Mpolynomial and NMpolynomial of Lopinavir
In this section, we present the significant computational findings of our study. Our focus was on analyzing the molecular multigraph of lopinavir and deriving its Mpolynomial and NMpolynomial, as described in the theorem below. Subsequently, we expanded our analysis to encompass seven additional molecular drug structures. We performed calculations to obtain the Mpolynomial and NMpolynomial equations for each structure, and their corresponding values can be found in Table 3. Only lopinavir computation part is shown and Fig. 2 shows molecular multigraph of lopinavir. Figure 3 shows the 3DPlot of Mpolynomial and NMpolynomial of Lopinavir. From this observation the differences in the surface patterns imply that the degreebased and neighborhood degreebased topological indices derived from these polynomials will also differ in their numerical values and interpretations. To determine the superiority of one index over another, further analysis is required, such as comparing their performance in QSPR/QSAR models, evaluating their correlation coefficients with experimental data, and assessing their ability to discriminate between different molecular structures.
Theorem 1
Let \(\mathscr {L}\) be the molecular multigraph of Lopinavir. Then we have,
Proof
Consider \(\mathscr {L}\) as the molecular multigraph representing Lopinavir (refer to Fig. 2). It comprises a total of 61 edges. Let \(\Gamma _{(j,k)}\) represent the collection of edges where the endpoints have degrees i and j, respectively. (i.e.) \(\Gamma _{(j,k)} = \{uv \in E(\mathscr {L}): \Delta (u) = j, \Delta (v) = k \}\). Let \(m_{(i,j)}\) be the no.of edges in \(\Gamma _{(j,k)}\). From 2 it is clear that \(m_{(1,3)} = 3, m_{(1,4)} = 2, m_{(2,2)} = 4, m_{(2,3)} = 7, m_{(2,4)} = 13, m_{(3,3)} = 18, m_{(3,4)} = 11, m_{(4,4)} = 3\). To derive the Mpolynomial of G, we use Eq. 1.
By using the values of \(m_{(j,k)}\), we get
Let \(\Gamma ^{*}_{(j,k)}\) as the set of all edges in which the neighborhood degree sum of the endpoints corresponds to degrees i and j, respectively. (i.e.,) \(\Gamma ^{*}_{(j,k)} = \{uv \in E(\mathscr {L}): \Delta (u) = j, \Delta (v) = k \}\). Let \(nm^{*}_{(i,j)}\) be the no.of edges in \(\Gamma ^{*}_{(j,k)}\). From 2 it is clear that \(nm^{*}_{(3,5)} = 2, nm^{*}_{(3,6)} = 1, nm^{*}_{(4,4)} = 1, nm^{*}_{(4,5)} = 1, nm^{*}_{(4,6)} = 3, nm^{*}_{(4,7)} = 4, nm^{*}_{(4,8)} = 2, nm^{*}_{(5,9)} = 1, nm^{*}_{(5,10)} = 1, nm^{*}_{(6,6)} = 10, nm^{*}_{(6,7)} = 14, nm^{*}_{(6,10)} = 1, nm^{*}_{(7,7)} = 3, nm^{*}_{(7,8)} = 11, nm^{*}_{(7,9)} = 1, nm^{*}_{(7,10)} = 1, nm^{*}_{(8,10)} = 3, nm^{*}_{(9,10)} = 1\). To derive the NMpolynomial of G, we use Eq. (2).
The Mpolynomial and NMpolynomial are computed to derive a range of ’D’ and ’NBD’ TI’s for the molecular multigraph representing Lopinavir. These findings are summarized in the following theorem. \(\square\)
Theorem 2
Let \(\mathscr {L}\) be the molecular multigraph of Lopinavir. Then, their respective values in Table 3holds.
Proof
Initially, we determine the degreebased indices by referring to Table 1. Let \(M(\mathscr {L};x,y) = t(x,y) = 3xy^{3}+2xy^{4}+4x^{2}y^{2}+7x^{2}y^{3}+13x^{2}y^{4}+18x^{3}y^{3}+11x^{3}y^{4}+3x^{4}y^{4}\). Then we have,

1.
\(M_1(\mathscr {L}) = (D_x+D_y)t(x,y)_{x=y=1} =12xy^{3}+10xy^{4}+16x^{2}y^{2}+35x^{2}y^{3}+78x^{2}y^{4}+108x^{3}y^{3}+77x^{3}y^{4} +24x^{4}y^{4} = 360.\)

2.
\(M_2(\mathscr {L}) = (D_xD_y)t(x,y)_{x=y=1} = 9xy^{3}+8xy^{4}+16x^{2}y^{2}+42x^{2}y^{3}+104x^{2}y^{4}+162x^{3}y^{3}+132x^{3}y^{4}+48x^{4}y^{4}\)

3.
\(mM_2(\mathscr {L}) = S_xS_yt(x,y)_{x=y=1} = xy^{3}+\frac{2}{4}xy^{4}+x^{2}y^{2}+\frac{7}{6}x^{2}y^{3}+\frac{13}{8}x^{2}y^{4}+\frac{18}{9}x^{3}y^{3}+\frac{11}{12}x^{3}y^{4}+\frac{3}{16}x^{4}y^{4} = 8.3958\)

4.
\(ReZG_3(\mathscr {L}) = D_xD_y(D_x+D_y)t(x,y)_{x=y=1} = 36xy^{3}+40xy^{4}+64x^{2}y^{2}+210x^{2}y^{3}+624x^{2}y^{4}+972x^{3}y^{3}+924x^{3}y^{4}+384x^{4}y^{4} = 3254\)

5.
\(F(\mathscr {L}) = (D_x^{2}+D_y^{2})t(x,y)_{x=y=1} = 30xy^{3}+34xy^{4}+32x^{2}y^{2}+91x^{2}y^{3}+260x^{2}y^{4}+324x^{3}y^{3}+275x^{3}y^{4}+96x^{4}y^{4} = 1142\)

6.
\(SDD(\mathscr {L}) = (S_xD_y+S_yD_x)t(x,y)_{x=y=1} = \frac{30}{3}xy^{3}+\frac{34}{4}xy^{4}+\frac{32}{4}x^{2}y^{2}+\frac{91}{6}x^{2}y^{3}+\frac{260}{8}x^{2}y^{4}+\frac{324}{9}x^{3}y^{3} +\frac{275}{12}x^{3}y^{4}+ \frac{96}{16} = 139.0833\)

7.
\(H(\mathscr {L}) = 2S_xJt(x,y)_{x=1} = \frac{7}{4}x^{4}+\frac{9}{5}x^{5}+\frac{31}{6}x^{6}+\frac{11}{7}x^{7}+\frac{3}{8}x^{8} = 21.3262\)

8.
\(I(\mathscr {L}) = S_xJD_xD_yt(x,y)_{x=1} = \frac{25}{4}x^{4}+\frac{50}{5}x^{5}+\frac{266}{6}x^{6}+\frac{132}{7}x^{7}+\frac{48}{8}x^{8} = 85.4405\)

9.
\(A(\mathscr {L}) = S_x^{3}Q_{2}JD_x^{3}D_y^{3}t(x,y)_{x=1} = 42.125x^{2}+60.7407x^{3}+309.0313x^{4}+152.064x^{4}+56.8889x^{6} = 620.8499\)

10.
\(R_{\alpha }(\mathscr {L}) = D_x^{\alpha }D_y^{\alpha }t(x,y)_{x=1} 3(3)^{\alpha }+2(4)^{\alpha }+4(4)^{\alpha }+7(6)^{\alpha }+13(8)^{\alpha }+18(9)^{\alpha }+11(12)^{\alpha }+3(16)^{\alpha } = 22.1114\)
Next, we compute the neighborhood degree sumbased indices by taking into account \(NM^{*}(\mathscr {L}) = t(x,y) = 2x^{3}y^{5}+x^{3}y^{6}+x^{4}y^{4}+x^{4}y^{5}+3x^{4}y^{6}+4x^{4}y^{7}+2x^{4}y^{8}+x^{5}y^{9}+x^{5}y^{10}+10x^{6}y^{6}+14x^{6}y^{7}+x^{6}y^{10}+3x^{7}y^{7}+11x^{7}y^{8}+x^{7}y^{9}+x^{7}y^{10}+3x^{8}y^{10}+x^{9}y^{10}\). By utilizing the edge partition of \(\Gamma ^{*}_{(j,k)}\) in combination with Table 1, the NMpolynomial can be derived, thus concluding the proof. The obtained values of the ’D’ & ’NBD’ indices, calculated using the Mpolynomial and NMpolynomial, are displayed in Tables 3 and 4, respectively. \(\square\)
QSPR analysis of selected antiviral drugs with its target properties
Regression analyses
To clarify the physical significance of our results, we have included concise discussions on the effectiveness of the computed topological indices. These quantitative measures reveal key structural attributes, with higher values indicating enhanced stability and lower reactivity, and lower values suggesting potential reactivity sites. Our study validates the predictive power of these indices by demonstrating strong correlations with experimental properties, supporting their use in understanding structureproperty relationships and guiding drug design and development. We highlight the practical applications in drug delivery and material design while acknowledging the need to consider molecular context and explore advanced methods for improved accuracy.The correlated values between ‘D’ and ‘NBD’ based TI’s and the physicochemical properties of antiviral drugs (COVID19 drugs) can be observed in Tables 5 and 6. From Table 5 we observe that inverse sum indeg index (estimator) reflects a strong positive relationship with boiling point(outcome variable) which is depicted in Fig. 4.
From Fig. 5 we observe that the high correlation coefficients ‘r’ values for the physicochemical properties of Surface tension(ST), Molar refractivity(MR), Molar volume(MV) and Polarizability(P) are higher than the simple graph’s representation of selected antiviral drugs. The existence of a double bond in a molecule can greatly impact its properties, including polarity, conjugation, and reactivity. These changes, in turn, can impact the molecule’s solubility, stability, and biological activity. For example when a molecule contains a double bond, it introduces regions of different electron density, resulting in a shift in polarity. The presence of the double bond can make the molecule more polar or less polar depending on the surrounding atoms and functional groups. We observe that molecular multigraphs can provide a more detailed and nuanced representation of the chemical structure and the high correlation coefficients ’r’ of simple graph representing seven drugs for the physicochemical properties of MR with r = 0.9709, P = 0.9710, ST = 0.5115 and MV = 0.9108 using degree based indices from^{11}. One can see the high correlation ‘r’ values of molecular multigraph in Table 5, bold values with an asterisk*. In similar fashion, From Table 6 we observe that Neighborhood Inverse sum indeg index(NI) (predictor variable) reflects a strong positive relationship with Boiling point(outcome variable) which is depicted in Fig. 6.
From Fig. 7 we observe that the high correlation coefficients ’r’ values for the physicochemical properties of Flash point(FP) and Surface tension(ST) are higher than the simple graph’s representation of selected antiviral drugs. The high correlation coefficients ’r’ of simple graph representing seven drugs for the physicochemical properties of FP with r = 0.9629 and ST with r = 0.6682 using Neighborhood degree sum based indices from^{11}. One can see the high correlation ’r’ values of molecular multigraph in Table 6, bold values with an asterisk *.
Note: We also have observed that the highly correlated values in the multigraph are nearly identical to the values found in the simple graph for both ’D’ and ’NBD’ based correlation values for example, BP with 0.9920, E with 0.9887 from^{11} representing as simple graphs whereas for multigraphs BP with 0.9864 and E with 0.9827, we get a small variance with the correlation values and some are higher than the simple graph. However, when there is a low correlation between chemical structure descriptors and a target property, it suggests that additional factors may play a more significant role in determining the target property. Further analysis or experimentation might be necessary to identify and understand those factors.
QSAR analyses of biological activity \(pIC_{50}\) versus degree based & nbd degree sumbased indices as predictors
Within this section, we employed IBM SPSS Statistics Version 27.0.1.0 software. To view url link of this version, visit https://www.ibm.com/support/pages/downloadingibmspssstatistics27010 to carry out multiple linear regression analyses. \(IC_{50}\) were used as dependent variable and several ’D’ and ’NBD’ based indices, (one can refer Table 1) were used as independent variables. \(IC_{50}\), also known as half maximal inhibitory concentration, is a parameter that measures the effectiveness of a drug or compound in inhibiting a specific biological or biochemical process. It represents the concentration at which the drug can block the target protein’s function by 50 %. \(pIC_{50}\) is a transformed version of \(IC_{50}\), where the “p” stands for the negative logarithm (base 10) of the \(IC_{50}\) value. \(pIC_{50}\) are used in regression analyses over \(IC_{50}\) since it is linearly related to the drug potency than \(IC_{50}\). The selection of the optimal multiple linear regression model was based on these statistical criteria: Fisher ratio (F), squared multiple correlation coefficient \((R^2)\), adjusted correlation coefficient \((R^{2}_{adj})\), Durbin–Watson value (DW), variance inflation factor (VIF), tolerance value and significance (Sig). The main difference between QSPR and QSAR is the type of property that is being predicted. QSPR models utilize statistical and mathematical methods to establish a link between the molecular structure of compounds and their physicochemical properties. On the other hand, QSAR models employ statistical and machine learning techniques to establish a correlation between the molecular structure of compounds and their biological activities.
MLR model and MLR analyses
Multiple linear regression (MLR)^{55} is a statistical technique that explores the relationship between a dependent variable and multiple independent variables. Its purpose is to find the bestfitting regression line that minimizes the differences between the predicted and actual values of the dependent variable. MLR is a statistical method that explores the linear relationship between target variable Y \((pIC_{50})\) and predictor variables X (2D descriptors). Through the least squares curve fitting technique, MLR calculates regression coefficients \((r^2)\) to estimate the model. This approach establishes a straight line equation that accurately represents the overall data points. The regression equation is formulated as follows:
In the regression equation, the dependent variable is represented as Y, and the regression coefficients ’b’ correspond to the independent variables ‘I’. The intercept or regression constant is denoted as ‘c’^{56}. Kirmani et al.^{11} conducted a QSAR analysis on antiviral drugs represented as simple graphs, suggesting a weak association between biological activity \((pIC_{50})\) and TI’s. Inspired by their approach, we applied a similar analysis using molecular multigraphs for our selected drugs and achieved a wellfitting QSAR model by backward elimination method which will be elaborated in the upcoming section.
Multicollinearity and VIF^{57}
Multicollinearity refers to high correlation among independent variables, which can result in unstable and unreliable regression coefficient estimates. Variance inflation factor (VIF) is a measure used to evaluate the presence of multicollinearity in regression analysis, commonly utilized in tools such as SPSS and it is defined as \(VIF = \frac{1}{1R^2}\). VIF values ranging from 1 to 10 indicate no multicollinearity, while values below 1 or above 10 suggest the presence of multicollinearity. Our regression models showed signs of multicollinearity, as some independent variables had correlation coefficients near 1 and corresponding VIF values outside the ideal range of 1 to 10. This implies that the model may struggle to accurately estimate the individual effects of these correlated variables. Hence, it is crucial to address this issue to ensure the reliability and accuracy of our regression results.
QSAR model for \(pIC_{50}\)
The correlation matrix is a helpful tool for detecting multicollinearity in regression models. It displays the pairwise correlations between multiple variables, indicating the strength and direction of their relationship. By examining the matrix for high correlations between independent variables, we can identify multicollinearity and take appropriate measures to address it. In the Supplementary Table S1, we present the correlation matrix between various ’D’ and ’NBD’ based indices. In QSAR analysis, one of the primary goals is to identify the most important molecular descriptors or features that are correlated with the target property. When dealing with numerous molecular descriptors in QSAR analysis, including all of them in the model may not be practical. To tackle this issue, variable selection techniques are utilized to identify the most significant descriptors that exhibit strong correlations with the target property. This process helps improve the predictive performance of the model. Stepwise regression is one such variable selection method that is commonly used in QSAR analysis. It involves iteratively adding or removing descriptors based on their statistical significance in predicting the target property. The process continues until no more significant descriptors remain, resulting in a effective model.
We began constructing simple linear regression models using topological indices that had the lowest correlation (specifically, 0.1170 between \(NDe_3\) and \(NmM_2\)). This led to the development of two monoparameter models. However, both models demonstrated a weak correlation with \(pIC_{50}\).
\(n=7, r=0.3976, R^2=0.1581, R_A^{2} = 0.01026, SE=0.4512, F=0.9390, PE=0.2121\)
Here n : Number of drugs used, r(R):simple(multiple) correlation coefficient, \(R_A^{2}\): adjustable \(R^{2}\), F: Fisher’s statistics, PE: Probability error.
By employing Stepwise regression analysis, various combinations of two topological indices have been examined. The following biparametric model demonstrates significantly improved statistical measures in comparison to its monoparametric (Model 1).
\(n=7, r=0.7292, R^2=0.5317, R_A^{2}=0.2976, SE=0.3762, F=2.2711, PE= 0.1179\).
To improve the statistical parameters of the models, trials were conducted to determine the correlation between three combined TI’s and the biological activity\(pIC_{50}\). However, the resulting model exhibited only marginal improvements in its statistical measures.
\(n=7, r=0.8950, R^2=0.8011, R_A^{2}=0.6022, SE=0.2831, F=4.0282, PE= 0.0501\).
By applying successive Stepwise regression, a tetraparametric model was derived, showcasing notable enhancements in the statistical parameters.
\(n=7, r=0.9689, R^2=0.9389, R_A^{2}=0.8167, SE=0.1921, F=7.6844, PE= 0.0154\).
After employing successive Stepwise regression, a pentaparametric model was obtained, demonstrating enhanced statistical parameters.
\(n=7, r=0.9819, R^2=0.9642, R_A^{2}=0.7854, SE=0.2079, F=5.3922, PE= 0.0090\).
In the aforementioned QSAR models, the Fvalue signifies the ratio between the variability accounted for by the model and the remaining variability ascribed to error. This value is used as an indicator of the model’s statistical significance, with a higher Fvalue suggesting a greater probability of statistical significance. Probability error, also known as a type I error or alpha error, refers to a statistical concept in hypothesis testing, \(PE = \frac{2(1r^2)}{3\sqrt{n}}\)^{56}. The pvalue is a statistical measure that evaluates the likelihood of observing the given outcomes if the null hypothesis is true. It quantifies the level of evidence against the null hypothesis, indicating the strength of the observed results. A predetermined significance level, commonly set at 0.05, is used as a threshold to determine the statistical significance of the study findings and decide whether to reject the null hypothesis. In our QSAR models, we encountered insignificant results as our p (alpha) value was greater than 0.05. By selecting the least correlated variable can reduce the problem of pairwise correlations between the variables, it does not account for the possibility of higherorder correlations among the variables (multicollinearity). Pairwise correlation refers to the correlation between two variables. So we remove all the predictor variables included in the model since all our p values are greater than 0.05. To mitigate this problem, we used the backward elimination method. The objective was to identify a subset of predictor variables that exhibited the most robust association with the response variable \((pIC_{50})\) while avoiding the issue of overfitting the model due to an excessive number of predictors.
Backward elimination method and validation
Backward elimination is a feature selection method used in statistical modeling and machine learning. It aims to identify the most relevant subset of features (independent variables) for a given predictive model. The method starts with a full model that includes all available features and iteratively eliminates features that are found to be nonsignificant. One can refer the article^{58} for QSAR study utilizing TI’s with backward elimination method. By conducting a 2DQSAR analysis on the biological activity \(pIC_{50}\) of antiviral drugs, we generated multiple QSAR models. During the stepwise regression process, we successfully identified and eliminated five independent variables that exhibited insignificant associations with the \(pIC_{50}\) (biological activity) outcome. Initially, our study encompassed a total of 18 independent(predictor) variables, but after removing the insignificant features, we were left with 13 remaining predictors. The best linear model for \(pIC_{50}\) contains three topological indices \(ReZG_3, NDe_5\) and NH. Through the process of backward elimination, we initially considered all 13 predictors \(M_1\), F, \(M_2\), H, SDD, \(mM_2\), A, NH, I, \(NM_1\), \(ReZG_3\), \(NDe_5\) and NI. The aim was to identify the best subset of predictors(independent variables) that displayed a strong association with \(pIC_{50}\). The selected model, model 3 from Table 7, demonstrated the best combination of predictors based on various statistical parameters.
Validation: Durbin–Watson statistics and tolerance^{59}
The Durbin–Watson statistic is used to measure autocorrelation in regression residuals. It ranges from 0 to 4, with 2 indicating no autocorrelation. Autocorrelation occurs when residuals are correlated over time, violating the assumption of independence. The DW statistic helps assess the level of correlation among residuals. A DW value below 2 indicates the presence of positive autocorrelation, while a value above 2 suggests negative autocorrelation. A DW value of 2 indicates the absence of autocorrelation. To evaluate the model’s goodness of fit using the DurbinWatson (DW) statistic, a value close to 2 indicates no significant autocorrelation in the residuals. This suggests that the model effectively represents the relationship between the variables. In our final QSAR model 3, the DW value is around 2, indicating that the errors are uncorrelated. The concept of tolerance is employed as an indicator of multicollinearity, measuring the correlation among independent variables in a model. It is represented on a scale from 0 to 1, with a higher tolerance value nearing 1 indicating a lower degree of correlation among predictor variables, thus suggesting reduced multicollinearity. Conversely, a low tolerance value close to 0 indicates high correlation among predictors, suggesting a potential issue of multicollinearity.
Discussion
Backward elimination typically uses a significance threshold (pvalue) to determine whether a predictor should be removed from the model. If a predictor already exceeds the significance threshold at the beginning, it is considered nonsignificant and excluded directly without further evaluation. In our analysis, we found that 8 out of the 13 predictors did not meet the required statistical criteria, such as pvalues, VIF, and tolerance values. As a result, these predictors were excluded from further analysis. The statistical parameters indicated that these predictors did not significantly contribute to the model and may have exhibited multicollinearity issues. So 5 independent predictors were carried out for backward elimination which is presented in Table 7, among which model 3 is the best to predict the biological activity \(pIC_{50}\) based on these statistical criteria \(VIF < 5\), Tolerance values are not close to zero, DW = 1.850 and all pvalues are less than 0.05.
Ordinary residuals or regular residuals^{59}
Regular Residual \(=\) Observed Value − Predicted Value. In simpler words, a residual signifies the difference between the observed value of the dependent variable and the value estimated by a regression model. It represents the residual error or the remaining variability that the model was unable to explain. They measure the vertical difference between the observed data points and the regression line or curve. The comparison between the actual and independent (predicted) values of the biological activity \(pIC_{50}\) for seven antiviral drugs is presented in Table 8. Figure 8 illustrates the linear relationship between the actual \(pIC_{50}\) values and the predicted \(pIC_{50}\) values obtained from model 3 for the aforementioned drugs.
Conclusion
This study delves into the evaluation of various antiviral drugs for treating COVID19, utilizing molecular multigraphs to analyze their chemical structures. Through edge partition techniques, Mpolynomial and NMpolynomial expressions were derived, leading to the computation of ’D’ and ’NBD’ based indices. The research also involved a thorough QSPR investigation focusing on antiviral drugs as multigraphs, showcasing the predictive power of computed topological indices (TI’s) in determining physicochemical properties. Notably, the inverse sum indeg and neighborhood inverse sum indeg indices exhibited a strong positive correlation with boiling point (BP), surpassing other indices.
Further, QSAR analysis of the biological activity \(pIC_{50}\) of these antiviral drugs were estimated using multiple linear regression in conjunction with backward elimination approach. The results demonstrated that the MLR model was an effective tool for estimating biological activity \(pIC_{50}.\) The validation criteria used were designed to assess the accuracy and predictive capability of the MLR model. The results highlight the effectiveness of the MLR model in estimating \(pIC_{50}\), with specific TI’s like NH, \(NDe_5\), and \(ReZG_3\) showing significant predictive potential. Also the observed and predicted \(pIC_{50}\) of the drugs for the best model evaluated using cross validation techniques shows minor variation, resulting in low residuals.
The study highlights the importance of considering multigraphs as graph models, offering a novel perspective on drug connectivity analysis. By diverging from conventional approaches focused on simple graphs, the research has provided insights into optimizing the drug selection process. In conclusion, there remains an open challenge in incorporating chemometric methods statistical and mathematical techniques for analyzing chemical data to further refine these models. Using these techniques, researchers can advance our understanding of drug behavior and improve strategies for enhancing drug effectiveness.
Data availability
The paper includes the information used to verify the study’s findings.
References
Pillaiyar, T., Manickam, M., Namasivayam, V., Hayashi, Y. & Jung, S. H. An overview of severe acute respiratory syndromecoronavirus (SARSCOV) 3cl protease inhibitors: Peptidomimetics and small molecule chemotherapy. J. Med. Chem. 59, 6595–6628 (2016).
Hite, G. Medicinal Chemistry: A Series of Monographs: By George deStevens 1st edn. (Academic Press, 1964).
Brezovnik, S., Tratnik, N. & Žigert Pleteršek, P. Weighted Wiener indices of molecular graphs with application to alkenes and alkadienes. Mathematics 9, 153 (2021).
Zakharov, A. B., Tsarenko, D. K. & Ivanov, V. V. Topological characteristics of iterated line graphs in the QSAR problem: A multigraph in the description of properties of unsaturated hydrocarbons. Struct. Chem. 32, 1629–1639 (2021).
Hayat, S., Alanazi, S. J. & Liu, J. B. Two novel temperaturebased topological indices with strong potential to predict physicochemical properties of polycyclic aromatic hydrocarbons with applications to silicon carbide nanotubes. Phys. Scr. 99, 055027 (2024).
Hayat, S., Mahadi, H., Alanazi, S. J. & Wang, S. Predictive potential of eigenvaluesbased graphical indices for determining thermodynamic properties of polycyclic aromatic hydrocarbons with applications to polyacenes. Comput. Mater. Sci. 238, 112944 (2024).
Hayat, S. & Liu, J. B. Comparative analysis of temperaturebased graphical indices for correlating the total \(\uppi\)electron energy of benzenoid hydrocarbons. Int. J. Mod. Phys. B 2550047 (2024).
Hayat, S., Khan, A., Ali, K. & Liu, J. B. Structureproperty modeling for thermodynamic properties of benzenoid hydrocarbons by temperaturebased topological indices. Ain Shams Eng. J. 15, 102586 (2024).
Hayat, S. Distancebased graphical indices for predicting thermodynamic properties of benzenoid hydrocarbons with applications. Comput. Mater. Sci. 230, 112492 (2023).
Hayat, S., Suhaili, N. & Jamil, H. Statistical significance of valencybased topological descriptors for correlating thermodynamic properties of benzenoid hydrocarbons with applications. Comput. Theor. Chem. 1227, 114259 (2023).
Kirmani, S. A. K., Ali, P. & Azam, F. Topological indices and QSPR/QSAR analysis of some antiviral drugs being investigated for the treatment of Covid19 patients. Int. J. Quantum Chem. 121, e26594 (2021).
Bokhary, S. A. U. H., Siddiqui, M. K. A. & Cancan, M. On topological indices and QSPR analysis of drugs used for the treatment of breast cancer. Polycycl. Arom. Compds. 42, 6233–6253 (2022).
Shirakol, S., Kalyanshetti, M. & Hosamani, S. M. QSPR analysis of certain distancebased topological indices. Appl. Math. Nonlinear Sci. 4, 371–386 (2019).
Shanmukha, M. C., Basavarajappa, N. S., Shilpa, K. C. & Usha, A. Degreebased topological indices on anticancer drugs with QSPR analysis. Heliyon 6 (2020).
Kirmani, S. A. K., Ali, P., Azam, F. & Alvi, P. A. On vedegree and evdegree topological properties of hyaluronic acidanticancer drug conjugates with QSPR. J Chem. 2021, 1–23 (2021).
Arockiaraj, M., Greeni, A. & Kalaam, A. Linear versus cubic regression models for analyzing generalized reverse degree based topological indices of certain latest corona treatment drug molecules. Int. J. Quantum Chem. 123, e27136 (2023).
Zaman, S., Jalani, M., Ullah, A. & Saeedi, G. Structural analysis and topological characterization of sudoku nanosheet. J. Math. (2022).
Ullah, A., Zaman, S., Hamraz, A. & Saeedi, G. Networkbased modeling of the molecular topology of fuchsine acid dye with respect to some irregular molecular descriptors. J. Chem. (2022).
Ullah, A., Zaman, S. & Hamraz, A. Zagreb connection topological descriptors and structural property of the triangular chain structures. Phys. Scr. 98, 025009 (2023).
Zaman, S., Jalani, M., Ullah, A., Ali, M. & Shahzadi, T. On the topological descriptors and structural analysis of cerium oxide nanostructures. Chem. Pap. 77, 2917–2922 (2023).
Zaman, S., Jalani, M., Ullah, A., Ahmad, W. & Saeedi, G. Mathematical analysis and molecular descriptors of two novel metalorganic models with chemical applications. Sci. Rep. 13, 5314 (2023).
Ullah, A., Bano, Z. & Zaman, S. Computational aspects of two important biochemical networks with respect to some novel molecular descriptors. J. Biomol. Struct. Dyn. 42, 791–805 (2024).
Hakeem, A., Ullah, A. & Zaman, S. Computation of some important degreebased topological indices for γgraphyne and zigzag graphyne nanoribbon. Mol. Phys. 121, e2211403 (2023).
Zaman, S., Salman, M., Ullah, A., Ahmad, S. & Abdelgader Abas, M. Threedimensional structural modelling and characterization of sodalite material network concerning the irregularity topological indices. J. Math. 1–9 (2023).
Zaman, S., Ullah, A. & Shafaqat, A. Structural modeling and topological characterization of three kinds of dendrimer networks. Eur. Phys. J. E 46, 36 (2023).
Ullah, A., Zaman, S., Hussain, A., Jabeen, A. & Belay, M. Derivation of mathematical closed form expressions for certain irregular topological indices of 2D nanotubes. Sci. Rep. 13, 11187 (2023).
Trudeau, R. J. Introduction to Graph Theory (Courier Corporation, 2013).
MarreroPonce, Y. Linear indices of the “molecular pseudograph’s atom adjacency matrix’’: Definition, significanceinterpretation, and application to qsar analysis of flavone derivatives as hiv1 integrase inhibitors. J. Chem. Inf. Comput. Sci. 44, 2010–2026 (2004).
Kier, L. & Hall, L. Molecular connectivity VII: Specific treatment of heteroatoms. J. Pharmaceut. Sci. 65, 1806–1809 (1976).
Stevanović, D. Hosoya polynomial of composite graphs. Discrete Math. 235(1–3), 237–244 (2001).
KHADIKAR, P. On a novel structural descriptor pi. Natl. Acad. Sci. Lett. 23, 113–118 (2000).
Schultz, H. P. Topological organic chemistry. 1. Graph theory and topological indices of alkanes. J. Chem. Inf. Comput. Sci. 29.
Deutsch, E. & Klavžar, S. Mpolynomial and degreebased topological indices. arXiv preprint arXiv: 1407.1592 (2014).
Mondal, S., De, N. & Pal, A. On some general neighborhood degree based topological indices. Int. J. Appl. Math. 32, 1037 (2019).
Shanmukha, M. C., Basavarajappa, N. S., Usha, A. & Shilpa, K. C. Novel neighbourhood redefined first and second Zagreb indices on carborundum structures. J. Appl. Math. Comput. 66, 263–276 (2021).
Ghorbani, M. & Hosseinzadeh, M. A. Computing abc4 index of nanostar dendrimers. Optoelectron. Adv. Mater. Rapid Commun. 4, 1419–1422 (2010).
Graovac, A., Ghorbani, M. & Hosseinzadeh, M. A. Computing fifth geometricarithmetic index for nanostar dendrimers. J. Discrete Math. Appl. 1, 33–42 (2011).
Mondal, S., De, N. & Pal, A. On some new neighbourhood degree based indices. Acta Chem. Iasi 27, 31–46 (2019).
Mondal, S., Siddiqui, M. K., De, N. & Pal, A. Neighborhood mpolynomial of crystallographic structures. Biointerface Res. Appl. Chem. 11.
Pizzorno, A. et al. In vitro evaluation of antiviral activity of single and combined repurposable drugs against SARSCOV2. Antiviral Res. 181, 104878 (2020).
Fan, S. et al. Research progress on repositioning drugs and specific therapeutic drugs for SARSCOV2. Future Med. Chem. 12, 1565–1578 (2020).
Jang, M. E. A. Tea polyphenols EGCG and theaflavin inhibit the activity of SARSCOV2 3clprotease in vitro. Evid.Based Complem. Altern. Med. (2020).
Cicka, D. & Sukhatme, V. P. Available drugs and supplements for rapid deployment for treatment of covid19. J. Mol. Cell Biol. 13, 232–236 (2021).
Gutman, I. & Trinajstic, N. Graph theory and molecular orbitals: Total pielectron energy of alternant hydrocarbons. Chem. Phys. Lett. 17, 535–538 (1972).
Miličević, A., Nikolić, S. & Trinajstić, N. On reformulated Zagreb indices. Mol. Divers. 8, 393–399 (2004).
Ranjini, P. S., Lokesha, V. & Usha, A. Relation between phenylene and hexagonal squeeze using harmonic index. Int. J. Graph Theory 1, 116–121 (2013).
Ghorbani, M. & Hosseinzadeh, M. The third version of Zagreb index. Discrete Math. Algorithms Appl. 5, 1350039 (2013).
Furtula, B. & Gutman, I. A forgotten topological index. J. Math. Chem. 53, 1184–1190 (2015).
Randic, M. Characterization of molecular branching. J. Am. Chem. Soc. 97, 6609–6615 (1975).
Favaron, O., Mahéo, M. & Saclé, J. F. Some eigenvalue properties in graphs (conjectures of graffitiII). Discrete Math. 111, 197–220 (1993).
Vukičević, D. & Gašperov, M. Bond additive modelling 1. Adriatic indices. Croatica Chem. Acta 83, 243–260 (2010).
Fajtlowicz, S. On conjectures of graffitiII. Congr. Numer. 60, 187–197 (1987).
Furtula, B., Graovac, A. & Vukičević, D. Augmented Zagreb index. J. Math. Chem. 48, 370–380 (2010).
Hosamani, S. M. Computing Sanskruti index of certain nanostructures. J. Appl. Math. Comput. 54, 425–433 (2017).
Cohen, J., Cohen, P., West, S. G. & Aiken, L. S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (Routledge, 2013).
Devillers, J. Neural Networks in QSAR and Drug Design (Academic Press, 1996).
Johnson, R. A. & Wichern, D. W. Applied Multivariate Statistical Analysis (2002).
Esmaeili, E. & Shafiei, F. QSAR study on the physicochemical parameters of barbiturates by using topological indices and MLR method. Bulgar. Chem. Commun. 50, 44–49 (2018).
James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning Vol. 112 (Springer, 2013).
Author information
Authors and Affiliations
Contributions
M.Suresh introduced the parameter and helped in proof reading., Ugasini Preetha .P analyzed, calculated and computed the main results and Fikadu Tesgera Tolasa helped in providing drug properties and in overall management of the article. Ebenezer Bonyah helped in providing software tools and helped in graphical work. Overall the authors are contributed equally to the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
P, U.P., Suresh, M., Tolasa, F.T. et al. QSPR/QSAR study of antiviral drugs modeled as multigraphs by using TI’s and MLR method to treat COVID19 disease. Sci Rep 14, 13150 (2024). https://doi.org/10.1038/s4159802463007w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4159802463007w
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.