Designing the structure of cationic star-shaped trimeric surfactants most active in micelle formation using molecular connectivity indices

A model expressing the relationship between the logarithm of critical micelle concentration of cationic star-shaped trimeric surfactants and topological indices was obtained using only molecular connectivity indices. Based on the obtained model, the exemplary compound most active in micelle formation was designed. The analysis of the influence of various structural factors on the value of the critical micelle concentration was supported by atomic charge studies. The obtained model will be used to design new star-shaped trimeric surfactants that are more active in formation of micelle as well as to predict their critical micelle concentration.


Designing the structure of cationic star-shaped trimeric surfactants most active in micelle formation using molecular connectivity indices Anna Mozrzymas
A model expressing the relationship between the logarithm of critical micelle concentration of cationic star-shaped trimeric surfactants and topological indices was obtained using only molecular connectivity indices.Based on the obtained model, the exemplary compound most active in micelle formation was designed.The analysis of the influence of various structural factors on the value of the critical micelle concentration was supported by atomic charge studies.The obtained model will be used to design new star-shaped trimeric surfactants that are more active in formation of micelle as well as to predict their critical micelle concentration.
Oligomeric surfactants with multiple hydrophobic and hydrophilic groups are a new class of surfactants which become attractive in recent years.Dimeric surfactant, also called gemini surfactants, are simplest oligomeric surfactants which contain two hydrophobic and two hydrophilic groups connected by the spacer group.Trimeric surfactants are natural extension of the gemini surfactants.Their structure represent intermediate structure between dimeric surfactants and higher oligomeric ones.The first report on trimeric surfactants was by Raoul Zana et al. [1][2][3] .Generally, the oligomeric surfactants are constructed by two or more hydrophobic tails and polar head groups linked by the spacer groups.The spacer groups of trimeric and higher oligomerization degree surfactants can be linear, ring-like or star-shaped therefore the oligomeric surfactants are categorized into linear, ring-like and star-shaped, and their aggregation behavior strongly depends on their topological structures 4 .The star-shaped trimeric surfactants exhibit more unique self-aggregation behavior in aqueous solution compared to their linear analogues 4 .
Generally, the trimeric surfactants show an excellent surface-active and self-aggregation properties in aqueous solutions.The experimental studies [5][6][7][8][9] and also the theoretical considerations 10 show that the critical micelle concentration decreases with the increasing the degree of oligomerization.Thus, the critical micelle concentration of trimeric surfactants is much lower than those of the corresponding monomeric and dimeric analogues.Also, the cationic trimeric compounds exhibit strong antimicrobial activity.They are active against board range of microorganism such as bacteria and fungi 11,12 and probably can be used against some viruses.
The main aim of this work is designing the structure of cationic star-shaped trimeric surfactants most active in micelle formation otherwise to study the effect of the structure modifications on the critical micelle concentration value.This work is a continuation of studies on structure-property relationship (QSPR) of cationic surfactants [13][14][15] .The subject of the previous studies was the critical micelle concentration (cmc) of cationic monomeric 13 and cationic gemini 14,15 surfactants.In the papers [13][14][15] , the relationship of the critical micelle concentration and the structure of cationic surfactants was investigated using the molecular connectivity indices only.In the present work these topological indices were also used to study the influence of the chains structure on the value of critical micelle concentration of star-shaped cationic trimeric surfactants.As was suggested in paper 4 the topological structure of the oligomeric surfactants strongly affect the aggregation behavior of these compounds.Therefore, the topological descriptors like the molecular connectivity indices 16 can be very good representation of the oligomeric surfactant structure in studies of self-aggregation properties, and in particular of the critical micelle concentration, as was shown in the previous papers concerning the gemini surfactants 14,15 .
The critical micelle concentration depends not only on geometrical structure but also on a number of other parameters, among them a kind of counterion.Therefore, in order to minimize the influence of factors other than geometrical on cmc value, only the star-shaped cationic surfactants with bromides as counterions were taken into account.Although, the obtained QSPR model has been derived for compounds with fixed counterions, it can be used to design the structure of cationic trimeric star-shaped surfactants with any kind of counterion because the impact of modifying the geometrical structure on changes in the cmc value, its decrease or increase, should be the same.
The semi-empirical calculations of the atomic charges were also performed to investigate the effect of branches and heteroatoms contained in the spacer group on the critical micelle concentration of studied trimeric surfactants.

Results and discussion
All investigated cationic trimeric surfactants are star-shaped type surfactants.The geometrical structures of investigated compounds are significantly different.The entire data set includes thirteen training set compounds (compounds 1-13) and five test compounds (compounds T1-T5).The structures of the molecules taken from literature along with the logarithms of the literature cmc values are shown in Methods section.
Molecular connectivity indices were calculated basing on the graphic structural formula of the molecules 1-13 using the expressions contained in Methods section (Eqs.3-5).The values of five molecular connectivity indices from zero to fourth order and five valence molecular connectivity indices from zero to fourth order of training compounds are given in Table 1.
Based on the values of connectivity indices (Table 1) and the logarithms of literature cmc values of training compounds 1-13, using the polynomial regression analysis and stepwise method, the two-variable equation (Eq.1), expressing the relation between logarithm of cmc and the molecular connectivity indices, have been obtained: The statistical characteristics of the equation variables are given in Table 2.The obtained model (Eq. 1) contains first-order valence molecular connectivity index 1 χ ν and second-order molecular connectivity index 2 χ .The first-order valence molecular connectivity index 1 χ ν is path-type index and it represents one-bonds fragments in a molecule.The values of this index depend on the isomers of the compound 16 and decrease with increasing in branching.The molecular valence connectivity index 1 χ ν is a valence connectivity index thus differentiates heteroatoms and multiple bonds.The second-order 2 χ molecular connectivity index appearing in the model (Eq. 1) represents two-bonds fragments in the molecule.The values of this index also depend on the isomers of the compound 16 , and in this case the values increase with increasing in branching.Also, the values of 2 χ and 1 χ ν indices increase with the increase in the number of atoms in the molecule by extending the hydrocarbon chains or adding atoms to the chains through branching.
(1) From Table 1 it can be concluded that the values of 2 χ index are smaller compared to the values of 1 χ ν index.The obtained model (Eq. 1) contains the ( 1 χ ν ) 2 variable with a negative coefficient and the ( 2 χ) 2 variable with a positive one, and the absolute value of these coefficients are 0.00709 and 0.00428, respectively.Therefore, the analysis of equation (Eq. 1) allows us to conclude that as the 1 χ ν index increases, the cmc decreases.
The graphical comparison of the calculated Log 10 cmc values using Eq. ( 1) and the experimental Log 10 cmc values is shown in Fig. 1.
The Log 10 cmc values calculated using Eq. ( 1) and the experimental Log 10 cmc values for studied trimeric surfactants, along with the values of residuals, are contained in Table 3.
The plot of residuals versus the experimental values of Log 10 cmc is shown in Fig. 2.
As can be seen from Table 3 and Figs.1-2, the calculated values of Log 10 cmc using Eq. ( 1) are very close to the from literature ones.
As shown in Table 6 given in Methods section, the Eq. ( 1) has been obtained based on molecules with a different topological structure.These compounds differ in the length of the hydrophobic chains as well as the structure

Hydrophobic tails effect
The experimental cmc values and thus Log 10 cmc of star-shaped cationic trimeric surfactants decrease when the tails are lengthened between ten and fourteen (compounds 1-3) or ten and sixteen (compounds 4-6 and T4) carbon atoms.The corresponding plots of the calculated, using Eq. ( 1), and the experimental Log 10 cmc versus the alkyl chain carbon number of compounds 1-3 and compounds 4-6 and T4 are shown in Figs.3-4.
As shown in Figs.3-4, the Log 10 cmc values calculated using Eq. ( 1) also decrease with tails lengthening from ten to eighteen (Figs.3-4) carbon atoms, and as shown in Figs.3-4, with this range the dependence of both the calculated and also experimental logarithm values of cmc on the number of carbon atoms in the alkyl chains is linear.As can be seen in Fig. 3-4  atoms chain, but this deviation is probably within the margin of experimental error.Thus, it can be concluded that in the range between the ten and the eighteen carbon atoms the dependence of Log 10 cmc on the number of alkyl chain carbon atoms (n) of star-shaped cationic trimeric surfactants can be described by linear function.However, in the case of longer chains, above eighteen carbon atoms, the dependence of the Log 10 cmc values calculated using Eq. ( 1) on the number of carbon atoms will be non-linear due to the non-linear nature of this equation.The non-linear dependence of logarithm of cmc on the chains length has been observed for various cationic gemini surfactants of different nature and flexibility spacer groups or tails [25][26][27] .Also, the non-linear relationship between Log 10 cmc and the number of tail carbon atoms for cationic gemini surfactants was obtained basing on theoretical model 28 .

Spacer group effect
The structure and nature of the spacer group play an important role in micelle formation of oligomeric surfactants.The studies on gemini surfactants [29][30][31][32][33] show that such features of the spacer group as flexibility and hydrophobicity have significant impact on the aggregation behavior in aqueous solution of these compounds.Therefore, the effect of such factors which influence the flexibility and hydrophobicity as spacer's length, branching and heteroatoms on the cmc of star-shaped trimeric surfactants have been investigated using Eq.(1) (Table 4).In Table 4 the star-shaped type trimeric surfactants with different spacers groups structures and corresponding Log 10 cmc values calculated using Eq. ( 1) are shown.
The inspection of data contained in Table 4 shows that the greatest value of critical micelle concentration is for compound S6 and the lowest for compound S1.
In the case of compounds with different central groups in the spacer group (compounds S1-S4), the lowest cmc value is for S1 compound with the cyclohexane-based spacer group, and the highest for compound S4 with the central nitrogen atom in the spacer group.This lowest cmc value of compound S1 may be due to the fact that the spacer is more hydrophobic which may cause its incorporation into the interior of the formed micelle.Also, as shown in Table 4, the cmc value of compound S2 is lower than cmc value of compound S4.Similar results was obtained by Nacham et al. 21.The authors of paper 21 have studied among others the ionic liquid-based starshaped trimeric surfactants containing triethylamine and triethylbenzene spacer groups.As was shown in 21 the cmc value of IL-based trimeric surfactant containing triethylbenzene spacer groups is lower than the cmc of that containing triethylamine spacer group with identical alkyl tails and head groups.The authors of this paper also suggest that this difference of cmc values may be due to the higher hydrophobicity imparted by the benzyl core in investigated IL-trimeric surfactant.Table 4 also shows that cmc value of compound S1 is lower compared to the cmc value of compound S2.This difference may result from the fact that substituted cyclohexane ring is more flexible compared to the benzene one, and thus the cyclohexane-based spacer group easily incorporates into the interior of the formed micelle.
Among the compounds having nitrogen as the central group and the same number of non-hydrogen atoms in the spacer group (compounds S4-S6 and S8-S10), the lowest critical micelle concentration is for compound S4 and the greatest for compound S10.In the case of the compounds with straight spacer chains (compounds S4-S6), the cmc value increases in the order: -CH 2 -< -NH-< -O-.The similar order is for compounds with branched spacer group (compounds S8-S10), the order of increasing cmc value is: -CH 3 < -NH 2 < -OH.In addition, the comparison of the critical micelle concentration values of compounds with straight chains of star-shaped spacers (compounds S4-S6) with the corresponding compounds having branched spacer group (compounds  www.nature.com/scientificreports/S8-S10) shows that the critical micelle concentration values of compounds having the branched spacer groups are greater compared to the cmc values of the corresponding compounds with straight spacer's chains.In summary, Table 4 shows that in the case of star-shaped spacers having the same number of non-hydrogen atoms, the branches and the heteroatoms cause the increase the critical micelle concentration value.
The effect of spacer's branches and heteroatoms on cmc value of compounds presented in Table 4 (compounds S4-S6 and S8-S10) was also analyzed using the atomic charges.The total charges of different spacer's functional groups found in the Y position of compounds S4-S6 and S8-S10 are presented in Table 5.
The AM1 semi-empirical calculations (Table 5) show that the atoms of high electronegativity such oxygen and nitrogen introduce a great negative charge to the spacer group, greater than the carbon atom thus the total charges of functional groups containing heteroatoms are negative.Moreover, the total charges of the various functional groups shown in Table 5 show that for compounds both straight and branched spacer group the total charge changes from positive for functional group with carbon atom to negative for functional groups with nitrogen or oxygen.Additionally, the negative charge is greater in the case of a group with an oxygen atom than with a nitrogen atom.Comparing these results with the data from Table 4, it can be concluded that the cmc values increase with the increase in the negative charge of the functional group.
The data in Table 5 also show that the total (negative) charge of the functional group, in the branched spacer chain compared to that of the corresponding functional group in the straight chain, is lower for branched spacer when the non-hydrogen atom in the functional group in Y position is oxygen or nitrogen, and greater (positive) charge also for branched spacer group when the non-hydrogen atom in the functional group in Y position is carbon atom.Comparing these results with the data from Table 4, it can be seen that when the non-hydrogen atom in the functional group is carbon atom, the results from Table 5 agree very well with those in Table 4 discussed above, i.e. the greater the positive charge, the greater the critical micelle concentration.In the case of functional groups containing heteroatoms (Table 4), it can be seen that the cmc value is greater for compounds with branched spacer group for which the negative charge of the functional group is lower than the negative charge of analogous functional group in straight spacer chain (Table 5).
The obtained model also shows that the cmc value of star-shaped trimeric surfactants decreases with the increase in the chains length of the spacer group (compounds S4 and S7).This is due to the fact that changes in the length of hydrocarbon tails have a significant impact on cmc values, and thus on obtained model.However, the experimental data 34 shows the opposite conclusion.As shown in 34 , the cmc value of 3C 12 tris-s-Q cationic trimeric surfactants is greater for s = 6 than for s = 3.This suggests that the cmc increases with increasing spacer chains length (s) from three to about six methylene groups.A similar effect of spacer chains length is observed for dimeric surfactants 29,30 .The cmc values of 12-s-12 gemini surfactants increase with increasing number of spacer carbon atoms, up to four or five methylene groups, and then decrease with further elongation of the spacer chain 29,30 .Therefore, probably the increase in cmc value of trimeric surfactants with spacer chains length may have maximum at about six carbon atoms and then, as for gemini surfactants, the cmc may decrease with further chains length elongation.To the best of the author's knowledge, spacer chains lengthening above six methylene groups for star-shaped trimeric surfactants has not been studied.Therefore, the obtained model should be used probably for compounds with more than six carbon atoms in each chain of the trimeric surfactant spacer group when examining the effect of the spacer chains elongation on the cmc value.
Finally, if we examine only the changes in cmc value due to changes in the geometrical structure, the obtained equation (Eq.1), although derived for bromide compounds, can also be used for compounds with another counterion, because the changes in cmc, its increase or decrease, should be the same.In other words, when examining changes in the cmc of bromide star-shaped trimeric surfactants, we can expect that in the case of chloride analogous compounds will be similar.For example, for chloride compounds 1,1,1-Tris[2-hydroxy-3-(dodecyldimethylammonio)-propoxymethyl]ethane Trichloride 35 and the so-called III-12-4 trimeric surfactant 36 , for which the experimental cmc values are 0.223 (mM) and 0.15 (mM), respectively, the calculated values using Eq.(1) are 0.198 (mM) and 0.187 (mM), respectively.So, the calculated cmc value decreases as the experiment shows.
Table 5.Total charges of the different functional groups found in the Y position (see Table 4).

Molecular connectivity indices
Molecular connectivity indices, some of the topological descriptors to characterize molecules in structure-property and structure-activity studies, are calculated from the molecular graph.The molecular graph is a graphical representation of the structural formula of the chemical compound, in which vertices represents atoms and edges symbolize covalence bonds.
The first connectivity index was proposed by Randić 37 and was defined as: where δ i is a connectivity degree i.e. the number of non-hydrogen atoms to which the i-th non-hydrogen atom is bonded.The Kier and Hall molecular connectivity indices 16 are generalizations of Randić's connectivity index and the m-th order Kier-Hall molecular connectivity index is defined as 16 : where δ i is a connectivity degree, m is the order of the connectivity index, k denotes type of the fragment of the molecule for example: path (p), cluster (c) and path-cluster (pc), n m is the number of fragments of type k and order m.The m-th order valence molecular connectivity index is defined 16 : where the valence connectivity degree δ ν i is defined as: where Z ν i is the number of valence electrons in the i-th atom, h i is the number of hydrogen atoms connected to the i-th atom and Z i is the number of all electrons in the i-th atom.
The values of the molecular and valence molecular connectivity indices of training compounds are contained in Table 1.

Atomic charges
Atomic charges were calculated using the semi-empirical molecular orbital package MOPAC 7 included in the VEGA program 38,39 , employing the semi-empirical AM1 method.All calculated atomic charges are expressed in atomic units (a.u.).

Statistics
Each formula expressing the relationship between the Log 10 cmc and molecular connectivity indices was gener- ated using the least-squares method and the final equation was obtained using the stepwise method.Pearson correlation coefficient (r), the adjusted coefficient of determination (r 2 adj ), the standard deviation of the fit (s) and the Fisher ratio (F) were used to select the best model.The model obtained was selected according to following principles: highest correlation coefficient, adjusted coefficient of determination and Fisher value, the lowest standard deviation of the fit and also the smallest possible number of significant descriptors in the model.For good quality QSPR model the positive value of correlation coefficient (r) should be closer to 1. High values of F-test indicate that the model is statistically significant.The number of variables in the model should not exceed the number of compounds divided by five 40 .The statistical characteristics of the equation variables includes standard error, t-value and p-value.High absolute Student t value of the variable expresses that the coefficient of the variable is significantly larger than the standard error.Variable with p value below 0.05 is considered statistically significant 41 .
The leave-one-out cross-validated correlation coefficient ( Q 2 LOO ) and predictive correlation coefficient ( R 2 pred ) were also used to indicate the internal and external validation of derived model.The model is considered to be excellent, if Q 2 LOO is equal or more than 0.9 42 .For acceptable QSPR model, the values of Q 2 LOO and R 2 pred should be more than 0.5 40,43 .
The statistical calculations were made using the program STATISTICA 12 44 .

Conclusions
The aim of the presented work was to find a simple equation expressing the dependence of the critical micelle concentration of cationic star-shaped trimeric surfactants on their geometrical structures represented by topological indices, that will allow for examining the impact of structure modifications on the cmc values and thus it can be helpful in the design of novel star-shaped cationic trimeric surfactants.Employing the polynomial regression analysis, the equation with two molecular connectivity indices was obtained.Using the obtained model, the influence of hydrophobic tails length and structure of star-shaped spacer group on the critical micelle concentration was examined.The analysis of the influence of tails length in the range between ten and eighteen carbon atoms confirmed the experimental results and the Klevens equation 45 which expresses a linear dependence of logarithm of cmc on the number of carbon atoms in the alkyl chains.
The analysis of star-shaped spacers with the same number of non-hydrogen atoms allows us to conclude that heteroatoms cause the increase the cmc.These results were confirmed by atomic charge analysis.Functional groups containing highly electronegativity atoms, such as oxygen or nitrogen, introduce into the spacer group the negative charges, which cause an increase in the critical micelle concentration.The spacer group analysis also reveals that the branches cause the increase the critical micelle concentration too.

Figure 2 .
Figure 2. Plot of residuals versus the experimental Log 10 cmc values for training set (rhombus) and test compounds (triangle).

Table 1 .
Values of molecular connectivity indices.

Table 3 .
Calculated and literature values Log 10 cmc of compounds from training and test set.Compound Experimental