Introduction

Growing concerns for environment and energy have been stimulating intensive basic and industrial research in renewable energy specifically, solar energy1,2,3,4,5 since the pioneering work of dye-sensitized solar cells (DSSCs) by O’Regan and Gratzel.6 In DSSCs, the photoactive material (sensitizer) absorbs sunlight in the form of the photon, which is converted to electricity through a series of complicated steps. Sensitizers are anchored to the wide band gap semiconductor like TiO2 that are called as working electrode. The system is filled with a mixture of electrolytes like iodide/triiodide (I/I3 ) redox shuttle and additives. Such composition has been considered as one of the high performance electrolytes in DSSCs since the beginning of its development.7 Photo excitation of sensitizer allows electron to be injected to the conduction band of metal oxide and its nanoporous structure conveys electron to the external circuit. Redox shuttle helps to reduce oxidized dye to its ground state, which is itself regenerated at counter electrode. The details mechanism of the DSSCs can be found elsewhere.7, 8

In DSSCs, sensitizer is probably one of the key elements since it governs the photon harvesting and electrons transport after injection of electrons into the nano-structured semiconductor surface. Additionally, the molecular structure of sensitizer can stabilize the dye and can provide a mean to control the interfacial behavior of electron transfer at TiO2/dye/electrolyte interface.9, 10 A substantial number of research techniques and computational tools have been employed already by different research groups to develop efficient sensitizers for the better power conversion efficiency (PCE)11,12,13,14 than the existing ones. Pastore et al.3 reported ab initio approach to model different interface of DSSCs by DFT and TD-DFT. Prolific research considering ab initio simulation of different interface had been performed by Anselmi et al.15

Metal-based dye sensitizer, such as inorganic ruthenium-based,16 and Zn-based DSSCs achieved PCE of 13%11 till date. On the other hand, PCE of metal-free or pure organic dyes sensitizer-based solar cell amounts to 12.5%.12 Environment friendliness and ease of fabrication make the later more prominent in the scientific community. Though a variety of metal-free organic dyes such as arylamine,17, 18 perylenes,13 and anthocyanin dyes19 have been explored, arylamine organic dyes (AOD) have received enormous attention in the last decade because of their high molar absorption coefficient and tunable donor–acceptor moieties.17, 18 The optimal performance of the solar cell is directed by multifaceted interactions between the device components and their multiple possible combinations that are experimentally challenging to address. On this aspect, a careful investigation is needed to design photovoltaic device through efficient organic dye-sensitizer. However, the necessary experimental work requires a significant amount of resources and time. This aspect could be addressed by efficient and comparatively low-cost computational protocols that can help to screen the different components of the DSSCs and thereby save substantial time and resources. Therefore, there is a significant interest in the use of in silico modeling techniques that could predict PCE values before designing new sensitizers. Such predictions could significantly expedite experimental efforts, narrowing the range of promising acceptor materials by ruling out candidates that are doomed to fail experimental tests.

The use of in silico methods for chemical property prediction is well established and quantitative structure-property relationship (QSPR) method emerged as an important computational tool with a diverse range of applications.20 Robust and validated QSPR models can predict properties for new or untested molecular structures and provide insights that expedite the design of new compounds with enhanced desired properties.21 In this context, QSPR provides a promising way to estimate the PCE value of dyes, based on characteristics derived from the quantum chemical calculation and molecular structure. Recently, QSPR model has been used to predict PCE values for different types of solar cell.22,23,24,25,26 Venkatraman and Alsberg22 suggested that photovoltaic properties such as PCE of phenothiazine dyes could be predicted by QSPR with the use of eigenvalue like descriptors. In our ongoing research, we have already successfully employed QSPR method to model and suggest improvements for PCE of polymer-based solar cells23 and AOD for DSSCs explicit to cobalt electrolytes.25 Venkatraman et al.26 investigated de novo computational design methodology to design of coumarin-based dye sensitizer with improved properties for use in DSSCs. Li et al.24 established a unique cascaded QSPR model to predict PCE of AOD for DSSCs. In this study, authors modeled a large number of dyes whose experimental values are not uniform as most of them are obtained from experiment for iodine electrolytes and few of them for cobalt electrolytes. Therefore, such approach violates the Organization for Economic Co-operation and Development (OECD) principle 1 of “defined endpoint”. Also, authors had developed a global model with AODs from diverse chemical classes but the major problem is that this kind of global model identifies precise features and/or fragments for the explicit class whose contribution is highest in the data set. Therefore, obtained results from global model with so much structural variation may be misleading sometimes. In addition, the presence of a significant portion of triphenylamine (TPA) in the data set regulates the interpretation toward it and the presence of small numbers of indoline, N,N′-dialkylaniline, tetrahydroquinoline, phenothiazine, phenoxazine, and carbazole dyes does not contribute toward final findings.

Herein, we have carefully collected 273 AODs divided into 11 chemical classes whose experiment (Table S1 in Supplementary file) has been performed explicitly with a TiO2 film and iodine electrolyte for DSSCs.17, 18 This allows us to model PCE with computed descriptors directly from the dyes structures without considering any experimental properties of DSSCs as descriptors. We have employed direct QSPR analysis combined with state of the art quantum chemical analysis such as density functional theory (DFT) and time dependent-DFT (TD-DFT) approaches to understand the basic electron transfer mechanism, structural attributes as well as material properties for 11 chemical classes of AOD as sensitizers explicit to iodine electrolyte that are chiefly responsible for high PCE of DSSCs. The identified properties and structural fragments are particularly valuable for guiding future synthetic efforts for the development of new organic dyes with improved PCE. Thus followed by QSPR modeling, we have designed a series of new dyes with enhanced PCE values. They could be considered as future sensitizers for the least explored chemical classes (tetrahydroquinoline, N,N′-dialkylaniline and indoline) among the studied 11 classes under AOD for DSSCs. As in silico prediction of PCE value is not the sole criteria to identify the best sensitizers, we additionally have computed different electrochemical parameters including absorption spectra to check the required criteria for efficient flow of electrons in DSSCs.

Results

Computational QSPR results

Eleven QSPR models were developed employing experimental PCE data. All models were validated to check their predictability on new data set utilizing relevant test set for each chemical class. Highly acceptable internal and external validation results were obtained and reported along with all 11 QSPR model equations (Eqs. 1–11) in Table 1. External predictability was additionally evaluated employing the Golbraikh and Tropsha’s criteria, which all 11 models passed satisfactorily. The contribution and/or effect of each descriptor toward PCE along with their minute mechanistic interpretations are illustrated in Table S2. Goodness-of-fit and high predictability are established by scatter plot of experimentally evaluated (observed) vs. predicted (employing QSPR model) PCE values of AODs for all 11 classes illustrated in Fig. 1.

Table 1 Statistical quality of QSPR models developed based on different division tools
Fig. 1
figure 1

Scatter plots of experimentally evaluated vs. predicted PCE values of AODs for all 11 chemical classes

The Y-randomization study was also performed to guarantee that the developed models were not the result of mere chance alone and all 11 models passed the stipulated c R p 2 threshold value of 0.5. The lowest c R p 2 value of 0.55 is observed for substituted-TPA dyes and tetrahydroquinoline dyes showed the highest value of 0.76. Furthermore, the applicability domain (AD) study with the standardization-based technique revealed that all test compounds were inside of the AD and their predictions were completely reliable for all 11 models. To double check the AD results with another tool, we have employed Euclidean distance approach which suggested that except forone test molecule each from truxene-based TPA (Compound ID: 167) and indoline (Compound ID: 187) dyes, all test compounds from these two classes and remaining nine classes were inside the AD created by respective developed models by their training set molecules. Thus, we can assertively predict 97% of test compounds comprising of 11 classes based on the developed models after meticulous testing for validation and AD (see Table S3 in Supplementary file).

Discussion

Future dye designing and prediction

Power conversion efficient dyes for indoline, N,N′-dialkylaniline, and tetrahydroquinoline were designed employing identified structural fragments and quantum properties derived from the respective QSPR models. Therefore, QSPR equations of these chemical classes were elaborately discussed to provide the idea behind the design of future dyes maintaining the OECD principle 5 that supports mechanistic interpretation for developed models.

Indoline

The QSPR model (Eq. 7) for indoline identified four descriptors to correlate its experimental PCE. The order of significance of modeled descriptors based on their standardized coefficient follows: ICR > SaasC > nO > ECP. Except descriptor ICR, remaining three descriptors had negative contributions toward PCE. There was no intercorrelation observed among all four descriptors, as checked through Pearson correlation. ICR, a topological index that defines radial centric information27 with the following representation in Eq. 12:

$$v{\overline I _{C,R}} = - \mathop {\sum}\limits_{g = 1}^G {\frac{{{n_g}}}{A}} {\log _2}\frac{{{n_g}}}{A}$$
(12)

where, n g is the graph vertices having the same atom eccentricity η, G is the number of different vertex equivalence classes, and A the number of graph vertices which explain bulky substitution surrounding indoline fragment maintaining other required properties as discussed below. SaasC explains the sum of E-state values for the presence of fragments where nO is a constitutional indices related to number of oxygen atoms present in a dye. As both descriptors had negative contribution, so avoiding fragment and oxygen atom facilitates higher PCE value of indoline dyes. Therefore, except carboxylic group which is the anchoring acceptor group in the dye, we avoided oxygen atom along with unwanted aromatic substitution in designing of future indoline dyes. ECP is related to electronic chemical potentials that depends on the energy of HOMO and LUMO levels of the studied dyes at the ground state. Among all descriptors, ECP had the least contribution. Therefore, taking the idea of the optimum ECP value of existing dyes, we maintain the threshold energy levels of the LUMO and HOMO orbitals for conduction band level of the TiO2 semiconductor and I/I3 redox potential of the designed dyes.28 We tuned optimum ECP value by adding electron-donating group that will lower the HOMO level while facilitate the increase in LUMO level like D-D-π-A29 organic dyes.

N,N′-dialkylaniline

Equation 8 suggested two important properties crucial for PCE of N,N′-dialkylaniline dyes characterized by two descriptors. First descriptor f denoted as oscillator strength corresponds to maximum absorption energy of the dye sensitizer and second one G(NN) elucidates sum of geometrical distances between NN for studied dyes. Both descriptors have positive and almost equal weight contribution toward PCE. Our studied compounds also do follow the same trend. Dyes (198, 205, and 206) having higher oscillator strength and higher values of G(N N) showed superior PCE (6.8, 6.3, and 6.6), respectively. As suggested from the equation reasonably higher oscillator strengths can be involved in efficient excited-state configuration and consequent electron injection into the TiO2 conduction band. The hypothesis is also supported by the literature data.30, 31 Introduction of aromatic ring system with nitrogen heteroatom result in higher G(NN) values and interestingly, oscillator strength is also increased for this type of fragmentations. Therefore, dyes were designed maintaining higher oscillator strength and G(NN) values by introducing 3,5-dihydrothieno[3,2-b:4,5-b′]dipyrrole, 3,6-dihydropyrrolo[3,2-b]pyrrole and 4H-thieno[3,2-b]indole ring systems based on the information gathered from studied dyes structures and literatures.16, 17

Tetrahydroquinoline

The developed QSPR equation (Eq. 9) for tetrahydroquinoline consists of only one descriptor SdO, an atom-type E-state indices that defines the sum of E-state values of =O fragments in a dye contributing positive effects toward PCE. Analyzing modeled dyes, it’s quite clear that dyes like 216 and 214 had higher PCE values (8.2 and 7.2, respectively) due to higher fragment value of SdO descriptor (24.50 and 24.14, respectively). On the contrary, dyes like 207, 210, and 218 showed lower range of PCE values among the studied dyes due to lower fragment value of the mentioned parameter. Therefore, to design new tetrahydroquinoline dyes, we have introduced =O fragments in aromatic ring system maintaining the conjugation and resonance. The reason behind the introduction of =O fragments only in aromatic ring was the outcome of our previous study.23 It suggested that when the carbonyl fragment is attached to an aromatic ring, it can induce a mesomeric effect which is stronger than an inductive effect results from the introduction of the C=O group in the saturated carbon chain. Comparing both effects, an inductive effect leads to lower PCE value and the mesomeric effect leads to higher PCE value.

Considering above mechanistic interpretation and effects of individual descriptors on PCE, we have designed ten dyes for each class. The required descriptors were computed for each dye to predict their PCE employing respective QSPR equation. As we need better power conversion efficient dyes than the existing ones, we have set the predicted PCE threshold of 8.2, 6.8, and 9.52 for tetrahydroquinoline, N,N′-dialkylaniline and indoline, respectively (which were the highest experimental PCE value for the studied classes) for the next level of screening. Along with the prediction value, we have checked AD of the designed compounds to confirm their presence in the same chemical space of training set compound for relevant models. Based on set PCE thresholds and AD screening, seven dyes from each class were selected for the next level of screening through absorption spectra and electrochemical parameter calculation to report the “lead dyes” as sensitizers for future DSSCs.

Absorption spectra of designed dyes

The absorption spectra of selected dyes have been checked by TD-CAM-B3LYP/6-31G(d,p) level calculations in the acetonitrile solvent by CPCM model. Theoretically calculated wavelength of the corresponding simulated absorption spectra is shown in Fig. 2a for all three chemical classes. The excitation energy, oscillator strengths, maximum wavelength and light harvesting efficiency in the solvent phase of dye sensitizers are shown in Tables 24, respectively for tetrahydroquinoline, N,N′-dialkylaniline and indoline dyes. Light harvesting efficiency, is one of the important photo-physical parameters to design efficient dye-sensitizer. Itcan be calculated as follows:

$$LHE = 1 - {10^{ - f}}$$
(13)

where, f is the oscillator strength.

Fig. 2
figure 2

(a) Simulated absorption spectra and (b) Energy diagram of HOMO and LUMO along with energy levels of TiO2 and iodine electrolyte for the designed “lead dyes” of Tetrahydroquinoline (A), N,N′-dialkylaniline (B), and Indoline (C) family

Table 2 Calculated descriptor, predicted %PCE and TD-DFT parameters [excitation energy (E), wavelength (λ), oscillator strengths (f), and light harvesting efficiency (LHE) in acetonitrile (AN) media] of the designed Tetrahydroquinoline dyes
Table 3 Calculated descriptor, predicted %PCE and TD-DFT parameter [excitation energy (E), wavelength (λ), oscillator strengths (f), and light harvesting efficiency (LHE) in acetonitrile (AN) media] of the designed N,N′-dialkylaniline dyes
Table 4 Calculated descriptor, predicted %PCE and TD-DFT parameter [excitation energy (E), wavelength (λ), oscillator strengths (f), and light harvesting efficiency (LHE) in acetonitrile (AN) media] of the designed indoline dyes

These calculations are important as the absorption peak along with the oscillator strength occursdue to S0→S1 phase transition and results in the intramolecular charge transfer transition contributed from the HOMO to the LUMO. It is observed that for dyes of respective chemical classes the calculated results have comparable tendencies with the standard dyes in mentioned experimental conditions.32 The trend of maximum absorption wavelength is: THQ5>THQ8>THQ7>THQ9>THQ1>THQ2>THQ3 for tetrahydroquinoline, NDI4>NDI10>NDI8>NDI7>NDI3>NDI1>NDI6 for N,N′-dialkylaniline and IND3>IND5>IND10>IND9>IND7>IND4>IND2 for indoline dyes. This drift reveals red-shift and the absorption intensity amplification with the increase of electron-donating ability of donor groups in those designed dyes. In case of dyes like THQ5, THQ8, THQ7, and THQ9, higher absorption length and wider spectrum are observed. These effects are due to the presence of diketopyrrolopyrrole moiety as an acceptor group generating higher conjugation length and superior charge transfer compare to remaining three dyes of same class, resulting in better predicted PCE values. For all three classes of dyes, wider absorption that covers the whole visible and infrared regions, resulting a smaller HOMO-LUMO energy gap, is confirmed in Fig. 2b representing an energy diagram of HOMO and LUMO for all the designed dyes discussed in next section. Interestingly, all seven dyes for each class passed the requisite absorption spectra parameters for the next level of screening.

Electrochemical parameters of designed dyes

Next level screening of dyes was performed to match the I/I3 redox potential and conduction band level of the TiO2 semiconductor with the appropriate energy levels of the HOMO and the LUMO orbitals of the designed dyes.27 Thus, the molecular orbital energies of all dyes were computed by using B3LYP/6-31G(d,p) in acetonitrile solvent and illustrated in Fig. 2b. Most importantly, the LUMO levels of all dyes for all three classes lie above the conduction band of TiO2 leading to proficient injection of excited electrons into the semiconductor electrode. Again, considering electrolyte redox potential, the HOMO levels of all dyes were also lower than I/I3 redox potential which is an important factor for efficient dye regeneration.33 All three parts under Fig. 2b suggested the accepting electron efficiency of the oxidized dye from the electrolyte system.

The HOMO energy levels of THQ1, THQ2, and THQ3 were computed to be −5.40, −5.54, and −5.80 eV, respectively. Here, the HOMO levels were steadily enhanced with rising donor ability that impended the redox potential of the electrolyte systems while the LUMO levels did not change too much. The energy gap for the mentioned dyes was also high which is unfavorable (comparing other designed dyes for this class, but much better than the existing ones) for charge migration from donor to acceptor. On the contrary, in case of THQ5, THQ7, THQ8, and THQ9, the energy gap is more favorable for charge transfer. With the use of diketopyrrolopyrrole34 fragment which acts as an additional donor and acceptor, a high power conversion efficient solar cell can be achieved.35 Considering structural analysis of dyes, THQ8 differs with THQ5 only in numbers of conjugation unit which helps to decrease the energy gap and increase the PCE value of THQ8 compare to THQ5. On the contrary, the mesomeric effect and conjugation factor enhanced the PCE of THQ9 compare to THQ7 though the energy gap was almost similar for both dyes.

In case of NDI dyes, NDI6 carries the 3,5-dihydrothieno[3,2-b:4,5-b′]dipyrrole ring system generating lowest band gap (1.5 eV) among all the designed dyes. Dyes like NDI7, NDI8 and NDI10 contain 3,6-dihydropyrrolo[3,2-b]pyrrole ring system resulting in slightly higher band gap (1.6 eV for all), but in all four compounds HOMO and LUMO levels are almost comparable. Again, ring system like 4H-thieno[3,2-b]indole produces somewhat higher band gap for NDI4. Interestingly, for NDI dyes, it can be concluded that narrow energy gap would be a proficient sensitizer for better predicted PCE values.

The IND dyes consist of 2,7-dihydronaphtho[1,2-d:5,6-d′]diimidazole (IND3 and IND5) and naphtho[1,2-c:5,6-c′]bis([1,2,5]thiadiazole) (IND4, IND7 and IND9) fragments resulted in lower and higher range band gaps, respectively. The IND10 with 7H-imidazo[4′,5′:5,6]naphtho[1,2-c][1,2,5]thiadiazole fragment is characterized by predicted a medium range of band gap of 1.93 eV and it evolved as the most power conversion efficient dye among the designed ones. Therefore, from the analysis of designed dyes from all three classes, one has to understand that there must be perfect balance between HOMO-LUMO level and their band gap. Also, all dyes need to pass the required threshold of conduction band and redox potential to be able to act asefficient sensitizers for DSSCs.

To get insight into the electronic structures, frontier molecular orbitals were acquired to scrutinize the HOMO and LUMO since the relative ordering of those orbitals offers a sensible qualitative signal of the charge transfer features for all designed dyes. The molecular orbitals of best dyes THQ9, NDI6, and IND10 for each class were depicted along with their different electronic densities between the ground state and excited state in Fig. 3. As required for standard organic sensitizers, the electron density on the HOMO and LUMO were perfectly delocalized on donor and acceptor group, respectively for the lead dyes. Thus, the intramolecular charge transfers (ICT) occurs from the donor to the acceptor group through transition from the HOMO to LUMO. The electron densities shifted from lower to higher along with the donor group to acceptor groups for the dyes is pointing out to the ICT process when transition takes place upon photoirradiation. Though Fig. 3 displays the best designed dyes only, but the pattern is the same for all lead dyes. This presented contour plots confirm strongly the outcome of energy diagrams.

Fig. 3
figure 3

Contour plots of HOMO and LUMO of the highest %PCE predicted dyes for all three classes at B3LYP/6-31g(d,p) level of theory along withthe charge density difference (∆ρ) between the excited and ground states of the dyes. Purple and cyan colors indicate increase and decrease of the charge density, respectively

In conclusion seven dyes were selected as “lead dyes” for each class. This was accomplished by application of the QSPR model and AD study, followed by stringent screening through absorption spectra study, energy diagram along with required electronic structures and densities evaluation for efficient charge transfer properties. To get further insight into the electron injection, dye recombination, and other electronic processes related to the DSSCs, we are working on first principle approach for the different interfaces lying in the heart of DSSCs concerning the involved processes. Future study will provide deeper understanding of the designed new dyes sensitizers and their interaction with the semiconductor, the solution environment, and/or an electrolyte upon adsorption onto the semiconductor. Final results will be summarized along with the outcome of wet lab collaboration.

This is the first comprehensive computational study where QSPR models were generated, followed by designing of new dye sensitizers for DSSCs, employing generated QSPR equations. Thereafter, each dye was evaluated maintaining all validation criteria of QSPR models and screened through extensive electrochemical studies to confirm future “lead dye”. Complete introspection and outcome of the study is reported in Fig. 4. The findings of this study can be summarized, as follows:

  • We have developed 11 QSPR models for diverse chemical classes of 273 AODs for DSSCs. This helped us to classify the indispensable fragments and structural features of studied dyes which are most responsible for high PCE value through rigorous validation criteria.

  • The identified properties from the QSPR models followed by a series of electrochemical parameters and absorption spectra assist us in designing of power conversion efficient seven lead dyes, each for tetrahydroquinoline, N,N′-dialkylaniline and indoline classes. The potentially best dyes reported PCE value of 18.88 (THQ9), 19.24 (NDI6), and 13.87 (IND10) predicting 130, 183, and 46% increment in PCE when compared to existing studied dyes. Undoubtedly, this is an encouraging outcome. Along with the elevated PCE value prediction, electrochemical parameters, absorption spectra and contour plots of HOMO and LUMO of each lead dyes helped us to check required electrochemical environment for fast electron transfer rate in the composite DSSCs system to support their better PCE.

  • The constructed QSPR models for each chemical class are important to predict and exemplify the nature of donor:π-bridge:acceptor relationships vital for PCE.

Fig. 4
figure 4

Schematic representation of the complete workflow from modeling to design to prediction to quantum study and their outcomes

The study offers a series of data for a huge number of AODs to be employed by experimentalists to diminish the experimental efforts, time, and funds by many folds. Additionally, the exploratory features may help to design more efficient units for other chemical classes which are not explored in the present work.

Methods and materials

The data set consists of 273 AODs across 11 chemical classes as sensitizers with a TiO2 film and explicit to liquid iodine electrolyte for solar cell with experimental PCE values collected from the literature.17, 18 This data set covers up all possible arylamines available in the literature considering mentioned experimental conditions. The complete schemes of the study and molecular structures are represented in Figure S1 and Table S1, respectively in Supplementary material section.

Dye structures were drawn and molecular geometries were computed by molecular mechanics approach through the HyperChem 8.07 software package.36 Consequently, geometry optimizations were done in the gas phase employing DFT and TD-DFT methods with B3LYP and CAM-B3LYP exchange correlation functionals, respectively with the same basis set 6-31G(d,p), executed in GAUSSIAN 09 software package.37 The rationale behind the selection of the functional and basis set is demonstrated elsewhere.25, 29, 38, 39

Regarding descriptor selection, first we computed 32 quantum-mechanical descriptors exploring the Gaussian output files of DFT and TD-DFT calculations. Then DRAGON 640 software was employed for generation of 248 constitutional indices, ring descriptors, topological indices, connectivity indices, functional group counts, Atom-type E-state indices and 3D-Atom pairs from the optimized structures to recognize the crucial structural features liable for better PCE. The comprehensive list of computed descriptors is reported in Table S4 in Supplementary file.

The entire pool of 280 descriptors was primarily pretreated with a 0.0001 variance cut off and checked through a 0.99 correlation coefficient to eradicate correlations between individual descriptors and to diminish the noise level among input descriptors. Then, genetic algorithm (GA) statistical tool was implemented to select the best possible set of descriptors for QSPR modeling from the pretreated pool executed in the Genetic Algorithm 1.4 software package.41 Thus, 100 equations were generated through GA for the entire data set to make out 50 frequent descriptors from the entire pool. The process can be considered as “descriptor-reduction” or “descriptors thinning”. The reason behind this step was to identify the most pertinent ones for our models as they can reveal the requisite properties of all modeled dyes and can be handled efficiently due to smaller number of descriptors than the preliminary pool.

Choice of the training and test sets plays a vital role in the building of a statistically significant QSPR model. To cancel out the partisanship of dividing the data set, we have used Euclidean distance-based technique to split the data set into training and test set with 3:1 ratio.41 Selected pool of descriptors from descriptors reduction process was used for final model development applying GA followed by MLR analysis under MLR Plus Validation GUI 1.2 software,41 employing the training set molecules for each chemical class. Developed models were subsequently validated employing the respective test set molecules to check predictability on a new set of compounds other than those involved in model development. The criteria employed in the GA algorithm were as follows: total number of iterations 100; cross-over probability 1; mutation probability 0.5; smoothing parameter (LOF calculation) 1.

As the statistical metrics guarantee the robustness and predictability of the QSPR models, therefore internal and external validation techniques were consequently employed for model validation.42,43,44 The robustness of each QSPR model was checked by the Y-randomization technique to classify whether the model was obtained coincidentally or not. The model randomization was computed 100 times through rearranging the response while retaining the original independent variables or descriptor matrix. Then c R p 2 metric41, 43 was computed. In respect to OECD principle 3, the AD was checked for each model employing the training set compounds to confirm the reliability of the test set prediction. Here, two different techniques: (a) the standardization-based technique45 and (b) the Euclidean distance approach41 were used.

A good number of dyes already exist from the TPA chemical class and successful modeling was already performed for phenothiazine22 and carbazole46, 47 dyes. Therefore, we have concentrated on designing new dyes for the chemical class of tetrahydroquinoline, indoline and N,N′-dialkylaniline employing their respective QSPR models as those dyes are the least explored ones under DSSCs. We have designed ten new dyes for each of the above-mentioned three chemical classes. The practicability of the designed dyes as a potential candidate for DSSCs was thoroughly screened via different electrochemical parameters such as redox potential, band alignment, LHE, etc. Absorption spectra of the designed dyes have been simulated by the Conductor like polarizable continuum model (C-PCM)48 method at TD-CAM-B3LYP/6-31G (d,p) level of theory in acetonitrile solvent.

Data availability

All data are available in supplementary information.