Machine-learning assisted design principle search for singlet fission: an example study of cibalackrot

Weber, Fabian; Mori, Hirotoshi

doi:10.1038/s41524-022-00860-1

Download PDF

Article
Open access
Published: 20 August 2022

Machine-learning assisted design principle search for singlet fission: an example study of cibalackrot

npj Computational Materials volume 8, Article number: 176 (2022) Cite this article

2269 Accesses
8 Citations
2 Altmetric
Metrics details

Subjects

Abstract

This work uses quantum chemistry calculations and machine learning to explore design rules for singlet fission in a chemical space of four million indigoid derivatives. We identify ~400,000 derivatives of 2,2′-diethenyl cibalackrot, which theoretically fulfil the energy conditions for exoergic singlet fission above the silicon band gap energy. Probing this database with a random forest classifier, we observe that small substituents with positive mesomeric effects and weak negative inductive effects reinforce the desired energetic conditions when placed at specific positions. Finally, a subset of molecules that reflects the random forest classifier’s rules are investigated for their quantum chemical properties to translate the desirable structural motifs into wavefunction-based design rules. Here, direct correlations between the energetic condition for singlet fission, the biradical character and the charge and triplet spin density in prominent molecular regions are identified, providing insights that may serve as a guide for singlet fission core structure development.

Human- and machine-centred designs of molecules and materials for sustainability and decarbonization

Article 24 August 2022

Principle and design of pseudo-natural products

Article 03 February 2020

Exploring chemical compound space with quantum-based machine learning

Article 12 June 2020

Introduction

The transition from fossil fuels to environment-friendly and renewable sources of energy has become one of the central challenges for today’s society and across scientific disciplines¹. Though global energy studies report that there has been record-breaking growth in sustainable energy production, further improvements are desirable^2,3,4. Solar energy is a promising solution for the near future, owing to its availability and potentially high output. Since the first practical silicon photovoltaic cells were developed more than 50 years ago⁵, several methods of harvesting energy from sunlight have been investigated. Today, four generations of photovoltaic cells are recognised^6,7, which are categorised based on factors such as the absorbed spectral bandwidth, and the power conversion efficiency and their upper thermodynamic maxima. However, although record conversion efficiencies are reported frequently^8,9, many of these prototypes are not suitable for mass production as they use rare or toxic chemicals, or due to their short operational lifetimes. Considering the demand for a global solution, an upgrade to mass-producible first-generation technology could be valuable.

One such strategy aims to exceed the Shockley-Queisser (SQ) limit in single junction cells¹⁰ by applying a multiple exciton generation technique called singlet fission (SF)^11,12,13. In SF, higher-energy photons initialise local excitations in a SF-absorber material, which then split into pairs of lower-energy excitons. The currently widely accepted mechanism proceeds from a two-site singly excited state (S₁,S₀), to a coupled singlet of two triplets ¹(T₁, T₁) and finally to two independent spin-dephased triplet excitons (T₁ + T₁) after decoupling^14,15,16. However, recent reports suggest that there may be an additional spatially separated exciton state ¹(T₁…T₁) between formation and fission¹⁷, and that quintet spin states may also contribute to the final spin-dephasing mechanism^18,19. Ultimately, when the energy of the decoupled T₁ states are high enough, both triplet excitons can be used to drive a photovoltaic current²⁰. Note, that while triplet excitons with energies below the PV semiconductor’s band-gap are also generally available for this process, the occurrence of dark recombination currents can lead to a voltage penalty in that case²¹. Finally, when more than one exciton per photon can be made available as charge carriers the device formally exceeds the SQ limit.

The mechanism of SF also dictates its energy requirement. That is, the singlet state excitation energy E(S₁, S₀) must be larger than the final state E(T₁ + T₁), which can be simplified into an approximate monomeric condition E(S₁) ≥ 2E(T₁). Furthermore, it has been reported that this process becomes less efficient as the inequality increases, which is assumed to be connected to heat loss related reaction channels²². So far, no collective structural design rules have been formulated to allow the free design of organic SF molecules, but several trends that promote SF have been identified from quantum chemistry and experimental studies. For example, in polycyclic aromatic hydrocarbon SF molecules, as well as (BN)₂-pyrenes, a relationship between the SF criterion E(SF) = E(S₁) − 2E(T₁) and the multiple diradical nature of the structure y₀, y₁ was reported^23,24. Here, a correlation between the wavefunctions of the dark excited singlet state (¹[TT]) and the radicaloid ground state allows to connect y₀ and y₁ to the energetic conditions of singlet fission²⁵. Further, both concepts of Hueckel and Baird aromaticity have been found useful in the design of SF molecules when the molecular design is chosen carefully, as both are able to lower the excited triplet state’s excitation energy^26,27,28. In addition, a linear correlation between the delocalised π electrons in specific aromatic rings and the first triplet excitation energy E(T₁) was found²⁷. Inspired by these connections between electronic structure theory and energetic SF conditions that may serve as design criteria even across different core structure designs, we aim to identify further complementary trends by applying machine-learning assisted design rule extraction to more than four million derivatives of 2,2′-diethenyl cibalackrot and check for the validity of known concepts in this large sample space.

The article is structured as follows. In the results section, we first explain how the chemical space of 2,2′-diethenyl cibalackrot was chosen through pre-screening a smaller set of cibalackrot derivatives with different substituents at ten residue positions. Next, we provide details on the calculation and prediction of excited state energies for the over four million structures, including the specifications and performance of the machine learning model used for predictions. In the second half of the results section, we present a three-step analysis procedure in which we extract common structural designs from the predicted chemical groups of the singlet fission photovoltaics (SF-PV) candidates, generalise these motifs towards distinguishing from non-SF-PV using machine learning-assisted classification and finally infer wavefunction-based rules a posteriori from the logic of the trained classifier.

Results

Pre-screening for a suitable chemical subspace

From the initial full cibalackrot space of 215 million possible substituted structures, a set of 2000 structures was constructed using randomised substitution patterns (cf. histogram in Supplementary Fig. 1 of the ESI). For these structures, we calculated the singlet and triplet excitation energies E(S₁) and E(T₁) using time-dependent density-functional theory (TDDFT) and the delta self-consistent field (Δ-SCF) method, as described in the Methods section. Figure 1a shows the distribution of these energies for the 1975 converged molecules, where the solid black line represents the energy condition threshold for SF, and the black dashed line represents the silicon band gap energy threshold of 1.12 eV²⁹.

**Fig. 1: Excitation energy distributions (left) and chemical compositions (right) of highlighted orange samples from respective distribution.**

This distribution shows that 3.2% of the initially screened molecules fulfilled the SF energy requirements, which highlights the importance of pre-screening prior to machine learning. The mean energies $\tilde{E}({{{{\rm{T}}}}}_{{{{\rm{1}}}}})$ and $\tilde{E}({{{{\rm{S}}}}}_{{{{\rm{1}}}}})$ were 1.27 and 2.27 eV, respectively. It should be noted that, in this random set, there were nine structures that fulfilled the SF energy condition and exceeded the silicon band-gap threshold. However, since these were not enough to draw any conclusions about the group of SF-PV molecules, the pre-screening focused on molecules that fulfilled the SF criterion (cf. orange dots in panel on left of Fig. 1a). By considering their specific substitution patterns, a histogram of the chemical groups at their respective positions in the molecule was extracted for the SF candidates (cf. panel on right Fig. 1a). For generalisation, inversion-symmetry equivalent positions were grouped together according to the colour code in Fig. 2a.

**Fig. 2: Structure of studied molecules.**

Looking at the overall substitution counts (cf. right of Fig. 1a) shows that molecules which fulfilled the SF condition showed more cyano- (‘CN’) and ethenyl- (‘Vi’, as for vinyl-) groups, and relatively few phenyl-group substitutions (‘Ph’). From an energetic perspective, cyano-groups were often found in structures with lower singlet and triplet excitation energies (compare energetic quadrant analysis, Supplementary Fig. 2 in the ESI). However, the ethenyl-groups on the central positions R₅ and R₆ strongly reduced E(T₁), while having little effect on E(S₁).

To investigate the cause of this asymmetric red-shift, we conducted a detailed comparison of a sample 2,2′-diethenyl SF candidate molecule (referred to as CIBA_Vi) and its 2,2′-dihydrogen counterpart CIBA_Hyd. The molecules had excitation energies of E_Vi(S₁) = 2.20 eV (E_Hyd(S₁) = 2.41 eV) and E_Vi(T₁) = 1.03 eV (E_Hyd(T₁) = 1.35 eV), respectively. Visualising the spin densities of the Δ-SCF triplet state, the two molecules (Fig. 3a, b, respectively) revealed that a portion of the spin density was delocalised onto the ethenyl units, which hints at a stabilising effect for the triplet excitation energy. Considering that also a close connection between radical stabilising groups at the 2,2′ positions and SF had been reported on differently 2,2′-functionalised cibalackrot molecules earlier²⁷, it is likely that the same effect also adds to the stabilisation here as well. To probe this notion in more generality for also other positions and molecules, the relationship between spin density, the biradical character y₀ and the energetic SF criterion will be investigated in more detail in a later chapter of the results sections.

**Fig. 3: Wavefunction properties of the first excited triplet and singlet states of two molecules.**

For analysing the character of the excited states and further check for potential reasons for the asymmetric shift, the hole and particle wavefunctions for both the singlet and triplet excited state were calculated from natural transition orbitals (NTO) of TDDFT calculations. Note, that both excitations use the same singlet ground state as initial reference for constructing the one-particle transition density matrices (1-TDM) so that a direct comparison of the transition characters can be made. Since the spin-forbidden singlet to triplet excitation would formally yield zero for the NTOs, a spin-free 1-TDM was used here instead. Also, note that in both cases the transitions reflect a mostly pure HOMO to LUMO transition, as was observed in other cibalackrot derivates earlier as well²⁷.

Looking at the nature of the transitions (Fig. 3c–f), it can be seen that in all cases the hole and particle wavefunctions describe ππ^* transitions with similar patterns of sign-changes in the respective wavefunctions. This shows that the introduction of the ethenyl groups does not change the nature of transition but rather extends the delocalisation for both the singlet and triplet excitation. Hence, when judging from a purely spin-free perspective the stabilisation caused by the ethenyl groups should be similar in magnitude for both singlet and triplet state. Hence, it can be inferred that the remaining difference between the excitation energies and cause of asymmetric stabilisation must be related to purely spin-related differences; which could result in a direct correlation between spin-related properties and the energy difference E(S₁) − E(T₁) (and the singlet fission criterion E(SF) = E(S₁) − 2E(T₁), similarly), assuming that the transition characters behave in the same way for all chemical modifications. Again, a generalisation of this hypothesis towards a larger number of molecules shall be revisited in more depth in the later sections of the results.

In any case, the asymmetric red-shift caused by the 2,2′-diethenyl modification was deemed favourable for discovering SF-PV structures, and hence these substituents were incorporated into the core structure for all further investigations, focusing the chemical space to a subspace of 4,573,800 structures.

Machine learning of excitation energies

After selecting the 2,2′-diethenyl subspace, a set of 7214 molecules was obtained for machine learning their respective E(S₁) and E(T₁) energies. Comparing the resulting energy distribution with the one obtained previously (see Fig. 1a, b) revealed that the mean excitation energies ($\tilde{E}({{{{\rm{T}}}}}_{{{{\rm{1}}}}})=$ 1.09 eV and $\tilde{E}({{{{\rm{S}}}}}_{{{{\rm{1}}}}})=$ 2.19 eV) were indeed red-shifted, especially to lower triplet excitation energies. This also increases the amount of molecules that satisfy the SF condition in this set to 52.4%.

Using this set of molecules, one machine learning model was trained for each the first singlet and triplet excitation energies, as described in the Methods section. The learning curves of the two models (cf. Fig. 4) displayed convergent prediction behaviour as the size of the training set increased, and they reached mean absolute errors (MAE) of ~0.025 eV, with coefficients of determination R² above 0.7. The S₁ model performed slightly better than the T₁ model in terms of the MAE and R², which we ascribe to the fact that the underlying PM6 wavefunction used for some of the features was of singlet character itself.

**Fig. 4: Learning curves of the machine learning models.**

The histograms of the signed prediction errors (cf. Supplementary Fig. 3 of the ESI) were used to obtain the respective standard deviations of the models, σ_T = 0.031 eV and σ_S = 0.034 eV.

Note that despite the potential for higher accuracy offered by current deep neural network models for excited state energy predictions, such models generally require much more training data, which makes them more attractive for much larger chemical spaces than the one considered here³⁰. However, since the training reference values were obtained from theoretical calculations themselves, such deviations can be considered negligible compared to the errors inherent to the density functional theory (DFT) reference itself. Judging from benchmark studies on the performance of DFT methods for the description of singlet fission molecules^31,32,33, it is thus fair to assume that the actual excitation energies of our predicted overall distribution will be globally shifted by somewhere between 0.1 and 0.3 eV with respect to possible experimental values, while the trends when comparing individuals inside the distribution are predicted correctly.

Distributions of energies and chemical groups of predictions

Utilising these models, the first singlet and triplet excitation energies for all 4,463,586 PM6-converged molecules were predicted. In the following analyses, the SF-PV space of interest is given by structures that lie close to the SF energy criterion (∣E^pred(S₁) − 2E^pred(T₁)∣ < 0.03 eV) and also exceed the triplet energy threshold of E^pred(T₁) > 1.12 eV. The energy window of 0.03 eV was chosen as it had the same order of magnitude as a single standard deviation in the models. Take note that while the histogrammatic trends presented in this initial analysis are helpful for understanding what general types of molecules tend to fulfil the SF-PV condition and what underlying design principles might exist, the observations drawn from inside the SF-PV chemical subspace are likely not sufficient for generalisation towards also the non-SF-PV chemical space. Hence, the information highlighted in this chapter must not be confused with actual general design rules that have been checked against the full chemical space. The strategy that was applied to develop globally applicable decision rules, will be addressed in the next part of the results section, instead.

The distribution of singlet and triplet excitation energies inside the predicted chemical space was similar to the training set, as shown in Fig. 1c. The mean energies were $\tilde{E}({{{{\rm{S}}}}}_{{{{\rm{1}}}}})=$ 2.19 eV and $\tilde{E}({{{{\rm{T}}}}}_{{{{\rm{1}}}}})=1.10$ eV, with 50.16% of the structures fulfilling the SF energetic condition. In addition, 393,841 (8.82%) structures also satisfied the desired, slightly exoergic SF-PV condition (orange dots in Fig. 1c). Their chemical composition analysis revealed that there were below average amounts of cyano-, ethenyl- and thiophenyl-groups, where the low number of cyano-groups might seem surprising compared to the previous results in Fig. 1b. However, it was observed that structures with higher singlet and triplet excitation energies generally featured fewer cyano-groups. This can be confirmed by chemical composition analysis of the four quadrants surrounding the mean $\tilde{E}({{{{\rm{S}}}}}_{{{{\rm{1}}}}})$ and $\tilde{E}({{{{\rm{T}}}}}_{{{{\rm{1}}}}})$ values (cf. Supplementary Fig. 2 in the ESI). The most common residues were hydrogen, followed by fluoro-, methoxy-, chloro- and phenyl-groups, displaying an overall slight preference towards electron donating substituents.

In terms of positioning of the groups, the thiophenyl-group displayed a significant anomaly, as it almost never occurred at the R₁ position. Since we assumed a potential design principle was the reason for this anomaly, we attempted to formulate a possible explanation for this observation. If a bulky thiophenyl substituent was located at this position, electronic repulsion between the sulfur and oxygen lone pairs might have had a negative effect on the energy requirements. This effect would be less pronounced for the phenyl-group, which lacks the sterically demanding sulfur lone-pair electrons, thus explaining why phenyl was instead only slightly disfavoured at this position. Probing the suspected design principle against non-SF-PV structures by grouping together all SF-PV and non-SF-PV molecules with none, one or two bulky groups (phenyl- and thiophene) at either (and both) of the R₁/R₁₀ positions strengthened this suspicion, as indeed decreasing portions of 54.8%, 38.1% and 28.5% of the respective subspaces matched singlet fission requirements.

However, to confirm the suspected physicochemical origin of our hypothesis would require access to the electron density surrounding the ketone oxygen for all 4,463,586 structures. Since we had no direct access to this property, we instead used an auxiliary descriptor for probing the hypothesis. For this purpose, we used the available preoptimised structural data of all structures and calculated the sum of local nuclear potentials surrounding the ketone oxygen atoms as a rough approximation to the local electronic repulsion. Using this description, however, no conclusive correlation was identified (cf. Supplementary Fig. 5 in the ESI). Hence, while smaller groups at the R₁/R₁₀ positions correlate with SF-PV, the exact cause for this remains to be uncovered in future work.

Next, to include two-body relationships, a connectivity diagram was used to illustrate the 200 most commonly co-occurring combinations of groups and positions (see Fig. 5a)³⁴. To facilitate the discussion, we introduce the shorthand notation Grp1_Pos1/Grp2_Pos2, indicating chemical group Grp1 located at position Pos1 and chemical group Grp2 located at position Pos2, respectively. In this diagram, the brightness of connecting lines represents the number of occurrences for each such combination of sites and groups, given that the probed molecule satisfies the SF-PV criteria (i.e. conditional probability). Owing to the nature of the artificial randomness of structures, the balance between symmetrically equivalent modifications (e.g. OMe₇/Hyd₄ and OMe₄/Hyd₇) was not exactly equal. A list of the ten most and least common co-occurring pairs can be found in Supplementary Table 1 of the ESI.

**Fig. 5: Most commonly co-occurring combinations of groups and positions.**

As shown, the most common modification (4.74%) contains a methoxy-group at position R₄ and hydrogen at position R₇. This percentage was significantly elevated, considering that the mean conditional frequency for all possibilities was 1.56%. In accordance with the previous analysis, there were many methoxy-, fluoro- and hydrogen modifications among the high-percentage co-occurring entries at positions R₄ and R₇. However, small groups with positive mesomeric effects, such as the methoxy-, fluoro- and chloro-groups, also occurred together in pairs relatively often (cf. connectivity diagram of chemical groups, Supplementary Fig. 4b in the ESI). Similarly, by analysing the least common combinations, we confirmed that there were only few structures among the SF-PV candidates that featured any two of the electron-withdrawing cyano- or bulky thiophenyl-groups simultaneously.

Focusing on just positional arguments in these combinations, the symmetry-equivalent positions R₄ and R₇ displayed the strongest overall connection and had the highest summed density of connections to any other node (cf. Supplementary Fig. 4a in the ESI). Conversely, R₃ and R₈ displayed the lowest density of connections, and the weakest connection was found between R₃ and R₁. Likewise focusing only on the chemical groups regardless of their position, mostly pairs of hydrogen were observed, while there were almost no combinations containing pairs of any of the larger phenyl- and thiophenyl-substituents, or the strongly electron withdrawing cyano-group (cf. Supplementary Fig. 4b in the ESI).

In summary, these considerations indicate that the R₄ and R₇ positions can be expected to play a significant role in potential design principles for SF-PV in 2,2′-diethenyl cibalackrot. Further, it is implied that such a design principle favours small, electron donating groups at these positions (methoxy- and hydrogen groups in 42.9% of cases, cf. Fig. 1c). However, likely due to the non-local nature of ππ^* excitations, note that these positions are also strongly connected to all remaining positions in the molecule (cf. Supplementary Fig. 4a). Hence, a clear physicochemical origin may be difficult to pinpoint without additional data. Take note, that a full ‘top-down search’ for any potential origin property always requires that any such probed origin property is available for all structures of the database (like in the case for the repulsion of ketone groups above). To work around this problem, we therefore instead choose a ‘bottom-up approach’, in which we first train a classification model to identify SF-PV structures from the available information and then search for correlations in the physicochemical properties of a smaller group of representative molecules chosen by the classifier logic.

Random forest classification

To identify generally applicable structural design principles, a random forest classifier was trained using the predicted excitation energies of the database. The features were based exclusively on the chemical groups and their positions, with the aim of translating the energetic region of interest directly into structural design rules relevant to synthesis. A detailed description of the construction of the three groups of features can be found in the Methods section. For training and testing purposes, two types of datasets were created from the full database of predictions. Both types contained all 393,841 SF-PV candidate structures and an equally large amount of non-SF-PV molecules. In the first type, all the non-SF-PV structures were taken from one of the energetic quadrants (cf. Fig. 1c), and in the second type, those structures were taken randomly from anywhere. In the following discussion, the former is called a balanced quadrant set, and the latter is called a balanced random set.

The random forest model was found to be most suitable for classification, when it was trained using the upper-right balanced quadrant set (i.e. energetically close to the SF-PV distribution itself). Using this training method, an accuracy of 80.3% was achieved when performing classification on a random balanced test set. From the confusion matrix of this test set, equally high precision and sensitivity were found (cf. Supplementary Fig. 6 in the ESI). When the balanced random set was applied for training, the resulting random forest model achieved a slightly better accuracy of 82.3%. However, in this case, the precision and sensitivity of the model was unfavourable for the desired application, because there were far more false-positive classifications. In particular, because the actual chemical space is highly imbalanced (91.2% non-SF-PV and 8.8% SF-PV entities), a high sensitivity is desirable. Finally, when the model was trained in any of the other quadrants, the performance in the respective quadrant was highest individually, but it failed to generalise towards any other quadrant. The largest discrepancy was observed when training was conducted using the lower-left balanced quadrant set (testing accuracy of 97.7%), and the resulting model was tested using the upper-right balanced quadrant set (testing accuracy 55.2%). The training statistics for all quadrants can be found in Supplementary Table 3 of the ESI.

From a chemical informatics perspective, these findings can be explained by the fact that the different energetic quadrants do not share enough similarities to learn anything other than the difference between the quadrants themselves. In the previous extreme case, the model practically learnt whether molecules were part of the bottom-left quadrant or not by judging the presence or absence of cyano-groups. However, this approach to classification cannot be used to make decisions about the upper-right quadrant, where few cyano-groups are featured anyway (see chemical composition of energetic quadrants, Supplementary Fig. 2 in the ESI).

To understand the meaning behind the learned design principles, the importance of the features can be interpreted together with the distribution of the observed chemical residues. Figure 5b is a graphical representation of the importance of all the features. The brightness of individual nodes Grp_Pos reflects how important the singular combination of group and position is for classification. The opacity of the lines between two opposing nodes gives the importance of a positional either–or presence ${{{{\rm{Grp}}}}}_{{{{{\rm{PoS}}}}}_{1},{{{{\rm{PoS}}}}}_{2}}$ (i.e. whether the group is found at either or both of the two positions). Finally, the intensity of the differently coloured edges around the nodes represents the importance of typical, common, or uncommon groups of chemical residues for a specific position (blue, green and red, respectively). A list of the actual importance values can be found in Supplementary Table 2 in the ESI, and a more detailed explanation of the features is given in the ‘Methods’ section.

The most important feature for classification was the presence (or absence) of a thiophenyl group at position R₁₀ (i.e. unc₁₀ in 5b), followed by the inversion-symmetric Boolean of whether or not there was a thiophenyl substituent at position R₄ or R₇ (i.e. Thi_4,7). Furthermore, the chemical distribution of the SF-PV molecules was devoid of thiophenyl across all these positions (cf. Fig. 5a). Comparing in this fashion what features are important for classification and what groups were present in the SF-PV molecules, the absence of Thi_4,7,10 can be considered as a generalisable design rule for the specific set of molecules and all 2,2′-diethenyl cibalackrot derivatives. By going through the list of importances in this way, it can be seen that similar clear generalisations include the absence of Ph_3,8 and the presence of typ₇ (i.e., hydrogen or methoxy substituents at position R₇).

In summary, bulky substituents like the phenyl- and thiophenyl-groups are generally confirmed to be unfavourable close to the central positions, whereas the small electron-donating methoxy-group is favoured on the R₄ and R₇ positions. These findings are consistent with the actual distribution of groups in the energetic region, and were identified as global classification rules to distinguish from the non-SF-PV molecules. Further, note that because this random forest model does not possess chemical intuition (e.g. concepts of electron donating/withdrawing or bulky groups) the design principles learnt by the model can be understood as purely data-driven and free of human bias except for the nature of features that the model learned from.

Transfer towards wavefunction-based design strategies

When using the trained random forest model to predict classes for all 4,463,586 PM6-converged molecules, each structure was associated with its own class probability for each SF-PV and non-SF-PV. Interpreting this probability as a measure of how representative these structures were for their individual group and thus to which extent they represent the related design rules, the most confident 500 SF-PV and 500 non-SF-PV structures were selected for an a posteriori in-depth analysis of their wavefunction characteristics. This way, we attempt to transfer the structural design rules of the random forest classifier towards more general electronic structure characteristics that might be applicable also for other core structures and substituents.

For all selected structures, the SF criterion energy E(SF) = E(S₁) − 2E(T₁) was calculated from the respective TDDFT and Δ-SCF results after performing a B3LYP structure optimisation, as for the training data. Inspired by the work of Zeng et al.²⁷, we extracted several different wavefunction-related features from the individual rings A–C and A’–C’ and from the α-carbon atoms (i.e., the 2,2′ positions which connect to the ethenyl moieties) and investigated these local properties for correlations with singlet fission. To ensure the analysis was insensitive to inversion symmetry on the artificially random cases of either image or inverted image, all features were summed up inside every pair of inversion-symmetric regions of the core scaffold (i.e. rings A+A’, B+B’ and C+C’ and the 2+2’ carbon atoms that connect to the ethenyl groups). Note that features were only extracted on the original cibalackrot core to improve transferability (i.e. including the ketone oxygen atoms, but excluding the 2,2′-diethenyl substituents).

Besides these local properties, a connection towards the biradical character y₀ of each structure was investigated, since this property had been reported to show correlations with singlet fission in earlier works as well²⁷. Here, we evaluated y₀ as the occupation number of the lowest unoccupied natural orbital (LUNO) from a broken-symmetry unrestricted Hartree–Fock (BS-UHF) calculation²⁵.

A consistent gradient in the SF criterion energy can be observed with respect to the sum of ground state Mulliken charges and spin densities in specific regions of the core structure (cf. Fig. 6). As shown in the upper row of graphs, charge neutral 2+2’-carbon atoms tended to fulfil the SF condition, with a further separation becoming apparent when the charge in other core regions (A+A’, B+B’ and C+C’) was considered at the same time. Similarly, a separation into clusters was obtained when considering the spin densities of the C+C’ region against the other core regions. Here, low spin densities in both the C+C’ region and other regions are desirable.

**Fig. 6: Distributions of Mulliken charges and spin densities colour coded by the SF criterion energy.**

We closer examined these local features by considering their correlations with respect to the individual E(S₁) and E(T₁) excitation energies and found that more charge neutral 2+2’-carbon atoms led to lower triplet energies, while the singlet energies remained mostly unaffected. A similar independence for excited singlet energies was observed for the sum of spins in the C+C’ rings. These observations show that these features may become useful for tuning the energetics in favour of singlet fission and generalise our previously suspected design concept of energetically asymmetric shifts towards the other substituent positions as well.

It is assumed that the independence of E(S₁) with respect to the charge on the 2,2′-carbon atoms can be explained by the electro-neutrality of the 2,2′-carbon atoms and their attached ethenyl substituents. Considering that the first singlet excitation is of global ππ^* nature, the charge of two singular carbon atoms inside the conjugated network will be efficiently re-distributed among the rest of the core π-network upon excitation anyway, regardless of their initial charge on those specific atoms - thus being of little importance to the overall E(S₁) excitation energy. However, since the electronic (spin-)density in the triplet state does not have the same global extent, but is instead accumulated strongly on the C,C’-rings and the two ethenyl substituents (cf. Fig. 3a, b), the observed negative correlation can be related to a situation where more negative local charge on the 2,2′-carbon atoms could lead to a higher energy requirement for E(T₁).

As for the second correlated feature, higher spin densities forming in any part of the molecule can be practically associated with an equally larger excitation energy for their formation, explaining the origin of the positive correlation between density and E(T₁). This is indirectly confirmed by noting that any combination of axes (i.e. plotting A+A’ against B+B’ instead) would result in an equally correlated picture. The reason why there is no correlation with E(S₁), however, is likely connected to the same asymmetric red-shift of E(T₁) versus E(S₁) as observed for the example study earlier (cf. Fig. 3c–f). While both excitation energies change with the structure in a similar way due to the practically identical hole and particle wavefunctions, only the triplet energy will additionally benefit from spin-related effects.

Finally, the evaluation of the biradical characters y₀ of 921 molecules that converged to a broken-symmetry solution implies that a third correlation with respect to the singlet fission energy criterion E(SF) is contained in the structural design principles of the random forest classifier (cf. Fig. 7). Seeing how radical stabilisation was found to be beneficial for SF for substituents at the 2,2′-positions of cibalackrot²⁷, we can thus conclude that also other positions may be used to improve the radical stabilisation further. Take note though, that even if correlations exist for the biradical value y₀ inside the chosen samples it is not a straightforward task to identify what groups and positions exactly caused which shift, due to the non-locality of the excited state wavefunction that was briefly mentioned at the end of the first half of the results section.

**Fig. 7: Correlation diagram between the singlet fission energy criterion E(SF) and the biradical value y₀.**

In summary, the sampled designs follow an asymmetric shift of the T₁ versus S₁ states’ excitation energies. More specifically, this asymmetric shift is found to be correlated to local changes in the spin density of the C,C’-ring regions and the charges on the 2,2′-carbon atoms. Both findings imply that the purely data-driven classification model has learned a design strategy, where local changes in the electronic structure close to the central part of the molecule (i.e. in the region where most of the spin density is accumulated) play a key role for designing singlet fission molecules. The design principles of the model are further backed up by a simultaneous increase in the radical stabilisation through the biradical character (i.e. y₀-value). Overall, this means that our envisioned three-step procedure was successful in learning structural design rules that could be transferred into relevant underlying wavefunction-related properties.

Discussion

Machine learning has become an invaluable asset when designing materials. On one hand, it can be used to predict key properties of molecules in an efficient way to construct databases of millions of molecules in a relatively short amount of time. This makes it possible to screen for candidate molecules in silico and identify those that satisfy the desired criteria, such as the energetic condition for SF^35,36. On the other hand, by analysing large databases, it is possible to search for correlations with specific properties to help identify potential chromophores and broaden our physicochemical understanding^25,37. In this work, these directions were rigorously combined to search for potential SF structures from a generated database, investigate the common design principles inside their molecular structure and finally translate them into more basic properties, that is, their wavefunction and electronic structure.

For this purpose, we first trained a kernel ridge regression (KRR) model to predict the first excited singlet and triplet excitation energies of the 4,573,800 chemical derivatives of 2,2′-diethenyl cibalackrot in TDDFT quality (MAEs of 0.025 and 0.026 eV, respectively) based on features obtained from semi-empirical PM6 calculations. Using this database of excitation energies, the suitability of each candidate structure for SF-PV was assessed based on two factors: (a) whether the predicted singlet excitation energy was within the range of twice the predicted triplet excitation energy (i.e. ∣E(S₁) − 2E(T₁)∣ < 0.03 eV allowing only for slightly exoergic candidates); and (b), whether the predicted triplet excitation energy E(T₁) was larger than the band gap energy of silicon (1.12 eV²⁹) to avoid voltage penalties in hypothetical devices²¹. We found that 30.15% of the 2,2′-diethenyl subspace fulfilled the SF condition (a), and 8.82% of the overall chemical space also fulfilled the photovoltaic condition (b). The comparably high percentage of SF candidates was attributed specifically to the double ethenyl functionalisation of the core structure, which greatly (and almost exclusively) decreased the triplet excitation energy compared to the unfunctionalised cibalackrot. This observation is in line with previous studies on the 2,2′-substituent position and was previously attributed to the radical stabilising effect of the ethenyl substituent²⁷. A collection of ten SF-PV structures that were identified within our predictions along with their a posteriori confirmed TDDFT/Δ-SCF excitation energies can be found in Fig. 8. Note that these structures were specifically selected from only symmetric molecules to increase their potential for synthesis with varying degrees of functionalisation as a preview of the diversity of the database. Further, since cibalackrot is a well-known indigoid vat dye with more than 100 years of history and former commercial use^38,39, its modifications may be equally promising and available for applications.

**Fig. 8: Selection of ten symmetric SF-PV candidate structures.**

To extract which chemical modifications were beneficial for singlet fission photovoltaics and to generalise towards wavefunction-based properties, a three-step extraction procedure was performed on the predicted database. In the first step, the distribution of chemical groups at different substituent positions in the set of predicted SF-PV candidate molecules was screened. Here, it was found that electron withdrawing groups (cyano- most specifically) were less common in the predicted SF-PV designs at practically any position, whereas electron donating groups (methoxy- and hydrogen substituents at the central positions, most specifically) were overall more often present (cf. Fig. 9a). However, even if these observations from inside the PV-SF set are understood as characteristic for the subspace itself, we would like to raise awareness that they remain difficult to generalise towards global design rules for the full 2,2′-diethenyl cibalackrot space. For example, while it was found that roughly 280,000 out of 400,000 SF-PV structures featured an electron donating group at position R₄, another 2,250,000 of non-SF-PV structures fall into this same category. Further, due to the highly non-local nature of the electronic structure, formulating design recommendations for several positions will become increasingly difficult, since making the decision of actually placing a specific group will lead to a changed situation for the remaining positions (example shown in Fig. 9b).

**Fig. 9: Structural occurrences of electron withdrawing and donating groups.**

To account for this problem, the second step of the procedure is aimed at extracting design principles that are both applicable to SF-PV molecules while excluding non-SF-PV structures. For this purpose, a random forest classification approach was chosen to treat the non-local nature of the problem. The model achieved an overall classification accuracy of 80.3% with equally high sensitivity and precision when classifying a balanced test set of SF-PV molecules and otherwise random non-SF-PV structures. By analysing the importance values of the classification features (Fig. 5b) together with the chemical distributions of the previous analysis step (Fig. 5a), we further deduced that bulky substituents such as phenyl- and thiophene are disfavoured especially for positions R₁ and R₁₀ which are close to the ketone oxygen and confirm that cyano-groups are not only rare in SF-PV molecules, but also carry significant exclusivity towards the non-SF-PV structures. While we are aware that this is a somewhat unusual perspective, note that we consider this combined analysis of which groups were typically present (or not) together with which factors were important in the classifier to be our most compact and concise attempt at capturing all possible structural recommendations in one image, while both paying the necessary respect to the non-locality of the problem and balancing the aforementioned generality and exclusivity that is necessary for formulating a valid recommendation.

In the last step of the analysis we transfer the structural logic learnt by the random forest model towards wavefunction-based properties that are related with SF-PV. This was done by analysing the most likely 500 SF-PV molecules and 500 non-SF-PV molecules according to the model and subsequently perform (TD)DFT, UDFT and broken-symmetry UHF calculations. Relating the SF criterion energy E(SF) = E(S₁) − 2E(T₁) to different wavefunction-related quantities (cf. Figs. 6 and 7), three meaningful correlated features were found. These are the sum of the ground state charges at the 2,2’-carbon atoms, the sum of T₁ spin density inside the C,C’-rings and the occupation numbers of the LUNO of the broken-symmetry UHF solution (i.e. the y₀-value as used in Minami et al.²⁵). Note that these are identical or at least in part related to previously reported correlations for singlet fission in a chemical space of cibalackrot that focused on 2,2′-derivates²⁷. Thus, our results not only serve as a proof of concept for the purely data-driven extraction method of design principles for future properties and chemical spaces, but further generalise design rules known to be beneficial for singlet fission towards a much larger chemical space when following the design rules provided by the random forest model.

Methods

Generation of chemical space

For the generation of cibalackrot derivatives, the core structure was decorated with eight different substituent groups (cf. Fig. 2a, b) that are commonly used in SF studies^27,36,40. The chloro- and fluoro-groups were chosen to represent small substituents with mesomeric electron-donating and inductive withdrawing effects, the cyano-group was used to represent a medium-sized electron-withdrawing substituent, and the methoxy- and ethenyl-groups were considered as electron-donating medium-sized groups. For eventual tuning of the crystal structure packing^41,42, the bulky phenyl-group was added. Finally, the 5-methylthiophenyl substituent (in the following just referred to as thiophenyl) was introduced. For simplicity of comparison with their study on aromaticity, we adopted the naming convention for 5- and 6-membered rings used by Zeng et al.²⁷.

We deemed it difficult to attach large numbers of phenyl- and thiophenyl-groups simultaneously from steric hindrance and potential synthetic perspectives, so we limited the number of such groups on either side of the half-structures (R₁–R₅ and R₆–R₁₀, respectively) to one at a time. Following these rules for chemical space generation and considering the inversion symmetry of the core scaffold, the full catalogue of structures included 215,001,216 molecules.

To identify potential SF candidate molecules from this pool of possibilities, a pre-screening was performed to find a chemical subspace in which to carry out the machine learning study (cf. ‘Results’). For this purpose, a set of 2000 structures was randomly selected from the full space, and it was investigated using quantum chemistry calculations. After this initial screening, the chemical space was narrowed down to 4,573,800 molecules, in which the central residues (R₅ and R₆) were fixed to ethenyl-groups. From this reduced space, a set of 7500 molecules was randomly selected to serve as the training set for machine learning.

Utilising the inversion symmetry of the core scaffold, initial guess structures for the calculations were efficiently generated from combinations of the half-structures (indicated by the dashed line in Fig 2a). The half-structures themselves were constructed using the Schrodinger Materials Science Suite⁴³. Note that, for the initial guess, no structural optimisation other than that provided by the Schrodinger Suite was performed⁴³. In some instances when the substituents were too close or overlapping, the guess structures were adjusted prior to any quantum chemistry calculations by rotating the substituents slightly apart.

Quantum chemistry calculations

To generate the excitation energies of the chemical space, a machine learning model was developed. For this purpose both labels (i.e. answer data E(S₁) and E(T₁)) and features (i.e. describing data) for the excitation energies of the first singlet and triplet states, had to be obtained for a set of molecules.

The label excitation energies of the 7500 selected training set structures were obtained with the following quantum chemistry protocol. After pre-optimisation of the guess structures using the semi-empirical PM6 method⁴⁴, the remaining 7214 converged structures were used in restricted density functional theory (RDFT)^45,46 calculations for final structure optimisation. The excited singlet energies E(S₁) were calculated using TDDFT^47,48 for the RDFT optimised ground state structures. Previous benchmark studies reported that triplet energies from TDDFT calculations tend to be unreliable for the study of SF materials, so the Δ-SCF method was used to calculate the E(T₁) energy labels instead^32,36.

All quantum chemistry calculations were conducted using the Gaussian programme package⁴⁹ and utilised the B3LYP functional in the respective restricted or unrestricted version^50,51,52,53. Similar to other studies³⁶, we found that combinations with the 6-31G(d’) basis set^54,55 had a favourable balance between accuracy (compared to available data from other studies) and computation time for several available standard methods. It should be mentioned that, although we expected the reported combination of the M06-2X functional and the 6-311+G(d,p) basis set to give a more accurate image²⁷, the trade-off with respect to the higher calculation time was considered unfavourable for the high-throughput nature of our study. Calculation of the diracdical character y₀ were performed by running BS-UHF calculations at the B3LYP optimised ground state structures at the higher 6-311+G(d,p) basis⁵⁶.

The features required to train the machine learning model were obtained from the semi-empirical PM6 structure optimisation calculations (which also served as pre-optimisation for the training set). On average, one of these calculations took less than 40 CPU seconds per structure, so they presented a balanced way of obtaining useful electronic structure information and structural features for the 4,573,800 molecules considered. In the following discussion, the generation of the features will be considered together with the central aspects of the machine learning model. For a general overview, Westermayr et al. recently compiled a list of state-of-the-art machine learning models for excited state properties⁵⁷.

General machine learning model specifications

The machine learning technique applied in this work was KRR,⁵⁸ as implemented in the scikit-learn Python package⁵⁹. This is a non-linear method where a sum of products is used to predict a desired property (here, energy E^pred) connected to a molecule $M^{\prime}$ with respect to a training set M with known reference energies. That is,

$${E}^{{{{\rm{pred}}}}}(M^{\prime} ,{{{\bf{M}}}})=\mathop{\sum}\limits_{i}{\alpha }_{i}f(M^{\prime} ,{M}_{i}).$$

(1)

Here, α_i are the coefficients to be trained, and the function $f(M^{\prime} ,{M}_{i})$ determines how similar molecule $M^{\prime}$ is to the i-th molecule of the training set M_i. Training of the coefficients is performed by minimising the expression

$${\min }_{\alpha }\mathop{\sum}\limits_{j}{({E}^{{{{\rm{pred}}}}}({M}_{j})-{E}_{j}^{{{{\rm{lab}}}}})}^{2}+\lambda \mathop{\sum}\limits_{j}{\alpha }_{j}^{2},$$

(2)

for the training subset. Here the superscript ‘lab’ indicates the label values for the respective excitation energies E. To increase the generality, a regularisation term with respect to the norm of the coefficients was added. The corresponding prefactor λ is a hyperparameter of the model itself, and it does not change during training; instead, it needs to be optimised separately via cross-validation. Through kernelisation, the minimisation problem can be expressed as the matrix inversion problem

$${{{\boldsymbol{\alpha }}}}={({{{\bf{K}}}}+\lambda {{{\bf{I}}}})}^{-1}{{{{\bf{E}}}}}^{{{{\rm{ref}}}}},$$

(3)

in which I is the identity matrix and K is the so-called kernel matrix. Individual elements of the kernel matrix are obtained as K_ij = f(M_i, M_j). To design a machine learning model suitable for predicting excitation energies, different descriptive expressions for f correlated to the desired quantity must be defined.

In this work, f took the form of products of Gaussians with respect to the abstract distance measures between two molecules d_g(M_i, M_j),

$$f({M}_{i},{M}_{j})=\exp \left(-\left(\mathop{\sum}\limits_{g}\frac{{d}_{g}{({M}_{i},{M}_{j})}^{2}}{2{\sigma }_{g}^{2}}\right)\right).$$

(4)

Here, the width of each respective Gaussian σ_g is a hyperparameter of the machine learning model. Note that in practice, however, we made an initial guess for the individual σ_g based on the data distribution in histograms of the training set’s distances d_g, and then scaled all σ_g according to a single, collective hyperparameter σ. Therefore the model only uses two hyperparameters that needed optimisation through cross-validation.

Depending on the functional choice of the distance measures d_g, the machine learning model will be more or less capable of relating the features of a set of molecules to their respective labels. To prevent the model from learning the training set without being able to generalise to unknown structures (i.e., to prevent overfitting), we performed stratified 5-fold cross-validation for the selection of hyperparameters and benchmarked models that were trained with differently split compositions of 80% training and 20% testing sets.

Overall, the final models use a total of 54 features belonging to fifteen types of distance measure descriptors d_g, the detailed descriptions of which can be found in the ESI. Note that, even though the features were designed to support machine learning correlations, not all of them necessarily reflected physico- and quantum chemical behaviours. Further, the landscape of machine learning assisted chemical design studies is growing rapidly, and by the nature of the task, the developed features are often custom-made solutions for a given problem, rather than universally applicable quantities whose true origin can be easily tracked. Therefore, although we reference special cases where we know of an earlier application or definition of a feature^60,61, the list of references is almost certainly incomplete.

Finally, take note that in it’s current form, the models for energy prediction designed here are not transferable to other chemical spaces in their current form, since they were only developed as a tool to predict the excitation energies of cibalackrot-type molecules.

Random forest classifier

To determine a mapping between the predicted excitation energies and chemical structures, a random forest classifier^62,63, as implemented in the sci-kit learn Python programme package⁵⁹, was trained to distinguish SF-PV structures from non-SF-PV structures only using information regarding the chemical substituents and their positions. In total, 120 features were used for training, which were subdivided into three groups of increasing generality as follows.

In the first group of features, each possible combination of the eight chemical substituents at the eight possible functionalisation sites R₁–R₄ and R₇–R₁₀ was evaluated. This resulted in 8² = 64 different Boolean features (e.g., checking for specific modifications such as Ph₁). Similarly, a second group was constructed such that any of the pairwise inversion-symmetry equivalent positions counted for the determination of the Boolean. In other words, for all eight chemical groups, the four pairs of positions were evaluated in the same way as for group one (e.g., specific modification Ph_1,10 at R₁ or R₁₀). This treatment resulted in 4 ⋅ 8 = 32 features for the second group.

Finally, a third group of features was generated based on the predicted chemical group distribution of SF-PV molecules (cf. Fig. 1c). For this purpose, the substituent sites of each molecule were evaluated based on whether they were carrying a substituent that was typical, common, or uncommon in SF-PV molecules. The three cases were distinguished as follows.

In the predicted distribution of chemical groups in SF-PV molecules (cf. Fig. 1c), for each position, a specific portion of molecules was found to carry a chemical group at given pairs of chemical inversion-symmetry equivalent sites, expressed as p^Pos(Grp). For each site, the mean percentage ${\tilde{p}}^{{{{\rm{Pos}}}}}$ and standard deviation σ^Pos was calculated. The portion of phenyl- and thiophenyl groups was different owing to the choice of structure generation rules per se; therefore, to equalise the picture somewhat, the portion of phenyl- and thiophenyl groups was doubled for the purpose of p^Pos(Grp). For each position, the distinction between typical, common and uncommon groups was then performed according to:

$${{{\rm{Type}}}}\,\,{{{\rm{of}}}}\,\,{{{{\rm{Grp}}}}}_{{{{\rm{Pos}}}}}=\left\{\begin{array}{l}{{{\rm{typical}}}}\,\,\,\,{{{\rm{for}}}}\,\,{p}^{{{{\rm{Pos}}}}}({{{\rm{Grp}}}}) \,>\, {\tilde{p}}^{{{{\rm{Pos}}}}}+{\sigma }_{p}^{{{{\rm{Pos}}}}}/2,\quad \\ {{{\rm{uncommon}}}}\,\,\,\,{{{\rm{for}}}}\,\,{p}^{{{{\rm{Pos}}}}}({{{\rm{Grp}}}}) \,<\, {\tilde{p}}^{{{{\rm{Pos}}}}}-{\sigma }_{p}^{{{{\rm{Pos}}}}}/2,\quad \\ {{{\rm{common}}}}\,\,\,\,\,{{\mbox{for all remaining cases.}}}\,\quad \end{array}\right.$$

(5)

Following this scheme, for each position of each molecule, one of three cases was assigned. These three cases were subsequently one-hots encoded to obtain three Boolean features for each of the eight positions, giving a total of 24 features for this last group.

To check the generality of the resulting random forest classifier, all structures were randomly subjugated to an inversion operation before feature generation. The model used 300 different decision trees with a maximum depth of 20 distinguishing steps. It was confirmed that for different randomisation seeds, training-testing splits and slightly changed model size and depth parameters, the arising classification rules were approximately identical, within reasonable margins. The training itself was conducted with a balanced data subset of 50% SF-PV and 50% non-SF-PV molecules. From the full chemical space, different subsets were constructed. First, a balanced set with non-SF-PV structures taken from all energetic regions of the E(S₁)/E(T₁) distribution was considered. Second, non-SF-PV structures were taken from specific energetic quadrants with respect to the mean energies $\tilde{E}({{{{\rm{S}}}}}_{{{{\rm{1}}}}})$ and $\tilde{E}({{{{\rm{T}}}}}_{{{{\rm{1}}}}})$ (cf. grey lines on left of Fig. 1c). From each of the quadrants, one additional balanced set was generated. Training was conducted using each of the subsets, which resulted in different classification rules each time. To test the quality of the resulting models, each dataset was split 80:20 into training and testing sets. The quality of each model was determined using the test set of the balanced random set. The best performing model was considered based on accuracy and sensitivity with an emphasis on the latter, since in the actual chemical space much more non-SF-PV structures are expected. It should be noted that although all the datasets used the same SF-PV molecules before the training/testing split, the non-SF-PV entries of each quadrant may or may not have randomly overlapped with the balanced random set.

Once the model was chosen, the probability of each classification, ’SF-PV’ or ’non-SF-PV’, was calculated for all 4,463,586 molecules. From these molecules, the ones with the highest probabilities for SF-PV and non-SF-PV were treated as the most confident; i.e., most congruent with the learned classifier. Consequently, these confident cases were expected to yield the best separation in subsequent investigations of the quantum chemical properties, as performed in the results section.

Data availability

Owing to the size of the database, it is not practical to upload the full contents. Instead, the PM6 optimised structures alongside their predicted energies are available upon reasonable request from the authors.

Code availability

Template Gaussian inputs that were used here, as well as a working sample for the machine learning model predictions, are available upon reasonable request from the authors.

References

Ripple, W. J. et al. World scientists’ warning of a climate emergency 2021. Bioscience 71, 894–898 (2021).
Article Google Scholar
BP Statistical Review of World Energy 2021, https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy/downloads.html (Accessed 22 October 2021).
World Energy Outlook 2021. International Energy Agency, https://doi.org/10.1787/14fcb638-en, (2021).
Nerini, F. F. et al. Mapping synergies and trade-offs between energy and the Sustainable Development Goals. Nat. Energy 3, 10–15 (2018).
Article Google Scholar
Green, M. A. Silicon photovoltaic modules: a brief history of the first 50 years. Prog. Photovolt. Res. Appl. 13, 447–455 (2005).
Article Google Scholar
Chawla, R., Singhal, P. & Garg, A. K. Photovoltaic review of all generations: environmental impact and its market potential. Trans. Electr. Electron. Mater. 21, 456 (2020).
Article Google Scholar
Conibeer, G. Third-generation photovoltaics. Mater. Today 10, 42 (2007).
Article CAS Google Scholar
Green, M. A. et al. Solar cell efficiency tables (Version 58). Prog. Photovolt. Res. Appl. 29, 657–667 (2021).
Article Google Scholar
Dambhare, M. V., Butey, B. & Moharil, S. V. Solar photovoltaic technology: a review of different types of solar cells and its future trends. J. Phys. Conf. Ser. 1913, 012053 (2021).
Article CAS Google Scholar
Shockley, W. & Queisser, H. J. Detailed balance limit of efficiency of p-n junction solar cells. J. Appl. Phys. 32, 510 (1961).
Article CAS Google Scholar
Hanna, M. C. & Nozik, A. J. Solar conversion efficiency of photovoltaic and photoelectrolysis cells with carrier multiplication absorbers. J. Appl. Phys. 100, 074510 (2006).
Article CAS Google Scholar
Singh, S., Jones, W. J., Siebrand, W., Stoicheff, B. P. & Schneider, W. G. Laser generation of excitons and fluorescence in anthracene crystals. J. Chem. Phys. 42, 330 (1965).
Article CAS Google Scholar
Xia, J. et al. Singlet fission: progress and prospects in solar cells. Adv. Mater. 29, 1601652 (2017).
Article CAS Google Scholar
Chan, W.-L. et al. The quantum coherent mechanism for singlet fission: experiment and theory. Acc. Chem. Res. 46, 1321–1329 (2013).
Article CAS Google Scholar
Mirjani, F., Renaud, N., Gorczak, N. & Grozema, F. C. Theoretical investigation of singlet fission in molecular dimers: the role of charge transfer states and quantum interference. J. Phys. Chem. C 118, 14192–14199 (2014).
Article CAS Google Scholar
Smith, M. B. & Michl, J. Singlet fission. Chem. Rev. 110, 6891–6936 (2010).
Article CAS Google Scholar
Seiler, H. et al. Nuclear dynamics of singlet exciton fission in pentacene single crystals. Sci. Adv. 7, eabg0869 (2021).
Article CAS Google Scholar
Abraham, V. & Mayhall, N. J. Revealing the contest between triplet-triplet exchange and triplet-triplet energy transfer coupling in correlated triplet pair states in singlet fission. J. Phys. Chem. Lett. 12, 10505–10514 (2021).
Article CAS Google Scholar
Sakai, H. et al. Multiexciton dynamics depending on intramolecular orientations in pentacene dimers: recombination and dissociation of correlated triplet pairs. J. Phys. Chem. Lett. 9, 2254–3360 (2018).
Article CAS Google Scholar
Einzinger, M. et al. Sensitisation of silicon by singlet exciton fission in tetracene. Nature 571, 90–94 (2019).
Article CAS Google Scholar
Daiber, B., van de Hoeven, K., Futscher, M. H. & Ehrler, B. Realistic Efficiency Limits for Singlet-Fission Silicon Solar Cells. ACS Energy Lett. 6, 2800–2808 (2021).
Article CAS Google Scholar
Zhang, Y.-D. et al. Excessive exoergicity reduces singlet exciton fission efficiency of heteroacenes in solutions. J. Am. Chem. Soc. 138, 6739–6745 (2016).
Article CAS Google Scholar
Zeng, T. et al. Identifying (BN)2 pyrenes as a new class of singlet fission chromophores: significance of azaborine substitution. J. Phys. Chem. Lett. 9, 2919–2927 (2018).
Article CAS Google Scholar
Kubo, T. Recent progress in quinoidal singlet biradical molecules. Chem. Lett. 44, 111–122 (2015).
Article CAS Google Scholar
Minami, T. & Nakano, M. Diradical character view of singlet fission. J. Phys. Chem. Lett. 3, 145–150 (2012).
Article CAS Google Scholar
El Bakouri, O., Smith, J. R. & Ottosson, H. Strategies for design of potential singlet fission chromophores utilizing a combination of ground-state and excited-state aromaticity rules. J. Am. Chem. Soc. 142, 5602–5617 (2020).
Article CAS Google Scholar
Zeng, W., Szczepanik, D. W., Bronstein, H. & Ottosson, H. Excited state character of Cibalackrot-type compounds interpreted in terms of Hückel-aromaticity: a rationale for singlet fission chromophore design. Chem. Sci. 12, 6159 (2021).
Article CAS Google Scholar
Fallon, K. J. et al. Exploiting excited-state aromaticity to design highly stable singlet fission materials. J. Am. Chem. Soc. 141, 13867–13876 (2019).
Article CAS Google Scholar
Klimm, D. Electronic materials with a wide band gap: recent developments. IUCrJ 1, 281–290 (2014).
Article CAS Google Scholar
Lu, C. et al. Deep learning for optoelectronic properties of organic semiconductors. J. Phys. Chem. C 124, 7048–7060 (2020).
Article CAS Google Scholar
Han, J., Rehn, D. R., Buckup, T. & Dreuw, A. Evaluation of single-reference DFT-based approaches for the calculation of spectroscopic signatures of excited states involved in singlet fission. J. Phys. Chem. A 124, 8446–8460 (2020).
Article CAS Google Scholar
Grotjahn, R., Maier, T. M., Michl, J. & Kaupp, M. Development of a TDDFT-based protocol with local hybrid functionals for the screening of potential singlet fission chromophores. J. Chem. Theory Comput. 13, 2984–4996 (2017).
Article CAS Google Scholar
Laurent, A. D. & Jacquemin, D. TD-DFT benchmarks: a review. Int. J. Quantum Chem. 113, 2019–2039 (2013).
Article CAS Google Scholar
Gramfort, A. et al. MEG and EEG data analysis with MNE-Python. Front. Neurosci. 7, 267 (2013).
Article Google Scholar
Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).
Article CAS Google Scholar
Perkinson, C. F. et al. Discovery of blue singlet exciton fission molecules via a high-throughput virtual screening and experimental approach. J. Chem. Phys. 151, 121102 (2019).
Article CAS Google Scholar
Liu, X., Tom, R., Gao, S. & Marom, N. Assessing zethrene derivatives as singlet fission candidates based on multiple descriptors. J. Phys. Chem. C. 124, 26134–26143 (2020).
Article CAS Google Scholar
Seixas de Melo, J. et al. Photophysics of an Indigo Derivative (Keto and Leuco Structures) with Singular Properties. J. Phys. Chem. A. 110, 13653–13661 (2006).
Article CAS Google Scholar
Engi, G. Über neue Derivate des Indigos und anderer indigoider Farbstoffe. Z. Angew. Chem. 27, 144–148 (1914).
Article CAS Google Scholar
Shen, L. et al. Effects of aromatic substituents on the electronic structure and excited state energy levels of diketopyrrolopyrrole derivatives for singlet fission. Phys. Chem. Chem. Phys. 20, 22997 (2018).
Article CAS Google Scholar
Zaykov, A. et al. Singlet fission rate: optimized packing of a molecular pair. Ethylene as a model. J. Am. Chem. Soc. 141, 17729–17743 (2019).
Article CAS Google Scholar
Suarez, L. E. A., de Graaf, C. & Faraji, S. Influence of the crystal packing in singlet fission: one step beyond the gas phase approximation. Phys. Chem. Chem. Phys. 23, 14164 (2021).
Article Google Scholar
Schrödinger Release 2021-4: Maestro. (Schrödinger, LLC, New York, NY, 2021).
Stewart, J. J. P. Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements. J. Mol. Model. 13, 1173–1213 (2007).
Article CAS Google Scholar
Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B71 (1964).
Article Google Scholar
Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133 (1965).
Article Google Scholar
Bauernschmitt, R. & Ahlrichs, R. Treatment of electronic excitations within the adiabatic approximation of time dependent density functional theory. Chem. Phys. Lett. 256, 454–464 (1996).
Article CAS Google Scholar
Casida, M. E., Jamorski, C., Casida, K. C. & Salahub, D. R. Molecular excitation energies to high-lying bound states from time-dependent density-functional response theory: Characterization and correction of the time-dependent local density approximation ionization threshold. J. Chem. Phys. 108, 4439–4449 (1998).
Article CAS Google Scholar
Frisch, M. J. et al. Gaussian 16, Revision C.01. (Gaussian, Inc., Wallingford CT, 2016).
Becke, A. D. Density-functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 98, 5648–5652 (1993).
Article CAS Google Scholar
Lee, C., Yang, W. & Parr, R. G. Development of the Colle–Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 37, 785–789 (1988).
Article CAS Google Scholar
Vosko, S. H., Wilk, L. & Nusair, M. Accurate spin-dependent electron liquid correlation energies for local spin density calculations: a critical analysis. Can. J. Phys. 58, 1200–1211 (1980).
Article CAS Google Scholar
Stephens, P. J., Devlin, F. J., Chabalowski, C. F. & Frisch, M. J. Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields. J. Phys. Chem. 98, 11623–11627 (1994).
Article CAS Google Scholar
Petersson, G. A. et al. A complete basis set model chemistry. I. The total energies of closed-shell atoms and hydrides of the first-row atoms. J. Chem. Phys. 89, 2193 (1988).
Article CAS Google Scholar
Petersson, G. A. & Al-Laham, M. A. A complete basis set model chemistry. II. Open-shell systems and the total energies of the first-row atoms. J. Chem. Phys. 94, 6081 (1991).
Article CAS Google Scholar
Krishnan, R., Binkley, J. S., Seeger, R. & Pople, J. A. Self-consistent molecular orbital methods. XX. A basis set for correlated wave functions. J. Chem. Phys. 72, 650 (1980).
Article CAS Google Scholar
Westermayr, J. & Marquetand, P. Machine learning for electronically excited states of molecules. Chem. Rev. 121, 9873–9926 (2021).
Article CAS Google Scholar
Murphy, K. P. Machine Learning: A Probabilistic Perspective. 492–493. (The MIT Press, Cambridge, MA, 2012).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
Article CAS Google Scholar
Krygowski, T. M. Crystallographic studies of inter- and intramolecular interactions reflected in aromatic character of .pi.-electron systems. J. Chem. Inform. Comput. Sci. 33, 70–78 (1993).
Article CAS Google Scholar
Breiman, L. Arcing classifiers. Ann. Stat. 26, 801–824 (1998).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Dennington, R., Keith, T. A., Millam, J. M. GaussView Version 6.1. (Semichem Inc. Shawnee Mission, KS, 2016).
Hermann, G. et al. ORBKIT: a modular Python toolbox for cross-platform postprocessing of quantum chemical wavefunction data. J. Comput. Chem. 37, 1511–1520 (2016).
Article CAS Google Scholar
Pohl, V., Hermann, G. & Tremblay, J. C. An open-source framework for analyzing n-electron dynamics: I. Multideterminantal wave functions. J. Comput. Chem. 38, 1515–1527 (2017).
Article CAS Google Scholar
Pohl, V., Hermann, G. & Tremblay, J. C. An open-source framework for analyzing n-electron dynamics: II. Hybrid density functional theory/configuration interaction methodology. J. Comput. Chem. 38, 2378–2387 (2017).
Article CAS Google Scholar

Download references

Acknowledgements

This project was funded by the Japan Society for the Promotion of Science (P20703) and through Kakenhi (20F20703). Calculations were carried out at the Research Center for Computational Science (Okazaki, Japan). F.W. acknowledges fruitful discussions on machine learning with R. Suzuki and many thought provoking questions from the group members and students of HM during the regular seminars.

Author information

Authors and Affiliations

Department of Applied Chemistry, Chuo University, 112-8551 Tokyo-to, Bunkyo-ku, Kasuga, 1-13-27, Japan
Fabian Weber & Hirotoshi Mori
Department of Theoretical and Computational Molecular Science, Institute for Molecular Science, Okazaki, 444-8585, Japan
Hirotoshi Mori

Authors

Fabian Weber
View author publications
You can also search for this author in PubMed Google Scholar
Hirotoshi Mori
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The research outline and decision on a core structure of interest for this work were conceived to equal parts by both authors. Programming and execution of quantum chemistry related inputs, database archiving and machine learning codes were performed by F.W. The contents of inputs and codes (i.e. feature selection and generation method, label generation method, machine learning type for both energy predictions and classification) were decided on by both authors equally through regular discussion and seminars. The initial draft of the manuscript and its figures were composed by F.W. and refined to its current form, including the manuscript review process, by both authors.

Corresponding authors

Correspondence to Fabian Weber or Hirotoshi Mori.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Weber, F., Mori, H. Machine-learning assisted design principle search for singlet fission: an example study of cibalackrot. npj Comput Mater 8, 176 (2022). https://doi.org/10.1038/s41524-022-00860-1

Download citation

Received: 23 March 2022
Accepted: 29 July 2022
Published: 20 August 2022
DOI: https://doi.org/10.1038/s41524-022-00860-1