Comparing data driven and physics inspired models for hopping transport in organic field effect transistors

The past few decades have seen an uptick in the scope and range of device applications of organic semiconductors, such as organic field-effect transistors, organic photovoltaics and light-emitting diodes. Several researchers have studied electrical transport in these materials and proposed physical models to describe charge transport with different material parameters, with most disordered semiconductors exhibiting hopping transport. However, there exists a lack of a consensus among the different models to describe hopping transport accurately and uniformly. In this work, we first evaluate the efficacy of using a purely data-driven approach, i.e., symbolic regression, in unravelling the relationship between the measured field-effect mobility and the controllable inputs of temperature and gate voltage. While the regressor is able to capture the scaled mobility well with mean absolute error (MAE) ~ O(10–2), better than the traditionally used hopping transport model, it is unable to derive physically interpretable input–output relationships. We then examine a physics-inspired renormalization approach to describe the scaled mobility with respect to a scale-invariant reference temperature. We observe that the renormalization approach offers more generality and interpretability with a MAE of the ~ O(10–1), still better than the traditionally used hopping model, but less accurate as compared to the symbolic regression approach. Our work shows that physics-based approaches are powerful compared to purely data-driven modelling, providing an intuitive understanding of data with extrapolative ability.

www.nature.com/scientificreports/ understanding of the physics of transport and hence can be predictive in nature. In essence, this boils down to the dependence of charge hopping upon the thermodynamic temperature (T) and the modulation of additional carriers in the polymeric channel via the applied gate voltage (V g ) in the OFET configuration. Similar efforts at understanding physical principles in other materials and physical systems have been employed in recent times with simulated physics datasets. "AI Feynman" (AI stands for Artificial Intelligence) is a recursive symbolic regression algorithm, which connects neural networks with physics-inspired methods such as dimensionality analysis, generalized symmetries, multiplicative separability, and inversion to discover physical equations that explain the data 13,14 . The use of symbolic neural networks to learn the dynamics of data with a Partial Differential Equation (PDE-NET) fixes the functional form but is able to capture the underlying parameters 15 . Physics-informed Neural Networks (PINNs) respect the laws of physics while solving both differentiable (data-efficient approximators) and discrete datasets and have been used to solve problems in fluids 16 , quantum mechanics and more 17 . In this work, we investigate the efficacy of a powerful mathematical approach, namely symbolic regression (SR), in recapturing underlying physics after fitting transport data of three different OFETs based on organic molecules of p-type pentacene, and n-type buckminsterfullerene (C 60 ) as well as conjugated polymer poly(3-hexylthiophene-2,5-diyl) (P3HT). While the regressor can fit the experimental data after carefully tuning the hyperparameters, physical interpretability is not easy and guaranteed in all cases. The regressor does succeed in identifying the physically meaningful and essential terms to describe the transport. However, it is unable to decouple the effect of the activation energy modulated by varying gate voltage from the magnitude of the thermal energy window that is directly proportional to temperature. We see that the regressor treats the whole dataset as one entity, as opposed to a system of five curves following the same bi-parametric equation, where the data in each curve is generated by only varying the value of one of the parameters, while keeping the other parameter constant. The resulting equation fits the entire dataset at once instead of providing a solution that is repeatable curve-by-curve by varying the value of the second variable/parameter (gate voltage in this case). To overcome this missing physical insight, we compare this with a renormalization approach to describe the transport by defining dimensionless quantities from the temperature dependent mobility data. Derivation of dimensionless quantities captures both the effects of temperature as well as gate voltage embedded in a single parameter, rather than in multiple parameters as seen in the existing models 18 , and provides us with a one-shot approach in describing and characterising hopping transport in organic semiconductors. We demonstrate that this approach is powerful as it also helps to differentiate between processing conditions for the same molecules by comparing the extent of activation energy that is accessed over the range of the applied gate voltages. Figure 1 depicts various approaches for extracting input-output relationships involved in hopping transport phenomena.
As a benchmark, we use the model derived by Bassler et al. for the field-effect mobility (μ e ) using the effective medium approximation and the concept of effective transport energy 11,19 , while considering Gaussian density of states (DOS) and Miller-Abrahams jump rates for thermally activated Arrhenius-type hopping. Here, Figure 1. An overview of various approaches to extract the underlying input-output relationships from temperature-dependent field-effect mobility data. Here, the left panel consists of a representative temperaturedependent field-effect mobility data at different gate voltages. The middle panel captures the methods followed in each approach. The panel on the right shows the final solution form for each approach. For the Bassler Model, the unknowns are coloured, and the derived equation fits these values. In Symbolic Regression, inputs (X 0 and X 1 ), constants and mathematical operations are fed into the algorithm, while a solution tree produces a fitting function, Y. For the Renormalization Model, a generic fit by rescaling the inputs is obtained.  As illustrated in Fig. S3, σ indicates the broadening of the Gaussian DOS, a/b is the ratio of inter-site distance and localization radius. Further, n/N is the ratio of occupied and total number of transport states. T 0 is the isokinetic temperature at which all curves of varying E a converge; n is the number of occupied transport states, which can be tuned by the applied gate voltage (V g ) since it is directly proportional to V g . The total number of transport states, N, is a constant of the order of ~ 10 23 cm −3 , which relates to the lattice constant (a) as N = a −3 for an assumed cubic lattice.
In the equation above, the hopping process is affected by both spatial (a, b) and energetic terms ( E a ) . The spatial terms indicate the extent of hopping (given by a) and the localization radius between two sites (given by b), while the energetic terms specify the activation required for the phonon-assisted excitation, spatial tunnelling, and phonon emission-all together considered as one hopping event-to take place from one site to another. The tunnelling strongly depends on the extent of wave function overlap between neighbouring molecular orbitals. This results in an increase in mobility when the wave functions overlap is higher, which thereby decreases the activation energy for the hopping event 20,21 . The activation energy also relates to the structure and morphology of the semiconducting layer through molecular orientation and processing conditions.

Data description
The temperature-dependent mobility data for pentacene, P3HT, C 60 OFETs is obtained from literature [22][23][24] . While pentacene and P3HT were fabricated through solution processing methods on Si/SiO 2 substrates with Au contacts, C 60 film was fabricated through standard flush evaporation or hot-wall epitaxy 25 on ITO/divinyltetramethyldisiloxane-bis(benzocyclobutane) (BCB) substrates with LiF/Al top contacts. The pentacene film can be regarded as polycrystalline, while the P3HT film has ordered regions that alternate with disordered regions. The C 60 film is of greater crystallinity compared to both P3HT and pentacene films. The field-effect mobility (output) data recorded as a function of the change in two variables namely temperature and gate voltage (inputs), is shown in Fig. 2a for pentacene OFET and Fig. S1 for P3HT and C 60 OFETs. Typical of materials science dataset sizes, each OFET data is comprised of ~ 100 points, stacking input-output information from five temperaturedependent curves at five different gate voltages that were used to record the data (as seen in Fig. 2a). The activation energy can be regulated by varying the applied gate voltage, i.e., a larger gate voltage signifies that a smaller activation energy is needed.

Symbolic regression: an approach for data-driven learning of OFET behaviour
First, we introduce Symbolic Regression (SR) and discuss the fundamentals of genetic programming and how it is employed specifically in this work to extract the underlying input-output relationship in the form of comprehensive mathematical expressions 26,27 . Although other machine learning methods are useful in extracting input-output relations, they employ predetermined functional forms to fit the data (for instance, polynomial regression). On the contrary, SR attempts to solve such problems without requiring any prior knowledge of the functional forms, by utilizing a finite set of operators and functions (like arithmetic, trigonometric, exponential). Such a symbolic input-output mapping is beneficial as it offers interpretability in applied physics and material science, which is the motivation behind using this technique over other machine learning models [27][28][29] .
For characterizing hopping-like transport phenomena in OFETs, several models have been proposed to relate the field-effect mobility with inputs like temperature and back-gate voltage. However, these models differ in both the functional forms as well as the nature and number of the transport parameters used. In the present work, SR is applied on each OFET dataset where the input variables are defined as X 0 = E a and X 1 = k B T and the output Figure 2. (a) Temperature-dependent mobility data for pentacene OFET; (b) Simultaneous visualization of the fits obtained from symbolic regression (SR) (red symbol) and the Bassler model (blue symbol) with respect to the experimental data (black symbol) (data extracted from Ref. 19 ). prefactor . The activation energy E a for each V g curve is obtained from the slope of the respective curve by expressing the logarithmic mobility as a function of the reciprocal temperature. The thermal energy associated at temperature T is calculated as a product of the temperature T and the Boltzmann constant, k B . This choice of the two input variables, X 0 = E a and X 1 = k B T is so that both the variables are in the same units (eV) and hence guide the SR outcome expressions into a meaningful and interpretable form, since the range of values for both E a and k B T were similar. This is of special significance because a choice of E a ≫ k B T would mean that the hopping energy is too large, and transport does not take place. On the other hand, E a ≪ k B T would result in a transport mechanism that is different from hopping, akin to metallic conduction, which is unlikely in organic thin films. Further, the usage of dimensionless input variables obtained by normalising the existing X 0 and X 1 parameters with the thermal energy at room temperature, i.e., k B (300K) , was also explored and discarded as it did not help in accuracy or interpretability.
The output variable Y, denoting revised mobility, is a dimensionless quantity. The Bassler model expression can be written in log 10 scale as, where, µ e prefactor is the dimensionless revised mobility. The framework applied in this paper seeks to find out a mapping Y = f (X 0 , X 1 ) . We use the gplearn package in Python to perform SR 26 .
GP minimizes the Mean Absolute Error (MAE) between the experimental data and the predictions to produce symbolic expressions. In addition, upon evaluating the best values of SR parameters, specifically, the parsimony coefficient and the tournament size (details can be found in Supplementary Information) are determined using grid search, and the resulting trees render mathematical expressions for the three different OFETs as shown in Table 1. After inspecting the results from SR, we find that the outcome expressions fit the experimental data with an MAE of ~ 0.02-0.03 with a fit better than that offered by the Bassler model; a comparison of the results between the SR outcome and the Bassler Model, is shown in Fig. 2b.
In Table 1, SR outcome expressions capture the expected terms according to the Bassler model, i.e., E a and (E a /k B T), and include higher-order non-linear terms. Therefore, the coefficients attached to these terms, expectedly do not match the coefficients in Eq. (1) 27 . In summary, the SR gives us better MAE at a cost of lack of generality from a physical equation, which is what the Bassler Model offers. An innate limitation of the SR approach is limited accuracy with sparse data, which can be mitigated by using more densely populated data 13 .

Physics-inspired model through renormalization
In order to extract underlying physical equations, we next consider a simple yet robust physics-driven approach that makes use of fewer parameters to characterize the underlying transport phenomena in OFETs. In solid-state physics, the concept of renormalization theory is utilized to renormalize the inputs into dimensionless quantities in such a way that certain critical points are identified as scale-invariant, which captures the physics behind the observed data. This has been employed in the study of second-order magnetic phase transitions, for instance 30,31 .
We consider the same OFET data as used for the SR and the Bassler Model. We notice that the carrier mobility, the average velocity of the charge carrier to the electric field, of these disordered systems strongly depends on the carrier concentration, i.e., on applied gate bias ( µ ∝ V n g , where n is the exponent) 32 . The carrier mobility of these organic materials is expected to follow a similar trend with several parameters needed to describe its temperature and field-dependent behaviour. Typically, the carrier mobility of disordered systems can be represented with the help of a scaling law given by µ ∝ T γ e ( E k B T ) 33 . In this particular case, for each mobility curve at fixed gate bias (V g ), T p is the temperature corresponding to the maximum mobility (µ p ) and T r is the reference temperature corresponding to half of the maximum mobility (µ p /2). Due to the limitation of a small number of data points, T r is obtained by interpolation directly from the mobility-temperature plots. We choose T p as 300 K as the scale-invariant point (T p ) in this work for all the studied OFETs.
Upon extracting the T r values, we defined two dimensionless quantities µ ′ = µ µ p and θ ′ = T r −T p T−Tp for every data point in the dataset. μ′ always takes a value greater than 0 and reaches a maximum value of 1 at T = T p , i.e., 0 ≤ µ′ ≤ 1. We have obtained the universal plot using temperature-dependent mobility at different V g . The T−Tp curves collapse into a single curve for different V g , as seen for pentacene OFET data in Fig. 3a.
The obtained universal curve is well fitted by an exponential function µ ′ = aexp b θ ′ +c with a ∼ = 1.04, b ∼ = −0.65, and c ∼ = −0.13 being the fitting constants, respectively. With simple dimensionless normalization parameters we find a comprehensive universality, where all the mobility curves at different gate voltages collapse into a single curve, exhibiting a general form across different organic semiconducting materials of the same hopping mechanism, as evidenced from Fig. 3b.
Furthermore, the knowledge of the T r value enables us to calculate the value of the activation energy for that temperature-dependent mobility curve. The activation energy E a can be expressed in terms of T p and T r as E a = k B ln (2) T r T p T p −T r , by evaluating the Bassler model equation at T = T p and T = T r . Section S1 in the Supplementary Information gives a more detailed derivation of this relationship. Figure 4a shows the obtained/calculated range of activation energies, which are accessible to five different molecules by varying the gate voltage. In general,  The horizontal axes denote the semiconductor for which the activation energy E a is plotted on the vertical axes. For each organic system, the plots display the entire range of E a values, with the data points being indicated by solid symbols. The minimum and maximum E a values are marked by the topmost and bottommost data points, respectively. The corresponding mean and median E a values for each OFET are indicated by the unfilled square and horizontal line, respectively, within each box; (a) activation energies (E a ) accessed by each OFET as calculated from the range of gate voltages used in calculating the mobilitytemperature dependence for each molecule. The E a values have been calculated from the values of T r and T p for each curve and are shown for each OFET (filled spheres). The inset shows the semiconducting material whose E a is plotted; (b) variation in the range of E a accessed by each OFET fabricated using different processing techniques for the same semiconducting material. The inset represents the different processing conditions used as indicated by varying box patterns. www.nature.com/scientificreports/ at low gating (low energy state filling) charge carriers require more energy for the hopping to occur, and this activation energy decreases as the gating voltage is increased (more negative values). The strength of this approach, in comparison to SR and the Bassler Model described earlier, is that a general rule to fix T r as a function of V g , and the values of E a , are sufficient to capture the trends for five different organic systems. This, therefore, can be regarded as a generic fitting approach wherein the only descriptor required to convert the µ-T data to the µ′-θ′ data is the value of T r for each V g curve. Figure 4b shows the activation energies accessible to each OFET that were fabricated under different processing conditions. Our results indicate that solution-processed P3HT films with chloroform (CH) require higher activation energies on average than the films grown in trichlorobenzene (TCB) 34 . The mobility is much higher for the TCB films as they require lower activation energy for the hopping transport. This is in line with previous reports on the solution processing of P3HT, wherein the higher mobility of TCB films is attributed to the lower volatility and improved crystallization resulting from deposition by spin-coating 35 .
Similarly, for C 60 , the films grown through hot wall epitaxy at T sub = 250 °C show a lower band of activation energies accessed for almost similar values of gate voltage as compared to the films grown at T sub = 130 °C 24 . This is again consistent with the overall level of crystallization of the resultant films, as evidenced by the AFM images and the value of the broadening of the DOS in either case (75.5 meV at T sub = 130 °C and 53.5 meV at T sub = 250 °C). A narrow broadening or a sharper DOS is seen in the case of molecules that have better crystallite ordering. Thus, the epitaxially grown pentacene films 36 access a small range of activation energies and are less dispersed, as compared to the solution processed films 22 .

Discussion
We have shown previously that the activation energy E a can be expressed as E a = k B ln(2) T r T p T p −T r . This relation is invertible for all molecules as both the original relation and the universal relation express mobility as an exponential function of the negative of the reciprocal temperature. The original and reconstructed fits for pentacene are shown in Fig. 5.
This approach is not only useful for studying the effect of gate voltage on the activation energy, but also that of the processing conditions such as the choice of solvent, polymer weight, and physical growth temperature, to name a few. This approach could also be extended to other techniques of controlling the extent of energy state filling such as chemical (or vacuum de-doping). Also, we show that the choice of T r does not alter the nature of the function that fits the scaled data (only the coefficients a, b, c vary). We illustrate this in Section S2 of the SI with the help of temperature-dependent four-probe mobility data of a 2L-pentacene OFET 36 , where mobility at T = T r was taken as 70% of µ p . Another salient feature is that the general fit shown in Fig. 3b can also act as a screening technique to determine which OFETs exhibit thermally activated hopping transport, with the help of very few measurements at different temperatures. For example, with mobility measurements at two different temperatures under constant gate voltage, with one at T p and another below T p , the µ′ values of these points can be calculated as µ ′ = µ µ p and subsequently the corresponding θ′ values can be extracted from the plot (alternatively, the exponential equation) of the universal fit (Fig. 3b). From the θ′ values thus obtained, the T r value of the data point below T p can be determined. By virtue of the definition of the scaling relations, upon calculating µ′ and θ′ values at T = T r , we obtain values of 0.5 and 1 for µ′ and θ′, respectively, if the transport in the OFET occurs through thermally activated hopping. Hence, this not only serves as a screening/sanity check for the nature of charge transport but also helps in evaluating the activation energy at the applied gate voltage with the help of the T r value determined thus.
A summary of the different data-fitting approaches is available in Table 2. Figure 5. Comparison between the experimental data (blue symbols) and the reconstructed values from renormalization (red symbols) for pentacene OFET (data extracted from Ref. 19 ).

Conclusions
In this work, we use gate voltage and temperature-dependent OFET data from various small molecules/polymers and show that symbolic regression outperforms physics-inspired fitting techniques such as the Bassler model and the renormalization approach in fitting the OFET data, by one order of magnitude in the corresponding MAE values, although not offering generality achieved through physics-inspired fitting approaches. Through simple graphical value extraction, we demonstrate that the hopping mechanism in OFETs can be described by a general equation that relates normalized mobility with a scaled, dimensionless temperature descriptor. It is challenging to come up with a single model that accurately captures mobility characteristics in both low and high temperature regions; however, the SR outcome and physics-inspired model prove to be better than the baseline in this regard. Although powerful mathematical approaches like symbolic regression may offer reasonably good performance in terms of accuracy of fits, the physical interpretability is often incomplete and not consistent with the established know-how. We also proffer that a similar methodology can be extended to other transport mechanisms wherein the mobility is normalized with respect to the highest observed value and the temperature is then reduced to a dimensionless quantity that encompasses the dependency of the mobility on both the temperature and another experimental variables such as the gate effect or other processing conditions.