Background & Summary

Skin is the largest organ in the human body and functions primarily to protect the body from external factors. Due to the key role skin plays in safeguarding the body, understanding how chemicals penetrate has applications across multiple disciplines. Most notably, chemical penetration of human skin has significance with regard to determining the risk and toxicity of environmental and occupational hazards as well as the efficacy of topical pharmaceuticals. With the knowledge of skin absorption growing in importance for various fields, the need for a database comprised of chemicals and their dermal absorption parameters, such as permeability and diffusion coefficients also grows. In vivo experimentation often requires large investments of time and money and may involve ethical issues; for this reason In vivo experimentation is not always plausible. In silico tools require toxicokinetic datasets to be able to simulate a wide variety of chemicals. “High throughput toxicokinetics”, or httk1, and other models will benefit from having open access datasets with physiochemical parameters and diffusion descriptors. Total accumulation over time can be used to quantify dermal absorption parameters, such as permeability and as such, multi-linear regression techniques have also been applied to dermal permeability datasets to obtain QSAR (Quantitative Structural Activity Relationships) equations for different exposure scenarios2,3,4.

Earlier dermal absorption in vitro experiments separated the upper layers of skin (largely referred to as the whole epidermis) to quantify dermal absorption parameters. The assumption that the uppermost skin layers offered the highest resistance to absorption motivated the experimental choice for epidermis use5. The number of layers included in the in vitro experiments has varied over time. Since the highest barrier to dermal penetration has been thought to be in the upper layers (the stratum corneum and the viable epidermis), earlier in vitro experiments included only these layers. However, inclusion of a partial dermis has become common practice when using the data from dermatomed experiments. Unlike the epidermis, which includes a lipid barrier in the stratum corneum (typically modeled by a “brick and mortar” structure6,7), the dermis is an aqueous barrier that contains collagen and plasma proteins contributing to binding, and facilitates capillary transport8 For this work, data was compiled across the epidermis, stratum corneum, and dermis in order to create a unique and accessible database.

An earlier skin database (HuskinDB) has been published in Scientific Data, but it is limited to inclusion of permeability coefficients only3. Our work added diffusion and partition coefficients for each layer. The permeability coefficients (kp) can be related to the diffusion coefficients (D) and partition coefficients (P) using the ideal membrane equation when the layer depth (l) is known9: \({k}_{p}=\frac{P\ast D}{l}\). Further, our database also includes chemical descriptors for each identified chemical, adding the ability to explore QSAR models such as the Potts-Guy model.

Our created database contains publicly available experimental data that were collected from multiple sources. Experimental permeability and diffusion coefficient values10,11,12 along with chemical descriptors13,14,15 were collected and included in the database. In addition to compiling diffusion and permeability coefficients across three layers, another valuable aspect of this database is the focus on chemical features which includes those that are indicative of volatility such as melting point and vapor pressure. Volatility was not explored in the Potts-Guy Equation16, which is often cited when discussing skin permeability. Including these features is unique, as volatility is largely unexplored in regards to dermal absorption. A major application of the database is to use it in connection with mathematical modeling to quantify and predict permeability, partition, and diffusion coefficients.


The data were compiled from the literature and began with 50 cosmetic chemicals from one source that were measured for penetration in the skin under a standardized protocol in aqueous buffers10. The database was further expanded to include non-volatile chemicals11 and hydrocortisones12 for a total of 73 distinct chemicals that are identifiable by name, CAS (Chemical Abstracts Service) number, DSSTox (Distributed Structure-Searchable Toxicity) Substance ID, and SMILES (Simplified Molecular Input Line Entry System).

Source identification

Identification of a potential data source from the open literature was a key step for the development of this database. PubMed and GoogleScholar were used as primary search engines. Query phrases used included “human dermal absorption”, “aqueous vehicles”, “in vitro measurements”, and “epidermis, SC, and dermis”.

The search was limited to publication between the years 1970 and 2022; details for the experimentation leading to data collection was required to be provided within the publication itself. Once a manuscript was identified, a researcher read the paper and decided if the data reported met the selection criteria. The criteria specified were: human skin, in vitro experiments, aqueous vehicle, and known dose. The researcher determined if the data published could be used for the database. The publications included in this database evaluated drug permeation utilizing Franz diffusion cells with human skin plugs that were removed during surgery. This, however, is a criteria that was not determined a priori.

Recent publications typically include a table or electronic dataset reporting the values. Older manuscripts had their data entered by hand and curated by two separate individuals followed by a verification by a third. In all cases, the data were regarded as valid as reported. Only unit conversions were performed by the researcher to ensure that all data in this database had consistent units.

Data content

The following criteria were considered prior to including data from a publication:

  • The publication was publicly accessible

  • The primary source of data were the publication or the associated excel file

  • The units were included or able to be determined from the publication’s text

  • The permeability coefficient (kp) and/or diffusion coefficient was included with specifications of the layer(s) or could be calculated from other data

  • Any chemical vehicle(s), in addition to the aqueous buffer, were identified.

As as result of this criteria, the three sources of permeability and diffusion coefficients used in this database are Ellison et al.10, Krestos et al.11, and Anderson et al.12. Experimentation is detailed in the corresponding publications and was reviewed by all researchers to ensure all necessary criteria was met.

In order to provide a consistent set of chemical descriptors, features not included with the experimental data were pulled from the EPA’s CompTox Chemicals Dashboard13 as well as the PaDEL-Descriptor15 and PubMed14 to allow for a more robust set of features, including structural information as well as the highlighted features below:

  • Molecular Weight (MW)

  • Vapor Pressure

  • Index of Refraction

  • Molar Refractivity

  • Henry’s Constant

  • Polarizability

  • Surface Tension

  • Molar Volume

  • Boiling Point

  • Melting Point

  • Water Saturation (Sw)

  • Octanol-Water Partition Coefficient (\(\log P\))

  • Bioconcentration Factor

  • Biodegradation Half Life

  • Michaelis constant (Km)

  • Atmospheric Hydroxylation Rate

  • Water Solubility

  • Density

  • Flash Point

  • Soil Adsorption Coefficient

Some features were reported more than once in the event of a unit conversion such as kp, which is reported in both centimeters/hour and centimeters/second. In the event that the data were unavailable for a specific chemical, the entry was left blank and that chemical was not included in any analysis of that feature. A list of features and the corresponding units, excluding some features that are dimensionless, can be found in Tables 14.

Table 1 Units and approximate ranges of various features.
Table 2 Units and approximate ranges of various features in the epidermis.
Table 3 Units and approximate ranges of various features in the stratum corneum.
Table 4 Units and approximate ranges of various features in the dermis.

Data usage and calculations

Figure 1 shows the distributions for molecular weight (Fig. 1a) and \(\log P\) (Fig. 1b). The values for molecular weights for the chemicals fall between 18 g/mol and 519 g/mol whereas the values for \(\log P\) fall between -3 and 5. In Fig. 2, molecular weight is plotted against the values for \(\log {k}_{p}\) in the dermis (Fig. 2a) and all layers of the skin (Fig. 2b). While certain subsets of the data may show a trend, the data overall do not indicate a correlation between \(\log {k}_{p}\) and molecular weight. Similarly in Fig. 3, the diffusion coefficients are plotted against the molecular weights.

Fig. 1
figure 1

(a) Distribution of molecular weight of all chemicals, (b) distribution of \(\log P\) values for all chemicals.

Fig. 2
figure 2

Molecular weight versus the value of \(\log {k}_{p}\) of all chemicals in (a) the dermis and (b) all layers.

Fig. 3
figure 3

Molecular weight versus the diffusion coefficients for (a) all chemicals in the dermis (b) non-volatile chemicals in the dermis, (c) all chemicals in all layers.

The relationship between the diffusion coefficients (cm2/s) in the dermis and molecular weight (g/mol) for non-volatile chemicals shown in Fig. 3b indicates a negative correlation between the two. It is important to note, however, that the figure only includes a small subset of the chemicals in a single layer of skin. The other plots in Fig. 3, and the data in this compiled database, show no significant correlation between the molecular weights and diffusion coefficients despite the common assumption that larger chemicals would have lower diffusion coefficients. This lack of correlation further supports the need for a robust database with various features that may contribute to QSAR models in varying degrees.

Dermal permeability (kp) is probably the most common parameter used to estimate dermal penetration and net absorption. Using ideal membrane theory, the diffusion constant is directly proportional to permeability, although modified by partitioning and membrane depth. The Potts-Guy correlation equation16 describe a direct relationship between \(\log {k}_{p}\) and \(\log P\), particularly for the epidermis (consisting of the stratum corneum and viable epidermis) skin barrier. Figure 4 summarizes individual correlations between \(\log {k}_{p}\), MW, and \(\log P\) for different layers.

Fig. 4
figure 4

Scatter plot matrices for molecular weight, \(\log P\), and \(\log {k}_{p}\) values in all layers.

The main aim of this dermal database was to aid in the development of mathematical models and computer simulations such that more information can be learned and extrapolated regarding how chemicals diffuse and permeate within the skin’s layers. The example chosen for this paper is the mechanistic modeling of dermis diffusion coefficient since the diffusion constant is rarely included in QSAR models.

Example applications for dermis layer

Since the dermis contains plasma proteins, an additional descriptor included was fraction unbound in the plasma (fu). The dermis diffusion constant was calculated using the diffusion equation presented by Chen et al.17. Predictions were compared to experimental values obtained from Hewitt et al. and Kretsos et al.10,11 Fig. 5 presents the results for the Kretsos dataset containing the 13 chemicals.

Fig. 5
figure 5

Predicted diffusion coefficient vs experimental values collected in dermis. Chemical descriptors used: MW, \(\log P\), and fu (fraction unbound).

Data Records

The database is deposited on the Dryad Digital Repository as a series of Microsoft Excel files prepared to be used in coding18. It is presented as individual files for each layer (epidermis, stratum corneum, dermis) and chemical type (fragrance related, non-volatile, hydrocortisone). Additional spreadsheets containing all information, the chemical descriptors, and time course data are also included along with a notated and color-coded file which is condensed and not recommended for coding.

Data Validation

The collection of experimental data was collected from its corresponding publication10,11,12 and the additional features were collected from the EPA CompTox Chemicals Dashboard (version 2.2.0)13, Padel-descriptor15, as well as additional literature14,19. The database was curated by a team of two and reviewed by an additional team member in order to ensure that the data were accurately reported with correct units. The dermal absorption coefficients were collected from peer-reviewed publications and included in the database, taking into account any additional supplementary materials and corrections.

Technical Validation

Chemicals identifiers were used as reported in the open literature. Many publications used CAS numbers to identify the chemical. If a chemical name was given without CAS number, the US EPA Dashboard was used to obtain unique identifiers for each chemical (CAS number and DSSTox ID). The Dashboard has a synonym function designed specifically to identify chemicals by different names. PubChem was also used to further confirm a chemical’s identity and corresponding DSSTox ID. Agreement between these different sources ensured that the chemicals were correctly identified and are provided for the users’ convenience.

To obtain the fraction unbound needed for the dermis calculations, OPERA (version 2.8)20 was used. The fraction unbound predictions from OPERA are also included in the EPA CompTox Chemicals Dashboard. However, extracting multiple values for different chemicals is easily done in the original software. A function within the US EPA HTTK package can also be used to download multiple values for fraction unbound if desired. The fraction unbound was only needed in the dermis, since this layer contains plasma proteins that exhibit binding and affect overall absorption into the dermis. Because the dermis is important for capillary absorption into the blood stream, the additional descriptors were a valuable addition.

Usage Notes

The database is built in order to be easily integrated into coding, particularly with R Studio21. The spreadsheet is formatted such that it can be used as a whole and simultaneously functions as separated databases for each layer and subset of chemicals.