Introduction

Epidemiological studies over the last several decades have provided strong evidence that parental life experiences shape offspring development, stimulating new consideration of the factors that underlie intergenerational programming of disease risk and resilience1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16. While these associations are well established, the molecular mechanisms involved remain elusive, especially for paternal transmission. Recent studies focusing on the influence of preconception paternal insults implicate epigenetic processes important for germ cell development and maturation17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46. Advances in high-throughput next-generation sequencing have revealed highly complex populations of RNA in sperm, and in animal studies, small noncoding RNA (sncRNA) populations have emerged as causal agents in the germline transmission of paternal experience44,47,48,49,50,51. Animal studies have provided the strongest evidence that paternal experiences, including various types of stress, drugs, and dietary manipulations, are associated with lasting changes in sperm RNA expression, where changes in specific miRNA and tRNA fragments have often been reported17,18,19,20,21,22,23,24,25,27,28,31,32,36,37,38,40,41,52,53. Further, injection of these RNAs into fertilized zygotes or incubation of sperm with RNA-containing extracellular vesicles, replicated the offspring phenotype, supporting a causal importance of sperm RNA changes17,19,20,21,22,23,24,25.

A growing number of human subject studies also demonstrated sperm sncRNA populations vary in association with diverse environmental exposures or experiences (e.g., smoking, diet, obesity, and stress/trauma)28,39, 42,53,54,55,56,57,58. However, the majority of these studies were constrained to a single timepoint or to a single within-subject comparison. Without time as a factor, it is challenging to determine environmentally-driven impact on sperm RNAs vs. population variance. Before we can begin searching for candidate sperm RNAs associated with or predictive of offspring disease risk (e.g., paternal stress or trauma and child autism or schizophrenia risk), a critical first step must be to develop a framework from a ‘normative’ dataset that includes within- and between-subject comparisons.

In the current studies, our goal was to assess the normative composition and dynamic changes in sperm sncRNA (including miRNA, piRNA, and tRNA) from a cohort of healthy human subjects from repeated monthly collections over 6 months. This repeated measures design allowed us to define both between-subject and within-subject variation in sperm sncRNA content with time as a factor. In addition, as our lab and others have previously demonstrated in animal models that sperm sncRNA are responsive to prior chronic stress experience, we modeled monthly transcriptomic data aligned with prior subject perceived stress state to identify specific sperm RNAs that fit strict criteria for consistent detection within- and between-subjects17,18,21,28,31. Unlike sperm DNA methylation, which is relatively stable over time within and between subjects, specific populations of sperm sncRNA appear to be more dynamic59,60. Our goal is that by providing the field with this comprehensive ‘normative’ dataset modeled with perceived stress, we can begin building a powerful framework to be utilized across cohorts and areas of study, and therefore novel and disease-predictive sncRNAs in human sperm will eventually be identified.

Results

Demographic and clinical characteristics of a normative cohort of males

To establish the baseline characteristics of the sperm sncRNA complement, we recruited men from a relatively homogenous and ‘normative’ population of University of Pennsylvania students. Subjects between the ages of 18 and 25 were screened and excluded for major medical illness, mental health diagnoses, psychotropic medication use, and substance abuse. Following screening and baseline assessments, enrolled subjects returned monthly for 6 visits to donate semen samples for sperm sncRNA analysis. In addition, with each sample donation subjects completed psychological inventories, including the Perceived Stress Scale (PSS) (Fig. 1A,B). The PSS is commonly used to assess perception of stress over the previous month. It is the most commonly used psychological instrument to measure the degree to which situations in a subject’s life are appraised as stressful and taps into how unpredictable, uncontrollable and overloaded respondents experience their lives61. Baseline demographics and results from an Adverse Childhood Experiences (ACE) questionnaire and State-Trait Anxiety Inventory (STAI) demonstrate the final study cohort (N = 17) was relatively homogenous (Table 1). Subjects were between 19–25 years old (mean = 22.8, SD = 1.8), single, and without children. In addition, most subjects had ACE scores of 0, while only one subject had an ACE = 1 and two had an ACE = 2. Subjects scored between 22–39 on the STAI-Trait questionnaire (mean = 30.7, SD = 6.0). Mature sperm was enriched from cryopreserved samples collected from these individuals, then sncRNA was isolated and subject to small RNA sequencing. Sample characteristics are listed in Supplemental Table 1.

Figure 1
figure 1

A normative cohort for within- and between-subject expression characteristics of sperm sncRNA populations associated with variable perceived stress experience across the 6-month study period. (A) To assess the normative composition and dynamic changes in human sperm small non-coding RNA (sncRNA) populations (including miRNA, tRNA, and piRNA), potential subjects were screened (indicated by brown shading of the first visit) to enroll healthy young men (ages 18–25) who completed psychological inventories, including the Perceived Stress Scale (PSS), and donated sperm monthly over 6 months (N = 17 subjects). Mature sperm was enriched from cryopreserved samples collected from these individuals, then sncRNA was isolated and subject to small RNA sequencing, followed by analysis and statistical modeling of sncRNA expression data and PSS scores. (B) On the same day of each monthly sperm collection, subjects also completed the Perceived Stress Scale (PSS) as part of their psychological inventory. The total score of this clinically validated measure of perceived stress score for each month (y-axis) is plotted over time (x-axis) for each subject (indicated by plot numbering). (C) Hypothetical expression plot of a sncRNA for 5 subjects over 6 months to illustrate the within- and between-subject structure and analysis of the data collected in our study. This structure was exploited to develop screening criteria for the identification of potential environmentally responsive ‘dynamic’ sperm sncRNA. These criteria were based on three a priori assumptions. First, we focused on sncRNA with a high degree of within-subject variation across time, assuming variation reflects responses to external stimuli. Second, we assumed that for a ‘dynamic’ sncRNA to be functionally relevant, it should also be expressed at relatively high levels (within the top quartile of expression) during at least one time point over the 6 months for a given subject. Third, we assumed sncRNA meeting these first two criteria (variable and relevant expression level) in multiple subjects within the final cohort were more likely to reflect a conserved response to variation in extrinsic factors in the environment, and therefore meet our final criteria as ‘dynamic’ to be tested in our statistical modeling. In the hypothetical plot provided, potential sncRNA expression patterns shown include: low expression and low variation in Subject 1, relevant high expression but low variation in Subject 2, low expression and high variation in Subject 3, relevant high expression and high variation in Subjects 4 and 5, exhibiting between-subject overlap. (A,C) by Tim Phelps © 2020 JHU AAM. (B) was generated using R (version 3.5.2) and the package ‘ggplot2’ (version 3.2.0) (https://cran.r-project.org/).

Table 1 Baseline demographics and assessments including ACE questionnaire and STAI inventory for study subjects.

Quality control and alignment to the Ensembl ncRNA transcriptome

Mature sperm cells are generally thought to be transcriptionally inert. As a result, the RNA content of mature sperm differs from most cells in several ways. The mature sperm RNA complement is dominated by sncRNA and lacks intact ribosomal RNA. This is likely due to the active degradation of specific pools of RNA or the protection of specific species from a broader degradation. These characteristics present a challenge for efforts to characterize sperm RNA populations, largely driven by a poor understanding of the functional role of sncRNA and a parallel dearth of annotations for sncRNA in standard reference genomes. In an effort to circumvent this lack in annotation, we have instead aligned our sequencing data to specific sncRNA transcriptomes curated to contain known or predicted sncRNA species. Initially, a dataset consisting of reads aligned to the Ensembl ncRNA reference transcriptome was used to establish various quality control criteria to apply across sperm sncRNA transcriptomes of interest (miRNA, piRNA, and tRNA). These included filtering criteria to exclude sncRNA features not consistently present across samples and multivariate analyses to identify samples that would be excluded as outliers for technical reasons. Sequencing libraries were generated from 100 samples collected from 17 subjects. In addition, libraries were generated from a common pool of sperm RNA for each round of library preparation and sequencing to serve as technical replicates to assess potential batch effects. These libraries were sequenced to an average depth of 8.2 × 106 reads. Across samples, 29% of these reads aligned to 37,110 features in the Ensembl ncRNA reference transcriptome (Supplemental Table 1). To balance the detection of low-abundance transcripts against the characterization of transcripts consistently present across sperm samples, we retained transcripts with abundances ≥ 1 CPM in at least 75% of samples. This filtering criterion resulted in a near normal distribution of the log2 expression (log2 CPM) of 11,074 retained features (Supplemental Fig 1). Hierarchical clustering of samples based on ncRNA transcript expression, post-filtering and TMM normalization, identified samples 4-01 and 6-01 as outliers (Supplemental Fig 2A). This was likely driven by the large number of ncRNA features absent from these sample; in sample 4-01 and 6-01, 3695 and 969 features, respectively, had CPM = 0. The sample with the next largest number of absent features was 17-01 with 150; therefore 4-01 and 6-01 were excluded from further analyses. Though subjects were assigned to library prep/sequencing runs in a non-systematic manner, samples from the same subject were generally prepared and sequenced in the same run; therefore, it was important to test for batch effects. The clustering analysis grouped the 3 technical replicate samples into a single exclusive cluster, demonstrating any batch effects were minimal. The two outlier samples and the overlapping technical replicates were also identifiable in a Principal Component Analysis (PCA) when samples are plotted along the two components (PC1 and PC2) that account for the largest proportions of total variance in the dataset (37.3% and 12.2% respectively) (Supplemental Fig 2B).

Figure 2
figure 2

‘Dynamic’ sperm sncRNA populations display characteristic within-subject patterns of expression across time. Graphs represent the expression patterns of sperm sncRNA, the ranked expression (highest to lowest) of each (A) miRNA, (B) tRNA, and (C) piRNA is plotted by time within each subject. The final pool of ‘dynamic’ sncRNA (N = 33 miRNA, N = 17 tRNA, and N = 97 piRNA) are plotted in dark blue, while all other consistently expressed sncRNA are plotted in light blue. This figure generated using R (version 3.5.2) and the package ‘ggplot2’ (version 3.2.0) (https://cran.r-project.org/).

Characterizing the class-specific sncRNA transcriptomes of sperm

There are 3 classes of sncRNA in sperm that are of particular relevance to the goals of the current study, miRNA, tRNA, and piRNA. None of these classes are present in their entirety within the Ensembl ncRNA transcriptome, therefore we aligned our sequencing data separately to more comprehensive class-specific reference transcriptomes. For miRNA, reads were initially aligned to 2657 features in the reference transcriptome obtained from miRbase. After filtering for features consistently present across samples (CPM ≥ 1 in 75% of samples), we identified 254 total miRNA in mature sperm. For tRNA, reads initially aligned to 425 features in the genomic tRNA database reference transcriptome from GtRNAdb. After filtering, we identified 194 total tRNA consistently present in mature sperm. For piRNA, reads initially aligned to 32,827 features in the reference transcriptome obtained from piRbase. After filtering, 837 total piRNA were consistently present in mature sperm. Hierarchical clustering and PCA demonstrate broad clustering of samples from the same subject, suggesting between-subject variation in sperm sncRNA expression is greater than within-subject variation (Supplemental Figs 3, 4, and 5). For miRNA, 70% of samples (N = 69) were grouped in clusters exclusively with samples from the same subject (Supplemental Fig 3A). These clusters ranged in size from 2 to 6 samples. The same was true for 78% of samples (N = 76) based on tRNA expression (Supplemental Fig 4A) and 66% of samples (N = 65) based on piRNA expression (Supplemental Fig 5A). This makes sense biologically, as between-subject variation is likely driven by the combination of differences in genetic background and past life experience, which should be greater than any changes in experience over the course of the 6 monthly collections.

Figure 3
figure 3

Illustrations by Tim Phelps © 2020 JHU AAM.

Individual ‘dynamic’ sperm miRNA and tRNA, but not piRNA, are significantly correlated with perceived stress experience, potentially acting to influence the earliest stages of development to shape long-term health outcomes. (A) Summary of analysis results, including the class-specific numbers of consistently expressed (total) and ‘dynamic’ sperm sncRNA, and the number of ‘dynamic’ sncRNA with statistically significant associations with perceived stress experience. (B) Theoretical model linking a chain of events beginning with an environmental experience or exposure, including stress, impacting sperm sncRNA levels. Acting individually or interacting together as part of a broader sncRNA code, these ‘dynamic’ sncRNA are able to transmit the encoded information regarding the paternal environment to offspring, potentially impacting developmental processes (e.g., rates of fertility, embryo division and implantation). Even if the initial impact of these ‘dynamic’ sperm sncRNA was small (indicated by ⍺1 or ⍺2), consequent shifts in the timing of developmental windows of susceptibility to additional events could produce significant differences over time, either in positive (resilience, indicated in blue) or negative (risk, indicated in red) directions when compared to a typical developmental trajectory (as indicated in black).

To identify the top expressed features, each class of sperm sncRNA was ranked by expression from highest to lowest. The top 50 of each class are displayed in Tables 2, 3, and 4. To examine the stability of these rankings, we looked at how many months individual sncRNA were ranked in the top quartile of expression for each subject. Plotting a histogram of the number of months in which a given sncRNA was expressed in the top quartile of its class demonstrates that sncRNA expressed at a high level in at least 1 month were most likely to be highly expressed in all 6 collections from a subject (Supplemental Figs 6A, 6C, and 6E). The only subjects for which this was not the case were subject 11 for miRNA and piRNA and subject 17 for tRNA. The same is not true for sncRNA expressed at levels in the bottom quartile (Supplemental Figs 6B, 6D, and 6F).

Table 2 Top expressed miRNA.
Table 3 Top expressed tRNA.
Table 4 Top expressed piRNA.

‘Dynamic’ sperm sncRNA

To identify potential environmentally responsive ‘dynamic’ sncRNA, we developed screening criteria based on three a priori assumptions: (1) we assumed ‘dynamic’ sncRNA were likely to exhibit a higher degree of variation in expression over time in response to changes in the environment; (2) given the disparity in the amount of RNA present in sperm relative to ovum, we assumed sncRNA with the potential to impact offspring development would need to be highly expressed; and (3) we assumed sncRNA meeting criteria based on these first two assumptions in multiple subjects were more likely to reflect a conserved functional response to extrinsic factors62,63,64. The analysis we performed to characterize these properties and identify ‘dynamic’ sncRNA is illustrated in Fig. 1C and described in greater detail in the “Methods”. An initial pool of candidates consisted of sncRNA that exhibited within-subject variation (CV expression) ranked in the top quartile of each class of features, while also being expressed in the top quartile in at least one timepoint over the 6 months for a given subject. One hundred seventeen miRNA, 75 tRNA, and 369 piRNA met these criteria in at least one subject. The expression of these initial candidates is plotted for each subject in Supplemental Fig 7. We then asked how many of these initial candidates overlapped between subjects (Supplemental Fig 8). Thirty-three miRNA, 17 tRNA, and 97 piRNA met the within-subject criteria in at least 25% (N = 5) of subjects (count candidate ≥ 5), constituting our final pool of ‘dynamic’ sperm sncRNA. These sncRNA are displayed in Tables 5, 6, and 7 and their expression over time is plotted for each subject in Fig. 2A–C, respectively.

Table 5 'Dynamic' sperm miRNA.
Table 6 'Dynamic' sperm tRNA.
Table 7 'Dynamic' sperm piRNA.

Perceived stress and ‘dynamic’ sperm sncRNA

To assess the potential relationship between perceived stress and individual ‘dynamic’ sncRNA, we conducted a series of linear models to test for relationships between ‘dynamic’ sncRNA expression and PSS scores. Based on data from our lab in a mouse model of paternal stress, we hypothesized that there would be a delay in the impact of stress experience on the expression of ‘dynamic’ sncRNA17. Therefore, we evaluated the following seven relationships for an individual sncRNA’s expression level in a sperm sample and: (1) PSS score at the time it was collected (t), (2) PSS score at the time of the previous collection (t − 1), (3) PSS score at t − 2, (4) PSS score at t − 3, (5) change in PSS score between t and t − 1, (6) change in PSS score between t and t − 2, and (7) change in PSS score between t and t − 3. The interpretation of the first four models was that the level of a sncRNA changes in a direct relationship with PSS score, possibly with a delay (i.e., a rheostat model), whereas the last three model an increase or decrease in a sncRNA relative to the magnitude of a change in PSS score over the specified period (i.e., a change detector). Ultimately, we identified five ‘dynamic’ miRNA with associations to PSS scores that passed our significance cutoffs (p < 0.01 and FDR < 0.2): let-7f-5p, miR-181a-5p, miR-4454, miR-6765-3p, and miR-12136 (Table 8). We identified four ‘dynamic’ tRNA with significant associations to PSS scores: tRNA-Gly-GCC-3-1, tRNA-Lys-CTT-1-1, tRNA-Lys-CTT-2-1, and tRNA-Lys-CTT-4-1 (Table 8). There were no significant associations between ‘dynamic’ piRNA and PSS scores (Fig. 3A).

Table 8 Characteristics of linear fixed effects models assessing the relationship between perceived stress experience and the expression of ‘dynamic’ sncRNA: Table includes model regression coefficients (log2FC), adjusted p-values, and FDR. Significant relationships are highlighted in bold.

Discussion

There is a growing appreciation for the importance of the paternal preconception environment in the developmental programming of offspring1,15,44,48,49,50,51. Though the mechanisms underlying this developmental plasticity are likely adaptive, in the context of human health, these effects may be expressed as changes in disease risk or resilience. Recent work from our lab and others demonstrating direct causal associations between sperm RNA and complex offspring phenotypes have shifted the focus of the investigations into the mechanisms underlying intergenerational transmissions to sperm RNA17,19,20,21,22,23,24,25.

In the early 2000s, Stephen Krawetz and colleagues demonstrated that human sperm contained specific populations of RNA, that the RNA species present were conserved across healthy subjects, and that these RNA were delivered to the ovum, where they played a functional role in early zygote development65,66. Though the initial focus of this work was on protein-coding RNA, these studies also identified sncRNA in human sperm, including miRNA, tRNA, and piRNA, and as our understanding of the functional relevance of sncRNA in cell physiology has advanced, so too has our appreciation for the importance of sncRNA in the function of the germ cell and in regulating the earliest processes of newly fertilized zygotes26,47,62,66,67,68,69,70. A portion of the sncRNA present in sperm may only be remnants of spermatogenic processes, such as the RNA fragmentation products of ribosomal RNA subunits extensively degraded to suppress spurious protein translation in these transcriptionally quiescent cells71. However, it is clear that much of the sncRNA content of sperm is not the product of stochastic processes, but is actively shaped, in part, through interactions with extracellular vesicles (EVs) secreted by somatic cells along the reproductive tract, including the epididymis63,72,73,74.

Mechanistic studies in animal models show an association of changes in germ cell sncRNA with intergenerational transmission17,18,19,20,21,22,23,24,25,27,28,31,32,36,37,38,40,41,52,53. Several labs independently demonstrated that injecting total RNA isolated from sperm exposed to environmental manipulations, specific classes of sncRNA (often differentiated by size), or even specific environmentally-responsive sncRNA into newly fertilized zygotes was sufficient to transmit/phenocopy complex phenotypes in affected offspring17,19,20,21,22,23,24,25. Epidemiological studies suggest that similar processes may link paternal adverse experiences and offspring disease risk, but causal or prospective data are lacking28,39,42,53,54,55,56. Progress in the field has been held back by a lack of critical details regarding many of the necessary factors to design prospective clinical studies and test such hypotheses. There is a primary need to first understand fundamental dynamics of sperm transcriptomics. For example, to differentiate between variation driven by genetics vs. environment (intrinsic vs extrinsic factors), it is necessary to examine sperm content over multiple time points (within- vs between-subject comparisons). Therefore, we have established an extensive dataset describing the dynamics of sperm sncRNA expression over time and across a normative cohort of human subjects. Of course, defining any cohort as ‘normative’ can be problematic. Our selection criteria were developed to recruit a healthy group of males with minimal heterogenous and confounding characteristics. However, age can influence sperm epigenetics, including sncRNA content, and our cohort may not comprise the typical age distribution of males with reproductive intent17,75,76,77,78. In addition, the lived experience of people from different racial or ethnic groups can vary dramatically and are likely to influence the dynamics of specific sperm sncRNA expression. Future studies will need to determine how generalizable our current model is from this initial cohort across the diversity present in the broader population.

As has been previously reported, miRNA comprised a smaller fraction of the total sperm sncRNA pool in our subjects54,56. However, each miRNA can regulate the expression of hundreds of genes, and multiple miRNAs can collaborate in targeting extensive cellular processes and molecular pathways79. For example, we previously demonstrated that injecting a specific pool of 9 stress-sensitive miRNA into newly fertilized mouse zygotes extensively altered the expression of specific target stored maternal mRNA transcripts within 24 h19. In our current study, an average of 0.73% of total reads aligned to mature miRNA (compared to 29% aligning to the Ensembl ncRNA reference transcriptome), and after filtering, we identified a pool of 254 consistently expressed miRNA. We noted that several of the top-expressed miRNA in this pool have known functions in spermatogenesis or the epididymal maturation of sperm, including miR-10a-5p (consistently one of the highest expressed sperm miRNA), miR-30a-5p, and miR-26a-5p (84–86)80,81,82. Others may play important roles in the earliest stages of zygotic development81. For example, miR34c-5p is among the highest expressing sperm miRNA in humans and mice, and is required for the first cleavage division in mouse zygotes68,83,84. These studies suggest functional roles for sperm miRNA in biological processes important to reproduction and that may impact post-fertilization embryo development.

Our experimental design did not include manipulations of human subjects or interventions; instead, it was intended to build an initial framework from a ‘normative’ human subject cohort, including examination within subjects over time and comparisons between subjects. By collecting samples across an extended period, we anticipated exploiting variation in the experiences of participants to screen for environmentally responsive ‘dynamic’ sperm sncRNA. Using screening criteria based on three a priori assumptions, as detailed in the “Methods”, we identified 33 final ‘dynamic’ miRNA—highly expressed and with a sufficient dynamic pattern over time, both within- and between-subjects. In validation of our assumptions and selection criteria, we found that several of the ‘dynamic’ miRNA identified were also previously reported in sperm from rodents and humans, and associated with male environmental perturbations, including chronic stress18,21,28. For example, miR-449a levels were reduced in sperm from adult men who had experienced a high number of adverse childhood events and in the sperm of male mice following chronic stress28. Further, in two independent models, miR-375-3p was elevated in the sperm from adult male mice following prior chronic stress experience that occurred in the postnatal or pubertal/adult period18,21.

tRNA are an abundant class of sncRNA also found in mature sperm73,84,85. In our study, just over 2% of total reads aligned to tRNA, and we identified 194 consistently expressed tRNA. In addition to the role intact tRNA play in protein translation, tRNA-derived RNA fragments (tRFs) regulate gene expression, and in some instances act in concert with cellular machinery already in place for miRNA and piRNA actions86,87,88. In 2016, two studies implicated sperm tRFs in the epigenetic germline inheritance of complex metabolic phenotypes following paternal high fat or low protein dietary exposures in mice23,25. At ~ 75 bp, intact tRNA are significantly longer than either miRNA or piRNA, and also longer than the 36 bp read lengths of our sequencing dataset, and therefore, we were limited in our ability to differentiate between sequence reads derived from intact tRNA and those from tRFs88.

In examining the expression pattern of sperm tRNAs across subjects, it was clear that this class of sperm sncRNA is far less variable overall than either miRNA or piRNA. However, we did identify a subset of 17 sperm tRNA that met our criteria for ‘dynamic’ sncRNA. Focusing on the expression of these ‘dynamic’ tRNA over time, there were clearly subjects who had more variable expression than others, which might correlate with between-subject variation in environmental factors that were not examined in our study. It was also clear that the most ‘dynamic’ tRNA for a given subject tended to be present at lower overall expression levels. Three of the final ‘dynamic’ tRNA we identified, tRNA-Gly-GCC-1-1, tRNA-Gly-GCC-2-1, and tRNA-Gly-GCC-3-1, transport the same amino acid (glycine) and share the same anticodon sequence (i.e., isodecoders). Reduced sperm expression of tRFs derived from two other tRNA-Gly-GCC isodecoders were previously associated with poor embryo quality when those human sperm samples were used for in vitro fertilization, supporting an important biological function for these fragments in sperm85.

Consistent with previous reports, piRNA were the most prevalent class of sperm sncRNA in our dataset in terms of expression abundance54,80. An average of 4.9% of total sequence reads aligned to piRNA across subjects. In addition to comprising the largest proportion of the total sperm sncRNA pool, far more piRNA (837) were consistently expressed across sperm samples. This was not unexpected, as tens of thousands of human piRNA have been annotated89. piRNA are predominately expressed in the germline where they fulfill their canonical role in maintaining genomic stability by repressing repetitive elements; though there is a growing appreciation for the role piRNA play in the post-transcriptional regulation of protein-coding transcripts80,90,91. Interestingly, in D. melanogaster and C. elegans, piRNA play a key role in transgenerational epigenetic inheritance of specific traits92,93. In rodents, changes in sperm piRNA expression were reported in association with male dietary manipulations and early life chronic stress experiences20,21,41.

Of consistently expressed sperm piRNA, 97 met selection criteria and were categorized as ‘dynamic’. Compared to miRNA, little is known about the functional role of individual piRNA, especially in mature sperm. Interestingly, in our study, we noted that when viewed across time, piRNA expression patterns in many subjects appear to display a bi-monthly cycle. Similar but less apparent patterns were also noted in the miRNA plots. Supporting a potential biological relevance, 15 of the 97 ‘dynamic’ piRNA had a cyclical expression pattern in at least 25% of our subjects. Such an expression pattern may reflect cyclical changes in extrinsic environmental factors or an intrinsic rhythm in male fertility not previously described.

Recent studies from our lab and others have demonstrated that the sperm sncRNA content is responsive to prior stress experience17,18,21,28,31. Further, our recent molecular studies in mice identified the specific timing at which previously stressed males were able to transmit a specific phenotype to their offspring17. These studies demonstrated a key finding in the field, that intergenerational transmission may occur, and in some contexts may only occur after an extended period following the cessation of the insult. This was a critical piece of the puzzle for formulating hypotheses in modeling human subject data—when to expect a detectable change to show up in the sperm after a given experience? Therefore, we evaluated a series of statistical models to align our sncRNA expression data with that of a clinical measure of perceived stress. These analyses identified significant associations between the expression of several ‘dynamic’ sncRNA and either the subject’s absolute PSS score, concurrently or at previous timepoints (rheostatic model), or the change in PSS score between timepoints (delta-detector). Five ‘dynamic’ miRNA and four tRNA were determined to be significantly associated with perceived stress experience. Of these, we noted that several had been identified in previous studies as important, including the miRNA, let-7f-5p and miR-181a-5p, and the tRNA, tRNA-Gly-GCC-3-1, tRNA-Lys-CTT-1-1, tRNA-Lys-CTT-2-1, and tRNA-Lys-CTT-4-125,56,94. The miRNA let-7f-5p and tRFs derived from tRNA-Gly-GCC and tRNA-Lys-CTT isodecoders were previously identified as differentially expressed in the sperm of male mice following exposure to a low-protein diet25. In that study, injection of a tRF derived from the 5′ end of tRNA-Gly-GCC depressed the levels of genes highly expressed in preimplantation embryos targeted by the endogenous retroelement MERVL. tRNA-Lys-CTT levels were increased in human sperm following a high sugar diet exposure56.

Of great relevance to our results, a recent study identified significant correlations between levels of prior childhood trauma and adult plasma miR-181a-5p levels and several members of the let-7 family, similar to our results in sperm94. This suggests that these miRNA may be part of an evolutionarily conserved stress-responsive mechanism, conserved across tissues and species. How this would impact post-conception reproduction or embryonic development is unknown, but one of the top predicted gene targets of these miRNA, protogenin (PRTG), could play a role95,96,97. PRTG is involved in regulating early embryonic developmental transitions and trophoblast differentiation, and has been associated with ADHD and measures of cognitive development in multiple studies95,96,98,99. Therefore, changes in levels of sperm let-7f-5p and miR-181a-5p delivered at fertilization could regulate PRTG expression, among other genes, impacting rates of embryo division and implantation. Even if these effects were small, consequent shifts in the timing of developmental windows of susceptibility to environmental signals could produce significant differences in development and health outcomes mapped onto genetic risk (as shown in the schematic in Fig. 3B).

Conclusion

In these studies, we have utilized between- and within-subject sperm sncRNA comparisons and provided an initial framework onto which additional human subject data can be built. These data confirm high expressing common miRNA, tRNA, and piRNA that were dynamic in their pattern of expression over time, likely responsive to a factor in the internal or external environment. Further, using our perceived stress state analyses, we were able to identify miRNA and tRNA that fit strict modeling criteria for changing their expression levels following a previous perceived high stress state. Much work remains to be done in this field, but these data provide a powerful starting point. From here, it is conceivable that one day we may be able to map onto such a normative data set the sncRNA expression of fathers of children with neurodevelopmental disorders or men with traumatic experiences, such as returning from military service, and be able to identify biomarkers predictive of offspring developmental risk and resilience factors.

Methods and materials

Subject recruitment

A cohort of 18 healthy males was recruited from the University of Pennsylvania student body to establish normative sperm molecular signatures as a benchmark for comparison to later clinical populations. The study was approved by the Perelman School of Medicine at the University of Pennsylvania Institutional Review Board, all participants provided written informed consent, and all research was performed in accordance with relevant guidelines. Subjects between the ages of 18 and 25 were screened for history of major medical illnesses, mental health diagnoses, and substance abuse. Grounds for exclusion included: (1) history (participant self-report) of major medical illnesses or other current medical conditions that the physician investigator deemed as contraindicated for study participation; (2) regular or recreational use of any psychotropic medication (e.g., antidepressants, antipsychotics, psychostimulants or anxiolytics), as per self-report, (3) recent (within previous year) diagnosis (per MINI International Neuropsychiatric Interview) or treatment for any psychiatric disorder or substance use disorder (previous 2 years), (4) lifetime history of schizophrenia or other psychotic disorder, substance addiction disorder (excepting nicotine), (5) current use of any tobacco products, determined by urine cotinine level; and (6) positive drug screen for any substance, determined by urine drug screen at screening100.

Study procedures

The study involved a total of 7 visits. The first visit was a screening visit to determine participant eligibility. The following 6 visits were sperm collection visits. During the screening visit, subjects underwent an in-office assessment including a urine toxicology screen, urine cotinine screen, and clinical assessments, including the Adverse Childhood Experiences (ACE) questionnaire, and the MINI International Neuropsychiatric Interview100,101. Subsequent visits (2–7) took place once a month for 6 months. At these visits, subjects submitted a semen sample, collected at home within the previous hour, to experienced andrologists at Penn Fertility Care clinic for processing and sample cryopreservation. Participants were asked to abstain from ejaculation for 48 h prior to semen collection. Within the same day, participants also completed a series of questionnaires to assess stress and anxiety experienced over the previous month, including the Perceived Stress Scale (PSS) and the Spielberger State Trait Anxiety Inventory (STAI)61,102. The PSS is the most commonly used instrument to assess perception of stress over the previous month61. It measures the degree to which situations in a subject’s life are appraised as stressful. The instrument taps into how unpredictable, uncontrollable and overloaded respondents experience their lives. The 10-item scale includes a number of direct queries about current levels of experienced stress during the last month. One participant did not return for their final donation, therefore only timepoints 1–5 were available for subject 11. In addition, one subject was excluded due to consistently low sperm quality across donated samples, leaving a final study cohort of 17 subjects for sperm sncRNA analysis.

Sperm sncRNA sequencing

Procedures for the isolation of small RNA from mature sperm were adapted from a published protocol103. Briefly, cryopreserved sperm samples were thawed, suspended in PureSperm Buffer (Nidacon), then mature sperm were enriched by centrifugation (300g, 15 min) through a 50% PureSperm density gradient (Nidacon). Sperm were then lysed in TRIzol-LS (Thermo Fisher) reagent, supplemented with 0.2 M β-mercaptoethanol and 100 mg of nuclease-free stainless-steel beads, by homogenization on a Disruptor Genie (Scientific Industries) at 3000 rpm for 5 min. RNA, enriched for small RNA, was isolated using Qiagen’s miRNeasy Mini kit according to manufacturer’s instructions. RNA concentration and quality were assessed using Agilent’s small RNA chips run on a Bioanalyzer 2100 (Agilent Technologies). The small RNA content of sperm samples was analyzed by small RNA sequencing. Libraries were constructed using the TruSeq small RNA Library Prep Kit (Illumina). RNA input for library preparation was standardized to 10 ng of small RNA, in accordance with the manufacturer’s protocol. Post-PCR cleanup and size selection for products > 100 bp was performed using AMPure XP bead purification. Library size distribution and quantification was performed on a TapeStation 4200 (Agilent) using their High Sensitivity D1000 screentape. Individually barcoded libraries were pooled to achieve ~ 10 million reads per sample and sequenced on an Illumina NextSeq 550 (36-bp single-end).

Bioinformatic analysis pipeline

The small non-coding RNA sequencing (sncRNASeq) analytical pipeline was designed using the snakemake framework and is available via GitHub at https://github.com/acshetty/sncRNA-seq-analysis104. Reference transcriptome sequences in FASTA format were downloaded from public repositories. These included the GRCh38 ncRNA reference from Ensembl, the miRNA reference from miRbase 21, the tRNA reference generated from GRCh37 by GtRNAdb 2.0, and the piRNA reference from piRBase v1.0105,106,107. Using ‘index_ref’ component, each reference FASTA file was indexed using the ‘bowtie-build’ from the Bowtie short read aligner software108. The sequencing reads have lengths longer than the average size of most sncRNAs which may result in the inclusion of adapter sequence at the 3′-end of the read sequence. Hence, the ‘trim_fq’ component was invoked in order to remove trailing adapter sequence using the Trimmomatic tool109. After trimming, reads shorter than 15 nucleotides were discarded before downstream analyses. The trimmed reads for each sample were then aligned, using the ‘align_reads’ component, to each of the different sncRNA class-specific reference transcriptomes using the Bowtie short read aligner108. Reads were aligned allowing for 2 mismatches and a seed length of 15 nucleotides. The alignment statistics were summarized for each sample across each of the different reference sequences using the ‘merge_alignment_statistics’ component. The raw expression values were computed using the ‘compute_expr’ component based on the number of reads aligned to each of the sncRNAs specified in their respective reference files. For each type of sncRNA, the raw expression values were merged across samples using the ‘merge_expr’ component to generate a count matrix for downstream analysis.

Characterizing the dynamics of sperm sncRNA expression

Expression of sncRNA were adjusted for differences in library sequencing depth to generate counts per million reads (CPM) following TMM normalization using the Bioconductor package ‘edgeR’ (version 3.24.3)110. Features were retained for analysis if they were expressed at ≥ 1 CPM in at least 75% of samples. sncRNA aligning to each reference transcriptome were analyzed separately. Analyses were performed at the level of the individual sample, within each subject, and between subjects. At the sample level, the expression (CPM) of each feature (i.e. miRNA, tRNA, or piRNA) was ranked from highest to lowest and categorized if expressed in the top or bottom quartiles. These expression values were used to perform analyses of expression and variation at the within-subject level. Within-subject measures of expression include average expression, average ranked expression, and the number of collections a feature was expressed in the top or bottom quartile. Within-subject measures of variation included the coefficient of variation (CV) of a feature’s expression across collections, which was used to rank features from most to least variable, then categorize features if in the top or bottom quartiles of within-subject variation. Between-subject measures of expression (average expression and average rank expression) and variation [average CV expression (within-subject) and average rank CV expression (within-subject)] were calculated for each sncRNA from mean within-subject expression levels summarized across subjects.

‘Dynamic’ sperm sncRNA

To identify potential environmentally responsive ‘dynamic’ sncRNA, we exploited the within- and between-subjects structure of the sperm sncRNA expression data to developed screening criteria based on three a priori assumptions: (1) we assumed ‘dynamic’ sncRNA were likely to exhibit a higher degree of variation in expression over time in response to changes in the environment; (2) given the disparity in the amount of RNA present in sperm relative to ovum, we assumed sncRNA with the potential to impact offspring development would need to be highly expressed; and (3) we assumed sncRNA meeting criteria based on these first two assumptions in multiple subjects were more likely to reflect a conserved functional response to extrinsic factors62,63,64. The analysis we performed to characterize these properties and identify ‘dynamic’ sncRNA is illustrated in Fig. 1C. The first two criteria were based on within-subject expression characteristics. Features (sncRNA) were categorized as candidates in a given subject if: (1) a feature exhibited within-subject variation (CV expression) ranked in the top quartile of each class of sncRNA, and (2) the feature’s expression was ranked in the top quartile in ≥ 1 collection over the 6 months. (3) The final criteria required that a ‘dynamic’ sncRNA met the first two within-subject criteria in at least 25% (N = 5) of subjects (count candidate ≥ 5).

Identifying ‘dynamic’ sperm sncRNA responsive to perceived stress experience

The relationship between normalized expression counts for the subsets of ‘dynamic’ sperm sncRNA and PSS scores were analyzed by implementing linear fixed effects models using the Bioconductor package ‘limma’ (version 3.38.3)111. We implemented 7 different models to evaluate the relationship between the expression of individual ‘dynamic’ sncRNA and PSS scores, testing the following associations: (1) miRNA ~ PSS score at the time of collection (t); (2) miRNA ~ PSS score at collection t − 1; (3) miRNA ~ PSS score at collection t − 2; (4) miRNA ~ PSS score at collection t − 3; (5) miRNA ~ (change in PSS score between t and t − 1); (6) miRNA ~ (change in PSS score between t and t − 2); (7) miRNA ~ (change in PSS score between t and t − 3). The first four of these models tested for the relationship between expression of a miRNA and PSS scores using a ‘rheostat’ approach allowing for a delayed response. The final three models tested for the relationship between miRNA expression and the change in PSS scores using a ‘delta detector’ approach. The final results were tabulated and filtered for a p-value < 0.01 and FDR < 0.2 to detect significant associations.

Software used for data analyses

Sperm sncRNA sequencing data was trimmed, aligned, and counted using the small non-coding RNA sequencing (sncRNASeq) analytical pipeline, which is available in its entirety via GitHub at https://github.com/acshetty/sncRNA-seq-analysis. All other data processing, visualization, and statistical modeling were performed in the R software environment (version 3.5.3)112. Expression normalization was performed using the Bioconductor package ‘edgeR’ (version 3.24.3)110. Multivariate analyses were performed with the base R package ‘stats’ (version 3.5.3)112. Most data visualizations were generated using ‘ggplot2’ (version 3.2.0)113. Linear modeling of the relationships between ‘dynamic’ sncRNA expression and PSS scores was performed using the Bioconductor package ‘limma’ (version 3.38.3)111.