Transcript- and protein-level analyses of the response of human eosinophils to glucocorticoids

Glucocorticoids are first-line agents for the treatment of many eosinophil-associated disorders; however, their effects on human eosinophils remain poorly understood. To gain an unbiased, genome-wide view of the early transcriptional effects of glucocorticoids on human eosinophils in vivo, RNA sequencing was performed on purified blood eosinophils obtained before and 30, 60, and 120 minutes after administration of a single dose of oral prednisone (1 mg/kg) to three unrelated healthy subjects with hypereosinophilia of unknown significance. The resulting dataset is of high quality and suitable for differential expression analysis. Flow cytometry and qPCR were then performed on three additional cohorts of human subjects, to validate the key findings at the transcript and protein levels. The resulting datasets provide a resource for understanding the response of circulating human eosinophils to glucocorticoid administration.

Glucocorticoids are first-line agents for the treatment of many eosinophil-associated disorders; however, their effects on human eosinophils remain poorly understood. To gain an unbiased, genome-wide view of the early transcriptional effects of glucocorticoids on human eosinophils in vivo, RNA sequencing was performed on purified blood eosinophils obtained before and 30, 60, and 120 minutes after administration of a single dose of oral prednisone (1 mg/kg) to three unrelated healthy subjects with hypereosinophilia of unknown significance. The resulting dataset is of high quality and suitable for differential expression analysis. Flow cytometry and qPCR were then performed on three additional cohorts of human subjects, to validate the key findings at the transcript and protein levels. The resulting datasets provide a resource for understanding the response of circulating human eosinophils to glucocorticoid administration.

Background & Summary
Glucocorticoids effectively suppress eosinophilia and its clinical manifestations, and they are first-line agents in a variety of eosinophil-associated disorders, including hypereosinophilic syndromes 1 and eosinophilic granulomatosis with polyangiitis 2 . Although commonly used to treat these and other disorders, glucocorticoids have non-specific effects and their use is associated with significant toxicity. Glucocorticoids act primarily through the induction of changes in gene expression. The glucocorticoid receptor is a transcription factor that binds glucocorticoids in the cytoplasm, after which the complex translocates to the nucleus. The ligand-bound glucocorticoid receptor complex then binds genomic DNA directly, or indirectly via tethered interactions with other proteins 3,4 . The glucocorticoid receptor complex also interferes with the activity of other transcription factors, notably NF-κB and AP-1 4 .
Although the molecular biology of glucocorticoid receptor signaling has been the subject of intensive study, the specific mechanisms responsible for the clinically beneficial actions of glucocorticoids in different cell types and disease states remain poorly understood 5 . In the case of eosinophils, one of the difficulties in studying the glucocorticoid response in vivo is the fact that systemic administration of glucocorticoids consistently leads to a transient but profound drop in the number of circulating eosinophils 6,7 . Consequently, serial sampling of sufficient numbers of eosinophils after glucocorticoid administration in healthy volunteers with normal eosinophil counts (o500/μL) is problematic. Although patients with hypereosinophilic syndromes (HES) have adequate numbers of blood eosinophils for study, glucocorticoid responses in these patients vary considerably and may not reflect normal pathways. Hypereosinophilia of unknown significance (HE US ) is a rare trait in humans, defined by persistently elevated levels of blood eosinophils (≥1,500/μL for more than 5 years) 8 and no evidence of clinical manifestations or organ system involvement attributable to the eosinophilia. Moreover, the eosinophil response to glucocorticoids appears to be normal in these subjects. To study the early transcriptional response of human eosinophils to in vivo administration of a glucocorticoid, we studied three unrelated subjects with HE US (cohort 1). A single 1 mg/kg oral dose of the glucocorticoid prednisone was administered to each subject. This is a dose of prednisone commonly used in clinical practice. Eosinophils were successfully isolated from peripheral blood drawn prior to and at 30, 60, and 120 min after glucocorticoid administration. Total RNA was extracted from the isolated eosinophils without further in vitro manipulation, and high-throughput sequencing (RNA-seq) was performed. Each experimental stage was subjected to thorough and rigorous technical design and validation, leading to the generation of a high-quality RNA-seq dataset that is suitable for differential expression analysis. An initial analysis of the RNA-seq data revealed the induction of a pro-apoptotic transcriptional program and differential expression of key genes related to leukocyte migration 9 . To further evaluate these two observations, we performed three additional sets of experiments: 1. In vitro assessment of glucocorticoid-induced changes in gene expression for key eosinophil migration and apoptosis genes (cohort 2). We sampled circulating eosinophils from five donors: four unrelated donors with normal eosinophil counts and one subject with HE US . We then exposed the cells in vitro to the glucocorticoid dexamethasone (or vehicle, as a negative control) for 30, 60 and 120 min. By qPCR, we evaluated the transcriptional response of three key genes implicated in eosinophil migration (CXCR4, CCR1, CCR3) and seven genes involved in eosinophil apoptosis (BCL2L11, XIAP, CASP9, PAK1, TNFAIP3, NOTCH1, ZBTB16). As a reference gene, we measured expression of the gene encoding the 18S subunit of ribosomal RNA. As a positive control, we measured expression of TSC22D3, which is known to increase in expression in response to glucocorticoids in multiple cell types. 2. In vivo assessment of apoptosis and cell viability in human eosinophils after glucocorticoid administration (cohort 3). To evaluate whether the induction of a pro-apoptotic transcriptional program led to eosinophil apoptosis in vivo prior to the egress of eosinophils from the peripheral circulation, which occurs between 60 and 120 minutes (min) after glucocorticoid administration 9 , we studied a third cohort, comprised of three subjects: two unrelated donors with normal eosinophil counts received a single dose of IV methylprednisolone 250 mg and one patient with HES received a single dose of oral prednisone 1 mg/kg. We then performed flow cytometry for Annexin V and 7aminoactinomycin D (7-AAD) on circulating eosinophils sampled before and 120 minutes after glucocorticoid administration. 3. In vitro assessment of surface expression of key eosinophil migration proteins after glucocorticoid exposure (cohort 4). To assess whether the observation of transcript-level changes on key eosinophil migration genes led to changes in protein abundance at the cell surface within the time frame of glucocorticoid-induced eosinopenia, we studied a fourth cohort, comprised of six unrelated donors with normal eosinophil counts. Peripheral blood leukocytes from each subject were exposed in vitro to the glucocorticoid methylprednisolone (or vehicle, as a negative control). We then performed flow cytometry, gating on eosinophils, to evaluate changes in surface expression of the proteins CXCR4, CCR1 and CCR3. Figure 1 summarizes the study design. We anticipate that the RNA-seq, qPCR, and flow cytometry datasets described here can be used for hypothesis generation in studies of the baseline state of circulating human eosinophils and in studies of the mechanisms behind glucocorticoid-induced eosinopenia and glucocorticoid resistance.

Methods
These methods are expanded versions of descriptions in our related work 9 .

Human subjects
Patients with HE US were enrolled under NIH protocol NCT00001406. Patients with HES were enrolled under NIH protocol NCT01524536. Normal donors (ND) were enrolled under NIH protocols NCT02798523, NCT00001846, and NCT000090662. The Institutional Review Board of the National Figure 1. Study design. In cohort 1, three subjects with HE US received a single weight-based dose of prednisone (1 mg/kg). Peripheral blood was collected pre-and post-treatment at 30, 60, and 120 min. Eosinophils were separately isolated from each sample. Eosinophil purity was measured by cytospin preparation stained with eosin and methylene blue. RNA was extracted from isolated eosinophils without further in vitro manipulation. Sequencing libraries were separately prepared for each sample, and subjected to RNA-seq. In cohort 2, circulating eosinophils were isolated from five additional unrelated subjects (four donors with normal eosinophil counts and one subject with HE US ). Purified eosinophils from each subject were cultured with media or 5 μM dexamethasone for 30, 60 and 120 min. RNA was extracted from purified eosinophils without further in vitro manipulation and subjected to qPCR. In cohort 3, three unrelated subjects, (two donors with normal eosinophil counts and one patient with HES) were studied. Donors with normal eosinophil counts received a single dose of 250 mg of IV methylprednisolone and the patient with HES received a single weight-based dose of 1 mg/kg of oral prednisone. Peripheral blood was collected pre-and post-treatment at 120 min. Purified eosinophils were stained for apoptosis and cell viability with Annexin-V and 7-AAD, respectively, and analyzed by flow cytometry. In cohort 4, peripheral blood leukocytes (PBL) were isolated from whole blood collected from six unrelated donors with normal eosinophil counts. PBL were cultured with vehicle, 20 μM methylprednisolone or 200 μM methylprednisolone for 120 min. Surface expression of CXCR4, CCR1, and CCR3 was assessed by flow cytometry. Institute of Allergy and Infectious Diseases at the National Institutes of Health approved each protocol. Informed consent was obtained from each subject prior to enrollment. Demographic information for each subject is provided in Table 1. For the purpose of these studies, a normal donor was defined as an individual without a history of severe allergic reaction to glucocorticoids, autoimmune or autoinflammatory diseases, active solid or hematologic malignancy, diabetes mellitus, cancer chemotherapy within the previous 5 years, surgery within the previous 8 weeks, history of a recent infection (within the previous 30 days), a positive test for human immunodeficiency virus, hepatitis A, B or C virus infection, a history of parasitic, amebic, fungal or mycobacterial infections or other possible latent infections, a history of a bleeding disorder, vaccination within the previous 30 days, a body mass index (BMI) below 18 or above 35, pregnancy, or breastfeeding. Volunteers were not included in the study if they had taken any of the following in the 30 days prior to the screening visit: a glucocorticoid (including topical or inhaled), a nonsteroidal anti-inflammatory drug (including aspirin and selective COX-2 inhibitors), an anti-epileptic drug, an anticoagulant, a statin, a selective serotonin reuptake inhibitor, a macrolide, an azole, diltiazem, troglitazone, rifabutin, ranitidine, rifampin, quinine, quinidine, cyclosporine, amiodarone, St. John's wort, immunosuppressive or immunomodulatory drugs. Baseline studies included a complete blood count, electrolytes, liver function tests, an interferon gamma release assay for latent tuberculosis infection, and an electrocardiogram. Values outside of the NIH Department of Laboratory Medicine normal reference range and deemed clinically significant by the principal investigator, or any condition that, in the investigator's opinion, may put the participant at undue risk, were also used as exclusion criteria.

Blood collection
Peripheral blood was collected in Vacutainer EDTA blood collection tubes (Becton Dickinson, Cat No. 366643). For cohort 1, 10 cc peripheral blood was collected at each time point. For cohort 2, 100 cc peripheral blood was collected from normal donors and 60 cc peripheral blood was collected from the HE US patient. For cohort 3, 30 cc of peripheral blood was collected at each time point from normal donors and 20 cc peripheral blood was collected at each time point from the HES patient. For cohort 4, 30 cc of peripheral blood was collected.

Eosinophil purification and documentation of purity
At each time point, eosinophils were immediately purified from whole blood by negative-selection immunomagnetic purification with the MACSxpress Eosinophil Isolation Kit followed by the removal of residual erythrocytes with the Erythrocyte Depletion Kit (Miltenyi Biotec, Cat. Nos 130-104-446 and 130-098-196, respectively). The eosinophil fraction was counted, and eosinophil purity was determined by counting a minimum of 300 cells on a cytospin preparation stained with eosin and methylene blue (Kwik-Diff Solution, Thermo Fisher, Cat. No. 9990700). Eosinophil counts from whole-blood samples were obtained with a Siemens ADVIA 120 hematology system.

RNA extraction for RNA-seq
Eosinophils were isolated from the peripheral blood of each of the three subjects in cohort 1, as described above. Purified eosinophils were then centrifuged at 170 × g for 10 min at 4°C. The resulting pellets were resuspended and homogenized in 500 μL of TRIzol reagent (Thermo Fisher, Cat. No. 15596018) and stored at −80°C. On the day of RNA extraction, the samples were thawed and allowed to return to room temperature (RT). For every mL of TRIzol, 0.1 mL of the phase separation reagent 1-bromo-3chloropropane (BCP) (Molecular Research Center, Inc. Cat. No. BP151) was added. The samples were then homogenized by vigorously shaking to an emulsion followed by mixing at RT in an Eppendorf Thermomixer 5436 (Millipore Sigma, Cat. No. Z368164) at 1200 rpm for 15 min. Samples were then centrifuged at 12,000 × g for 15 min at 4°C to separate the phases. The RNA, which is in the upper aqueous phase, was transferred to a new microcentrifuge tube and kept on ice. One back-extraction was performed by adding RNase-free water (one-half of the initial TRIzol volume) to the organic phase, mixing and centrifuging as above. The back-extracted aqueous phase was recovered and pooled with the initial aqueous phase. One volume of 100% ethanol was added to each sample, to precipitate the RNA. This was followed by column-based RNA purification with the RNA Clean & Concentrator-5 kit (Zymo Research; Cat. No. R1016). For this, half the volume of each sample (~600 μL) was transferred to a Zymospin IC column. The columns were first centrifuged at 2,000 × g for 150 seconds (sec). The flow-through was reloaded on the column, centrifuged at 10,000 g for 30 sec, then discarded. The two centrifugations were repeated for the remaining half of the sample on the respective IC columns. This was followed by one wash with RNA prep buffer and two washes with RNA wash buffer, following the manufacturer's instructions. Two elutions of 10 μL each, with centrifugation at 10,000 g for 1 min at RT, were performed with RNase-free water pre-heated to 94°C. The purified RNA samples were stored at −80°C.

RNA sequencing
Sequencing libraries were prepared with the TruSeq Stranded Total RNA with Ribo-Zero Gold Kit (Illumina, Cat. No. RS-122-2303), following the manufacturer's high-sample (HS) protocol. The input amount of total RNA per sample ranged from 0.4 to 1.5 μg with a mean of 1 μg and a standard deviation of 0.321 μg. The initial step is ribosomal RNA (rRNA) removal from total RNA. The Ribo-Zero Gold reagent depletes samples of both cytoplasmic and mitochondrial rRNA. The next steps are RNA fragmentation and first-strand cDNA synthesis. The latter is carried out in the presence of Actinomycin D, which specifically inhibits DNA-dependent, but not RNA-dependent, DNA synthesis 10 . This is followed by second-strand cDNA synthesis, adenylation of the 3' ends, adapter ligation, and enrichment of the

Data processing
The output of an Illumina sequencing run is a set of base call (.bcl) files. The bcl files for this dataset were converted to read-level data in FASTQ format with bcl2fastq v.2.17.1.14 (Illumina, Inc.). Adapter sequences were trimmed with Cutadapt v.1.10 11 in Python v.2.7.9, with the following sequences as input: Adapter-trimmed reads under 20 base-pairs were discarded. The adapter-trimmed FASTQ files were aligned to the reference human genome assembly (GRCh38) with Bowtie v. 2.2.5 12 and TopHat v.2.0.14 13,14 . Because Bowtie2 is not haplotype-aware, haplotype sequences were excluded from the GRCh38 reference assembly to generate the Bowtie2 genome index file. The transcript annotation (GTF) file was obtained from GENCODE (release 23) 15 and was also modified to exclude all haplotype sequences prior to generation of the Bowtie2 transcriptome index file. Only fragments in which both paired-end reads were successfully aligned were kept. The binary alignment files (.bam) were then used for generation of a matrix of read counts with the featureCounts program of the package Subread v.1.5.1 16 . Paired-end exonic fragments were grouped at the level of genes, based on the GENCODE 23 annotation file. Normalization was performed with the DESeq2 17 package in R v.3.3.1 18 . DESeq2 takes as input a matrix of unnormalized read counts, such as the one we generated with featureCounts for this dataset. This matrix (K) has the following format: To normalize the read count data, DESeq2 uses the median-of-ratios method. In summary, the read count in transcript i from sample j is normalized by a factor s ij to account for differences in sequencing depth between samples. For this, DESeq2 first obtains a pseudo-reference sample for each transcript, by taking the geometric mean for the transcript across samples (row-wise): It then divides each read-count value by the pseudo-reference sample for its transcript (excluding pseudoreference sample values of zero), obtaining the following ratio: For each sample (column-wise), it then calculates the median of the ratios, to get the size factor s j . All values within a sample are then divided by the same size factor, so that s ij = s j . The resulting values are the normalized read counts. A matrix of normalized read counts for this dataset, obtained as described above, is available as a supplementary file with the GEO upload of this dataset (GSE111789). Table 2 shows the RNA-seq data processing pipeline, including the versions of all software and the specific variables and parameters used to generate, test, and process the dataset.

Code availability
In vitro cell culture system for qPCR Eosinophils were isolated from the peripheral blood of each of the five subjects in cohort 2, as described above. Purified eosinophils were then centrifuged at 170 × g for 10 min at 4°C.

Analysis of eosinophil apoptosis and viability by flow cytometry
Eosinophils were isolated from the peripheral blood of each of the three subjects in cohort 3, as

Data Records
The RNA-seq dataset is deposited in Gene Expression Omnibus (GEO) under series number GSE111789 (Data Citation 1). The GEO entry includes links to the raw data in FASTQ format, which is deposited in the Sequence Read Archive (SRA) under SRP135489 (Data Citation 2). A file with processed data is provided with the GEO series record. The values correspond to read counts normalized by library size. The file contains a table with 58,765 rows and 13 columns. The first row is a header row, and it is followed by 58,764 rows, each corresponding to one transcript. The first column has the transcript identifiers and it is followed by 12 columns, one for each sample. Table 3 provides a description of each sample and its respective GEO sample accession (GSM) number. The qPCR dataset is deposited in the figshare database (Data Citation 3). Table 4 provides a description of each sample. The apoptosis flow cytometry experiment files for cohort 3 each contain five or six .fcs files corresponding to each condition (unstained or stained) at each time point (baseline, 60 minutes and/or 120 minutes). Table 5 provides a description of each sample and its respective .fcs file name as uploaded to FlowRepository under Repository ID FR-FCM-ZYNE (Data Citation 4). The surface marker flow cytometry experiment files for cohort 4 each contain seven or eight .fcs files corresponding to each condition (unstained or stained) at each time point (baseline or 120 minutes) with each treatment (none, vehicle, 20 μg/dL methylprednisolone, and 200 μg/dL methylprednisolone). Table 6 provides a description of each sample and its respective .fcs file as uploaded to FlowRepository under Repository ID FR-FCM-ZYND (Data Citation 5).

Technical Validation Eosinophil purity
Eosinophil purity was defined as the proportion of eosinophils among all counted leukocytes on a cytospin preparation. A minimum eosinophil purity of 98% was documented in all samples.

RNA quality control
Quality of the isolated RNA samples for RNA-seq was assessed by microfluidic electrophoresis on an Agilent 2100 Bioanalyzer system (Agilent, Cat. No. G2939A), with RNA 6000 Nano chips (Agilent, Cat. No. 5067-1511). Each electropherogram was manually reviewed. As a second measure of quality, we used the RNA integrity number (RIN), which is calculated by a proprietary algorithm of Agilent Technologies. This algorithm was developed based on manual grading of electropherograms and machine learning, and it offers a quantitative assessment of RNA quality. This assessment is based on a numbering system from 1 to 10, with 1 being the most degraded profile and 10 being the most intact.
The total RNA samples in this dataset had a mean RIN of 9 with a standard deviation of 0.439. A representative sample from this study, with a RIN of 9, is presented in Fig. 2a  samples had concentrations in the range of 5 to 509 ng/μL, with a mean of 122 ng/μL and standard deviation of 128 ng/μL.

RNA-seq library validation
The size distribution and quality of the dsDNA sequencing libraries were analyzed by microfluidic electrophoresis on an Agilent Technologies 2100 Bioanalyzer (Agilent, Cat. No. G2939A), with DNA 1000 chips (Agilent, Cat. No. 5067-1504). The size distribution was consistent for all the samples, with a mode length of 300 bp. A representative library from this study is displayed in Fig. 2b. The mean library

Quality control of the sequencing reads
We performed quality control of the sequencing read files for this dataset (in FASTQ format), after adapter-sequence trimming, with the software FastQC (https://www.bioinformatics.babraham.ac.uk/ projects/fastqc). We are not aware of a single specific criterion that reliably measures the quality of a FASTQ file. However, FastqQC provides several data displays that are useful for identifying outlier samples. We relied primarily on the following three: a. The distribution of quality scores for each base across all reads in file (a representative plot from this dataset is in Fig. 2c). b. The distribution of the mean quality scores across all reads in a file (a representative plot from this dataset is in Fig. 2d). c. The sequence content (proportion of each of the four nucleotides A, T, C, and G) for each base across all reads in a file (a representative plot from this dataset is in Fig. 2e).
In the example displayed in Fig. 2c, which is representative of this dataset, only one base position had an interquartile range below 28, indicating that at least 75% of the reads have quality scores above 28, so that the probability of a base call being wrong is no more than 0.0016. Figure 2d shows a mode quality score of 35, indicating on average a very low probability (around 0.0003) of a base call being wrong. Based on this type of analysis, the sequencing reads for all FASTQ files were of high quality, and we did not identify any outlier samples in the dataset.

Alignment statistics
The distribution of the proportion of all adapter-trimmed reads that were successfully aligned (mapped) to the reference genome among the 12 samples in this study is presented in Fig. 2f. This percentage of aligned reads (referred in TopHat2 as the overall mapping rate) had a mean of 83.23% with a standard deviation of  1.05%. Of the aligned reads from each sample, the proportion of reads that did not align uniquely to the reference genome (multi-mappers) had a mean of 10% and a standard deviation of 1.98%. The distribution of the percentage of multi-mappers in this study is presented in Fig. 2g. The total number of aligned read pairs (where each read pair is counted once) had a mean of 16,730,000 and a standard deviation of 3,310,000. The distribution of aligned read pairs for the 12 samples in this study is presented in Fig. 2h.

Correlation among biological replicates
To assess the level of similarity in the baseline transcriptome of circulating eosinophils from the three unrelated patients studied, we plotted the normalized read count values at baseline for all expressed transcripts in each of the three pairwise comparisons (Fig. 3a). An expressed transcript was defined as a transcript with a read count > 0 in at least one of the two samples being compared. The baseline eosinophil transcriptomes of the three unrelated patients were similar, with Spearman's rank correlation coefficients of 0.94 to 0.95 (Fig. 3a).
To evaluate whether the overall change in the eosinophil transcriptome induced by the glucocorticoid stimulus was similar in the three patients at the three post-treatment sampling times, we performed principal component analysis (PCA) on the count data at each time point. For this, we first obtained data in the log scale normalized for library size, by applying the regularized log transformation (rlog) method as implemented in DESeq2 17 . This method avoids a common property of the standard logarithm transformation, which is the spreading apart of data for genes with low counts. The rlog transformation behaves similarly to a log2 transformation for genes with high counts but shrinks the values for genes with low counts. In short, this method fits a model: Where q i,j is a parameter proportional to the true read count of transcript i in sample j, β i,0 is an intercept which does not undergo shrinkage, and β i,j is a sample-specific effect, which is shrunk towards zero based on the trend of the dispersion across read-count values in the samples. The goal of performing this transformation prior to PCA analysis is to render the data homoscedastic. The PCA plot of the rlogtransformed data for the 12 samples in this study is presented in Fig. 3b. All patients showed a decrease with respect to PC2 over time and a decrease with respect to PC1, except for a slight increase in patient 2 from baseline to 30 min. This suggests that the overall effect of the glucocorticoid over time in the eosinophil transcriptome was similar in the three patients studied.

Quality control of qPCR data
A no-template control was used to verify no signal was produced from the reagents with water alone. The gene encoding the 18S subunit of ribosomal RNA was used as a reference gene to normalize all qPCR results. For qPCR, Taqman primer/probe sets were designed to span at least one intron-exon boundary; therefore, DNA containing introns was excluded. The gene TSC22D3, which is known to increase in response to glucocortiocids in multiple cell types, was included as a positive control in qPCR experiments. The expected increase in TSC22D3 expression was observed in all glucocorticoid-treated samples and in none of the vehicle-treated samples.
Quality control of the flow cytometry data Autofluorescence in each sample set was established with an unstained control. Single-color controls using PBL at baseline stained with each fluorophore were used to account for spillover and assign a compensation matrix to the samples. Fluorescence minus one (FMO) controls were used to assign gates. Isotype controls were used as negative controls to ensure that non-specific staining did not occur. All isotype controls were negative for their corresponding fluorochrome.
The quality scores on the y-axis are defined as -10log 10 e, where e is the estimated probability of a base call being wrong. Therefore, a quality score of 30 means that the estimated probability of a base being wrong is 1/1000. For each position in the sequenced reads, the corresponding box plot displays the distribution of quality scores across all the sequences in a FASTQ file. In each box plot, the red line displays the median, the yellow box the interquartile range (25-75%), and the lower and upper whiskers the 10 th and 90 th percentiles, respectively. The blue line displays the mean quality scores. (d) Distribution of the mean quality score by read for a representative FASTQ file from this study. (e) Sequence content in a representative FASTQ file from this study. For each position in the sequenced reads, the sequence content is the proportion of each of the four nucleotides (A, T, C, and G) at that position. (f) Distribution of mapping rates for the 12 samples in this study. The mapping rate is defined as the percent of reads in each sample that were uniquely aligned to the reference genome. (g) Distribution of the percentage of multi-mappers for the 12 samples in this study. This represents the percentage of mapped reads that aligned to more than one location in the reference genome. (h) Distribution of the number of aligned read pairs for the 12 samples in this study. Each read pair is counted once.

Usage Notes
GEO requests the upload of untrimmed FASTQ files, which are made publicly available through SRA. Therefore, the publicly-available raw data files for this study have untrimmed, 2 × 94 bp reads. As described in the Methods section and in Table 2, we chose to trim adapter sequences prior to alignment. We recommend that investigators using our data, particularly those trying to reproduce our results, to do the same.