Profiling the small non-coding RNA transcriptome of the human placenta

Proper functioning of the human placenta is critical for maternal and fetal health. While microRNAs (miRNAs) are known to impact placental gene expression, the effects of other small non-coding RNAs (sncRNAs) on the placental transcriptome are not well-established, and are emerging topics in the study of environmental influence on fetal development and reproductive health. Here, we assembled a cohort of 30 placental chorionic villi samples of varying gestational ages (M ± SD = 23.7 ± 11.3 weeks) to delineate the human placental sncRNA transcriptome through small RNA sequence analysis. We observed expression of 1544 sncRNAs, which include 48 miRNAs previously unannotated in humans. Additionally, 18,003 miRNA variants (isomiRs) were identified from the 654 observed miRNA species. This characterization of the term and pre-term placental sncRNA transcriptomes provides data fundamental to future investigations of their regulatory functions in the human placenta, and the baseline expression pattern needed for identifying changes in response to environmental factors, or under disease conditions. Measurement(s) small non-coding RNA Technology Type(s) small RNA sequencing assay Sample Characteristic - Organism Homo sapiens Measurement(s) small non-coding RNA Technology Type(s) small RNA sequencing assay Sample Characteristic - Organism Homo sapiens Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.14233229


Background & Summary
The placenta is essential for the maintenance of pregnancy and the regulation of fetal growth and development 1 . Regulation of gene expression through epigenetic modifications 2 and transcription factor availability 3 is critical for healthy placental functioning [4][5][6] .
Small non-coding RNAs (sncRNAs) have the ability to regulate gene expression through a variety of epigenetic and post-transcriptional mechanisms 7 . Certain sncRNA subtypes, including microRNAs (miRNAs), have been associated with gene deregulation in pregnancy-associated diseases, including preeclampsia 8 . Specifically, miRNAs originating from the chromosome 14 and 19 miRNA clusters (C14MC and C19MC, respectively) have established contributions to placental gene regulation [9][10][11] .
However, studies of placental sncRNAs have focused almost exclusively on canonical miRNAs 12,13 , and not on other RNA species, such as PIWI-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and miRNA variants (isomiRs) that can also influence the genetic and epigenetic regulation of transcription [14][15][16] . Furthermore, past efforts to identify human miRNAs have typically prioritized those that are highly expressed across multiple tissues 17 . MiRNA discovery efforts that focused on individual tissues have successfully identified novel, tissue-specific miRNAs that had previously been overlooked 18,19 , but the placenta has yet to be studied in this fashion. As a resource for future placental biology investigations, we have profiled and quantified the expression of annotated sncRNAs and determined the expression pattern of novel (previously-unannotated) miRNAs within the human placenta.
We isolated and sequenced the small RNA fractions of 32 placental (chorionic villi) samples of varying fetal sexes and gestational ages, 30 of which met our threshold for total high-quality reads (Table 1). Trimmed sequencing reads were input into the miRMaster platform, which performs quality filtering, aligns reads to annotated sncRNAs, quantifies sncRNA expression, and predicts novel miRNA sequences 20 . We considered a sncRNA to be 'placentally expressed' if it was present at ≥ 1 read per million (RPM) in at least 10% (3/30) of the placental samples. We considered a sncRNA to be expressed during a given trimester if it was present at ≥ 1 RPM in at least 10% of samples from that trimester. Raw data are made available for investigating sncRNA expression related to specific biological features 21 .
A total of 654 miRNAs were placentally expressed, along with 277 piRNAs, 231 snoRNAs, 161 snRNAs, and 221 tRNAs (Fig. 1a). For all trimesters, miRNA reads made up a large majority (88-93%) of the total sncRNA reads (Fig. 1b). Of the 654 identified miRNAs, 48 were novel miRNA sequences (Online-only Table 1, GSE164178) 21 . In both individual features, such as length and GC content, and composite metrics, such as novoMiRank score 22 and miRMaster-computed probability of their precursor being a true precursor, these novel miRNAs closely resemble the annotated placentally expressed miRNAs (Table 2).
In addition, there were 18,003 distinct isomiRs, which are natural variations of canonical miRNAs 23 , that were placentally expressed 21 . Additions or deletions of nucleotides at the 3′ end of isomiRs were far more prevalent (72% of all placentally expressed isomiRs) than at the 5′ end (18%) ( Table 3). The relative proportions of isomiR types were similar across trimesters, but a greater total number of isomiRs were expressed during Trimesters 1 and 2 (17,870 and 16,541, respectively), than in term samples (12,444) (Table 3).
For each sncRNA subtype, we calculated the fold change in RPM values for each trimester relative to the average RPM for all samples. Expression of miRNAs peaks in second trimester samples, while expression of piRNAs, snRNAs, snoRNAs, and tRNAs is lowest in second trimester samples (Fig. 1c). PiRNA expression is highest in first trimester samples, while snRNA, snoRNA, and tRNA expression is highest in term samples (Fig. 1c).
SncRNAs of all subtypes, including novel miRNAs, were found to be expressed from almost all chromosomes (Fig. 2a). Regions of high miRNA expression were found on chromosomes 14 and 19, at the well-characterized C14MC and C19MC clusters (Fig. 2b). www.nature.com/scientificdata www.nature.com/scientificdata/

Methods
Sample acquisition. This study used 32 de-identified chorionic villi samples collected at the BC Women's Hospital and Health Centre from first trimester, second trimester, and term pregnancies, including 14 that had been previously collected 24 . Thirty of the samples had sufficient (> 1 million) high-quality sequencing reads to be included in further analysis. First trimester samples were obtained from elective terminations, while second trimester terminations were due to various fetal demise conditions, including but not limited to anencephaly, spina bifida, and preterm membrane rupture. Mode of delivery for 3/9 term samples was caesarean section. Cases with   Table 2. Comparison between annotated and novel placentally expressed miRNAs. All values are given as the mean, plus or minus the standard deviation. *The novoMiRank score and miRMaster-assigned probability values for annotated placentally expressed miRNAs were calculated from the annotated miRNAs that miRMaster also predicts 20 to be true miRNAs (540 out of 606). † The novoMiRank score and miRMaster-assigned probability values for novel placentally expressed miRNAs were calculated from all (48) novel miRNAs predicted by miRMaster.
www.nature.com/scientificdata www.nature.com/scientificdata/   www.nature.com/scientificdata www.nature.com/scientificdata/ known chromosome abnormalities were excluded. Six of the samples were from fetuses that were phenotypically classified as having neural tube defects (Table 1).
For all cases ascertained before the termination of pregnancy, written consent was obtained. For all cases obtained retrospectively from pathological autopsy specimens, biospecimens were de-identified and all links to clinical data were removed. No identifiable information for any cases is presented in this publication. Ethics approval was obtained from the joint University of British Columbia/Children's Hospital and Women's Health Centre of British Columbia Research Ethics Board (H10-01028, H16-02280, and H04-70488).
After placental membrane removal, 30 mg of chorionic villi was sampled from the fetal-facing side of the placenta. Processing time after delivery ranged from 1-192 hours, and samples were RNAlater preserved. During the time of extraction, excess RNAlater was removed by blotting with Kimwipes (Kimberly-Clark, USA), after which samples were homogenized in the Next Advance Bullet Blender Tissue Homogenizer (Next Advance, USA), using the 3.2 mm Stainless Steel Beads (Next Advance, USA), with 1 ml of TRIzol reagent (ThermoFisher Scientific, USA). Samples were then incubated for 5 minutes at room temperature (RT), and then centrifuged at 7000 rpm for three minutes. 200 ml of chloroform (ThermoFisher Scientific, USA) was added to the supernatant, which was thoroughly mixed by inversion, and incubated for five minutes at RT. Samples were centrifuged at 4 °C at 9000 rpm for 20 minutes, and 500 μl of isopropanol (ThermoFisher Scientific, USA) was added to the aqueous phase, which was mixed by inversion and incubated for 10 minutes at RT. Samples were further centrifuged at 4 °C at 9000 rpm for 15 minutes. Supernatant was discarded and RNA pellet was washed in 75% ethanol (Commercial Alcohols, diluted with Ultrapure Distilled Water-RNAse/DNAse free, Gibco-LifeTechnologies) by gently inverting. Samples were centrifuged at 4 °C at 9000 rpm for 10 minutes. Supernatant was discarded, and RNA pellet was air-dried for five minutes at RT. RNA was eluted in 50 μl of nuclease-free water (Ultrapure Distilled Water-RNAse/ DNAse free, Gibco-LifeTechnologies). Genomic DNA removal was carried out using the RNase-Free DNase Set (Qiagen, Germany). RNA concentration was measured on a Nanodrop 2000 (ThermoFisher Scientific, USA). RNA quality was assayed on an Agilent Bioanalyzer 2100 (Agilent, USA). Prior to sequencing, small RNA fractions were depleted of ribosomal RNA by hybridization, using the NEBNext rRNA Depletion Kit (New England BioLabs, USA).  www.nature.com/scientificdata www.nature.com/scientificdata/ Sequencing and quality control. Samples were sequenced at Canada's Michael Smith Genome Sciences Centre in Vancouver, using their standard ribodepleted strand-specific RNA (ssRNA) sequencing protocol 25 . This protocol includes plate-based ssRNA library construction, followed by sequencing on an Illumina HiSeq 2500 using the 3′ TruSeq small RNA adapter. No negative sequencing controls or positive spike-in controls were used.
Sequencing reads were subjected to a series of quality control steps, in order to trim adapters and discard reads that were < 16 nucleotides. Trimmed reads (FASTQ) were processed through the miRMaster platform (accessed on Mar. 2018), under default settings 20 . Reads with a Phred quality score < 20 were discarded, and samples with < 1 million remaining reads were excluded from further analysis (2/32 samples).

Detection of annotated sncRNAs.
MiRMaster maps reads to the human genome (hg38) using Bowtie 2 and assigns them to different classes of sncRNAs (miRNA, piRNA, snRNA, snoRNA, or tRNA). Reads that nearly mapped to annotated miRNAs (miRBase v21), with 5′ or 3′ additions or deletions of nucleotides and up to two mismatches, were classified as isomiRs. Reads were then quantified by miRMaster and scaled on a per sample basis by units of reads per million. For sncRNA sequences that mapped to multiple locations in the genome, only the reads derived from the locus with the highest mean expression were retained. Sequences expressed at ≥ 1 RPM in ≥ 10% (3/30) of the samples were considered to be 'placentally expressed' . www.nature.com/scientificdata www.nature.com/scientificdata/ Discovery of novel (previously-unannotated) miRNAs. All reads not aligning with annotated sncR-NAs were assessed by miRMaster, using a machine learning algorithm trained to classify sequences as true or false miRNA precursors. MiRMaster employs the AdaBoost algorithm, trained on a set of 216 miRNA features, including nucleotide ratios, free energy metrics, and folding metrics 20 . Prospective miRNA precursors were also scored by novoMiRank. These scores represent the extent to which a prospective precursor differs from early miRBase-catalogued precursors in 24 features, including nucleotide composition, loop length, and the genomic proximity of other miRNA precursors 22 . Prospective miRNA precursors were filtered according to their miRMaster-assigned probability of being a true precursor (≥ 65%) and their novoMiRank score (≤ 1.5), and the corresponding prospective miRNAs were filtered by their expression level (≥ 1 RPM in ≥ 10% of samples). For novel miRNA sequences that could be derived from multiple prospective miRNA precursors, only the sequence derived from the prospective precursor with the highest miRMaster-assigned probability of being a true precursor was retained.

Data Records
FASTQ files containing raw sequencing reads can be accessed through the NCBI Sequence Read Archive 26 . CSV files detailing the reads per million expression values of sncRNAs in each placental sample can be accessed through the Gene Expression Omnibus 21 . Data are provided for all sncRNAs and isomiRs with at least one read in one sample, not only those that were placentally expressed. Similarly, data are provided for all candidate novel miRNAs, including those that did not pass filters for expression, novoMiRank score, or miRMaster-assigned probability of being a true precursor. Each sample has a separate file for expression of annotated sncRNAs, novel miRNAs, and isomiRs.
In order to ensure the accuracy of sequencing reads, reads with Phred scores < 20 were discarded. Prior to this filtering, FastQC v0.11.9 was used to assess the overall sequencing quality for each sample 27 . The mean (SD) pre-filtering Phred score for individual samples was 31.83 ± 2.99 (Table 4). Samples with the library ID 'MX1355' (n = 5) possessed a lower mean (SD) Phred score of 25.37 ± 0.67, which most likely represents a batch effect ( Table 4). The median sample yielded an average Phred score of > 20 at positions 1-27 of the trimmed reads, indicating that mature miRNAs (length 18-25 nt) were being accurately quantified (Fig. 3a). Reads with average Phred scores of 35-37 are the most abundant in all samples (Fig. 3b). The mean (SD) GC content for individual samples was 48.7 ± 1.5% (Table 4).
To confirm that the six second trimester samples from fetuses with neural tube defects did not have dramatically different non-coding transcriptomes from the other second trimester samples, multidimensional scaling using Principal Component Analysis was performed on their expression of placentally expressed sncRNAs. When plotted for the first two principal components, the samples with neural tube defects were not distinct from the other second trimester samples (Fig. 3c). The possibility that some preterm samples would have developed observable neural tube defects or other placental or fetal dysfunctions had the pregnancies progressed further cannot be excluded. However, the lack of outliers in the first two principal components indicates, based on the non-coding transcriptomic data that we present, that none of the samples were significantly altered at the time of sampling (Fig. 3c).

Code availability
All data for this project were processed in MATLAB R2017b. The code used to process these data has been deposited in the Figshare repository, and is publicly available 28 .