Comparative, transcriptome analysis of self-organizing optic tissues

Embryonic stem (ES) cells have a remarkable capacity to self-organize complex, multi-layered optic cups in vitro via a culture technique called SFEBq. During both SFEBq and in vivo optic cup development, Rax (Rx) expressing neural retina epithelial (NRE) tissues utilize Fgf and Wnt/β-catenin signalling pathways to differentiate into neural retina (NR) and retinal-pigmented epithelial (RPE) tissues, respectively. How these signaling pathways affect gene expression during optic tissue formation has remained largely unknown, especially at the transcriptome scale. Here, we address this question using RNA-Seq. We generated Rx+ optic tissue using SFEBq, exposed these tissues to either Fgf or Wnt/β-catenin stimulation, and assayed their gene expression across multiple time points using RNA-Seq. This comparative dataset will help elucidate how Fgf and Wnt/β-catenin signaling affect gene expression during optic tissue differentiation and will help inform future efforts to optimize in vitro optic tissue culture technology.


Background & Summary
During development, rx expressing neural retina epithelium (NRE) differentiates into neural retina (NR) and retinal pigmented epithelium (RPE) 1 (Fig. 1a), tissues with distinct morphologies and gene expression patterns (Fig. 1b,c). For instance, NR tissues show a comparatively thickened morphology, express the transcription factor gene chx10 (also called vsx2), sustain a high expression level of rx, and eventually perform the light-sensing duties of the mature eye 2 (Fig. 1b). Conversely, the RPE is a comparatively thin, pigmented monolayered epithelium that expresses the transcription factor Mitf 1 (Fig. 1b) and other pigmented cell markers such as tyrosinase (tyr) (note, Mitf is also expressed in pigmented cells such as melanophores and peripheral retinal components such as the pigmented ciliary body 3 ).
Amazingly, embryonic stem cells have the capacity to recapitulate optic cup formation via a culture technique called SFEBq (serum-free floating culture of embryoid body-like aggregate with quick reaggregation) [4][5][6][7] (Fig. 1d, Supplementary Fig. 1a). SFEBq-generated optic cups, although not totally analogous to their in vivo counterparts, facilitate the self-organization of Chx10+ NR-like and Mitf+ RPElike tissues 7 (Fig. 1e,f). In this way, SFEBq provides a convenient in vitro method to generate NR-like and RPE-like tissue for further analyses.
The principal goal of this study, thus, was to utilize RNA-Seq in combination with SFEBq in order to better understand how Fgf and Wnt/β-catenin signalling affect the transcriptome of Rx+ optic progenitor tissue. Towards this end, we utilized a previously established Rx::GFP reporter mouse ESC line, allowing us to monitor the generation of Rx+ SFEBq tissue in realtime 6,7 .
Using the Rx::GFP reporter line and a Wnt/β-catenin signaling reporter 'TOP::DsRed' 15 , we confirmed that SFEBq tissue with relatively high Wnt/β-catenin signaling correlated with RPE-like characteristics, such as Mitf expression and a comparatively thin tissue morphology (Fig. 2b,c). Consistently, we found that exposure of Day 10 Rx::GFP+//TOP::DsRed tissue explants to CHIR99201 (a chemical agonist of Wnt/β-catenin signaling 16 , a treatment hereon simply referred to as 'Wnt stimulation') strongly activated the TOP::DsRed reporter by Day 12 and resulted in tissue displaying RPE-like morphology by Day 15 (Fig. 2d, Data Citation 1). Conversely, exposing Day 10 Rx::GFP+ tissue explants to Fgf stimulating conditions resulted in highly expressing Rx::GFP+ tissue that displayed NR-like morphology by Day 15 (Fig. 2d, Data Citation 1).
We further analyzed these Day 15 Wnt or Fgf stimulated tissues via immunohistochemistry. Day 15 Wnt stimulated tissue was majority Mitf+, whereas Fgf stimulation produced tissue that was majority Chx10+ (Fig. 2e,f). In addition, we found that Fgf stimulation but not Wnt stimulation allowed the appearance of postmitotic retinal ganglion cells as evidenced by expression of Pou4f2 (also called Brn-3b 17,18 ), a marker that was not present in Day 10 Rx::GFP+ tissue ( Supplementary Fig. 2a). However, it is important to note that some Fgf stimulated aggregates displayed a small portion of Mitf+ tissue (Fig. 2f), and Wnt stimulated tissue was not 100% positive for Mitf (Fig. 2e). Thus, Wnt and Fgf stimulating conditions produce Day 15 tissue aggregates that are majority, but not absolutely, RPE-like and NR-like in identity.
We next performed RNA-Seq analyses to measure the gene expression changes in Day 10 Rx::GFP+ tissue following Wnt or Fgf stimulation. We collected five groups of samples in biological triplicate ( Fig. 3a): Day 10 Rx::GFP+ tissue (i.e., the starting material, Group 1); Day 12 and Day 15 Fgf stimulated tissue (Day 12 +Fgf and Day 15 +Fgf, Groups 2 and 4); Day 12 and Day 15 Wnt/β-catenin stimulated tissue (Day 12 +Wnt and Day 15 +Wnt, Groups 3 and 5). We then extracted high-quality total RNA from these samples and prepared paired-end libraries for sequencing on an Illumina HiSeq platform (Table 1,  Table 2, Fig. 3b). This approach generated on average~20 million paired-end reads per sample, and all samples possessed a suitable level of read quality and a high mapping rate (Fig. 3c, Technical Validation, Table 3).
Ultimately, this RNA-Seq analysis measured~18000 gene expression level changes, revealing significant differences in gene expression profiles between the groups (Fig. 3d). In the RNA-Seq data, we examined the expression patterns of some known NR, RPE, Wnt/β-catenin-target, and Fgf-target genes (Fig. 3e, see Technical Validation). Notably, the RNA-Seq data produced gene expression patterns that correlated with the immunohistochemical analyses of Chx10 (Vsx2), Mitf, and Pou4f2 (Supplementary Fig. 2b-f). Nevertheless, it's important to note that our RPE-like and NR-like tissue samples were generated in vitro, and thus, our dataset would not be expected to completely mirror in vivo NR and RPE gene expression patterns.
In conclusion, our dataset (GSE62432, Data Citation 2, Supplementary Table 1) is a genetic analysis of how Rx+ tissue responds to Fgf and Wnt/β-catenin signaling pathways as it differentiates towards NRlike and RPE-like tissue. This data may be helpful for future work in optimizing in vitro optic tissue engineering as well as future studies examining the developmental and cellular biology of eye. For instance, with these data, we can ask questions such as: In vitro generated optic tissues have been shown to integrate into host eyes following transplantation and thus hold immense potential in future cell replacement therapies [19][20][21] . In this regard, our dataset may

Generation of NR-like and RPE-like tissues using SFEBq
SFEBq and optic cup culture was performed using Rx::GFP murine ES cells 6 according to the protocol described by Eiraku and colleagues 22 with some minor modifications (detailed graphical overview of the SFEBq protocol and optic tissue culture protocol used in this study can be found in Supplementary Fig.  1a-j). For instance, unlike Eiraku and colleagues (2011), we cultured Rx::GFP murine ES cells in the presence of 2i conditioned media (i.e., ES cell media containing 3 μM CHIR99201 and 1 μM PD 0325901) due to its reported effect of promoting a uniform 'ground state' within ES cells 23 , and in addition, it had previously been shown that 2i culture improves ES cell differentiation rate of some neuronal lineages 24 . However, because the CHIR99201 compound 16 induces Wnt/β-catenin signaling via inhibition of GSK3, a property that we later utilized to promote the RPE-like tissue differentiation pathway in Day 10 Rx::GFP + tissue explants, it was important to confirm that 2i culture of ES cells did not bias ES cells towards a specific tissue fate prior to differentiation experiments. Towards this aim, we used RT-qPCR to examine Day 10 SFEBq aggregates for rx, vsx2, and mitf expression, finding no significant differences between aggregates generated from 2i-cultured or LIF-only cultured ES cells (Supplementary Fig. 2k).
Like Eiraku and colleagues 22 Supplementary Fig. 1l), and culturing the self-organizing Day 10 Rx::GFP+ tissue without exogenous Wnt of Fgf stimulation (i.e., RMM only) produces Day 15 aggregates that contain a comparatively heterogeneous mix of Mitf+, Chx10+, and Pou4f2+ tissues (Supplementary Fig. 1m). The endogenous expression of Fgf and Wnt ligands in Day 10 Rx::GFP+ tissue means that exogenous Wnt or Fgf stimulation may not completely negate some Fgf or Wnt signaling events in these samples, respectively.

Immunohistochemistry and live-imaging
Sectioning and immunohistochemistry was performed as previously described 25 . Antibodies were used as follows: Rx: (rabbit, 1:1000, custom 26  RNA extraction, cDNA synthesis, and RT-qPCR RNA was extracted using the RNeasy kit (Qiagen) using the company-provided protocol. SFEBq aggregates were added to 700 μl buffer RLT and spun through QIAshredder (Qiagen) prior to RNA extraction. The cDNA samples for RT-qPCR reactions were generated using the High Capacity cDNA kit (Applied Biosystems). The qPCR reactions were performed using a 7500 Fast Real-Time PCR System

Library preparation and sequencing
Using total RNA extracted as above, sequencing libraries were prepared from 700 ng total so that the library amplification with PCR required no more than 9 cycles. Sequencing was performed on Illumina HiSeq in Rapid mode with 101 cycles, with all the 15 libraries multiplexed in 2 lanes. Details of sequencing and read statistics are described in Table 3. Base calling was processed with RTA 1.17.21.3. Fastq files were generated with bcl2fastq 1.8.4 (illumina) and deposited in the Gene Expression Omnibus (GEO) database under the accession number GSE62432.

RNA-Seq data analysis
Quality check and mapping. The quality of the RNA-Seq reads was evaluated using the version 0.10.1 FastQC quality check package 27 . Having ensured high quality of the data, sequence reads for each library were mapped independently to the mouse genome assembly mm10, using the spliced aligner Tophat (v2.0.8b) with default parameter settings 28 . This yielded a high percentage of unique and properly paired reads,~87% for all libraries. Next, for each library we estimated the number of sequence reads overlapping at any given nucleotide position in the reference genome at a 100-bp resolution. The count of reads aligning at each position was then normalized to per million reads of their respective library sizes. This data was converted to wiggle-formatted files and eventually used to visualise the coverage of mapped reads in the form of Circos plots. Circos plots were generated using the Circster application of the Galaxy project 29 .

Expression quantification and downstream analysis
Successfully mapped reads were quantified against the annotated UCSC transcriptome for mm10 to estimate the number of fragments originating from individual genes using the Cuffdiff program of the Cufflinks package 30 (v2.1.1). The count data estimated by Cuffdiff was then used as the input to bioconductor package edgeR (v3.2.4) to assess the biological variability in the samples and test for differential expression 31 . In addition to identifying the significantly differentially regulated genes edgeR also provides the normalized expression values for each gene in each library. These normalized expression values referred to as counts per million (CPM) were used for all downstream analysis. Using the normalized expression values we performed Principal Component Analysis (PCA) to assess the variability among the samples as well as the fidelity within the replicates of each sample. Before performing PCA all expression values were log2 transformed and genes with zero values were replaced by the minimum non-zero expression value of the entire dataset. PCA was implemented using the prcomp() function of the R programming language. Furthermore the expression patterns of some of the relevant genes were analysed using the pheatmap package of R. For plotting the heatmap, gene expression values of each gene were normalized by a constant factor representing the highest expression value of that gene across all samples. These values were further scaled up by a factor of 100 and then log-transformed.

Data Records
Transcriptome-scale expression profile of SFEBq-generated Day 10 Rx::GFP+ optic tissue was performed using RNA-Seq. The Day 10 Rx::GFP+ tissue was stimulated by exogenous Fgf or Wnt signaling culture conditions and profiled at Days 12 and 15 by RNA-Seq. Three biological replicates were provided for all samples for each time point. The raw sequencing data in the form of fastq files and processed data showing normalized expression values has been submitted to Gene Expression Omnibus (GEO). The GEO accession number GSE62432 provides access to all the raw and processed data generated by RNA-Seq (Data Citation 2). The processing of all samples is summarized in Tables 1-3.

Technical Validation
Quality control of RNA, sequencing libraries and high throughput sequencing Quality of the total RNA was measured by RNA Pico Kit (Agilent) and all samples with sufficiently high RNA Integrity Number (RIN) were used for this study (average RIN was 8.9 with standard deviation of 0.7). Sequencing libraries were evaluated by High Sensitivity DNA Assay Kit (Agilent), which indicated a uniform size range across all libraries (Fig. 3b). Each library was sequenced to a depth of~20 million reads among which about 87% of the reads mapped uniquely to the mouse genome assembly mm10 (Table 3). In addition PCA plots displayed high agreement between the biological replicates thus ensuring us of a sufficiently high quality dataset (Fig. 3d).

Usage Notes
For RNA-Seq we recommend using the splice-aware software such as Tophat2 for efficient and accurate mapping to the genome. Expression quantification and differential expression can be best achieved by softwares such as edgeR or DEseq. These programs base their statistical inference on Negative Binomial (NB) distribution, which is required to correctly model the biological variation between samples. Some recent protocols elaborate the details to analyze RNA-Seq data 38,39 .