Identification of candidate genes and pathways in retinopathy of prematurity by whole exome sequencing of preterm infants enriched in phenotypic extremes

Retinopathy of prematurity (ROP) is a vasoproliferative retinal disease affecting premature infants. In addition to prematurity itself and oxygen treatment, genetic factors have been suggested to predispose to ROP. We aimed to identify potentially pathogenic genes and biological pathways associated with ROP by analyzing variants from whole exome sequencing (WES) data of premature infants. As part of a multicenter ROP cohort study, 100 non-Hispanic Caucasian preterm infants enriched in phenotypic extremes were subjected to WES. Gene-based testing was done on coding nonsynonymous variants. Genes showing enrichment of qualifying variants in severe ROP compared to mild or no ROP from gene-based tests with adjustment for gestational age and birth weight were selected for gene set enrichment analysis (GSEA). Mean BW of included infants with pre-plus, type-1 or type 2 ROP including aggressive posterior ROP (n = 58) and mild or no ROP (n = 42) were 744 g and 995 g, respectively. No single genes reached genome-wide significance that could account for a severe phenotype. GSEA identified two significantly associated pathways (smooth endoplasmic reticulum and vitamin C metabolism) after correction for multiple tests. WES of premature infants revealed potential pathways that may be important in the pathogenesis of ROP and in further genetic studies.

Gene-based analysis. Gene-based analysis using SKAT-O, which combines the effects of all rare variants across a gene, with adjustment for BW and GA revealed no genes that reached genome-wide significance (P < 2.5 × 10 -6 ). From the rare/low-frequency variants analysis, the most strongly associated genes by SKAT-O included RNF225, PMS1, PRMT3, SPANXD, ANKRD30A and THSD4 (Table 2). Among the 263 candidate genes, CD36, NOX4, NTRK2, IGFBP7, and NTF4 were the most strongly associated with ROP (Supplemental Table S3). From the common variants analysis, which analyzes each variant separately, the most strongly associated genes by SKAT-O included ZNF341, ALG10B, OR4D6, KCNE1, and FAM198A (Table 3).
In order to check if there is confounding between genes that are contributing to prematurity and genes that are contributing to ROP specifically, we compared the twenty genes with the largest associations from the rare variant and common variant analysis sets with the largest gene list for prematurity to date 29 and found no intersection.
Gene set enrichment analysis (GSEA). GSEA identified one pathway (GO:0005790, smooth endoplasmic reticulum) from the rare/low-frequency variants analysis that was significantly enriched after correction for multiple tests (Table 4). GSEA also identified 25 pathways from databases including GO, KEGG, PANTHER, Reactome, and Wikipathways with empirical P values of less than 0.05, including the beta2 adrenergic receptor signaling pathway, dopamine receptor mediated signaling pathway, T cell activation, nicotinate/nicotinamide metabolism, and platelet-derived growth factor (PDGF) signaling pathway (Table 4). From the common variants analysis, GSEA identified one significantly enriched pathway (Reactome R-HSA-196836, Vitamin C metabolism) ( Table 5). Other pathways with empirical P values of less than 0.05 are listed in Table 5.

Discussion
This study aimed to identify potential genes and pathways associated with ROP by analyzing rare and lowfrequency variants from whole exome sequencing data of 100 preterm infants enriched in phenotypic extremes.
The key findings from this study are as follows: (1) gene-based analysis with adjustment for BW and GA revealed no genes that reached genome-wide significance; and (2) GSEA identified 2 significantly enriched pathways (smooth endoplasmic reticulum and vitamin C metabolism) that may be important in the pathogenesis of ROP.
Gene-based analysis, which aggregates all the rare variants in each gene, using SKAT-O with adjustment for BW and GA identified many genes with low P values, although no genes reached formal genome-wide significance (P < 2.5 × 10 -6 ). The top most strongly associated genes included several genes of potential interest (Table 2 and 3). THSD4 (P = 5.04 × 10 -4 ) encodes thrombospondin type-1 domain-containing protein 4, also known as "A disintegrin and metalloproteinase with thrombospondin motifs-like protein 6" (ADAMTSL-6). ADAMTSL-6, expressed in various tissues including retina (from FANTOM5 data), is an extracellular matrix protein that promotes assembly of the fibrillin-1 matrix 30 . Fibrillin-1 controls activation of TGF-β 31,32 , which is an angiogenic activator and has been implicated in retinal vascular diseases such as diabetic retinopathy 33,34 . Therefore, THSD4 is one potential target for studies on ROP pathogenesis. OPA1, the most common gene mutated in dominant optic atrophy, encodes a dynamin-related GTPase that is necessary for mitochondrial inner membrane fusion and maintenance of mitochondrial architecture 35 . Recently, it was suggested that diabetes resulted in reduced opa1 gene expression and mitochondrial dysfunction in an animal model of diabetic retinopathy (Verma A et al. IOVS 2016;57:ARVO E-Abstract 5446). However, the role of OPA1 in retinal angiogenesis has not been investigated.
Calpain 2, which is encoded by CAPN2, has been known to be involved in neurodegeneration. A recent study showed that inhibition of calpain 2 ameliorated retinal ischemic injury, suggesting that calpain-2 inhibitor might prevent ischemia-induced retinal degeneration 36 . Therefore, CAPN2 is a potential target for the treatment of ROP.
Among the 263 candidate genes, CD36 showed the lowest P value from SKAT-O test (Supplemental Table S3). CD36 encodes platelet glycoprotein 4, also known as the thrombospondin receptor. Platelet glycoprotein 4 was suggested to be involved in angiogenesis in various ways with or without mediating thrombospondin 37,38 . Also, as a multi-ligand scavenger receptor, it has been implicated in retina homeostasis 38 . Therefore, CD36 may also be a potential target for future ROP studies.
GSEA identified several pathways that may be important in the pathogenesis of ROP. Identified pathways included the β2 adrenergic receptor signaling pathway, dopamine receptor mediated signaling pathway, T cell activation, PDGF signaling pathway, Robo4 and VEGF signaling pathways crosstalk, and vitamin C (ascorbate)  (Table 4 and 5). A series of studies showed that the β2 adrenergic receptors play a pivotal role in the regulation of vascular endothelial growth factor (VEGF) production and retinal neovascularization 39 . Also, the β1/β2 adrenergic receptor antagonist propranolol showed reduction in VEGF expression, retinal neovascularization and vascular leakage in oxygen induced retinopathy 39 . A recent systematic review on clinical trials which investigated the effect of beta-blockers on ROP concluded that low to moderate quality evidence suggests that prophylactic administration of oral beta-blockers might reduce progression towards stage 3 ROP and decrease the need for treatment 40 .
Retinal dopamine, which is synthesized in and released by subtypes of amacrine and interplexiform cells, and its receptor signaling pathway has not been associated with retinal angiogenesis 41 . However, studies suggested that dopamine mediates diverse functions including retina development, visual signaling, and myopic eye growth 41,42 . Thus, further studies to examine the relationships between dopamine receptor mediated signaling and retinal vascular development or ROP is warranted.
A growing body of evidence supports important roles of inflammation in ROP. Recently, a study showed that regulatory T cells are recruited to the retina in oxygen-induced retinopathy and reduce severe microvascular disease 43 . In this manner, the T cell activation pathway might be related to development of ROP. The PDGF signaling pathway was also important for pericyte viability and the subsequent prevention of VEGF/ VEGFR-2 overexpression and angiogenesis in oxygen-induced retinopathy 44 . Although PDGF antagonists have been tried for patients with neovascular age-related macular degeneration with unclear benefits (https:// clini caltr ials. gov/ ct2/ show/ NCT01 944839; Last accessed on 09/13/2019) 45 , the exact role of PDGF in ROP requires further investigation.  www.nature.com/scientificreports/ One of the identified pathways (GO:0005790, smooth endoplasmic reticulum) was significantly enriched after correction for multiple tests. In general, smooth endoplasmic reticulum is associated with production of carbohydrate and lipids such as steroid hormones, cholesterol and membrane phospholipids. It also plays an important role in protein modification and intracellular protein transport. However, whether biological processes in smooth endoplasmic reticulum have specific roles in ROP is not known.
A few studies investigated the role of vitamin C in ROP or retinal angiogenesis. An in vitro study showed that vitamin C prevented VEGF-induced increases in endothelial permeability 46 , and retinal level of vitamin C was reduced in the rat model of ROP 47 . However, a randomized controlled trial in 2005 which compared high or low dose supplementation of vitamin C on clinical outcome including ROP showed no significant effects on the development of (any stage) ROP 48 . Further studies to examine the relationships between vitamin C metabolism and ROP is warranted.
This study has several limitations. First, the statistical power was not high because of small sample size (58 severe ROP and 42 mild or no ROP). The small sample size may be one of the reasons why no genes reached genome-wide significance in this study. Second, although we tried to include phenotypic extremes to overcome low statistical power due to small sample size, we could not identify enough phenotypically extreme patients from the i-ROP database. In this study, the severe ROP group included 12 patients with AP-ROP and the remaining patients were selected for highest birthweight in severe ROP group and lowest birthweight in no or mild ROP group. Thus, we believe that our subjects are enriched in phenotypic extremes, which may mean possible enrichment of rare pathogenic variants in our subjects. Third, as a result of the first two limitations, identified association with pathways that are plausible biologically need to be replicated in larger studies. Table 4. Results of rare/low-frequency variants analysis: a list of the most significantly enriched pathways (P value < 0.05) from gene set enrichment analysis using WebGestalt. SNARE soluble N-ethylmaleimidesensitive factor attachment protein receptor, PDGF platelet-derived growth factor, APC/C anaphase-promoting complex, GPCR G-protein-coupled receptor. *By Benjamini-Hochberg procedure. Numbers in bold indicate the smallest adjusted p-values, i.e. most significant pathway enrichments among all rows in the table. Subjects. The subjects for this study were selected from participants of the Imaging and Informatics for ROP (i-ROP) study, a prospective multicenter cohort study which enrolled preterm infants for ROP screening and collected clinical and imaging data (retinal images obtained using a wide-angle fundus camera [RetCam; Natus Medical Incorporated, Pleasanton, CA]) and blood or saliva samples. In this study, the ROP diagnosis of each eye exam was made by combining clinical exam at each study center and image-based diagnoses by 3 trained graders, as previously described 50 . The i-ROP study database was reviewed to identify 100 non-Hispanic Caucasian infants enriched in phenotypic extremes (e.g. AP-ROP, non-LBW infants with severe ROP, LBW infants with no ROP) to maximize variant discovery. Among the 373 non-Hispanic Caucasian infants enrolled between July 2011 and October 2016, patients with AP-ROP in at least one eye were selected first. Remaining cases were selected for lowest birthweight in the case of no or mild ROP (defined as ROP less than type 2 ROP), and highest birthweight in severe ROP (defined as pre-plus, type-2 or type 1 ROP), with an enforced ratio of approximately 4:1:1:4 of no, mild, pre-plus/type-2, and type 1 ROP, respectively. In cases where a patient had a twin in the selected set, the most phenotypically extreme infant of the two was selected. Patients without detailed information on demographics, ROP screening, co-morbidities of prematurity, or imaging data were excluded.
Whole exome sequencing. Genomic DNA was extracted from peripheral blood samples and wholeexome sequencing was performed at Beijing Genomics Institute (BGI; Hong Kong, China). Briefly, genomic DNA was randomly fragmented and enriched for exome sequences using the SureSelect Human All Exon Kit (Agilent Technologies, Santa Clara, CA, USA) and sequencing was performed on a BGISEQ-500 sequencing platform (BGI; Hong Kong, China). Sequencing-derived raw image files were processed by BGISEQ-500 basecalling Software for base-calling and the sequence data of each sample was generated as paired-end reads 51 .

Bioinformatics analyses.
After removing reads containing sequence adaptors and low-quality reads, reads of each sample were aligned to the reference human genome, Genome Reference Consortium Human Build 37 using Burrows-Wheeler Aligner software. Local realignment around insertions and deletions (InDels) and base quality score recalibration were performed using Genome Analysis Toolkit (GATK), with duplicate reads removed by Picard MarkDuplicates. The HaplotypeCaller of GATK was used to call SNPs. After performing hard-filtering for SNPs and Indels as previously described 52  Selection of candidate genes. Previously reported candidate genes in ROP and additional genes involving pathways in retinal angiogenesis/vasculogenesis, neuronal development, neuroprotection, and retinal inflammation were selected (n = 164; Supplemental Table S4). Also, additional genes (n = 99) that are related to the 164 candidate genes were identified using Cytoscape with GeneMANIA plugin which finds related genes using large-scale functional association data including protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity (Supplemental Table S5) 59 .
Gene-based test. Gene and pathway analyses were run on two subsets of data. First, gene-based testing was performed to detect rare and low-frequency variants with relatively large effect. For the test, variants with a MAF of greater than 0.05 in 1000 Genomes Project and ESP6500 were excluded. In gene-based testing, in order to provide sufficient power, all rare variants within the gene are combined for the statistical analysis. Second, for the common variant analysis, variants with a MAF of ≥ 0.05 in 1000 Genomes Project or ESP6500 were selected. In common variant analysis, each variant is analyzed separately. The qualifying variants for gene-based tests included coding nonsynonymous variants, including missense, stop-gain, stop-loss, start loss, and splice acceptor and donor variants that were predicted to be functionally damaging from at least 1 of the 5 prediction algorithms (SIFT, PolyPhen2, MutationAssessor, FATHMM, and MutationTaster). These qualifying variants in individual genes from the two subsets of data (rare / low-frequency variants and common variants) were subject to sequence kernel association optimal unified test (SKAT-O) 60 using the genipe (GENome-wide Imputation PipelinE) v.1.4.0 module 61 with adjustment for BW and GA.