New Statistical Methods for Constructing Robust Differential Correlation Networks to characterize the interactions among microRNAs

The interplay among microRNAs (miRNAs) plays an important role in the developments of complex human diseases. Co-expression networks can characterize the interactions among miRNAs. Differential correlation network is a powerful tool to investigate the differences of co-expression networks between cases and controls. To construct a differential correlation network, the Fisher’s Z-transformation test is usually used. However, the Fisher’s Z-transformation test requires the normality assumption, the violation of which would result in inflated Type I error rate. Several bootstrapping-based improvements for Fisher’s Z test have been proposed. However, these methods are too computationally intensive to be used to construct differential correlation networks for high-throughput genomic data. In this article, we proposed six novel robust equal-correlation tests that are computationally efficient. The systematic simulation studies and a real microRNA data analysis showed that one of the six proposed tests (ST5) overall performed better than other methods.

mean expression in that it can identify changes of co-expression between disease and normal subjects and provide a potential biological interaction when they don't show different mean expression level 8,9 . To the best of our knowledge, no research has constructed differential correlation networks using miRNAs yet.
To construct a differential correlation network, we usually first test for each pair of nodes if the correlation between the pair is the same or not between cases and controls. We then connect a pair of nodes if the difference of the correlations between cases and controls is greater than a threshold. The benchmark statistical method for testing equal correlation of a pair of random variables between two independent populations is the Fisher's Z-transformation test. However, it is sensitive to the violation of the normality assumption. The normal distribution cannot be guaranteed in real data analysis. The violation of the normality assumption would result in inflating type I error rate (i.e., false positive rate).
Some improved tests for equal correlation have been proposed to be robust against the violation of the normality assumption. However, there are some limitations of those methods. For instance, the two methods (twocor and twopcor) proposed by Wilcox 10 are bootstrapping based methods, which are computationally intensive. To construct a gene differential correlation network, we need to test equal correlation between cases and controls for G(G-1)/2 pairs of gene probes, where G is the number of gene probes, which is usually large (~20,000) in whole genome-wide data analysis. Hence, it is not efficient to use bootstrapping-based methods to construct gene differential correlation networks.
ROS-DET (combination of robust correlations and hypothetical testing) proposed by Kayano et al. (2011) only focuses on pairs of genes that have positive correlations in one subject group and negative correlations in another subject group 11 . It ignores the scenarios where two correlations are significantly different but have the same directions. The result of other methods without bootstrapping, such as Zou's method 12 and HC4 (heteroscedastic-consistent estimators) method 13 , can be unsatisfactory, even under normality assumption 14 . Hence, there is a great need to develop a robust and fast test for equal correlation. In this article, we proposed 6 novel tests for equal correlation and performed systematic simulation studies and a real data analysis to compare the performances of these new methods with existing methods. For the real dataset, we construct a robust differential correlation network. Note that the term 'robustness' in this article is different from that in the theory of robust network, in which robustness indicates a network performs well after attacks. In this article, robustness indicates that the method performs well when model assumptions, such as normality assumption, are violated.

Results
simulation studies. We compared the 6 proposed equal-correlation tests (ST1, ST2, ST3, ST4, ST5, ST6) with 4 existing tests (twopcor, twocor, twohc4cor 10 and Fisher's Z-transformation test) using systematic simulation studies, in which we evaluated if the 6 proposed methods could achieve higher power than the 4 existing methods, while keeping the nominal type I error rate (0.05), when random variables X and Z are generated from normal or non-normal distributions. The methods twopcor and twocor are bootstrapping-based methods.
Following Wilcox (2009), we generated the observations of the random variables X and Z from g-and-h-distributions 15 (see Section 3 of Supplementary Document I for the definition of a g-and-h distribution) for cases and controls. In a g-and-h distribution, the parameters g and h are both non-negative. If both g and h are equal to zero, then the g-and-h distribution is the standard normal distribution. A positive value of g indicates a skewed distribution. A positive value of h indicates that the g-and-h distribution has heavier tail than the standard normal distribution. As g becomes larger, the g-and-h distribution would be more asymmetric. As h becomes Figure 1. An illustration of a differential correlation network. In this toy example, the correlations between node 1 and node 5 and between node 5 and node 6 are different between network for controls and network for cases.
Also following Wilcox (2009), we generated observations for a pair of random variables X ad Z. We first generated random numbers X1 and e 1 for cases and X2 and e 2 for controls based on g-and-h distributions. Then we generated random numbers of Z1 and Z2 via the formula z 1 = θ 1 x 1 + λ j (x 1 )e 1 for cases and z 2 = θ 2 x 2 + λ j (x 2 )e 2 for controls, j = 1, 2, 3. The parameters θ 1 and θ 2 indicate the magnitude of correlations between z 1 and x 1 and between z 2 and x 2 , respectively. For example, θ 1 = 0 indicates z 1 and x 1 are uncorrelated. θ 1 > 0 indicates z 1 and x 1 are positively correlated. θ 1 < 0 indicates z 1 and x 1 are negatively correlated. The functions λ j (x), j = 1, 2, 3 indicate the three variance patterns (vp) of z. For vp1, λ 1 ( vp2, the conditional variance of Z given X, would be large if X is close to its mean; With vp3, the conditional variance of Z given X, would be small if is close to its mean. The random error terms e 1 and e 2 have the same distributions as x 1 and x 2 14 . To evaluate the effect of sample size on the performances of the equal-correlation tests, we considered 3 different sample sizes: (1) 30 cases and 30 controls; (2) 100 cases and 100 controls; and (3) 200 cases and 200 controls.
For each scenario, we generated 100 datasets. For each dataset, we generated 1000 pairs of random variables X and Z for cases and controls, respectively. Because bootstrapping-based methods (twopcor and twocor) would cost too much time, we evaluated the performances of twopcor and twocor only in the scenarios where nCases = nControls = 100, g = 0.2, and h = 0.2 (nCases is the number of cases and nControls is the number of controls).
To evaluate the performances of equal-correlation tests, we used Type I error rate and power in simulation studies. Figures 2 and S6 showed that the median Type I error rates of twocor and ST5 did not exceed the nominal level 0.05 in all scenarios in the simulation studies, which shows the excellent robustness of twocor and ST5. www.nature.com/scientificreports www.nature.com/scientificreports/ The median Type I error rates of ST1 and ST2 (when h = 0, θ 1 = 0, θ 2 = 0), and ST6 (when variation pattern is vp3, h = 0, θ 1 = 1, θ 2 = 1 and n = 100 or 200) were just a little bit higher than 0.05 in few scenarios, which shows the symmetry does not affect the type I error rates of ST1, ST2 and ST6, but the heave tail does. For twopcor, the median Type I error rates were higher than 0.05 when the variance pattern was vp1 or vp2. When θ 1 = 1, θ 2 = 1(i.e., correlations between X and Z are non-zero, but the same, in both cases and controls), the median Type I error rates of twohc4cor and ST4 are higher than 0.05; For ST3 the median Type I error rates are higher than 0.05 when θ 1 = 0, θ 2 = 0 (i.e., correlations between X and Z are zero in both cases and controls).
For the power analyses, the median powers of all six methods increase as sample size increases. (see Fig. 3) For all six methods, the powers were smaller when the variation pattern was vp2. Among the methods (ST1, ST2, ST5, ST6 and twocor) that have the median Type I error rates smaller than 0.05 in almost all simulation scenarios, the ST5 and twocor always have the highest power, which is not affected by sample sizes, distributions and variance patterns. The median powers of ST5 were almost equal to those of twocor in all scenarios. ' We summarized the simulation results in Fig. 4 and Table S1. Figure 4 is a plot of the number n reject of scenarios with mean type I error rate significantly >0.05 versus the median rank of power m. For each scenario and each equal-variance test, we used one-sample t-test to test the null hypothesis H 0 that the mean of the 100 estimated type I error rates from 100 simulated data sets is significantly ≤0.05 versus the alternative hypothesis H a that the mean of the 100 estimated type I error rates from 100 simulated data set is significantly >0.05. If the p-value of the one-sample t-test <0.05, we claimed for this scenario and this equal-variance test, the mean type I error rate >0.05. For a given scenario, we ranked in terms of power the equal-variance tests that did not reject the null hypothesis H 0 . For ranks with ties, average ranks were used. For the equal-variance tests that rejected the null hypothesis H 0 , we set their ranks as missing values. Since for each scenario evaluating power (θ 1 = 0, θ 2 = 1), there are two corresponding scenarios evaluating Type I error rate (θ 1 = θ 2 = 0 or θ 1 = θ 2 = 1), we set ranks as missing values if one of the two corresponding scenarios with mean Type I error rate > 0.05. We then obtained the median m of the rank for each equal-variance test. The right panel of Fig. 4 is based on all scenarios. The proposed equal-variance test ST5 located at the left-bottom corner of the left panel of Fig. 4 (n reject = 0, m = 1.75), indicating it had the smallest number of false positive rate and largest power, hence performed best. The methods twocor and twopcor have not appeared in the right panel of Fig. 4 because they are bootstrapping-based methods, which are computationally intensive. We only include them in the scenarios where nCases = nControls = 100, g = 0.2, and h = 0.2. The left panel of Fig. 4 is the plot for these scenarios, from which we can see that twocor performed the best (n reject = 0, m = 1.5) and ST5 performed the second (n reject = 0, m = 2).
Real data analysis. We downloaded the miRNA dataset GSE15008 (https://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=gse15008) from the public data repository Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo) www.nature.com/scientificreports www.nature.com/scientificreports/ to construct differential correlation networks of miRNAs using the 6 equal-correlation tests (3 proposed tests and 3 existing tests). GSE15008 contains 677 miRNAs from 174 non-small-cell lung cancer (NSCLC) tissues and 187 adjacent normal tissues from patients. After data preprocessing, 178 miRNAs were kept for differential correlation analysis (The details about the data preprocessing are shown in Section 4 of Supplementary Documents I). The quantile plots (Fig. S2) and plots of the top 2 principal components (Fig. S3) based on the 178 miRNAs showed no obvious patterns.
Before constructing differential correlation network, we inspected the normality of expression data of the 178 selected miRNAs using Shapiro-Wilk test for cancer tissues and for normal tissues, separately. The normality assumption for majority of microRNAs is violated. Specifically, 80.34% of FDR adjust p-values for testing normality are smaller than 0.05 for cancer tissues, 63.48% of FDR adjust p-values are smaller than 0.05 for normal tissues, and 88.20% of FDR adjust p-values are smaller than 0.05 for either cancer tissues or normal tissues. Please see Table S4 for p-values and adjusted p-values of miRNAs. We also did Wilcoxon signed rank test for each of the 178 selected miRNAs to check if they are differentially expressed between cancer tissues and adjacent normal tissues. Please see their FDR adjusted p-values in Table S5.
We randomly divided the 174 NSCLC samples into two equal parts. One part (87 samples) formed the cases of the discovery set and the other part (87 samples) formed the cases of the validation set. Similarly, we randomly divided the 187 normal samples into roughly two equal parts. One part (94 samples) formed the controls of the discovery set and the other part (93 samples) formed the controls of the validation set.
We compared ST1, ST5, ST6, Fisher's test, twohc4cor, and twocor in the real data analysis. We did not include ST2 because of its low power in the simulation studies. We did not include ST3 and ST4 because they were no better than ST5 and ST6 in the simulation studies. We did not include twopcor because of its long computational time and its low power in the simulation studies.
We claimed that the differential correlation of a pair of miRNAs is validated if its FDR-adjusted p-value < 0.05 in the discovery set and raw p-value < 0.05 in the validation set. We applied Benjamini and Yekutieli method 16 to calculate FDR-adjusted p-values.
We called the network formed by the pairs of miRNAs with validated differential correlations as the validated differential correlation network. We visualize validated differential correlation networks by Cytoscape.
We chose the miRNA having the maximum number of edges in the validated differential correlation network as the hub miRNA. We applied the web-tool miRSystem 17 to predict the genes targeted by the hub miRNA and to obtain the KEGG pathways enriched in these genes.
To evaluate the differential correlation networks in the real data analysis, we used the proportion of the validated edges in the discovery set = r e e 1 2 , where e 1 is the number of validated edges (i.e., the pairs of microRNAs having FDR-adjusted p-values < 0.05 in the discovery set and raw p-values < 0.05 in the validation set). e 2 is the number of edges detected based on only the discovery set (i.e., the pairs of microRNAs having FDR-adjusted p-values < 0.05 in the discovery set).
The numbers e 1 of validated edges obtained by the 6 proposed tests are 0 (ST1), 35 (ST5), 37 (ST6), 71 (Fisher), 0 (twohc4cor) and 440 (twocor). ST6 had the highest validation rate (r = 100.00%), followed by ST5 (r = 94.59%), Fisher (r = 88.75%), and twocor (r = 74.83%). In terms of running time, Fisher is the fastest method among the 6 methods, which took only 2.42 seconds. ST1 is the second fastest method, which took 5.97 seconds. ST5 and ST6 took 35.92 seconds and 37.32 seconds, respectively. Twohc4cor used 89.88 seconds, which is around 2.5 more times than ST5/ST6. Twocor, the bootstrapping-based method, took the longest time: 31429.21 seconds (i.e., 8.73 hours). The results of the differential correlation analyses of the 6 methods for the real dataset GSE15008 are summarized in Table 1. . n reject is the number of scenarios with Type I error rate > 0.05, m is the median ranks of power. We set ranks as missing value if one of the two corresponding Type I error rates (two scenarios with same sample size, variation pattern, g and h) is larger than 0.05. The right panel was obtained based on all scenarios, the left panel was obtained based on scenarios including twopcor and twocor. Figure 5 showed the validated differential correlation networks based on ST5, ST6, Fisher, and twocor. The hubs in the differential correlation networks obtained by ST5 and ST6 are the same, i.e., hsa-miR-143, which targets 531 genes (Table S2) Table S3). The hub detected by Fisher is has-miR-492, which targets 107 genes (Table S2) (Table S3).
Note that microRNAs in the differential correlation network are not necessarily having different mean expression levels between cases and controls (i.e., not necessarily differentially expressed). Hence, constructing differential correlation network can identify disease-associated microRNAs that would be missed in differential expression analysis. For example, in our real data analysis, the microRNA hsa-miR-152 in differential correlation networks detected by ST5 and ST6 is not differentially expressed between cancer tissue and adjacent normal tissue (FDR adjusted p-value of the Wilcoxon signed rank test equals to 0.9997). Please see Fig. 6 (scatter plots of expression levels with hsa-miR-152 and has-miR-145) and Fig. 7 (histograms of has-miR-152) for cases and for controls, separately.  Table 1. Summary of the differential correlation analyses of the 6 methods for the real dataset GSE15008. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
A differential correlation network of miRNAs, characterizing the differences between the co-expression network among cases and that among controls, could help understand how pairs of miRNAs affect the disease of interest. In this article, we proposed six novel robust tests (ST1, ST2, ST3, ST4, ST5, and ST6) for equal correlation and compared them with four existing tests (twocor, twopcor, Fisher's Z-transformation test, and twohc4cor) in the construction of differential correlation networks. Simulation studies showed that ST5 performed as well as twocor (a bootstrapping-based method), and had the highest powers among the tests that kept the nominal type I error rates in all scenarios in our simulation studies. Real data analysis also showed the good performance of ST5. ST5 had the second highest validation rate (r = 94.59%) in the real data analysis, followed by Fisher (r = 88.75%) and twocor (r = 74.83%), while ST6 had the highest validation rate (r = 100%). Furthermore, ST5 is computationally fast. Hence, the proposed equal-correlation test ST5 could be used to construct robust differential correlation networks in genomic data analysis.
Although twocor showed good performance in simulation studies, twocor detected too many differential correlation edges (e 1 = 395, e 2 = 537), but had the smallest validation rate (r = 73.56%) in the real data analysis. Moreover, as a bootstrapping-based method, twocor is computationally intensive. Therefore, twocor probably is not suitable to construct differential correlation networks in genomic data analysis, in which the number of nodes is large.
In the real data analysis, the rank of the proportion of validated edges is ST6, ST5, Fisher, and twocor. ST1 and twohc4cor failed to detect any significant differential correlation based on the discovery sets. It indicates that ST1 and twohc4cor are not powerful methods, which are also shown in the simulation studies. In the validated differential correlation networks of ST5, ST6 and twocor, the top three nodes having the largest number of edges are  www.nature.com/scientificreports www.nature.com/scientificreports/ same (hsa-miR-143, hsa-miR-145, hsa-miR-363). The miRSystem predicted that hsa-miR-143 (the hub detected by ST5 and ST6), has-miR-145, and hsa-miR-363 (the hub detected by twocor) are connected to pathways related to lung cancer, such as GNRH signaling pathway 18 (empirical-p-value is 0.01490), calcium signaling pathway 19 (empirical-p-value is 0.01872), and TGF-BETA signaling pathway 20 (empirical-p-value is 0.02706). However, hsa-miR-492 (detected by Fisher) is not related to lung cancer (Table S3). We surmised the reason why the hub selected by Fisher is not related to lung cancer is the high false positive rates of the Fisher's Z-transformation test.
From Fig. 4, we observed that the power increases as sample size increases in our simulation studies, which matches our intuition. Specifically, in Fig. 4 Figure S6 contains more examples. (You can download Figure S6 at https://sites. google.com/view/weiliangqiu/supplementary-documents/ figure-s6).
In this article, there are a couple of limitations. Firstly, the sample size (174 NSCLC tissues and 187 normal tissues) of the GEO dataset GSE15008 is not very large and we did not find an independent dataset to do validation. Instead, we randomly split the GSE15008 dataset into two sets: discovery set and validation set. Secondly, we could not derive the asymptotic or approximate distribution of the ST5 test statistic yet. Instead, we numerically demonstrated that the distribution of the ST5 test under the null hypothesis could be approximated by the chi square distribution with one degree of freedom. Further research on the asymptotic distribution of ST5 is needed.
Although ST5 combines ST3 and ST4, the degree of freedom of the corresponding chi square test statistic is still one, like ST1, ST2, ST3, and ST4. Although ST6 combines ST3 and ST4, it has 2 degree of freedom. That is, ST5 uses more information, in the meantime it does not increase degree of freedom. We think this is the possible reason why ST5 overall performed better than ST1, ST2, ST3, ST4, and ST6. Further investigations are warranted.
In summary, we proposed 6 robust tests for equal correlation to construct differential correlation networks and found ST5 had overall good performance in both the simulation studies and the real data analysis. However, if the expression data follow normal distribution, Fisher-z test should be used due to high computational efficiency and high detection ability. The ST5 test is highly recommended to construct differential correlation network to characterize the effects of the interactions between genes (not limited to mi-RNAs) on diseases (not limited to lung cancer) when the expression data violates normality assumption for majority of genes, hence to help uncover the molecular mechanisms of complex human diseases.

Methods
To construct a differential correlation network for a set of miRNAs, we first test for each pair of miRNAs if their correlation among cases is the same as that among controls using an equal-correlation test. We then set a criterion to determine if the test is significant or not (i.e., if the pair of miRNAs should be connected by an edge in the differential correlation network or not). To determine if a test is significant or not, we need to control for multiple testing since G(G-1)/2 tests are performed, where G is the total number of miRNAs. We claimed that the differential correlation of a pair of miRNAs is validated if its FDR-adjusted p-value < 0.05 in the discovery set and raw p-value < 0.05 in the validation set.
Novel equal-correlation tests. For given two random variables X and Z, we would like to test if the correlation corr(X, Z) in cases is the same as that in controls. Inspired by the joint test of equal mean and equal variance proposed by Ahn and Wang 21 and the improved Ahn and Wang's equal-variance tests proposed by Qiu et al. 22 , we proposed 6 robust tests (ST1, ST2, ST3, ST4, ST5, ST6) for equal correlation without the normality assumption.

ST1 Test.
Let's consider the following logistic regression: That is, when U I is large, we reject β = H : 0 . is equivalent to test for ρ 1 = ρ 2 . Note that in logistic regression (1), the random variable y i is conditional on the random variable w i I . We denote the score test in (2) as ST1 Test.
ST2 Test. ST2 test is an improved version of ST1 by replacing the sample variances by the square of median absolute deviations (MAD) since MAD is more robust to outliers than standard deviation. Let's consider the following logistic regression:    are complement of each other. So, we combined the test statistics of ST3 and ST4 to form the test statistic of ST5. Specifically, the test statistic of ST5 is the average of T III and T IV : It is challenging to derive the asymptotic distribution of the test statistic T V . We guess that the asymptotic distribution of the test statistic T V is close to the chi square distribution with one degree of freedom χ ( ) 1 2 . To numerically support this guess, we generated a simulated dataset with 50,000 pairs of random variables X and Z for 100 cases and 100 controls from a g-and-h distribution under the null hypothesis H V 0 that the correlation between X and Z in cases is the same as that in controls (see Simulations Section for more details). We then calculated the values of T V for each of the 50,000 pairs of random variables X and Z and drew the histogram of these 50,000 values of T V . Figure 8 showed that the histogram of the test statistic T V is very close to the density of the chi-squared distribution with one degree of freedom. Hence, in this article, we assumed that . Further investigation is warranted. ST6 Test. We can combine ST3 and ST4 based on the following multiple logistic regression: