Detrended Partial-Cross-Correlation Analysis: A New Method for Analyzing Correlations in Complex System

In this paper, a new method, detrended partial-cross-correlation analysis (DPCCA), is proposed. Based on detrended cross-correlation analysis (DCCA), this method is improved by including partial-correlation technique, which can be applied to quantify the relations of two non-stationary signals (with influences of other signals removed) on different time scales. We illustrate the advantages of this method by performing two numerical tests. Test I shows the advantages of DPCCA in handling non-stationary signals, while Test II reveals the “intrinsic” relations between two considered time series with potential influences of other unconsidered signals removed. To further show the utility of DPCCA in natural complex systems, we provide new evidence on the winter-time Pacific Decadal Oscillation (PDO) and the winter-time Nino3 Sea Surface Temperature Anomaly (Nino3-SSTA) affecting the Summer Rainfall over the middle-lower reaches of the Yangtze River (SRYR). By applying DPCCA, better significant correlations between SRYR and Nino3-SSTA on time scales of 6 ~ 8 years are found over the period 1951 ~ 2012, while significant correlations between SRYR and PDO on time scales of 35 years arise. With these physically explainable results, we have confidence that DPCCA is an useful method in addressing complex systems.

In traditional statistics, one can apply filter methods (including low-pass filter, high-pass filter, and band-pass filter) to discuss correlations of two considered time series on different time scales [13][14][15] . However, the low (high) pass frequency, or the band-width are usually chosen subjectively, which make these simple filter methods not appropriate in performing cross-correlation research over different time scales. Another method, cross-spectral analysis (CSA) 16,17 , may also be useful in discussing connections of two time series on different time scales, but it requires the analyzed data to be stationary with no external trends, which of course are rare in nature. Recently, a new method based on detrended covariance, detrended cross-correlation analysis (DCCA), has been proposed and widely used 18 . DCCA is a modification of the standard covariance analysis, but can be used in the research of non-stationary time series 19 . DCCA is also a generalization of detrended fluctuation analysis (DFA) 20,21 , but can be used to investigate the power-law cross-correlations between two simultaneously recorded time series. By further calculating the DCCA cross-correlation coefficient r DCCA according to the procedure proposed by 22 , where F DCCA is the fluctuation function obtained from DCCA 18 , F DFA is the fluctuation function obtained from DFA 20 , and x 1 i È É , x 2 i È É are the two considered time series, one can quantify the level of crosscorrelations on different time scales. Therefore, during the past few years, signals from various fields such as economics 23 , seismic studies 24 , traffic flows 25 , as well as geophysical systems 26 , have been analyzed by using DCCA and its multifractal version MFDCCA 27 . In this study, we will mainly focus on the DCCA cross-correlation coefficient r DCCA derived from DCCA.
In Figure 2 (top panel), we analyze the relations between SRYR and lead Nino3-SSTA by calculating DCCA cross-correlation coefficient r DCCA . Apparently, SRYR is correlated with Nino3-SSTA on time scale of 5 , 7 years with cross-correlation coefficient larger than 0.3. While on other time scales, the cross-correlations drop to a very low level (around 0.1). This result is in line with our discussion above, and indicates that studying correlations on different time scales is very important for better understanding the whole complex system. However, it should be further noted, that signals observed from a complex system are normally linked via interwoven heterogeneous ties. Quantifing cross-correlations between only two signals may be not sufficient and can provide erroneous results. Especially in the case, when the two signals are both correlated with other signals simultaneously. Such as the relations among SRYR, Nino3-SSTA, and the Pacific Decadal Oscillation (PDO). PDO is a pattern of warm or cold anomalous surface waters in north Pacific (north of 20uN), with inter-decadal time scale of about 30 years 28 . It is a well known fact that both the winter-time Nino3-SSTA and the winter-time PDO index can be considered as important precursor factors of the following summer rainfall over Yangtze River 29,30 . However, since PDO and El Niño are also coupled with each other 31,32 , simple analysis based on either PDO index or Nino3-SSTA may provide biased information. As shown in Figure 2 (bottom panel), we calculated DCCA cross-correlation coefficient r DCCA to study the relations between SRYR and PDO index. The PDO index is downloaded from the National Oceanic & Atmospheric Administration (NOAA, http://www.esrl.noaa.gov/psd/data/climateindices/), with only winter-time data selected. It is obvious that the results from PDO and that from Nino3-SSTA have similar pattern, especially on small time scale of 4 , 8 years (El Niño typical scale) and large time scale of 30 , 45 years (PDO typical scale), which indicates strong coupling between PDO and Nino3-SSTA. When making diagnostic analysis or prediction, one usually prefers to take as many related factors as possible into account to improve the accuracy. Here we argue that, first we need to state to what extent and on which time scales each related factor is independently affecting the system of interest, then further consider how these factors are connected to each other. One way to address this is by applying detrended partial-cross-correlation analysis (DPCCA). DPCCA is based on DCCA, thus can provide information on different time scales. Compared to DCCA crosscorrelation coefficient r DCCA , r DPCCA calculated from DPCCA is further upgraded by combining partial-correlation technique, therefore it is expected to be useful in quantifing correlations of multisignals (not only two signals) in a complex system.
In this report, we will first illustrate the advantages of DPCCA by conducting two numerical tests. Test I shows the advantages of DPCCA in handling non-stationary signals, while Test II illustrate the advantage of DPCCA in revealing ''intrinsic'' relations between two time series of interest, with potential influences of other unconsidered signals removed. Furthermore, the utility of DPCCA is confirmed by revisiting the climatic example mentioned above. Results and discussions are shown in the next sections. In the last part of this report, we will show explicitly how the DPCCA is designed.

Results
Advantages of DPCCA. Since DPCCA is based on the DCCA method but improved by combining the partial cross-correlation analysis (PCCA), it is expected to have the advantages of both methods. Therefore, we will perform two tests to verify the utility of DPCCA as indicated below.
Test I: According to 18 , DCCA is designed to investigate crosscorrelations between two time series with nonstationarity. When nonstationarity such as local trends or periodic background exist, without detrending, there will be crossovers in the fluctuation function F DCCA as a function of time scale 33,34 , and the DCCA crosscorrelation coefficient r DCCA calculated from Eq.(1) will be spuriously high 35 . Fortunately, by choosing an appropriate detrending order, DCCA is able to remove the effects of nonstationarity, and further provide us reliable information on the cross-correlation 36 . Similarly, DPCCA should also have this advantage. Suppose we have three independent and identically distributed (i.i.d) Gaussian vari- (with length of 10,000). They are not related to each other. However, if we generate another two time series: , respectively, the two new generated time series will be correlated, and are both related to x C i È É . By applying the partial cross-correlation analysis to the three time series , the PCCA coefficient between the two variables drops to 0.59 (the red line), which is still significantly higher than zero. If the influence of quadratic trends is removed by DCCA, the coefficient r DCCA further decreases to 0.51 (the yellow line), but still not the expected result. In this case, however, if we apply DPCCA, ''intrinsic'' relation between is finally obtained (the black curve in Figure 3c). In fact, not only for the case when quadratic trends exist, for cases with cubic trends, or even sinusoidal trends, DPCCA still shows reliable and accurate results, as shown in Figure 3d. We show the cases with ''No Trend'', ''Linear Trend'', ''Quadratic Trend'', ''Cubic Trend'', as well as ''Sinusoidal Trend''. By applying DPCCA with appropriate detrend order (see the discussions in 36 , and also in the ''Methods'' section. One can remove the non-stationary effects by substracting local trends with appropriate polynomial order. Normally, DPCCAn means the polynomial order of n), expected results still arise (Black line). While other methods failed, such as PCCA (the red line) and DCCA cross-correlation coefficient r DCCA (the yellow line). Therefore, from this test we confirm that DPCCA has the advantages of DCCA.
Test II: Another advantage of DPCCA should originated from the partial cross-correlation analysis. Compared with the DCCA crosscorrelation coefficient r DCCA , DPCCA can be used to investigate the correlations of multi-signals in a complex system, and find the ''intrinsic'' relations between two considered signals. Suppose we have two independent and identically distributed (i.i.d) Gaussian variables: (with length of 10,000). By adding sinusoidal signals x S1 successfully, and reveal the 100 (days) periodic signal accurately. Therefore, from this test we confirm that DPCCA inherits the advantages of the partial-correlation technique.
Application of DPCCA to natural complex system. Considering natural signals are normally recorded from complex systems, they are usually characterized by non-stationary, and are always correlated with other multi-signals. Therefore, the DPCCA method proposed in this report could be widely used in various fields. In the following, we will further illustrate the utility of DPCCA by revisiting the climatic example we have mentioned in the introduction.
We study how the winter-time Pacific Decadal Oscillation (PDO) and winter-time Nino3 Sea Surface Temperature Anomaly (Nino3-SSTA) affect the Summer Rainfall over the middle-lower reaches of the Yangtze River (SRYR) over the past 60 years. It has been well recognized that the summer rainfall over China is influenced by two main modes of Pacific SST variation: PDO and El Niño (with Nino3- SSTA as an indicator 37 ). Their winter signals are both considered as important precursor factors of the summer rainfall over China 29,30 . However, since PDO and El Niño are also coupled with each other 31,32 (Figure 2), simple predictions based on either PDO or Nino3-SSTA are not entirely reliable. Therefore, we need to reveal the ''intrinsic'' relations between SRYR and PDO, as well as the ''intrinsic'' relations between SRYR and Nino3-SSTA. Figure 5 shows the results, where significant differences between the output of DPCCA and DCCA are presented. For the relations between SRYR and Nino3-SSTA, after removing the influence of PDO, much higher (positive) cross-correlation coefficients r DPCCA over all the time scales are found. Especially on time scales of 5 , 8 years (the gray area), more significant cross-correlations between SRYR and Nino3-SSTA are found (exceeding the 95% confidence level), which corresponds to the typical period of El Niño. As for the relations between SRYR and PDO, after removing the influence of Nino3-SSTA, much lower (negative) cross-correlation coefficients r DPCCA over all time scales are obtained. If we calculate DCCA cross-correlation coefficient r DCCA only, positive correlations between SRYR and PDO on time scales of 6 , 8 years are found, however not significant. After removing the effect of Nino3-SSTA, the positive correlations disappear. Interestingly, on time scales of about 35 years (the grey area), significant (negative) correlations between SRYR and PDO arise (exceeding the 95% confidence level), which corresponds to the typical period of PDO. However, masked by the El Niño, this signal cannot be revealed from r DCCA . From these results it becomes obvious that El Niño has important impacts on SRYR during its typical period (5 , 8 years), while at the multidecadal scale, the SRYR may be modulated by the PDO. This finding is in line with previous studies. In fact, it has been well accepted that during the period of El Niño, a persistent anomalous anticyclone over the Western North Pacific (WNP) can bring a large amount of water vapor to East Asia, which leads to an increase of precipitation over the Yangtze River 38-41 . However, modulated by the locations and strengths of WNP monsoon trough and the WNP subtropical high (WNPSH) 41,42 , which maybe related to the variations of PDO, the effects of El Niño on East Asia can also vary on multidecadal scale. Such as the time before the late 1970s, positive (negative) winter-time Nino3-SSTA usually corresponds to less (more) rainfall over the Yangtze River. Due to a westward expansion of the WNPSH after late 1970s, summer precipitation increased over the Yangtze River 43,44 . Therefore, for better understanding the Summer Rainfall over the middle-lower reaches of the Yangtze River, different mechanisms on different time scales should be considered carefully. From this example, cross-correlation coefficient r DPCCA obtained from DPCCA shows better performance than r DCCA from DCCA.

Discussion
In this report, we proposed a new method, Detrended Partial-Cross-Correlation Analysis (DPCCA), which can be used to diagnose ''intrinsic'' relations of two nonstationary signals (with influences of other signals removed) on different time scales. This method is based on the Detrended Cross-Correlation Analysis (DCCA), but improved by including the Partial-Cross-Correlation Analysis (PCCA), which therefore has the advantages of both DCCA and PCCA. To illustrate the advantages, we made two simple tests in our study. Test I proved DPCCA indeed can provide robust results even when nonlinear trends are mixed in the data we are analyzing, and further show relations between the two considered data on different time scales. While Test II illustrated the ability of DPCCA in investigating correlations when multi-signals are linked via interwoven ties, as shown in Figure 4. In general, DPCCA has better performance in dealing with correlations in complex system. However, when applying it, there are two points that need to be considered.
i) Significance testing. With DPCCA, one can obtain cross-correlations on different time scales. However, to determine whether the calculated correlations are statistically significant, one can not simply apply the student's t-test due to the changing degree of freedom. Normally, Monte-Carlo tests have to be applied to decide whether the obtained cross-correlations are significant on a given time scale 45 ( Figure 5, the blue line). ii) Background assumptions. When applying DPCCA, one has to pay attention to the background assumptions of partial crosscorrelation analysis. That is, the considered multi-signals should have linear relationships with each other. This is the main deficiency of PCCA. However, by using DCCA, we believe this deficiency can be reduced to some extent, since only the relationships on different time scales are discussed, but not on the whole length. Nevertheless, we would like to stress that more advanced analytical methods are needed, especially the methods based on nonlinear frameworks.
We applied DPCCA to a climatic example, that deals with the winter-time Pacific Decadal Oscillation (PDO) and the winter-time Nino3 Sea Surface Temperature Anomaly (Nino3-SSTA) affecting the Summer Rainfall around Yangtze River (SRYR) over past decades. Since PDO has longer variation period (<30 years), it can be considered as a variable background. With PDO controlled, the relations between Nino3-SSTA and SRYR seems to be more apparent on the time scales of 5 , 8 years. Similarly, with Nino3-SSTA controlled, significant relations between PDO and SRYR emerges on time scales of about 35 years. Although, due to possible influences of nonlinear effects, our results may still be problematic, considering traditional Cross-Correlation Analysis is still the main analytical method in various fields, our study still improves our ability in analyzing cross-correlations among multi-variables on different time scales. From the two numerical tests and the climatic example, we can summarize the advantages of DPCCA. i) DPCCA can be used to reveal the ''intrinsic'' relations between two considered variables, by removing the possible influences of other unconsidered signals, ii) DPCCA is appropriate in the research of non-stationary variables, and iii) DPCCA can show the correlation levels on different time scales. Based on these advantages, we are convinced that this method will have extensive application prospects.

Methods
In this section, we will show the details on how the method, DPCCA, is designed.
Suppose we have m time series Each time series can be considered as a random walk, and we can define the so called profile as: where j~1,2,3, Á Á Á ,m, k~1,2,3, Á Á Á ,N. Similar to the procedures in DCCA, one first divide the entire profile into N 2 s overlapping boxes. Each box i contains s 1 1 values, starts at i and ends at i 1 s. In each box i, we can determine the ''local trend'' f P j k,i (i # k # i 1 s) by using a polynomial fit, and further define the ''detrended walk'' as the difference between the original profile and the local trend, as: In this way, we can get one detrended residual series Y where j 1 ,j 2~1 ,2,3, Á Á Á ,m, we can obtain a covariance matrix, Obviously, according to 22 , the cross-correlation levels between any two time series, where the coefficients r DPCCA (j 1 , j 2 ; s) can be used to characterize the ''intrinsic'' relations between the two time series on time scales of s. It is worth to note that we use the word ''intrinsic'' here, is to indicate a condition when the influences of other time series have been removed, or assume a situation that other time series remain unchanged. By changing s, similar to the DCCA cross-correlation coefficient r DCCA , we can further estimate the partial cross-correlation levels on different time scales.