Abstract
Vine copulas have become the standard tool for modelling complex probabilistic dependence. It has been shown that the number of regular vines grows extremely quickly with the number of nodes. Chimera is the first attempt to map the vast space of regular vines. Software for operating with regular vines is available for R, matlab and Python. However, no dataset containing all regular vines is available. Our atlas of regular vines, Chimera, comprises all 24 4 × 4 matrices representing regular vines on 4 nodes, 480 5 × 5 matrices representing regular vines on 5 nodes, 23,040 6 × 6 matrices representing regular vines on 6 nodes, 2,580,480 7 × 7 matrices representing regular vines on 7 nodes and 660,602,880 8 × 8 matrices representing regular vines on 8 nodes. Regular vines in Chimera are classified according to their treeequivalence class. We fit all regular vines to synthetic data to demonstrate the potential of Chimera. Chimera provides thus a tool for researchers to navigate this vast space in an orderly fashion.
Similar content being viewed by others
Background & Summary
Regular vines are graphs (or a sequence of graphs) that facilitate the characterization of complex multidimensional probability distributions. Regular vines used together with bivariate copulas, are the building blocks of multivariate distributions commonly referred to as vine copulas. The first vine copula and non explicitly also the first regular vine was introduced by Joe in 1994^{1} while the first formal definition of regular vines (and vine copulas) was presented by Cooke in 1997^{2}. Only in 2009, were vine copulas presented as statistical models^{3}. Their flexibility has made them become the standard tool for modelling complex multidimensional probability distributions in different fields. Vine copulas add flexibility because they construct a probability distribution from bivariate pieces rather than trying to represent a joint distribution with a particular multidimensional parametric family.
While theoretical developments are still being made, vine copulas on a different number of variables have found application in virtually all fields of science and engineering. Recent example applications can be found in finance, business and economics^{4,5,6,7,8,9,10}, coastal management^{11}, earth sciences^{12,13,14} and engineering^{15,16,17,18,19,20,21,22,23}, where the number of variables in their respective vine copula models ranges from 3 to 10 variables. In a recent study by the authors, vine copulas on 6 variables (23,040 models) are fit to two sets of variables including waves, currents and hydrodynamic forces acting on a submerged floating tunnel for its evaluation under different design configurations. In health sciences, the spatial dependence for COVID19 infection rates was modeled with a vine copula of 21 variables^{24}, while a vine copula of 4 variables was implemented to create a secure method to transfer sensitive data without accidental leakages^{25}.
Despite their popularity for modelling multidimensional probability distributions, the use of vine copulas on 6 or more variables relies mostly on heuristics^{26}. This is partly because the nonunique decomposition of the multidimensional probability distributions in bivariate building blocks causes the number of regular vines to grow extremely quickly with the number of variables under consideration. In particular, previous research has shown that the number of regular vines on d nodes is \(\frac{d!}{2}\times {2}^{\left(\begin{array}{c}d2\\ 2\end{array}\right)}\)^{27,28}. Notice that this number for 4 to 8 nodes corresponds already to 663,206,904 regular vines. The heuristics previously mentioned have been poorly tested, to some extent because a dataset containing all regular vines on more than 5 nodes is not available. In fact, an atlas of regular vines in higher dimension would enable brute force testing of all possible regular vine structures (assuming unlimited computational power) paving the way to improved heuristics. Regular vines in 5 nodes have been obtained in the past though permutation per equivalence classes (see for example^{29}). To our knowledge, this method has not been successfully used for more than 5 variables, neither a dataset with regular vine matrices on more than 5 variables is available.
In order to fill this gap, in this paper we introduce our atlas of regular vines from 4 to 8 elements: Chimera. A Chimera is an imaginary creature from Greek mythology that has the head of a lion, mid body of a goat and lower body of a serpent. Like all fantastic creatures, it is made up of “simpler” pieces of other real or imaginary creatures. Trees are the “simpler” pieces that give rise to vines. Regular vines are very much created like the zoology of the fantastic. In order to remind us of this fact our atlas is named Chimera. The data contained in Chimera consists of 663,206,904 matrices representing the regular vines of interest. The objective of this paper is thus to make these matrices available to researchers rather than providing new algorithms for producing them or a new proof of the number of regular vines as a function of the number of nodes. The data is available for R, matlab and Python since software implementations for manipulating vine copulas exist in all 3 languages^{30,31,32}. Finally, we illustrate the potential of Chimera by fitting all vine copulas from 4 to 8 nodes to synthetic data. Along this paper we used the high performance computer DelftBlue^{33} to implement our atlas and fit vine copulas to synthetic data.
Methods
Since our data relates to graphs, we introduce the basic definitions required for characterizing regular vines and representing them as matrices. We assume that the reader is familiar with concepts of graph theory and repeat the most important concepts required for our purpose for completeness.
Definitions
In this section we introduce some basic definitions. A more extended treatment may be found for example in^{30}. A vine is a set of nested trees. A tree is an undirected acyclic graph. More formally, a connected graph T = {N, E} is called a labeled tree with nodes N = {1, 2…, d} and edges E, where E is a subset of pairs of N with no cycle. In this paper the interest is on regular vines.
A regular vine V on d elements (edge or nodes) is a sequence of trees \({T}_{1},\ldots ,{T}_{d1}\) such that: (i) T_{1} is a tree with node set \({N}_{1}=\{1,\ldots ,d\}\) and edge set E_{1}, (ii) For \(j\ge 2\), T_{j} is a tree with node set N_{j} = E_{j−1} and edge set E_{j}, and (iii) For \(j=2,\ldots ,d1\) and \(\{a,b\}\in {E}_{j}\) it must hold that \( a\cap b =1\). Property (iii) is often referred to as the proximity condition which ensures that if there is an edge e connecting a and b in tree T_{j}, \(j\ge 2\), then a and b (which are edges in T_{j−1}) must share a common node in T_{j−1}. Thus, A regular vine on d elements is one in which two edges in tree j are joined by an edge in tree j + 1 only if these edges share a common node in tree j.
For e ∈ E_{j}, \(j\le d1\), the constraint set associated with e is the complete union \({U}_{e}^{* }\) of e, that is, the subset of \({N}_{1}=\{1,\ldots ,d\}\) reachable from e by the membership relation.
For \(j=1,\ldots ,d1\), \(e\in E\) if \(e=\{i,k\}\) then the conditioning set associated with e is \({D}_{e}=\left\{{U}_{i}^{* }\cap {U}_{k}^{* }\right\}\) and the conditioned set associated with e is \(\left\{{C}_{e,i},{C}_{e,k}\right\}=\left\{{U}_{i}^{* }\backslash {D}_{e},{U}_{k}^{* }\backslash {D}_{e}\right\}\). Note that for \(e\in {E}_{1}\), the conditioning set is empty. Note as well that the order of an edge is the cardinality of its conditioning set. For \(e\in {E}_{j}\), \(j\le d1\), \(e=\{i,k\}\) we have \({U}_{e}^{* }={U}_{i}\cup {U}_{k}^{* }\). Thus, nodes of T_{1} reachable from a given edge via the membership relation are elements of the constraint set of that edge. When two edges in T_{j} are joined by an edge in T_{j+1}, the intersection of the respective constraint sets forms the conditioning set. The symmetric difference of the constraint sets is the conditioned set of this edge. Figure 1 presents examples of regular vines on 5 elements. Note that the conditioned and conditioning set are presented as \({C}_{e,i},{C}_{e,k} {D}_{e}\).
Regular vines can be stored as matrices to facilitate their manipulation. The matrix representation was introduced to show that the number of regular vines on d nodes is \(\frac{d!}{2}\times {2}^{\left(\begin{array}{c}d2\\ 2\end{array}\right)}\)^{27}. The matrix representation is used in software implementations in R^{30}, Python^{32} and matlab^{31}. Our data consists precisely of all 24 4 × 4 matrices representing regular vines on 4 nodes, 480 5 × 5 matrices representing regular vines on 5 nodes, 24,030 6 × 6 matrices representing regular vines on 6 nodes, 2,580,480 7 × 7 matrices representing regular vines on 7 nodes and 660,602,880 8 × 8 matrices representing regular vines on 8 nodes.
Since R is by far, the most widely used implementation, we follow the definition provided in^{30} of a regular vine matrix. Let M be an upper triangular matrix with entries m_{i, j} for i≤j. The elements m_{i, j} take values in {1, …, d}. The matrix M is called a regular vine matrix or a matrix representation of a regular vine, if it satisfies the following conditions:

1.
\(\left\{{m}_{1,i},\ldots ,{m}_{i,i}\right\}\subset \left\{{m}_{1,j},\ldots ,{m}_{j,j}\right\}\) for \(1\le i\le j\le d\). This means that, the entries of a specific column are also contained in all columns right of this column.

2.
m_{i,i} ∋ {m_{1, I},…, m_{i−1, i−1}}. This means that, the diagonal entry of a column does not appear in any column further to the left.

3.
For i = 3, …, d and k = 1, …, i−1 there exist (j, l) with \(j\le i\) and \(l\le j\) such that \(\{{m}_{k,i},\{{m}_{1,i},\ldots ,{m}_{k1,i}\}\}=\{{m}_{j,j},\{{m}_{1,j},\ldots ,{m}_{l,j}\}\}\) or \(\{{m}_{k,i},\{{m}_{1,i},\ldots ,{m}_{k1,i}\}\}=\{{m}_{l,j},\{{m}_{1,j},\ldots ,{m}_{l1,j},{m}_{j,j}\}\}\). This last statement means that the elements of M should comply with the proximity condition for regular vines.
The regular vine matrices for the examples in Fig. 1 are:
Where matrix A corresponds to the vine in the top of in Fig. 1, matrix B corresponds to the vine in the middle and matrix C corresponds to the vine at the bottom in Fig. 1. For example, the edges of T_{1} of the first regular vine in Fig. 1 correspond to \(\left\{({a}_{5,5},{a}_{1,5}),({a}_{4,4},{a}_{1,4}),({a}_{3,3},{a}_{1,3}),({a}_{2,2},{a}_{1,2})\right\}=\left\{(1,2),(2,3),(3,4),(4,5)\right\}\). The edges of T_{2} for the same figure correspond to \(\left\{({a}_{5,5},{a}_{2,5} {a}_{1,5}),({a}_{4,4},{a}_{2,4} {a}_{1,4}),({a}_{3,3},{a}_{2,3} {a}_{1,3})=(1,3 2),(2,4 3),(3,5 4)\right\}\). For T_{3}, edges are given by \(\left\{({a}_{5,5},{a}_{3,5} {a}_{2,5},{a}_{1,5}),({a}_{4,4},{a}_{3,4} {a}_{2,4},{a}_{1,4})\right\}=\left\{(1,4 3,2),(2,5 4,3)\right\}\). The single edge of T_{4} for this regular vine is given by \(\left\{({a}_{5,5},{a}_{4,5} {a}_{3,5},{a}_{2,5},{a}_{1,5})\right\}=\left\{(1,5 4,3,2)\right\}\). Chimera stores regular vines as matrices, following the definition of regular vine matrix presented above and exemplified with the first regular vine in Fig. 1 and its representation as regular vine matrix A. More details about how the matrices are presented in Chimera will be shown later in section Data Records.
The first catalogues classifying regular vines are presented in^{27} for up to 7 elements and in^{28} for up to 8 elements. Those catalogues however do not present data corresponding to the regular vine matrices of all vines but only enumerate them. The construction of those catalogues consisted in roughly: i) generate all trees in the first level of the regular vine through Prüfer codes^{34} (see section Technical Validation for a description of Prüfer’s procedure), and ii) construct the line graph (below a definition of line graph) of each tree recursively in the regular vine and find all possible spanning trees of each tree of the regular vine. This procedure warranties the uniqueness of each vine. The procedure followed to construct Chimera is similar to the one presented in^{27} and^{28} except it does not use Prüfer codes. It however still relies on the concept of a line graph.
Given a graph G = (N, E), its line graph L(G) is a graph \(({N}_{\ell },{E}_{\ell })\) such that:

Every \(e\in E\) corresponds to an \({n}_{\ell }\in {N}_{\ell }\) and,

\({n}_{i},{n}_{j}\in {N}_{\ell }\), with \(i\ne j\) are adjacent if and only if their corresponding edges share a common endpoint (“are incident”) in G.
That is, L(G) is the intersection graph of the edges of G, representing each edge by the set of its two endpoints. Notice that by definition, all spanning trees of the line graph will comply with the regularity condition for vines. Line graphs are also known as derived graphs, interchange graphs, adjoin and edge to vertex dual. Harary^{35} notes that the concept of the line graph of a given graph is so natural that it has been rediscovered independently by many authors. The line graphs of the first tree of the regular vines presented in Fig. 1 are shown in Fig. 2. Notice that the first line graph shown in Fig. 2 has only one spanning tree. These type of graphs are usually referred to as “lines” while the line graph of the first tree of the third regular vine shown in Fig. 1 (which is usually referred to as a “star”) is a complete graph (all nodes are adjacent to each other) and hence it has 4^{4−2} = 16 spanning trees.
The steps taken to generate all regular vine matrices contained in Chimera are:

1.
A library of nonisomorphic trees is constructed. Two graphs G = {V, E} and H = {W, F} are isomorphic if there is bijective function f : V→W such that ∀ \({v}_{1},{v}_{2}\in V\), \(\{{v}_{1},{v}_{2}\}\in E\iff \{f({v}_{1}),f({v}_{2})\}\in F\). Loosely speaking, two trees are nonisomorphic if they do not have the same structure. This library constructed for Chimera consists of the 45 trees presented in Table S1 of the supplement. The trees are denoted T4, T5, …, T47, T48. Notice that by labeling these trees through different permutations all possible trees on 4 up to 8 nodes are obtained.

2.
Starting with a complete graph on d nodes (see the definition of a complete graph above), all d^{d−2} labelled trees on d nodes are found by brute force. Arthur Cayley^{36} was the first to note that for every positive integer d, the number of trees on d labeled nodes is d^{d−2}. For any labeled complete graph with d nodes, the number of spanning trees of this graph must be thus d^{d−2}. For example, the line graph at the bottom of Fig. 2 is a complete graph (all nodes share an edge with each other) on 4 nodes. This graph must have 16 labeled spanning trees of which 4 are of the type T5 in Table S1 of the supplement and 12 are of the type T4 in the same table. Once all trees for the first level of the regular vine are found, they are categorized according to their nonisomorphic tree from step 1. For example, T4 in Table S1 of the supplement will have \(\frac{4!}{2}=12\) ways of being labelled. That is, all possible permutations of numbers in 1, 2, 3, 4 divided by 2 to avoid repetitions (for example, a tree 1234 is equal to 4321 hence this permutation must not be double counted). Similarly T5 in Table S1 of the supplement, has 4 possible ways to be labeled assigning the number 1, …, 4 to the node adjacent to all other nodes. These will be used as the trees in the first level of the regular vines.

3.
At this step Prüfer codes are also obtained for each labeled tree. See section technical validation below where Prüfer codes are discussed. Steps 1 to 3 are performed using the Python script geninput.py which is available in the 4TU data repository under the Python data collection^{37}.

4.
For each nonisomorphic tree in step 1, a line graph is constructed for the edges of the tree in the first level of the regular vine, and all spanning trees of this graph are obtained again by brute force. For example, the line graph of T_{1} in the first regular vine of Fig. 1 is the first graph presented in Fig. 2. Notice that this line graph is a tree (a so called line) and has only one spanning tree (which is the graph itself). The line graph of T_{1} of the second regular vine is the second graph in Fig. 2. This graph has 3 spanning trees. The edge sets of these spanning trees are {{(1, 2), (2, 3)}, {(2, 3), (3, 4)}, {(3, 4), (3, 5)}}, {{(1, 2), (2, 3)}, {(2, 3), (3, 5)}, {(3, 5), (3, 4)}} and {{(1, 2), (2, 3)}, {(2, 3), (3, 5)}, {(2, 3), (3, 4)}}. T_{1} of the third regular vine shown in Fig. 1 is a so called star (all edges share a common node which is node 3 in this case). Its line graph is the complete graph shown at the bottom of Fig. 2 which as explained in step 2 above has 4^{2} = 16 spanning trees.

5.
Step 4 is repeated for each tree in each level of the regular vine until the last level of the vine. The results are written as a regular vine matrix if the first tree of the vine corresponds to a line (such as in the first regular vine presented in Fig. 1) or matrices whenever the first tree of the regular vine is not a line. Notice that at this point regular vines are classified according to their treeequivalent class. Two vines are treeequivalent if they share the same nonisomorphic tree in each level of the vine. For example by permuting nodes 4 and 5 in the first regular vine shown in Fig. 1, two distinct regular vines (and hence regular vine matrices) are obtained. However, these fall in the same treeequivalent class. Notice that by permuting nodes 4 and 5 in the second and third regular vines shown in Fig. 1 exactly the same regular vines (and hence regular vine matrices) are obtained. However by permuting nodes 5 and 3 (for example), distinct regular vines within the same tree equivalent class will be obtained respectively. Treeequivalent classes for all regular vines on up to 8 nodes are presented through their tree sequence in Table S2 of the supplement. The number of distinct regular vines (and regular vine matrices) within each tree equivalent class is also shown in the same table.

6.
Finally all regular vines (and consequently their matrix representation) within each tree equivalent class are found through permutation. Steps 4 to 6 are performed using the Python script genmatrix.py which is available in the 4TU data repository under the Python data collection^{37}. This script was specifically modified and implemented for use in the high performance computer DelftBlue^{33} of the Technical University of Delft.
Using all regular vine matrices in Chimera to fit vine copulas to synthetic data
Vine copulas characterize complex multidimensional probability distributions. In realcase applications, the structure of the vine copula (e.g., trees and bivariate dependence) is fitted (and its goodness of fit evaluated) based on available observations. In our case, to illustration the possibilities of Chimera, we fit all vine copulas in 4, 5, 6, 7 and 8 variables to synthetic data. Five synthetic data sets, of 1000 observations each, are generated with regular vines. The details are given in section 2 of the supplement. For example, in section 2.1.1 of the supplement, 1000 samples are generated from a regular vine whose first tree is 2314 (see M_{1}) with bivariate copulas and parameters shown in Tables S3, S4. All 24 vine copulas on 4 variables are fitted to the synthetic data using the 24 regular vine matrices representing regular vines on 4 nodes included in Chimera. The selected fit through a bruteforce procedure, that is, the one with minimum Akaike’s Information Criterion (AIC), is also shown as R_{1} in section 2.1.2 of the supplement. Tables S6, S7 of the supplement show the bivariate copulas and parameters corresponding to R_{1}. Notice that in this case a bruteforce procedure is able to find the regular vine which is used originally to generate the synthetic data. The Python package “pyvinecopulib”^{32} was used.
This process was repeated for synthetic datasets with 5, 6, 7 and 8 variables. Notice that in most cases a bruteforce procedure based on AIC is able to capture the regular vine that generates the synthetic data except for 7 variables where M_{4} ≠ R_{4}. The copulas in each tree are not always captured exactly. However, general characteristics (upper or lower tail dependence for example) of the joint distribution are. Datasets on 4 and 5 variables can be fitted relatively easily (depending on the sample size) in a personal computer with the aid of Chimera. Relatively small samples (300 for example) of a 6 dimensional distribution can be fitted within days in a standard personal computer. In order to fit 7 and 8 dimensional vine copulas to data the DelftBlue supercomputer was used. Notice that the computational time required to fit all vine copula models on 8 elements to the sample, amounts to approximately 12 years (Table S29 in section 3.4 of the supplement). Fitting all vine copula models to 1000 samples of a 7dimensional data set in the DelftBlue super computer is a matter of hours when computing on parallel. The fitting of vine copulas on 8 variables is however more challenging and takes days of parallel computing rather than hours. A more extended discussion of the computational challenges of fitting vine copulas on 7 or more variables is presented in section 3 of the supplement. A box plot showing AIC for all vine copula models that use regular vines (represented by their regular vine matrices) included in Chimera to synthetic data is presented in Fig. 3. An investigation of one of the most commonly used fitting algorithms^{26} for vine copulas on up to 8 nodes using Chimera is the subject of recent research by the authors.
Data Records
Our atlas Chimera is hosted in the 4TU research data repository^{37}. For the different platforms (R, matlab and Python) different files are available. The data containing regular vine matrices was originally created in Python and then transformed to R and matlab formats. The naming convention for the available files is presented in Table 1.
Figure 4 shows a screen shot of file submats_4_T4Matlab.mat. The matlab data in Fig. 4 is a structure array named “MatlabVineArrays”. It contains a total of 12 elements, each with 3 fields. The “Type”, which corresponds to a treeequivalent regular vine class, the regular vine matrix number (“Number”) and the matrix (named “VineMatrix”) itself. The treeequivalent class refers to the tree sequence corresponding to the particular tree in each level of the regular vine. The nonisomorphic trees used in the construction of treeequivalent regular vines included in Chimera are presented in the supplement.
Table S1 of the supplement presents nonisomorphic trees (and their labels) used in the construction of each regular vine included in Chimera. Table S2 of the supplement presents: (i) all treeequivalent classes (using the tree sequence), (ii) the naming convention (with Python extension) and (iii) the number of regular vine matrices included in each treeequivalent class. There are a total of 22 matlab files submat_4_T4Matlab.mat,…, submats_7_T25Matlab.mat which contain all regular vine matrices for regular vines on 4, …, 7 nodes. All together the 22 matlab files occupy ≈40 Mb.
Figure 5 shows a representation of the dataset in R. The data is ordered within lists, the main list is called “RVineArrays” and the nested lists contain the vine matrices (“Matrix”) and their respective tree sequences (“Type”).
For Python, the extension of the file is “pbz2”, because the amount of data increases drastically after 7 nodes (the total size of the Python data is ≈3.9 Gb). The initial ascii files are compressed using the cPickle module in Python and supplied in a digital format. An example Python script is included to retrieve data from binary files (see section Code Availability). Essentially, each matrix available in the file is presented with an index number (“index”), the tree type in the first level of the vine (“mat_type”) and the matrix (“matrix”) to be used within pyvinecopulib^{32} which is the Python library available for operating with regular vines.
Finally, all files available for Python are presented in Table S2 in the supplement. The 660,602,880 8 × 8 matrices representing regular vines on 8 nodes are only available for Python. To construct the regular vine matrices with the methods described above, the high performance cluster (supercomputer) DelftBluePhase1^{33} of the Technical University of Delft was used using parallel processing.
For example, as may be seen in Table S2 in the supplement, for the Python data set, submat_7_T25.pbz2 will contain all regular vine matrices whose first tree corresponds to T25 (shown in Table S1 of the supplement). A total of 22 distinct treeequivalent regular vines (tree sequences) have T25 in the first tree of their treesequence. There are a total of 161,280 regular vine matrices distributed among the 22 treeequivalent classes.
There are 576 times more regular vines on 9 nodes than there are on 8. There are also 737,280 times more regular vines on 10 nodes than there are on 8. It is not clear at the moment to the authors the computational, processing and storage restrictions required to extend Chimera to include regular vine matrices on 9 and 10 elements. It is also unclear at the moment to the authors the feasibility of using an extended catalogue in practice. These will be however subject of future research by the authors and hopefully by other research groups interested in Chimera.
Technical Validation
Notice that the application of the methods described in section Methods warranty the construction of all unique regular vine matrices. The procedure described in section Methods generates labelled trees through brute force. By obtaining Prüfer codes in step 3 of the procedure to generate regular vines we make sure that we have taken into account exactly d^{d−2} labelled trees to construct the regular vines in Chimera.
Prüfer’s procedure is based on the fact that there is a one to one correspondence between the set of trees with d labeled nodes and sequences of integers in {1, …, d} of length d−2. In his paper Prüfer obtains the correspondence by the following procedure: for a given tree, remove the endpoint with the smallest label (other than the root). The endpoints are nodes with degree one in the tree, they are sometimes referred to as leafs. Choose for example d as the root. Choosing any other node as the root would not change the procedure except the labelling of trees. Then, let \({\ell }_{1}\) be the label of the unique node which is adjacent to it. Remove the endpoint and the edge adjacent to it to obtain a tree on d−1 nodes. Repeat the operation with the new tree on d−1 nodes to obtain \({\ell }_{2}\) and so on. The process is terminated when a tree on two nodes has been found. The reader may check that the trees on the first level of the regular vines shown in Fig. 1 have Prüfer codes (2, 3, 4), (2, 3, 3) and (3, 3, 3) respectively.
The catalogues presented in^{27} and^{28} enumerate regular vines though Prüfer codes rather than the brute force procedures described in the Methods section. Notice that the number of regular vine matrices available in Chimera presented per treeequivalence class in Table S2 of the supplement, coincide exactly with the enumeration presented in^{27} and^{28} that was obtained through different procedures. Finally as observed in section Using all regular vine matrices in Chimera to fit vine copulas to synthetic data, all regular vine matrices included in Chimera were used to fit vine copulas to synthetic data using the Python package “pyvinecopulib”^{32} resulting in unique goodness of fit measures based on likelihood such as Akaike’s Information Criterion (AIC).
Code availability
The scrips used to generate regular vine matrices in Python are included in the 4TU data repository under the Python data collection^{37} (see the Methods section). The files containing regular vine matrices on up to 8 nodes for Python are compressed files in pbz2 format. In order to use these files, these need to be decompressed. For future users of the dataset a specific script get_matrices.py is available together with the files in the repository^{37}. This script provides an example, contains subroutines and the Python treeequivalent class definition for each one of the matrices of interest. Roughly, what the get_matrices.py script will do is get the matrices from files in a user specified directory for the specified number of nodes. By default, an array is returned with all matrices as a Python class, containing the treeequivalent class (tree sequence type), index and matrix. For convenience, a user can also specify parts of the dataset based on the treeequivalent class, which relates to the files names of the dataset.
References
Joe, H. Multivariate extremevalue distributions with applications to environmental data. Canadian Journal of Statistics 22, 47–64, https://doi.org/10.2307/3315822 (1994).
Cooke, R. M. Markov and entropy properties of tree and vinesdependent variables. In Proceedings of the ASA Section of Bayesian Statistical Science (American Statistical Association, 1997).
Aas, K., Czado, C., Frigessi, A. & Bakken, H. Paircopula constructions of multiple dependence. Insurance: Mathematics and Economics 44, 182–198, https://doi.org/10.1016/j.insmatheco.2007.02.001 (2009).
Eita, J. & Djemo, C. Quantifying foreign exchange risk in the selected listed sectors of the johannesburg stock exchange: An svevt pairwise copula approach. International Journal of Financial Studies 10, https://doi.org/10.3390/ijfs10020024 (2022).
Li, H., Liu, Z. & Wang, S. Vines climbing higher: Risk management for commodity futures markets using a regular vine copula approach. International Journal of Finance and Economics 27, 2438–2457, https://doi.org/10.1002/ijfe.2280 (2022).
Czado, C. et al. Vine copula based dependence modeling in sustainable finance. The Journal of Finance and Data Science 8, 309–330, https://doi.org/10.1016/j.jfds.2022.11.003 (2022).
Yang, L. & Czado, C. Twopart dvine copula models for longitudinal insurance claim data. Scandinavian Journal of Statistics 49, 1534–1561, https://doi.org/10.1111/sjos.12566 (2022).
Czado, C. & Nagler, T. Vine copula based modeling. Annual Review of Statistics and Its Application 9, 453–477, https://doi.org/10.1146/annurevstatistics040220101153 (2022).
Sahin, Ö. & Czado, C. Vine copula mixture models and clustering for nongaussian data. Econometrics and Statistics 22, 136–158, https://doi.org/10.1016/j.ecosta.2021.08.011 (2022). The 2nd Special issue on Mixture Models.
So, M. K. & Yeung, C. Y. Vinecopula garch model with dynamic conditional dependence. Computational Statistics & Data Analysis 76, 655–671, https://doi.org/10.1016/j.csda.2013.08.008 (2014). CFEnetwork: The Annals of Computational and Financial Econometrics.
Xiao, Z. & Bai, X. Impact of local port disruption on global container trade: An example of stressing testing chinese ports using a dvine copulabased quantile regression. Ocean & Coastal Management 228, 106295, https://doi.org/10.1016/j.ocecoaman.2022.106295 (2022).
Carrera, D., Bandeira, L., Santana, R. & Lozano, J. A. Detection of sand dunes on mars using a regular vinebased classification approach. KnowledgeBased Systems 163, 858–874, https://doi.org/10.1016/j.knosys.2018.10.011 (2019).
Farrokhi, A., Farzin, S. & Mousavi, S.F. Meteorological drought analysis in response to climate change conditions, based on combined fourdimensional vine copulas and data mining (vcdm). Journal of Hydrology 603, 127135, https://doi.org/10.1016/j.jhydrol.2021.127135 (2021).
Kreuzer, A., Dalla Valle, L. & Czado, C. A Bayesian NonLinear State Space Copula Model for Air Pollution in Beijing. Journal of the Royal Statistical Society Series C: Applied Statistics 71, 613–638, https://doi.org/10.1111/rssc.12548 (2022).
Xiao, Q. et al. Reliability analysis of bridge girders based on regular vine gaussian copula model and monitored data. Structures 39, 1063–1073, https://doi.org/10.1016/j.istruc.2022.03.064 (2022).
Liao, Z. & Li, Y. Probabilistic forecasting of windphotovoltaicload power based on temporalspatial correlation modelling of regular vine copuladbn. Dianli Zidonghua Shebei/Electric Power Automation Equipment 42, 113–120, https://doi.org/10.16081/j.epae.202112021 (2022).
Dong, W. et al. Regional wind power probabilistic forecasting based on an improved kernel density estimation, regular vine copulas, and ensemble learning. Energy 238, https://doi.org/10.1016/j.energy.2021.122045 (2022).
Tu, Q. et al. Forecasting scenario generation for multiple wind farms considering timeseries characteristics and spatialtemporal correlation. Journal of Modern Power Systems and Clean Energy 9, 837–848, https://doi.org/10.35833/MPCE.2020.000935 (2021).
Tao, Y., Wang, Y., Wang, D., Ni, L. & Wu, J. A cvine copula framework to predict daily water temperature in the yangtze river. Journal of Hydrology 598, 126430, https://doi.org/10.1016/j.jhydrol.2021.126430 (2021).
Pouliasis, G., TorresAlves, G. A. & MoralesNapoles, O. Stochastic modeling of hydroclimatic processes using vine copulas. Water 13, https://doi.org/10.3390/w13162156 (2021).
TorresAlves, G. A. & MoralesNapoles, O. Reliability analysis of flood defenses: The case of the nezahualcoyotl dike in the aztec city of tenochtitlan. Reliability Engineering & System Safety 203, 107057, https://doi.org/10.1016/j.ress.2020.107057 (2020).
Jäger, W. S. & Napoles, O. M. A vinecopula model for time series of significant wave heights and mean zerocrossing periods in the north sea. ASCEASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering 3, https://doi.org/10.1061/ajrua6.0000917 (2017).
Coblenz, M., Holz, S., Bauer, H.J., Grothe, O. & Koch, R. Modelling Fuel Injector Spray Characteristics in Jet Engines by Using Vine Copulas. Journal of the Royal Statistical Society Series C: Applied Statistics 69, 863–886, https://doi.org/10.1111/rssc.12421 (2020).
D’Urso, P., De Giovanni, L. & Vitale, V. A dvine copulabased quantile regression model with spatial dependence for covid19 infection rate in italy. Spatial Statistics 47, 100586, https://doi.org/10.1016/j.spasta.2021.100586 (2022).
Chu, A. M., Ip, C. Y., Lam, B. S. & So, M. K. Vine copula statistical disclosure control for mixedtype data. Computational Statistics & Data Analysis 176, 107561, https://doi.org/10.1016/j.csda.2022.107561 (2022).
Dissmann, J., Brechmann, E., Czado, C. & Kurowicka, D. Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis 59, 52–69, https://doi.org/10.1016/j.csda.2012.08.010 (2013).
MoralesNápoles, O. Counting vines. In Dependence Modeling: Vine Copula Handbook, 189–218, https://doi.org/10.1142/9789814299886_0009 (2010).
MoralesNápoles, O. Bayesian belief nets and vines in aviation safety and other applications. PhD Thesis, Delft Institute of Applied Mathematics, TU Delft (2010).
Joe, H. Dependence comparisons of vine copulae with four or more variables. In Dependence Modeling, 139–164, https://doi.org/10.1142/9789814299886_0007.
Czado, C. Analyzing dependent data with vine copulas: A practical guide with R. Lecture Notes in Statistics 222, 1–242, https://doi.org/10.1007/9783030137854_1 (2019).
Coblenz, M. Matvines: A vine copula package for matlab. SoftwareX 14, 100700, https://doi.org/10.1016/j.softx.2021.100700 (2021).
Vatter, T. & Nagler, T. Pyvinecopulib 0.6.1, https://vinecopulib.github.io/pyvinecopulib/ (2022).
Delft High Performance Computing Centre (DHPC). DelftBlue Supercomputer (Phase 1). https://www.tudelft.nl/dhpc/ark:/44463/DelftBluePhase1 (2022).
Prüfer, H. Neuer Beweis eines Satzes über Permutationen. Archiv der Mathematischen Physik 27, 742–744 (1918).
Harary, F. Graph Theory (AddisonWesley, Reading, MA, 1969).
Cayley, A. A theorem on trees. Quart. J. Pure Appl. Math. 23, 376–378 (1889).
‘t Hart, M., MoralesNápoles, O., TorresAlves, G. & RajabiBahaabadi, M. Chimera: an atlas of regular vine on up to 8 nodes. 4TU.ResearchData. https://doi.org/10.4121/c17b8790dfd24394854a7d98fd254c15 (2023).
Acknowledgements
C.M.P.H. and G.A.T.A. were partially funded by the Chinese engineering and construction company China Communications Construction Co., Ltd. (CCCC) and is jointly carried out by 8 institutions of universities, scientific research institutes, engineering consulting firms, design and construction companies. The authors are grateful for comments and suggestions of Elisa Ragno that have improved the presentation of our paper.
Author information
Authors and Affiliations
Contributions
O.M.N. Conceived and designed the dataset, resources, implemented software, wrote initial draft, M.R.B. and G.A.T.A. implemented software C.M.P.H. wrote initial draft and implemented software. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
MoralesNápoles, O., RajabiBahaabadi, M., TorresAlves, G.A. et al. Chimera: An atlas of regular vines on up to 8 nodes. Sci Data 10, 337 (2023). https://doi.org/10.1038/s41597023022526
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597023022526