Groupwise image registration based on a total correlation dissimilarity measure for quantitative MRI and dynamic imaging data

The most widespread technique used to register sets of medical images consists of selecting one image as fixed reference, to which all remaining images are successively registered. This pairwise scheme requires one optimization procedure per pair of images to register. Pairwise mutual information is a common dissimilarity measure applied to a large variety of datasets. Alternative methods, called groupwise registrations, have been presented to register two or more images in a single optimization procedure, without the need of a reference image. Given the success of mutual information in pairwise registration, we adapt one of its multivariate versions, called total correlation, in a groupwise context. We justify the choice of total correlation among other multivariate versions of mutual information, and provide full implementation details. The resulting total correlation measure is remarkably close to measures previously proposed by Huizinga et al. based on principal component analysis. Our experiments, performed on five quantitative imaging datasets and on a dynamic CT imaging dataset, show that total correlation yields registration results that are comparable to Huizinga’s methods. Total correlation has the advantage of being theoretically justified, while the measures of Huizinga et al. were designed empirically. Additionally, total correlation offers an alternative to pairwise mutual information on quantitative imaging datasets.

techniques that are based on the concept of mutual information.Though the formulation of mutual information for two images is unique, several multivariate versions exist for its generalization for more than two images.We provide theory about the main multivariate dissimilarity measures based on mutual information, that could be used for the groupwise registration of medical images.These dissimilarity measures are called interaction information 5 , total correlation 6 and dual total correlation 7 .Total correlation is the groupwise dissimilarity measure that we propose to adapt in the context of groupwise image registration.
A preliminary version of our work was presented at a conference 8 .In the present article, we provide full theoretical developments, extensive implementation details, and additional experimental analyses.
Competing state-of-the-art dissimilarity measures for groupwise registration include the sum of variances developed by Metz et al. 9 , the groupwise mutual information method of Bhatia et al. 10 , and the groupwise dissimilarity measures based on principal component analysis (PCA) previously developed by Huizinga et al. 11 .The expression of the total correlation dissimilarity measure that we propose is remarkably close to Huizinga's PCA-based groupwise dissimilarity measures, which were shown to outperform competing pairwise and groupwise state-of-the-art methods on qMRI datasets.The experiments conducted in this article consist of using groupwise total correlation for the registration of a dynamic CT imaging dataset, and of five quantitative magnetic resonance imaging (qMRI) image datasets.Registration results are compared to Huizinga's methods, but also to pairwise registration based on mutual information.

Results
Groupwise registration based on the total correlation dissimilarity measure TC  that we propose in this study is tested on six different types of image datasets, which overall represents 42 subjects.Dynamic series of CT images were acquired for the first type of image dataset, denoted CT-LUNG.The five other types of datasets, denoted T1MOLLI-HEART, T1VFA-CAROTID, ADC-ABDOMEN, DTI-BRAIN, and DCE-ABDOMEN, are qMRI datasets for which multiple MR images were acquired using different acquisition parameters (or at multiple time points after injection of a contrast agent).For these five qMRI datasets, we fitted a qMRI model to the image intensities at each spatial location, and extracted quantitative images: spin-lattice relaxation time (T 1 ) images for T1MOLLI-HEART and T1VFA-CAROTID, apparent diffusion coefficient (ADC) images for ADC-ABDOMEN, mean diffusivity (MD) images for DTI-BRAIN, and transfer constant (K trans ) images for DCE-ABDOMEN.More details on the image datasets are provided in the Experiments section of the present article.

Registration accuracy.
Figure 1 provides a visualization of the image alignment for a CT-LUNG dataset, gathering 10 CT images acquired at different time points from the lung area of a patient.Misalignments due to respiratory motion are visible when no registration is applied between the images (Fig. 1a), while they disappear after applying image registration based on Huizinga's PCA2  (Fig. 1b) or on the total correlation dissimilarity measure  TC proposed in this article (Fig. 1c).Visual differences between the results obtained with PCA2  and TC  are more limited and harder to identify.
For the five qMRI datasets, Fig. 2 provides quantitative parameter images obtained by applying curve fitting to the images before registration, after registration using Huizinga's  PCA2 groupwise dissimilarity measure, and after registration using the total correlation dissimilarity measure  TC proposed in this article.The fitting models used to derive the qMRI images assume that spatial correspondence is ensured between the images used for curve fitting.It is therefore expected that quantitative images obtained after image registration will be more reliable than without image registration 11,12 .Based on Fig. 2, visual differences in the estimates tissue maps are easily noticeable between the case before image registration, on the one hand, and the cases with  PCA2 or  TC , on the other hand.Such differences are particularly visible at organ interfaces.Slighter changes, identified by green arrows, can be identified between the tissue maps obtained with PCA2  and  TC .Full registration accuracy results in terms of landmark/volume correspondence (mTRE or Dice coefficient), registration transformation smoothness (denoted ), and uncertainty estimation (Cramér-Rao lower bound, denoted CRLB), are provided as supplementary material (Tables S1 to S6) for the following dissimilarity measures: pairwise mutual information MI  , Huizinga's dissimilarity measures based on PCA  PCA and  PCA2 , and the total correlation dissimilarity measure proposed in this article TC  .Table 1 presents a partial version of the registration accuracy results, based on the middle value of the control point spacings that were used for the non-rigid B-spline transformation model: 13 mm for CT-LUNG, 64 mm for T1MOLLI-HEART, 16 mm for T1VFA-CAROTID, 64 mm for ADC-ABDOMEN, and 64 mm for  1a.For all dataset, better alignments (i.e.lower mTRE) or overlaps (i.e. higher Dice coefficients) were obtained with the groupwise measures  TC ,  PCA and  PCA2 than with pairwise mutual information MI  , with one exception: the mTRE obtained with PCA2  for the CT-LUNG dataset is higher than the mTRE obtained with  MI .The Dice coefficients and mTRE results are very similar for  TC ,  PCA and  PCA2 .The only case for which  TC performs slightly worse than the two other groupwise measures is on the DCE-ABDOMEN dataset. for the T1MOLLI-HEART and DCE-ABDOMEN datasets, while they are quite similar for T1VFA-CAROTID and DTI-BRAIN, and higher (i.e.worse) for the ADC-ABDOMEN dataset.The 90 th CRLB obtained with TC  is higher than the 90 th CRLB obtained with PCA  and PCA2  for two datasets (ADC-ABDOMEN and DCE-ABDOMEN), while it is similar or better for three datasets (T1MOLLI-HEART, T1VFA-CAROTID, and DTI-BRAIN).The full results (Tables S1 to S6) are consistent with the results presented in Table 1a-c.

Multivariate joint normality.
As detailed in the Method section, the computation of the total correlation dissimilarity measure TC  that we propose is based on the approximation that the intensity distribution of the images to register is multivariate normal.Cumulative distribution functions (CDF) of the squared Mahalanobis distance d 2 , representing the intensity distribution for each of the six dataset types, are plotted in Fig. 3.According to these plots, none of these measure CDF follows the theoretical multivariate normal CDF (χ G 2 distribution), which suggests that the image intensities of the images do not follow a multivariate normal distribution.
Computational efficiency of total correlation  TC .Figure 4 illustrates the evolution of the average time per iteration obtained with groupwise total correlation TC  for three image registration parameters: the number of B-spline control points per image, the number of images G, and the number of spatial samples taken to evaluate the dissimilarity measure.The results show that the average registration time per iteration monotonically increases with each of the considered registration parameter.With the present implementation of TC  and of the registration components of the elastix software used to perform the registrations, the results indicate that the  number of B-spline control points has a limited influence on the average time per iteration as it remains close to 9 seconds for the whole span of numbers of B-spline control points that we considered.The experiments suggest that the number of images G influences the computation time most.For instance, when the number of image is G = 40, the average iteration time is 5 seconds, while this time reaches about two minutes for G = 160 images.In terms of the number of spatial samples, multiplying the number of spatial samples by 4 ends up in an average time per iteration that is multiplied by 6.

Discussion
The focus of this paper was to adapt a multivariate version of mutual information in the context of the groupwise registration of medical images, so that it can be used to register two or more images in one optimization procedure.
Among the main multivariate versions of mutual information, namely interaction information II  , total correlation TC  and dual total correlation  TC , total correlation  TC theoretically allows to quantify the shared information between any subset of the images to register.Besides, the expression of total correlation is particularly straightforward to apply for the registration of G ≥ 2 images, provided that the image intensity distribution is approximated by a multivariate normal distribution.
The expression of the approximated total correlation dissimilarity measure TC  that we devise is remarkably analogous to the expressions of two other dissimilarity measures PCA  and  PCA2 introduced by Huizinga et al. 11 , which were developed based on the intuition that an aligned set of images can be described by a small number of high eigenvalues.The expressions of these dissimilarity measures are all sums of functions of the eigenvalues of the correlation matrix K (compare Equations ( 18), ( 25) and ( 26)).Huizinga et al. 11 had proposed to weigh more the last eigenvalues (the λ i with the highest i indexes) than the first ones (the λ i with the lowest i indexes) so that as much variance as possible is explained by a few large eigenvectors.The form of  TC obtained in this study confirms the intuition of Huizinga et al. 11 , since the natural logarithm in Equation ( 18) also puts more weight on the lower eigenvalues than on the higher ones.
Results obtained on a dynamic imaging dataset and on five multi-parametric datasets show that the total correlation method that we propose yields comparable results as PCA-based methods of Huizinga et al. 11 , and better registration results than pairwise mutual information MI  .The main advantage of TC  with respect to PCA  and PCA2  is that it is more theoretically justified: the contribution of each eigenvalue used to compute TC  is automatically calibrated and follows naturally from the concepts of multivariate mutual information, whereas for PCA  and  PCA2 , the eigenvalue calibration was empirically chosen.Our study shows that even though the intensity distribution of the datasets to register is not multivariate normal (Fig. 3), TC  yields registration results that are better than mutual information and similar to the PCA dissimilarity measures of Huizinga et al. 11 .This is the case for a total of six diverse multi-parametric datasets, which suggests that approximating the intensity distributions, as done in this article, yields optimization minima that result in comparable or better registration accuracies than other state-of-the-art pairwise and groupwise techniques.On multi-parametric datasets, the results suggest that the approximation by a multivariate normal distribution is not detrimental to the registration results.
In the current implementation of the total correlation registration technique, increases in the number of images G have the largest impact on the average time per iteration, which is not surprising as both the amount of image data to register and the number of transformations to estimate scale with a factor G; moreover, estimating the correlation matrix K and its eigenvalue decomposition become increasingly computationally demanding.Further optimizations could improve the scalability of total correlation with respect to the number of images.The computation time also scales linearly with the number of spatial samples.Thanks to the use of the stochastic gradient descent optimization routine, we were able to use a relatively low number (2048) of spatial samples in our experiments, while still achieving accurate registration.
Other possible applications of the total correlation dissimilarity measure proposed in this article include motion tracking in ultrasound image sequences 13,14 , motion compensation in dynamic PET 15 or dynamic contrast-enhanced CT 16 , and for population template construction 17 .Future research should validate the performance of the method in such contexts.

Conclusion
In conclusion, we proposed an implementation of an approximated version of total correlation  TC for which the registration results are comparable to the results obtained with the dissimilarity measures of Huizinga et al. 11 , on multi-parametric datasets.Our results indicate that approximating the intensity distributions by a joint normal distribution for the sake of efficient calculation of the entropy, used to derive total correlation TC  , does not constitute a limitation in the practical application of TC  to quantitative imaging datasets.Total correlation  TC has the advantage of being elegant and theoretically justified, while the dissimilarity measures  PCA and  PCA2 proposed by Huizinga et al. 11 were elaborated empirically.Additionally, groupwise total correlation TC  offers an alternative to pairwise registration based on mutual information on multi-parametric imaging datasets.

Let us consider
, a series of G images that have to be registered.Each image M g , consists of N voxels.To quantify how well the G images are aligned, a dissimilarity measure has to be defined.In this study, we consider dissimilarity measures based on the concepts of mutual information.We choose the convention to formulate the measures as dissimilarity measures instead of similarity measures, so that the registration problem can be written as a cost function minimization problem.

Pairwise mutual information.
Mutual information is a robust measure that is commonly used for the pairwise registration of datasets of medical images, including multimodal datasets 3 .For G = 2 images M 1 and M 2 , the negated mutual information MI  is computed as follows 1,3 : with H(M 1 ) the entropy 18 of image M 1 , H(M 2 ) the entropy of image M 2 , and H(M 1 , M 2 ) the joint entropy of M 1 and M 2 .For two images M 1 and M 2 , the joint entropy can be computed as follows 19 : x x where x 1 and x 2 represent the discrete values of images M 1 and M 2 , respectively.P(x 1 , x 2 ) is the probability of these values occuring together.P(x 1 , x 2 ) ln[P(x 1 , x 2 )] is defined to be 0 if P(x 1 , x 2 ) equals 0.
When the dataset of images to register contains G > 2 images, it is still possible to use a pairwise method to register the images, but several independent registration procedures have to be performed.A typical method consists of selecting one of the images as fixed reference, and then successively applying pairwise registration with the remaining G−1 images considered as moving images (Fig. 5a).This technique is not well suited to registration problems for which there is no obvious reference image.Besides, the registration results may be different according to the choice of fixed reference image, as shown by Geng et al. 4 .Seghers et al. 20 introduced a method that we will refer to as semi-groupwise, which is based on multiple pairwise registrations and does not require the selection of a reference space.For each i, M i is taken as fixed image and G−1 independent registration are performed between each remaining image, M j , yielding G−1 transformations T i→j per fixed image M i .Each image M i is then resampled into an average or mid-point image space using T i 1 − (x), the inverse of the arithmetic mean of the transformations T i→j (Fig. 5b).The method of Seghers et al. 20 has the disadvantage of requiring G × (G−1) registration procedures, which becomes computationally complex when G grows.It also does not allow to register all images in a single optimization procedure.

Groupwise dissimilarity measures based on multivariate mutual information.
Groupwise registration techniques allow to register G ≥ 2 images.In this study, we will focus on groupwise techniques that allow to register all images in one optimization procedure, and that treat the images equally (Fig. 5c).In particular, the order in which the images are supplied should have no influence on the value of the groupwise dissimilarity measure , , )


, and therefore no influence on the registration results.This article more precisely focuses on groupwise generalizations of mutual information, given the wide interest and range of applications of that dissimilarity measure in the context of pairwise image registration 3 .There exist multiple multivariate forms of mutual information [5][6][7] , the concepts of which can be used for groupwise image registration.
The first multivariate generalization of mutual information is known as interaction information 5 , denoted II  .It measures the amount of information shared by all the images.For the G images of , negated interaction information is written: meaning that V can be any subset of images of  (e.g. if G = 3, then V successively represents the following subsets of images: {M 1 }, {M 2 }, {M 3 }, {M 1 , M 2 }, {M 1 , M 3 }, {M 2 , M 3 }, and {M 1 , M 2 , M 3 }), |V| the number of images in the corresponding subset, and H(V) the joint entropy of the subset V.For G images M 1 ...M G , the joint entropy is the generalization of Equation (2): where the x 1 , ..., x G are the values of images M 1 , ..., M G , respectively.The same definitions as for P(x 1 , x 2 ) and P(x 1 , x 2 )ln[P(x 1 , x 2 )] are directly extended for P(x 1 , ..., x G ) and P(x 1 ,..., x G )ln[P(x 1 , ..., x G )]. Interaction information quantifies the information shared together by images M 1 , ..., M G

21
. This means that if at least one of the images of  shares no information with all other images, the interaction information will be zero 21,22 .
The second form of multivariate mutual information, called total correlation 6 , measures the amount of information shared between any subset of .The negated total correlation is written as: with  H( ) the joint entropy of the images of the set . The third form is a refinement of total correlation called dual total correlation 7 , and can be written as: is the conditional entropy 19 of is the entropy of the image M g given the knowledge of images {M 1 , ..., Theoretically, both total correlation and dual total correlation quantify the amount of shared information between all possible combinations of images, while interaction information only quantifies the amount of information shared by all images 23 .Venn diagrams 19,23,24 for  II , TC  and  DTC are shown in Fig. 6.In the context of image registration,  TC and DTC  seem more adapted than II  , as they are built to quantify shared information not only between all images, but also between any of their subsets 21,22 .In particular, including an image with little dependence towards the others would impair the registration of the remaining images when using  II , while this would theoretically not be the case when using  TC or  DTC .We therefore chose to consider the dissimilarity measures based on total correlation in order to construct a groupwise dissimilarity measure.

Groupwise total correlation.
In this section, we describe how total correlation, as expressed in Formula (5), can be brought to practical use in the context of image registration.As such, computing total correlation implies computing the joint entropy  H( ), but this computation is subject to the curse of dimensionality 25 : the evaluation of joint entropy requires to compute a G-dimensional joint histogram that becomes increasingly sparser as G increases, and therefore becomes computationally prohibitive.
Let us consider a random variable ∈  X G following a G-variate normal distribution given by: T 1 G G a covariance matrix, and with det(.) the determinant operator.Ali Ahmed et al. 26 have shown that the entropy of the multivariate normal variable X may be written as: To circumvent the curse of dimensionality, and make it possible to use registration in a groupwise manner on datasets containing any number G ≥ 2 images, we propose to use Equation (8) . For the sake of efficient calculation of the entropy, we approximate the intensity distribution of the images by a joint normal distribution, and we make the hypothesis that the minimum of the resulting cost function is still a good solution for the underlying registration problem.Let M be a N × G matrix in which each image M g is represented by a column.The matrix C of covariances between the images M g is obtained as follows: with M, a matrix that has in each of its column the column-wise average of M. To make the method robust to linear intensity scalings and offsets, we incorporate an intensity standardization (i.e.z-score) within the definition of the dissimilarity measure.This is done by computing the entropy  H( ) using the correlation matrix K instead of the covariance matrix C, with: with Σ a diagonal matrix with the standard deviations of the columns of M as its diagonal elements.A diagonal element Σ gg of Σ verifies: where the M g,i are the individual voxel values and M g the average voxel value of image M g .By construction, each diagonal element of the correlation matrix K is equal to 1.The expression of the joint entropy therefore becomes: Equation ( 12) can also be used to derive the marginal entropies H(M g ).When considering only one image M g , the correlation matrix K is the scalar 1.All H(M g ) are therefore constant and equal to: g By combining Equations ( 5), ( 12) and ( 13), we define the dissimilarity measure based on total correlation  TC as follows:  3), ( 5) and (6).In (c), the dark greay area signifies that its contribution to the dissimilarity measure is twice as high as the contribution of each light-grey area.

Gradient-based optimization and implementation.
To implement the approximated version of  TC provided in Equation ( 14), we define an interpolation scheme based on B-splines.This scheme associates with each original image M g a continuous and differentiable function M g (x) of the spatial coordinate x.The aim is to simultaneously bring the images M g (x) to an average space by means of a transformation T(x, μ), where μ is a vector containing the parameters μ g that correspond to the transformation T g (x, μ g ) related to each image M g .Examples of transformation models are the affine model, or the non-rigid model in which deformations are modeled by cubic B-splines 27 .
In the groupwise scheme, the measure  quantifies the dissimilarity between all transformed images M g (T g (x, μ g )).We adopted the pull-back definition of a warped image.Groupwise registration can therefore be formulated as the constrained minimization of the dissimilarity measure  with respect to μ, as previously proposed by Huizinga et al. 11 : ))) ( 15) subject to the following constraint, allowing to define a mid-point space 28 : The implementation of the total correlation dissimilarity measure TC  was performed as part of the open source software package elastix 29 .The adaptive stochastic gradient descent (ASGD) developed by Klein et al. 30 is used as optimization method for image registration.This method randomly samples positions in the image space at each iteration in order to reduce computation time.Sampling is done off the voxel grid, which was shown to be necessary to reduce interpolation artefacts 29 .A multi-resolution strategy is used: the images are Gaussian-blurred with a certain standard deviation, which is decreased at each resolution level.This means that the large deformations are corrected first, and that finer deformations are corrected in subsequent levels.Linear interpolation is used to interpolate the images during registration, which reduces computation time, but cubic B-spline interpolation was used to produce the final registered images.For the chosen ASGD optimization method, the gradient of the dissimilarity measure is needed.Based on Equation ( 14) and van der Aa et al. 31 , we find: where v j T is the j th eigenvector of K. Similarly to van der Aa et al. 31 , we assume that the repetition of eigenvalues is unlikely.
When the eigenvalues λ j tend towards zero, evaluating  TC implies taking the natural logarithm of a near-zero number (as shown in Equation ( 14)), which might result in a failing optimization.We therefore introduce an adjusting constant c ∈  + that is added to the eigenvalue λ j before taking the natural logarithm: where I is the identity matrix.The gradient of the adjusted total correlation dissimilarity measure therefore becomes: To derive an appropriate value for c, we make the assumption that the first mode, corresponding to λ 1 , accounts for half of the total data variation.Given that the trace of K is equal to the sum of its eigenvalues, we can write that K tr( ) In addition, the diagonal elements of the correlation matrix K are all equal to 1, which induces that The assumption that the first mode accounts for half of the total data variation therefore yields λ 1 = G/2.We then constrain the ratio (λ 1 + c)/(λ G + c) to G, so that the weights 1/(λ i + c) in Equation ( 19) remain within a known, finite range.We also make the assumptions that c ≪ G and that λ G ≪ c.This leads to the solution c = 0.5.In addition to solving a computational issue, the constant c introduces a lower bound on the variance associated with each eigenvector.Initial experiments confirmed that with this choice for c, occasional numerical instabilities were successfully eliminated, while not visibly affecting the results in other cases.
Based on Equation (10), the expression of ∂K/∂μ p in Equation ( 19) becomes: The property of commutativity of the dot product yields: T T T T = with A and B, two matrices and v a vector.Using Equations (19-21), the derivative of TC  with respect to an element μ p becomes:


To obtain ∂Σ −1 /∂μ p , the diagonal elements Σ − gg 1 of the diagonal matrix Σ −1 can be derived one by one: The quantity ∂M/∂μ p is computed as follows: Related groupwise dissimilarity measures.Huizinga et al. 11 previously presented two dissimilarity measures, the expressions of which are close to the total correlation measure presented in the current article.Huizinga's dissimilarity measures are based on PCA and rely on the idea that an aligned set of multi-parametric images can be described by a small number of high eigenvalues, since the underlying model m g is low-dimensional (i.e. the size Γ of θ is lower than G).A misaligned set of multi-parametric images would, on the contrary, be characterized by an eigenvalue spectrum that is more flat: more eigenvalues of average intensity are required for describing the data in that case.
The first dissimilarity measure introduced by Huizinga et al. 11 , denoted PCA  , quantifies the difference between the sum of all eigenvalues and the sum of the first few eigenvalues: with L a user-defined constant with 1≤L≤G, and . This means that PCA  is the sum of the lowest G−L eigenvalues.Contrary to PCA  , the second dissimilarity measure, denoted PCA2  , does not require the selection of an arbitrary cut-off L. It consists of weighting the last eigenvalues more than the first ones: The dissimilarity measures of Huizinga et al. 11 were developed based on different ideas than total correlation:  PCA and PCA2  were developed based on the concepts of PCA, while TC  is a multivariate derivation of mutual information.Nevertheless, the expressions of PCA  and PCA2  , on the one hand, and of TC  , on the other hand, happen to resemble each other quite closely: all of them consists of a sum of functions of the eigenvalues.
The main disadvantage of Huizinga's  PCA with respect to the other techniques is that it requires to choose the cut-off L. In PCA2  , this user-defined constant is avoided, but the weights j in Equation ( 12) are actually still chosen arbitrarily.For the total correlation dissimilarity measure TC  we propose is that the contribution of each eigenvalue follows naturally from the derivation of mutual information.A key asset of TC  is therefore that the influence of each eigenvalue is automatically calibrated, because the expression of the dissimilarity measure is derived from the concept of mutual information.
Implementation codes.The implementation of total correlation will be made available within the open source image registration package elastix, downloadable at the following address: http://elastix.isi.uu.nl.

Experiments
The quantitative imaging datasets previously considered by Huizinga et al. 11 are covered by the more generic term of multi-parametric datasets, i.e. datasets {M 1 , ..., M G } for which the images M g are characterized by an underlying model m g describing their intensity values, such that:  32 .
In particular, Huizinga et al. 11 applied the groupwise dissimilarity measures PCA  and  PCA2 to a variety of multi-parametric datasets, and compared the results with other state-of-the-art techniques: pairwise mutual information MI  , the accumulated pairwise estimates (APE) introduced by Wachinger and Navab 33 , the groupwise sum of variances designed by Metz et al. 9 , and the groupwise mutual information method of Bhatia et al. 10 .Huizinga et al. 11 concluded that their measures  PCA and  PCA2 yielded better or equal registration results with respect to the other tested methods.
The present experiment uses total correlation  TC as groupwise dissimilarity measure for the registration of the same datasets as in Huizinga et al. 11 .On these datasets, the methods of Huizinga et al. 11 were shown to be the best ones, which is why we will compare the registration results of TC  with  PCA and  PCA2 only.The results reported by Huizinga et al. 11 for the other dissimilarity measures are directly comparable with the results reported in the present study.
Description of the six image datasets.The first dataset, denoted CT-LUNG 34 , consists of ten patient subsets containing G = 10 three-dimensional CT images of the thorax.The intensity distribution in this dynamic imaging dataset are analogous in all images, which means that the model m g can be considered as a constant (see Equation ( 27)): it is therefore a particular case of multi-parametric dataset.The second study, denoted T1MOLLI-HEART 35 , consists of nine T 1 -weighted MRI datasets of porcine hearts with transmural myocardial infarction of the lateral wall.G = 11 two-dimensional images were acquired for nine subjects.For each registration case, a voxelwise curve fitting was applied to the registered images, producing quantitative T 1 maps.The third study, denoted T1VFA-CAROTID 36 , involves MRIs of the carotid arteries.G = 5 three-dimensional images were acquired for 8 human patients.For each patient, the images were registered and fitted to obtain quantitative T 1 maps.The fourth study consists of DW-MR images of the abdominal region, and is denoted ADC-ABDOMEN 12 .Five datasets, each of them including G = 19 three-dimensional images, were registered and fitted to produce ADC maps.The fifth study is denoted DTI-BRAIN [37][38][39][40][41] and consists, for each of the five considered datasets, of registering diffusion tensor images (DTI) of the brain.The number of images to register varied between G = 33 and G = 70 for each dataset 11 .The fitted parameter is the mean diffusivity (MD).The sixth study involves DCE images of the abdomen.Five DCE-ABDOMEN 42 datasets were acquired, each of them containing G = 160 three-dimensional images.The fitted parameter of interest considered in this study is K trans .The full descriptions of the fitting models are provided by Huizinga et al. 11 .
All human data used in this study came from anonymized datasets.Data from the CT-LUNG dataset was obtained from a publicly available dataset 34 available at the following address: https://www.dir-lab.com.The ethics committee of the Academisch Medisch Centrum, Amsterdam, the Netherlands, approved the research related to the T1VFA-CAROTID and DCE-ABDOMEN datasets.The Research Ethics Committee of the Royal Marsden Hospital, United Kingdom, approved the research related to the ADC-ABDOMEN dataset.The medical ethics committee for research in humans of the University Medical Center Utrecht, the Netherlands, approved the research performed on the DTI-BRAIN dataset.Informed consent was obtained from all patients in human datasets.Porcine data from the T1MOLLI-HEART dataset were approved by the Animal Ethics Committee of the Erasmus MC Rotterdam, the Netherlands.All studies were carried out in accordance with the relevant guidelines and regulations.
Registration characteristics.We selected the same registration settings as Huizinga et al. 11 , for comparison purposes.The dissimilarity measures were applied in identical environments.Apart from the dissimilarity measure, all other registration settings such as the choice of optimizer, number of resolutions, number of iterations or number of considered samples, were identical.Two resolutions of 1000 iterations were used for all six image datasets.To account for deformations caused by heart-pulsations and breathing, we used a B-spline transformation model for the CT-LUNG, T1MOLLI-HEART, T1VFA-CAROTID, ADC-ABDOMEN and DCE-ABDOMEN datasets.The registrations were performed for three distinct B-spline grid spacings: 32 mm, 64 mm and 128 mm for the T1MOLLI-HEART, ADC-ABDOMEN, DCE-ABDOMEN datasets, 8 mm, 16 mm and 32 mm for the T1VFA-CAROTID dataset, and 6 mm, 13 mm and 20 mm for the CT-LUNG dataset.All results are reported as supplementary material (Tables S1 to S6). Results for the intermediate values of the spacings (i.e.either 64 mm, 16 mm or 13 mm), are reported in the Results section of this article.To account for deformations caused by head motion and eddy current distortions, we used an affine transformation model for the DTI-BRAIN dataset.When applying PCA  , the value of L was 1 for CT-LUNG, 3 for T1MOLLI-HEART, 1 for T1VFA-CAROTID, 4 for ADC-ABDOMEN, 7 for DTI-BRAIN, and 4 for DCE-ABDOMEN.

Evaluation measures.
No ground truth alignment was available for any of the six datasets considered.However, registration performance was evaluated based on four different measures, described in Huizinga et al. 11 , and briefly described in this section.
The first two measures are based on landmark correspondence and overlap of volumes of interest.Landmarks were manually defined on images of the T1VFA-CAROTID and DCE-ABDOMEN datasets.The correspondence between the corresponding landmarks was evaluated by computing a mean target registration error (mTRE).In the T1MOLLI-HEART case, segmentations of the myocardium were outlined on between 6 and 9 images per patient.In the ADC-ABDOMEN case, the spleen was manually delineated on 8 images.For these two cases, the overlap between the segmented structures was then evaluated using a Dice coefficient.For the DTI-BRAIN study, neither landmarks nor structures could be reliably identified on the diffusion weighted images, which is why no overlap or point correspondence was calculated 11 .
The second measure quantifies the smoothness of the transformation obtained through registration.Extreme and non-smooth deformations are unexpected.The smoothness of the deformation field can therefore be used to identify such undesirable transformations.A smoothness quantification can be obtained by computing the standard deviation of the determinant of ∂T g /∂x over all x for all images: g .Smoothness was quantified for all datasets except for DTI-BRAIN because an affine transformation was used in that last case.The smoother the transformation, the lower the quantity g .The last evaluation measure is an uncertainty estimation of the qMRI fit.For the five qMRI datasets, curve fittings were performed to generate quantitative images.The fitted values were evaluated in the myocardium for the T1MOLLI-HEART dataset (T 1 values), in the carotid artery wall for the T1VFA-CAROTID dataset (T 1 values), in the spleen for the ADC-ABDOMEN dataset (ADC values), in the brain parenchyma for the DTI-BRAIN dataset (MD values), and in the pancreas for the DCE-ABDOMEN dataset (K trans values).The qMRI models were fitted using a maximum likelihood estimator that takes into account the Rician characteristic of the noise in MR data.We used the fitting same method as Huizinga et al. 11 , based on the work of Poot et al. 43 .The uncertainty of these fitted qMRI model parameters can be quantified by the 90 th percentile of the square root of Cramér-Rao lower bound (CRLB), which provides a lower bound for the variance of the maximum likelihood parameters.This uncertainty estimate is denoted 90 th CRLB.

Assessment of multivariate joint normality.
As mentioned in the Method section, the computation of the total correlation dissimilarity measure  TC that we propose is based on the approximation that the intensity distribution of the images to register is multivariate normal.For most datasets, however, the intensity distribution is expected not to be multivariate normal.The underlying idea is that the approximated dissimilarity measure will result in the same minimization result as if the approximation had not been done.
A second interest of the experimental setting is therefore to evaluate how multivariate normal the intensity distributions are for the six types of datasets that are registered in this study, and in the light of the registration accuracy results, to assess whether the approximation that we made can be considered as sensible on multi-parametric datasets.
The joint normality of two images can be assessed by computing and visualizing their joint histogram.Assessing joint normality on more images requires other methods.A possible graphical approach to analyze the multivariate joint normality of G images is to compare the distributions of observed Mahalanobis distances with the distribution of a chi-square distribution with G degrees of freedom χ G Computational efficiency of total correlation  TC .To study the computational efficiency of the pro- posed total correlation dissimilarity measure  TC , the average time per iteration is studied by varying three registration parameters: the number of images G that are simultaneously registered, the number of spatial samples taken to evaluate the groupwise dissimilarity measure, and the number of B-spline control points of the transformation model used to warp the images.The influence of these three parameters on the average time per iteration is studied by varying each of them while setting the two remaining ones at values in the range of those described in the Registration characteristics section: • when the number of B-spline control points evolves, the number of images G is set to 50, and the number of spatial samples to 1024.The numbers of B-spline control points per image vary between 50 and 20000; • when the number of images G evolves, the number of B-spline control points is set to 500 per image, and the number of spatial samples to 1024.The numbers of images G cover the characteristics of the images described in the 'Description of the six image datasets' section (i.e.G = 5...160); • when the number of spatial samples evolves, the number of B-spline control points is set to 500 per image, and the number of images G is set to 50.We considered numbers of spatial samples between 16 samples and 8192.

Figure 1 .
Figure 1.Registration results for a CT-LUNG dataset.The images denoted '2' and '3' stack the voxel information of G = 10 images at the locations defined by the dotted lines drawn in the image denoted '1' (vertical line: '2' , horizontal line: '3').

Figure 2 .
Figure 2. Tissue maps generated before image registration (top), after image registration with  PCA2 (middle), and after image registration with TC  (bottom).The fitted values are shown in the myocardium for T1MOLLI-HEART, in the carotid artery wall for T1VFA-ABDOMEN, in the spleen for ADC-ABDOMEN, in the brain parenchyma for DTI-BRAIN, and in the pancreas for DCE-ABDOMEN.Slight visual changes between the tissue maps obtained with PCA2  and  TC are identified by green arrows.

Figure 3 .
Figure 3. Cumulative distribution functions for one subject of the six image datasets (aligned case).The observed CDF (blue) is compared with the theoretical CDF of a chi-square distribution with G degrees of freedom (red).

Figure 4 .
Figure 4. Average time per iteration with respect to the number of B-spline control points per image (a), the number of images G (b), and the number of spatial samples (c).

Figure 5 .
Figure 5. (a) Pairwise registration scheme (the orange frame indicates that this method requires the selection of a reference image), (b) semi-groupwise registration scheme proposed by Seghers et al.20 , and (c) groupwise registration scheme.

Figure 6 .
Figure 6.Venn diagram representations for three images M 1 , M 2 and M 3 .(a) The green, red and cyan circle represent the entropy of each image.The fact that the images share information is symbolized by the fact that these circles overlap.Subfigures (b), (c) and (d) were constructed based on Equations (3), (5) and(6).In (c), the dark greay area signifies that its contribution to the dissimilarity measure is twice as high as the contribution of each light-grey area.
It was verified that the derivative M/ p μ ∂ ∂ of the mean intensities was negligibly small and it was therefore ignored in the implementation.
θ a vector (dimension Γ < G) containing the model parameters, and ε the noise at coordinate x.An example of model is the monoexponential model θ tensor imaging, with θ = (S 0 , D 11 , D 12 , D 13 , D 22 , D 23 , D 33 ), u g the diffusion gradient direction vector, D a 3 × 3 symmetric diffusion tensor, and b the b-value

2 . A squared Mahalanobis distance d i 2 (, 1 . 2 χ
with i = 1...N) can be computed at each voxel location M g (i), by: with y i = [M 1 (i), .., M G (i)] T , the sample mean vector y yN / It has been shown that the sample squared Mahalanobis distance converges to G 2 graphically check whether the distribution of intensities of M is joint normal, we will plot the cumulative distribution function (CDF) of d 2 and χ G 2 in the same graph.If the CDF of the squared Mahalanobis distances d 2 approaches this of G , then we will consider the data as joint normal.

Table 1 .
Registration results.TC,  PCA and  PCA2 yield lower (i.e.better) values of MI .The only case for which  TC performs slightly worse than the two other groupwise measures is on the T1VFA-CAROTID dataset.Table1cprovides uncertainty estimations of the qMRI fit (90 th CRLB).The results indicate that the values of 90 th CRLB are lower (i.e.better) with  TC than with MI g.In all cases, g than  20, and (c) groupwise registration scheme.