Accurate and Robust Non-rigid Point Set Registration using Student’s-t Mixture Model with Prior Probability Modeling

A new accurate and robust non-rigid point set registration method, named DSMM, is proposed for non-rigid point set registration in the presence of significant amounts of missing correspondences and outliers. The key idea of this algorithm is to consider the relationship between the point sets as random variables and model the prior probabilities via Dirichlet distribution. We assign the various prior probabilities of each point to its correspondences in the Student’s-t mixture model. We later incorporate the local spatial representation of the point sets by representing the posterior probabilities in a linear smoothing filter and get closed-form mixture proportions, leading to a computationally efficient registration algorithm comparing to other Student’s-t mixture model based methods. Finally, by introducing the hidden random variables in the Bayesian framework, we propose a general mixture model family for generalizing the mixture-model-based point set registration, where the existing methods can be considered as members of the proposed family. We evaluate DSMM and other state-of-the-art finite mixture models based point set registration algorithms on both artificial point set and various 2D and 3D point sets, where DSMM demonstrates its statistical accuracy and robustness, outperforming the competing algorithms.

point set registration as an alignment between two distributions. This approach parameterizes the point sets using explicit TPS parameterizations, which is equivalent to a regularization of second order derivatives of the transformation. Their algorithm attempts to align the given two point sets without explicitly estimating the correspondences, leading to a more robust algorithm against degeneration (such as missing correspondences and outliers).
Chui et al. 11 pointed out that the processing of alternative correspondence estimate and transformation in the RPM algorithm is equivalent to the Expectation Maximization (EM) framework for Gaussian mixture model (GMM), in which one point set is considered as GMM centroids and the other one is considered as data 12 . GMM is a well-known mixture model, widely used to formulate non-rigid point set registration as it is a natural and simple way to describe the given point sets. Revow et al. 13 represented the contour-like point sets using splines and modeled them by the probabilistic GMM formulation, where GMM centroids were uniformly positioned along the contours. This algorithm allows non-rigid transformation for point sets. Similar to 9 , Myronenko et al. 14 proposed a robust point set registration framework. Myronenko et al. 15 later introduced the Coherent Point Drift (CPD) algorithm, which enforced the points drift coherently by regularizing the transformation following the Motion Coherence Theory (MCT) 16,17 . The major difference between the two algorithms proposed in 9,15 is that 9 re-parameterizes the transformation via TPS, while 15 re-parameterizes the transformation by using Gaussian radial basis functions (GRBF). However, the CPD algorithm aligns a same mixture proportion for all mixture components and introduce an additional uniform distribution in mixture model for improving robustness against outliers, noise and occlusion 18 . Jian and Vemuri 19 modeled both point sets using GMM and introduced a general robust framework involving the minimization of the L 2 distance between Gaussian mixtures. Tustison et al. 20 also represented point sets by using a GMM with an anisotropy covariance. In addition, features such as mutual information 21 and shape [22][23][24] extracted from images or point sets are incorporated into point set registration. Wang et al. 25 generalized a L 2 divergence and obtained closed-form solutions for registration. Subsequently, Wang et al. 26 used a similar model to simultaneously align multiple point sets. However, it is well known that the GMM-based non-rigid point set algorithms are sensitive to significant amounts of outliers and missing correspondences since they use an additional component to represent the heavy tail of the mixture model 27 .
There are also several algorithms that attempt to align two point sets using the Student's-t mixture model (SMM) to improve the accuracy and robustness against outliers and missing correspondences. SMM has been introduced as an alternative to GMM, providing an effective and non-heuristic mean to handle degradations such as missing correspondences and outliers 28 . It is worth to point out that, mathematically, the Student's-t distribution corresponds to a Gaussian distribution when the degree of freedom (DoF) γ → ∞, making the Gaussian mixture model be a special case of the Student's-t mixture model 27 . The Student's-t mixture model has heavily tails, leading to a natural and elegant model for modeling the given point sets with degradations 29 . Gerogiannis et al. 30,31 proposed a SMM-based rigid point set registration algorithm which was more robust than the GMM-based algorithms. However, it is regretful that the proposed algorithm is limited to rigid point set registration. In previous work, we introduced a SMM-based non-rigid point set registration method (called pSMM in this paper) for contour-like and surface-like point sets 32 , subsequently, we apply it for matching surface-like points 33 . Unfortunately, pSMM utilized EM framework to directly calculate the prior probability, which is a least-square-based method for fitting parameters, whose lack of robustness is well known. Moreover, it is an arduous task to get closed-form solutions for the SMM-based non-rigid point set registration in the EM framework 34 . To overcome this problem, Peel and McLachlan considered 34 SMM as an infinite mixture model of the scaled GMM integral form to get the closed-form solutions in EM framework 35,36 . Liu and Rubin indicated that convergence of estimating parameters of SMM in EM framework is slow, they subsequently extended the EM framework in the form of ECM and ECME algorithms 37,38 . Recently, the Student's-t distribution and the Student's-t mixture model also demonstrate their accuracy and robustness against outliers in various applications, such as data cluster 39,40 , data classification 41 , and image segmentation [42][43][44] . However, the prior distribution of SMM does not depend on the given point sets and the a same mixture proportion is assigned to all data in the existing approaches 29,31,40 . Additionally, the existing point set registration approaches do not take into account the local spatial representation of the input point sets. In order to overcome the lack of local spatial representation, Ma et al. 45 introduced a novel transformation estimation method using L 2 E estimator for building robust sparse and dense correspondences. Some feature descriptors, such as shape context, are utilized for establish rough correspondences in their work. Ma et al. 46 considered point set registration as the estimation of a mixture of density, where the local feature is used to assign the membership probability of the mixture model.
In this paper, we proposed a more accurate and robust non-rigid point set algorithm, called DSMM, by using Dirichlet distribution in the Student's-t mixture model to formulate the various mixture proportion and assign them to corresponding mixture components, instead the same value in the existing methods. Comparing with the existing state-of-the-art point set registration algorithms (include pSMM), the key contributions of our work are: (1) We introduce the idea of considering the mixture component label vector as random variables, which is a major difference from the existing point set registration, where the mixture proportions are considered as discrete labels. We consequently utilize the Dirichlet distribution as a natural model for formulating the mixture proportion in the Student's-t mixture model, and assign various mixture proportion w mn for each observation x m belonging to corresponding component y n . It is worth to point out that the main difference between DSMM and pSMM is that pSMM mathematically use a least-squared method to estimate the prior probabilities, while DSMM utilities an Dirichlet distribution for modeling it, which is detailed in subsection 2.2. (2) We further propose a general mixture model family for point set registration based on the hidden variables in the Bayesian framework, which reveals the relationship of DSMM and the existing methods in subsection 2.3. We consider the Student's-t mixture model as infinite mixture of scaled Gaussian mixture model as Peel and McLachlan did 34 , and subsequently parameterize the hidden variables using Dirichlet distribution. (3) In order to incorporate the local spatial relationship between neighboring points, we further formulate the mixture proportions by the parameters of Dirichlet distribution by representing the posterior probabilities in a linear smoothing filter. The rest of this paper is organized as follows. In the section 2, we present the main idea of the Dirichlet distribution for modeling the mixture proportions of the Student's-t mixture model, and further propose a general mixture model family for point set registration, where DSMM and existing approaches can be considered as its member. Section 3 contains some qualitative and quantitative evaluations on 2D and 3D point sets with outliers and missing correspondences. Finally, we present a discussion in section 4 and a conclusion in section 5.

Method
Student's-t mixture model for registration. In this section, we start with briefly reviewing our previous work on point set registration based on Student's-t mixture model 32 . Let X M×D = (x 1 , … x M ) T denotes a D-dimension point set considered as an observation, Y N×D = (y 1 , … y N ) T denotes the other D-dimension point set. Each point y n is considered as a component of the Student's-t mixture model. The probability density function of the Student's-t mixture model with N components is defined as where w n is a prior probability (mixture proportion) for y n , satisfying the following constraint S(x m |y n ,σ 2 ,γ n) represents a probability density of multivariate Student's-t distribution, which takes the form = (x m− y n ) T (x m− y n )/σ 2 is the Mahalanobis squared distance between x m and y n , and Γ(·) is a Gamma function. In our registration method, each Student's-t distribution S(x m |y n ,σ 2 ,γ n ), which is called a component of the mixture model, has its own parameter set Θ n = {y n ,σ 2 ,γ n } with its component centroid y n , variance σ 2 (or precision 1/σ 2 ) and degree of freedom γ n .
Mathematically, the multivariate Student's-t distribution is equivalent to Gaussian distribution when its γ → ∞. The Student's-t distribution provides a heavy-tailed model for fit the degradations such as data with longer than normal tails, outliers, and missing correspondences.
Prior probability modeling with Dirichlet distribution. The prior probability w n in the Eq. (1) represents the mixture proportions of the n-th component in the mixture model. Unfortunately, in the previous work 14,15,47 , the mixture proportion w n is assigned to all correspondences, which is unreasonable as the observations vary in their locations. Moreover, the existing methods estimate the prior probabilities via a least-squared-based method in the EM framework, leading a well-known under-fitting problem for complex point set registration. Another limitation is that each observation is considered as an independent point to its neighbors. Therefore, these methods do not take into account the spatial correlation between the neighboring points in the decision process. In order to overcome the under-fitting problem and improve the robustness to noise, outliers and occlusion, we introduce Dirichlet distribution for modeling the prior probabilities and assign different prior probabilities between the observations and their correspondences.
Firstly, we rewrite the density function of Student's-t mixture model at an observation x m , which takes the form Specially, the parameter w mn denotes the mixture proportion of the component y n belonging to its correspondence x m .
Secondly, we introduce the hidden variables 27 in the Bayesian approach to model the prior probabilities in our method. In the Bayesian approach, the complete-data vector, which composes of the hidden variables, is given by where the discrete label z n = (z 1n , …, z Mn ) T denotes the component label vector, which defines the relationships between x m and y n (n = 1, …, N; m = 1, …, M). z mn is 1 or 0 depending on whether x m belongs to the n-th component belongsto th component 0 otherwise (6) mn m u 1 , …, u N represent the hidden variables associated with the scaling weights of the covariance of the equivalent Gaussian distributions, which is defined as where f Γ (x) is the Gamma function. According to the Eq. (7), u 1 , …, u N are independent variables if z 1 , …, z N are given. Consequently, x m is a random variable defined as 34 where f N (y n ,σ 2 /u n ) is a Gaussian distribution with the mean y n and the covariance σ 2 /u n . We now focus on the hidden variable z n , which is considered as an independent variable in pSMM. We now consider z n = (z 1n , … z mn ) as a probable label vector and formulate it by Dirichlet distribution and Dirichlet law 44,48,49 for accurately modeling the prior probabilities. Dirichlet distribution is a natural and power method for modeling complex data by varying its parameters. According to 29 , we get the conditional probability of the probability label z n ; and α n = {α 1n , …, α Mn }, satisfying 0 < α mn < 1, is the vector of the Dirichlet parameters. p(z n |ξ n ) and p(ξ n |α n ) take the form of Combining the Eqs (9), (10) and (11), the probability label subsequently takes the form According to the property of the probability density function, p(ξ m |α m ) always satisfies the following condition Utilizing the Eq. (13) to rewrite the Eq. (12), we could obtain the probability We now consider the condition of discrete label z mn in the Eq. (6). Considering Γ(x + 1) = xΓ(x), the closed-form solution of prior probability w mn is finally given by However, the components in the mixture model are still assumed to be independent identically distributed, which brings an attendant trouble that there is no neighborhood information for registration process since x n is considered as an independent point to its neighbors. In order to solve the problem, we constraint the Dirichlet distribution with local spatial representation via defining parameter α mn of the Dirichlet distribution as 42 N n stands for the number of neighbors locating in the window around the point y n , and y i ∈ ∂y n represents that y i locates in the neighborhood of the given point y n . α is a local spatial constraint coefficient of the Dirichlet distribution. α mn contains the neighborhood information that makes registration has a spatial constraint. Moreover, only a parameter in the EM framework need to be calculated, not M × N parameters α mn in the traditional Student's-t distribution mixture model, leading our method to be a computationally effective algorithm. We finally accurately model the prior probability w mn and incorporate the local spatial constraint in a simple way. Combining the Eqs (15) and (16), w mn gets its closed-form as In order to get a solution of α, we separate w mn from the probability density function (4) and estimate it by minimizing the negative log-likelihood function equivalently. We obtain the iterative solution of α by minimizing E(w mn ), or equivalently solve the following equation Comparing to the mathematical expressions in the MRF method 50 , we find a connection between the proposed Dirichlet-based spatial representation and the MRF method. The energy function U MRF in the MRF method in 50 may degenerate to the prior probability in the Eq. (18) of our method if U MRF is set up as a diagonal matrix with the diagonal as −1, which implies that the Dirichlet distribution models the prior probabilities by using a spatial clustering method. A limitation of the previous methods is that they consider each point is independent to its neighbors, which results to the lack of a spatial correlation between the neighboring points.
The parameter set of non-rigid point set registration is defined as Ψ = (w 1 , …, w n , γ 1 , …, γ n , y 1 , …, y n , σ 2 ), where w n = (w 1n , …, w Mn ) represents the prior probability, whose solution has been discussed above. We subsequently separate parameters of SMM and estimate them by maximizing their log-likelihood, or by minimizing the negative log-likelihood function equivalently for calculating other parameters in the EM method. We now briefly reviews the solution of these parameter, which is detailed in our previous work 32,33 . Firstly, we consider the Eq. (19), u mn can be calculated from the equation The solution of γ n of k-iteration could be obtained by minimizing E(γ n ). The iteration of γ n is given by where G M×M is a Gaussian kernel matrix with it element g ij = exp(−|y i −y j |/(2β) 2 ) in order to reduce the oscillating energy at high frequency. β is a width of smoothing Gaussian filter, defining the model of the smoothness regularization. G(m;) is the column vector of the kernel matrix G M×M , and W M×D is the weight matrix of G M×M . Using ∂E(y n , σ 2 )/∂W = 0, W is given by where P is a M × N matrix with its element = p p u mn mn mn , denoting the posterior probability density corrected by u mn . 1 is a column vector of all ones; I is an identity matrix; diag(·) denotes a diagonal matrix. λ represents the trade-off between the goodness of maximum likelihood fit and regularization. Using ∂E(y n ,σ 2 )/∂(σ 2 ) = 0, σ 2 is formulated as for modeling the prior probabilities, and then assign various mixture proportions (prior probabilities) w mn of n-th component to its m-th correspondence. Rather than taking a point estimate, we model the prior probabilities using Dirichlet distribution, where Dirichlet distribution gives the posterior probability distribution over all model parameters in E-step of (k + 1) iteration by using the observed data together with the prior distributions. Subsequently, we utility these posterior probability distributions to estimate the prior probabilities in M-step. In general, comparing to a least-square-based estimation, the estimate of the prior probabilities via Dirichlet distribution could yield a robust and stable result, by including the resulting uncertainty into the estimation. (2) We incorporate the local spatial relationship between neighboring points into the Dirichlet distribution parameters in a simple and natural way by representing their posterior probabilities in a linear smoothing filter, leading to taking into consideration of the spatial correction in the registration process. Furthermore, it potentially supplies a universal approach to incorporate more ingenious filters for local spatial representation in the mixture model 52 .
In order to summarize the proposed method and theoretically reveal the differences between DSMM and pSMM, we represent the joint distribution of all random variables in our method via a directed graph model, as show in Fig. 1. Moreover, we will further quantitatively estimate performance of DSMM, pSMM and other competitive method in the following experiments, which will more intuitively reflect the power of modeling prior probabilities via Dirichlet distribution.
Family of the mixture-model-based registration. We tooe an interesting observation that the mixture-model-based registration methods (included the proposed method) can be generally modeled as infinite Gaussian mixture models at a single observation x for potential outliers or data with longer than normal tails, which takes the form  where f N is a general symbol for denoting a probability density function of Gaussian.
We now assume that H is a chi-squared distribution with the degree of freedom γ n and its random variable u n~( u|α,β) = α β u α e− βu , where G(u|α,β) is a symbol of Gamma distribution. In our method, we choose α = β = γ n /2. According to 27,34 , it is obvious that we can rewrite Student's-t distribution as an infinite mixture of scaled Gaussian mixture model. Therefore, we conclude that the Student's-t mixture model is a member of the general mixture model family.
We subsequently simplify the infinite mixture to a finite mixture with two different components by placing the mass ε at the point u n = 1 and mass (1−ε) at the point u n = 1/c. The Eq. (25) therefore transforms to a Gaussian scaled mixture that takes the form as where f N (x|y,σ 2 ) denotes the Gaussian distribution with its mean y and variance σ 2 ; ε is a small value, representing the small proportion of observation in the mixture and c is a relatively large value for representing the potential degeneration that has a relatively large variance. In the two components mixture, the first term denotes the probability density of potential degeneration, while the second term denotes the probability density of normal data.
Comparing to the Student's-t mixture model, the major limitation of Gaussian scaled mixture is lack of robustness to degeneration due to its additional Gaussian components to capture the tail of the distribution, as shown in the Eq. (27). We further simplify the Gaussian scaled mixture model. We now assume that φ 1 is a uniform distribution, which is given by φ 1 = N/M; φ 2 is a Gaussian distribution, and simultaneously fix w mn as a constant, satisfying w mn = 1/M. The Eq. (26) finally transforms to ε/N + (1−ε)(f N (x|y,σ 2 ))/M, which takes the same form as CPD. It is obviously to find that CPD is a member of the large family, which is formulated by the Eq. (26). Moreover, it is worthy to point out that RPM-based registration methods, such as RPM-TPS and RPM-RBF are mathematically equivalent to CPD in the EM framework, which leads RPM-based methods to be members of the mixture model family. Theoretically, the discrete latent variable z mn specifies which component of the Student's-t mixture model generates the observation x n , and the continuous latent variable u mn specifies the scaling of the corresponding equivalent Gaussian distribution. Consequently, pSMM will transform to CPD if z mn = 1, u mn = 1, and γ n →∞ simultaneously. The degrees of freedom γ n is a trade-off between robustness and efficiency. A small DoF γ n can appropriately assign a small weight to the outliers or missing correspondences depending on the input data, while a relative larger value of DoF tends to fit a Gaussian mixture model to the data. Actually, the degree of freedom reflects the assumption on the amount of noise in the point sets, which plays an important role in point matching. For the initialization of the degrees of freedom, we always use the value 1 (multivariate Student's-t distribution reduces to Cauchy distribution when γ = 1) to maximize the robustness at the beginning of registration process.
In the existing methods, the major disadvantage is that the parameter z mn is considered as a discrete label z mn = {0,1}. Another limitation of the existing mixture-model-based method is their under-fitting for prior probabilities. It is easily understood by recalling the maximization of prior probabilities in the EM framework. The estimation of prior probabilities in these methods mathematically is a least-square solution, leading to a well-known under-fitting problem. In order to get a more precise model, we consider the label z mn as a random variable following a multinomial distribution with its probability vector ξ n = {ξ 1n , … ξ Mn }. According to the multinomial definition, the conditional distribution takes the form as . When the multinomial distribution is used to generate the correspondences, the distribution of the number of emissions (i.e., counts) of an individual component follows a binomial law 53,54  The above equation reveals that it is a small probability to a point corresponding to multi-component under the multinomial model, since the count of a single point corresponding to components decays exponentially. A better approach is hierarchical: the probabilities of correspondences between point x m and component y n is generated by multinomial, whose parameters are formulated by Dirichlet distribution, which is also called Dirichlet compound multinomial 55 . As discussed in the subsection 2.2, we finally formulate the mixture proportion by using parameters of Dirichlet distribution. Jian et al. 19 revealed the relationship between point set registration methods from the view of the divergence function.
Generally, we generalize a family of mixture-model-based point set registration from the view of hidden variables in the Bayesian framework, and summarize a relationship between DSMM and the existing mixture-model methods in the Table 1.
Data availability statement. All data was obtained from public data collections, including dir-lab (https:// www.dir-lab.com/index.html) and ADNI(http://www.adni-info.org/), all these database allow researches reproduce their images and data.

Results
In this section, we qualitatively and quantitatively evaluate DSMM on various point sets, such as artificial data, points extracted from various medical images, and points form surface scan models. These point sets have various shapes, including 2D contour-like point sets, 3D cloud-like and surface-like point sets. In order to show the performance of our method, we compare DSMM with other state-of-the-art non-rigid point set registration (PR-GLS 46 , pSMM 32 , GMM-L2 19 , CPD 15 , RPM-TPS 9 , and its variety RPM-RBF) in the following evaluations. The performance of DSMM and pSMM will be intuitively shown in these evaluations. It is worth to point out that we directly perform DSMM on all point sets without any preprocessing (including rigid registration initialization), except data normalization. We only simply set β = 2, initial value of DoF γ = 1 in all tests, which is also a reflection of the robustness of DSMM.
Qualitative evaluations. We firstly demonstrate the qualitative evaluation of DSMM on 2D contour-like point sets. Specifically, Fig. 2 shows three examples of 2D contour-like Corpus Callosum (CC), which are from http://www.nitrc.org/. Each point set contains 63 points extracted from outer contour of CC in brain MR images of several normal subjects. The top row in Fig. 2 shows three pairs of Corpus Callosum point sets before registration and figures on bottom row show the performance of DSMM. We add various numbers of additional random outliers with uniform distribution. Examples of such point sets (with additional 32%, 48% and 63% outliers) are respectively shown in the top row of Fig. 3. The middle row shows the final registration results, which demonstrates the data points accurately match to their correspondences, resisting the impaction of the outliers. In order to intuitively show the displacement vector of outliers, we overlap the warp of outlier on the point sets before registration, which demonstrates the transformation maps the most outliers to the sound positions, except few points who are much closed to the data points.
Quantitative evaluations. We perform quantitative evaluations on 2D counter-like datasets, 3D cloud-like and 3D surface-like datasets for DSMM and other competing non-rigid point set registration algorithms. To take quantitative evaluations, we use the mean 3D Euclidean magnitude distance and standard deviation between correspondences as a statistical measure. In the quantitative evaluations, we show the performance of DSMM, PR-GLS 46 , pSMM 32 , GMM-L2 19 , CPD 15 , RPM-TPS 9 , and its variety RPM-RBF. Comparing to the existing methods, the major difference of our method is that DSMM models the prior probabilities by using Dirichlet distribution and assigns the various prior probability values for components, while the existing methods estimate a prior probability by a least-squared solution. PR-GLS assigns the membership probability w mn based on shape context feature, so that the local structure information can also be used to achieve good performance.
We perform the first quantitative evaluation on 2D Chinese characters 46 with deformation, noise, outliers and occlusion (the ratio of noise, outliers, and occlusion is from 10 to 50%). Each point set contains 105 normal points. The superimposed points of Chinese character are respectively shown in the top row of Fig. 4. The goal of our experiments are to align the template points (black "+") to their correspondences in the red point set (red "o"). The performance of DSMM seems good, which accurately and robustly matches the correspondences. The registration results are intuitively shown in the bottom of Fig. 4. Figure 5 shows the statistical registration results of DSMM and other completing methods. The y axis of bar in Fig. 5 indicates the mean registration error of each method, where a small error value indicates a good performance. We break the one-to-one correspondence by add noise, outliers, and removing points in these datasets. Benefitting from Dirichlet distribution and Student's-t mixture model, the statistical results show that DSMM performs the best results, which are slightly better than PR-GLS and significantly better than other five methods.
The second quantitative evaluation is performed on 20 samples of real 3D cloud-like lung datasets with 10 point sets extracted from thoracic 4D CT images 55,56 and the other 10 point sets extracted from COPD images 56 , Figure 2. Performance of DSMM on 2D Corpus Callosum data. The target points are denoted as red "o" and the template points are denoted as black "+". Each point set contains 63 points respectively. We align the black "+" to the red "o" by using DSMM. The point sets before registration is superimposed in the top row, and the performance of DSMM is shown in the bottom row. shown in Fig. 6. Each sample has a pair of 3D lung point set, one is identified from the maximum inhalation phase image and the other is identified from the maximum exhalation phase image. Each 3D lung point set respectively has 300 points, which are selected by experts to make the two point sets correspond to each other. It is a herculean task for non-rigid registration algorithms to match cloud-like point sets accurately due to lack of topological structures or geometry structures in such data. Table 2 demonstrates the mean 3D Euclidean distance between correspondences for each point set before registration. Figures 7 and 8 respectively show the performance of Figure 3. Performance of DSMM on 2D Corpus Callosum data with outliers. The data has been break up one-to-one correspondence by adding different number of additional uniform distribution outliers both in the template set and the target set. In the top row, the red "o" represents the data points in the target set, and the black "+" represents the correspondence in the template sets. For clarity, we denote outliers in the target set with red "∇", and the outliers in the template set with black "∆". The figures of middle row show the transformations that map all data points in the template set to their correspondences, resisting the influence of the outliers in the target set. In the bottom row, we overlap the transformations of outliers on the template set on the degenerated points before alignment, demonstrating that our method has an ability to handle most outlier, except few points very closed to the black "+". The results intuitively show that DSMM is accurate and robust against significant number of outliers.  Table 3, we can intuitively see the performance of DSMM is better than other mixture-model-based algorithms on real 3D cloud-like point sets.
In order to evaluate the performance of our method on the various distortion, we then perform the third quantitative evaluation on 4D CT point sets identified from thoracic 4D CT images 56 . Each 4D CT point set consists of six expiratory phases (T00, T10, T20, T30, T40 and T50) and there are 75 points (a subset of the point set containing 300 points) in each sample. The T00 point sets are identified from the maximum inhalation phase images, and the T50 point sets are identified from the maximum exhalation phase images. The T10, T20, T30, and T40 point sets are respectively extracted from the expiratory phase images between the maximum exhalation phase and the maximum exhalation phase. As shown in Fig. 10, the red "o" denotes the point in T00 image, the black + denotes the point in T10~T50 images. We show transformation vectors between correspondences in Fig. 10. Table 4. demonstrates the performance of DSMM on T00 and T50 of each subject.
We further test the ability of our algorithm to handle outliers and missing correspondences in the subsequent evaluation on the point sets from 4D CT images. In order to break up the one-to-one correspondence between the given point sets and add missing correspondences, we randomly delete the increasing number of points both in the target point sets and template points sets, as shown in the top row of Fig. 11. In the first subfigure, we do not delete any point, while in other subfigures, we respectively remove 15, 30, 45, 60, and 75 points both in the target set and the template set, which means only 270, 240, 210, 180, and 150 correspondences existing in figure (b)~(f). In order to explicitly reveal the outliers, we use red "∇" for denoting the outliers in the target set, whose correspondences having been removed in the template sets, and use black "∆" for denoting the outliers in the template set. Subsequently, we test DSMM and other algorithms on these pairs of incomplete samples. Figure 11 shows the performance of our method on these incomplete data. For clarity, we only show the correspondences in the result subfigures, which clearly shows that only few points diverge from the ground truth even though 75 points are removed in the data sets. In the evident from Fig. 11, our method shows its excellent performance in the presence  of significant amounts of missing correspondences and outliers due to the local spatial representations and the prior probability modeling of each component in the mixture model. Figure 12(a-e) respectively show the mean   3D Euclidean magnitude distance between correspondences for different algorithms on the incomplete data sets, which indicates the statistical accuracy and robustness of our method. Finally, we conduct the last quantitative experiment for matching 3D surface-like "wolf " shapes. Each point set typically contains about 5000 points, and there is absence between template point and target points. We show only 1600 points in the top row of Fig. 13 for clarity. In order to evaluate the robustness of DSMM on occlusion and outliers, we remove about 25 percentage of total number of points for representing occlusion, and add about 25 percentage of total number of points for representing outliers, which are respectively shown in the middle and right columns. The figures in top row of Fig. 13 show the superimposed points before registration, and the bottom figures show the matching results of DSMM. Figure 14 shows quantitative comparisons of DSMM and other competitive methods on wolf data, where DSMM performs the best results on ideal data and degeneration data.

Discussion
Point set registration is a key problem in various applications. We focus on the model of point set registration which is a core point that has been received sustaining attentions in the recent years. In this work, we introduce a SMM-based non-rigid point set registration approach, named DSMM, which models the prior probabilities by using Dirichlet distribution and Dirichlet law. The main motion of our method is that we want   (T00, T10, T20, T30, T40 and T50), where T00 are from the maximum inhalation phase, and T50 are from the exhalation phase image. We also denote the points from the maximum inhalation phase with red "o" and points from other phase with black "+". (a) Performance of DSMM on a point set between T00 and T10, (b) T00 and T20, (c) T00 and T30, (d) T00 and T40, (e) T00 and T50.   Table 4. Mean 3D Euclidean magnitude distance (unit: mm) between correspondence of T00 and T50 by using DSMM. Figure 11. Performance of DSMM on the incomplete 3D lung data pairs identified from 4D CT images. We randomly remove increasing number of points both in the target point set and the template point set, which breaks up the one-to-one correspondence between the given data. For clarity, the red "∇" denote the missing correspondences, whose correspondences are removed in the template set, and the black "∆" denotes the missing correspondences in the template set. Our method shows its stable performance on the missing correspondences and occlusion. The top image in the each subfigure shows the initial configuration of incomplete data, and the bottom ones show the result of our method.  to use a Bayesian framework to estimate the prior probabilities since the existing methods estimate them via a least-square method, which is a well-known method lack of robustness. Fortunately, Dirichlet distribution and its mixture models are fully Bayesian framework, which could automatically determine the model complexity (in terms of the total number of necessary mixture components) based on the data, not depend on any prior knowledge. Concretely speaking, we firstly consider the non-rigid point set registration as a probability density estimation, where one point set is represented as Student's-t mixture model centroids, the other one is represented as data set. The main advantage of multivariate Student's-t distribution is that it is heavily tailed than the Gaussian distribution, hence it is more robust against degradations than GMM. Secondly, we explicitly exploit Dirichlet distribution and Dirichlet law to incorporate the local spatial representation in the given point sets. We later assign various prior probability values of prior distribution depending on the input point sets, instead of the same value to all points, leading DSMM be more accurate than other existing algorithms. Thirdly, we formulate the SMM as an infinite scaled GMM integral form in order to obtain closed-form solutions. Subsequently, we iteratively fit the SMM centroids to the data set by using EM framework and estimate the posterior probabilities of centroids, which provides correspondence probabilities between the target point set and the template set. Finally, we calculate all registration parameters and transformation via the EM framework. We perform qualitative and quantitative evaluations for DSMM on various shapes. These evaluations intuitively indicate the favorable performance of DSMM.