3D brain tumor segmentation using a two-stage optimal mass transport algorithm

Optimal mass transport (OMT) theory, the goal of which is to move any irregular 3D object (i.e., the brain) without causing significant distortion, is used to preprocess brain tumor datasets for the first time in this paper. The first stage of a two-stage OMT (TSOMT) procedure transforms the brain into a unit solid ball. The second stage transforms the unit ball into a cube, as it is easier to apply a 3D convolutional neural network to rectangular coordinates. Small variations in the local mass-measure stretch ratio among all the brain tumor datasets confirm the robustness of the transform. Additionally, the distortion is kept at a minimum with a reasonable transport cost. The original \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$240 \times 240 \times 155 \times 4$$\end{document}240×240×155×4 dataset is thus reduced to a cube of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$128 \times 128 \times 128 \times 4$$\end{document}128×128×128×4, which is a 76.6% reduction in the total number of voxels, without losing much detail. Three typical U-Nets are trained separately to predict the whole tumor (WT), tumor core (TC), and enhanced tumor (ET) from the cube. An impressive training accuracy of 0.9822 in the WT cube is achieved at 400 epochs. An inverse TSOMT method is applied to the predicted cube to obtain the brain results. The conversion loss from the TSOMT method to the inverse TSOMT method is found to be less than one percent. For training, good Dice scores (0.9781 for the WT, 0.9637 for the TC, and 0.9305 for the ET) can be obtained. Significant improvements in brain tumor detection and the segmentation accuracy are achieved. For testing, postprocessing (rotation) is added to the TSOMT, U-Net prediction, and inverse TSOMT methods for an accuracy improvement of one to two percent. It takes 200 seconds to complete the whole segmentation process on each new brain tumor dataset.

The introduction of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) 1-3 has generated enormous research interest in the image processing and machine learning (ML) community, as has the BRATS challenge, which is held regularly (for a comprehensive review, see 4 ). This dataset, created by a group of distinguished neurologists, has quickly become one of the leading medical datasets due to its openness, robustness and regular maintenance. Accordingly, the automated segmentation of brain tumors, which would greatly benefit patients, has been sought by many researchers. In the early stage, top-rank performers used random forest classification [5][6][7] . Later, a convolutional neural network (CNN), with two layers proposed by Zikic et al. 8 and a CNN with eight layers proposed by Randhawa et al. 9 , gained traction. More sophisticated CNN structures are being developed. Ronneberger et al. 10 first introduced U-Net. Kamnitsas et al. 11 presented an ensemble of multiple models and architectures, which consists of two fully convolutional networks (FCNs) and a U-Net. Isensee et al. 12 have also shown the resiliency of the U-Net model, which is the most widely used model in the BRATS challenge.
It is expected that a 3D CNN produces a higher accuracy than its 2D counterpart because valuable information along the z-axis is taken into consideration 4,13 . However, the GPU memory constraint is the main issue 14 given that the size of each BRATS image is 240 × 240 × 155 × 4 , which represent the x, y, and z coordinates of the brain scan and the 4 scan modalities [fluid-attenuated inversion recovery (FLAIR), T1-weighted (T1), T1-weighted contrast-enhanced (T1CE), and T2-weighted (T2) magnetic resonance imaging (MRI)], respectively. To address the problem of memory shortage in 3D operations, Casamitjana et al. 15 applied a 3D U-Net using data reshaped to 140 × 140 × 140 , which suits the memory limit. However, by reducing its dimensionality, www.nature.com/scientificreports/ detailed information could be lost. Isensee et al. 16 trained their network using a randomly sampled patch size of 128 × 128 × 128 with 16 filters 10 , which are expected to cover entire regions of size 240 × 240 × 155 . This approach has the benefit of retaining all of the information but needs more voxels than the original images. Thus, properly addressing the 3D structure is crucial for the implementation of 3D segmentation. The optimal mass transport (OMT) theory has widely been applied in various fields, such as image processing [17][18][19] , data compression 20 and generative adversarial networks 21 . The OMT problem, a pioneering work that considers minimizing the transportation cost to move a pile of solid mass from one place to another without losing any detail, was first raised by Monge in 1781 (see 22 for details). In 1948, in light of the linear programming technique, Kantorovich 23 relaxed the transport map to a transport scheme/plan by a joint probability distribution and first proved the existence and uniqueness of the OMT problem. Due to their excellent work in the optimal allocation of scare resources, Kantorovich and Koopmans shared the Nobel Prize in Economics in 1975. Then, in 1991, Brenier 24 began studying the characteristics, uniqueness and existence of the OMT map and developed an alternative scheme for solving the OMT problem with a quadratic distance as the cost function for a special class of convex domains. Compared to the Monge-Kantorovich approach, the Monge-Brenier optimization method can significantly reduce the number of unknown variables from O(n 2 ) to O(n), where n is the number of discretized sample points on the target domain and can solve the optimal transport map via the gradient descent method of a special convex function. For the theoretical OMT problem, in the 1990s, Caffarelli 25 solved the regularity condition of the OMT map. Ganbo and McCann 26 proposed the geometric interpretation of the OMT problem, and Evans 27 expressed the OMT problem via partial differential equations (PDEs). A very nice review "Optimal Transport: Old and New" summarizing the contributions of predecessors was published by Villani 28 . For the numerical OMT problem, in 2017, Su et al. 29 proposed a volume-preserving mesh parameterization between a simply connected 3D tetrahedral mesh M with a boundary of the genus zero surface and a unit solid ball B 3 . The related algorithm in 29 was designed to first compute the volumetric harmonic map between M and B 3 and then, based on the newly discovered variational principle 30 , to compute the OMT map via Brenier's approach 24 . However, the first step in 29 for the computation of a harmonic map from M to B 3 is not satisfied with an OMT map. Very recently, in 2019, Yueh et al. 31 proposed a novel algorithm for the volume-preserving parameterization from M to B 3 by minimizing the volumetric stretch energy by modifying the Laplacian matrix with the current volume-stretching factor at each iteration step. However, there are infinitely many volumepreserving maps between M and B 3 . The resulting map by 31 is, in general, not an OMT map from M to B 3 .
In this work, the applicability of moving a solid (brain or any human organ) from one place to another is investigated. A two-stage OMT (TSOMT) procedure is carried out prior to 3D U-Net training and inference.
Step one is to transform the brain into a unit solid ball to ensure the convergence of the OMT. Since implementing a 3D CNN on spherical coordinates is complicated, step two is to transform the unit ball to a cube of size 128 × 128 × 128 . Rather than the threefold increase in the number of voxels in another 3D U-Net approach 16 , we decrease the total number of voxels by 76.6% . It is our belief that with fewer voxels, 3D U-Net training can easily find a local minimum and thus achieve a better performance. After applying the 3D U-Net to this 128 × 128 × 128 cube, the inverse TSOMT returns the dimension to 240 × 240 × 155 . In doing so, without losing much detail, the complexity of the 3D U-Net training is greatly reduced. Task01_BrainTumor, which describes a subset of the data used in the BRATS 2016 and BRATS 2017 challenges, is the Brain Tumor dataset included in the Medical Segmentation Decathlon (MSD) Challenge 32,33 . In this paper, 484 3D brain images from the training set of Task01_BrainTumor are divided into training, validation and testing sets comprising 400, 29 and 55 samples, respectively. Training Dice scores of 0.9781, 0.9637, and 0.9305 and testing Dice scores of 0.9202, 0.8794, and 0.8420 can be achieved at 400 epochs for the whole tumor (WT), tumor core (TC), and enhanced tumor (ET) predictions, respectively. Our contributions. The TSOMT procedure maps irregular 3D brain images to a 3D cube with minimal distortion to facilitate the input format of the ML program. The combination of using the TSOMT method and 3D U-Net in the training step, to the best of our knowledge, greatly surpasses previous methods. The idea here is that the transformation preserves the global features of the data, unlike other previous methods. The main contributions of this paper are summarized as follows.
• The properties of our proposed TSOMT and inverse TSOMT methods are thoroughly examined, including their abilities to control the density, preserve the local mass, and minimize both the transportation cost and the conversion loss, when mapping an irregular 3D brain image into a canonical 3D cube. The density can be enlarged by contrast-enhanced histogram equalization to make the various tumors more detectable by the 3D U-Net algorithm. The small standard deviation (SD) in the local mass ratios of the TSOMT for 484 samples in the MSD Challenge shows the robustness of the transport methods. • An MRI brain image in the MSD Challenge is described by a 240 × 240 × 155 × 4 tensor. In contrast, the TSOMT method uses only a 128 × 128 × 128 × 4 tensor to optimally express all the effective information of a brain image, as a large amount of air in the original raw data is completely excluded by the TSOMT deformation. The selection of a 128 × 128 × 128 cube has three advantages: (i) it matches the physical distance among the voxels of the brain image scanned by MRI; (ii) the conversion loss of a 3D brain image between TSOMT and the 128 × 128 × 128 cube is only 0.5% , which is satisfactory and appropriate for 3D U-Net training; and www.nature.com/scientificreports/ (iii) using the U-Net algorithm to target hundreds of training brain images on a workstation equipped with MATLAB in four NVIDIA Tesla V100S 32 GB GPUs satisfactorily considers the limited memory capacity. • The TSOMT technique differs from the method developed by Isensee et al. 12,16,34 in that a 3D brain image is covered with 16 randomly selected 128 × 128 × 128 cubes. The overlapping tile strategy 10 uses several subvoxels as inputs to achieve the seamless segmentation of arbitrarily large images. If each voxel consumes 1 byte, then the size of the input data for one brain image using the method 12,16,34 would be 32 MB. On the other hand, the TSOMT method transforms each brain image into a 128 × 128 × 128 cube that consumes 2 MB, which is more economical in terms of the input size and frees up considerable capacity with which to increase the augmented data using different resources to achieve more accurate and effective results. • Via the various density settings of the brain image and the rotations of the unit solid ball, each 3D brain image can be used to construct several different augmented tensors. Numerical experiments demonstrate that the Dice scores of the WT, TC, and ET are improved by such augmented data, as shown in Fig. 7. Furthermore, since the TSOMT procedure can skillfully represent the global information of a brain image, we also propose a postprocessing scheme by applying the mirroring and rotation techniques to increase the Dice scores of the WT, TC, and ET. The numerical results demonstrate that this postprocessing scheme can improve the associated Dice scores by one to two percent. • We implement the TSOMT method and the U-Net algorithm on the MSD Challenge dataset and verify the bias (underfit) and variance (overfit) of the learning algorithm, as shown in Fig. 6. The optimal number of epochs is between 45 and 55. Furthermore, numerical experiments show that after 400 epochs, the training and testing Dice scores reach 0.9781 and 0.9202, respectively, for the WT. The TSOMT and 3D U-Net approaches significantly improve the accuracy of brain tumor detection and segmentation. For each testing case, the TSOMT, U-Net inference, inverse TSOMT, and postprocessing steps can be accomplished in fewer than 200 seconds.
The remainder of this paper is organized as follows. "OMT formulation and preprocessing" details the OMT formulation, including problem statements, numerical algorithms, TSOMT maps and their related properties. The detailed numerical results of the TSOMT are shown in "Convergence verification and conversion loss of the TSOMT map". "Training setup" shows the U-Net structure and evaluation criterion. The training and testing results are discussed in "Results and discussion". Finally, "Conclusions" summarizes the results and outlines further research directions.

OMT formulation and preprocessing
The OMT problem. We now state the classical OMT problem raised by Monge 22 . Let (X, µ) and (Y , ν) be two measurable spaces with probability measures µ and ν , respectively, having the same total mass X dµ = Y dν . A map T : X → Y is said to be mass-preserving if µ(A) = ν(T(A)) and µ(T −1 (B)) = ν(B) for all measurable sets A ⊆ X and B ⊆ Y , respectively. Let F µ,ν denote the set of all mass-preserving maps from X to Y. For a given transport cost function κ : X × Y → R , the OMT problem with respect to κ is to find T * ∈ F µ,ν that minimizes the optimization problem Discrete OMT problem. The original OMT problem (1) raised by Monge does not necessarily contain an optimal solution. Kantorovich relaxed the transport map and first proved the existence and uniqueness of the optimal transport map. However, the relaxed approach of Kantorovich may result in a non-bijective map. In practical applications, the OMT map must be a bijective map. Thus, we should return to the original OMT problem to develop an efficient, reliable and robust numerical method. A discrete version of the OMT problem is described as follows. Let (M, ρ) be a 3-manifold with a spherical boundary and a density map ρ on V(M) , where M is a tetrahedral mesh composed of an ordered vertex set V(M) , an edge set E(M) , a triangular face set F (M) and a tetrahedron set T (M) . Let (B 3 , δ) be a unit solid ball with a constant density function δ ≡ 1 . We further define the piecewise linear density function of ρ on T (M) and F (M) by for a given density ρ(v) for v ∈ V(M) . Furthermore, we define the local mass and local area measure at v ∈ V(M) by www.nature.com/scientificreports/ respectively, where |τ | is the volume of τ and |α| is the area of α . Denote F ρ as the set of all mass-preserving simplicial maps f ρ : M → B 3 with respect to ρ satisfying ρ(τ )|τ | = δ|f ρ (τ )| = |f ρ (τ )| for each τ ∈ T (M) and f ρ (τ ) ∈ T (B 3 ) . Note that the linear simplicial bijective map between τ and f ρ (τ ) is given by the barycentric coordinates on the tetrahedron τ . The discrete OMT problem with respect to the 2-norm � · � 2 is to find an f * ρ ∈ F ρ that solves the optimization problem where m ρ (v) is the local mass at vertex v ∈ V(M) as in (2b) and T c (f ρ ) is the transport cost of f ρ . It is easily seen that the set F ρ is convex from the definition where ∈ [0, 1] for f 1 , f 2 ∈ F ρ and τ ∈ T (M) . The projected gradient method is a natural consideration for solving the discrete OMT problem (3). However, because the size of V(M) is too large and the projection on F ρ requires solving some highly nonlinear functions, we are motivated to propose a further thought. We now introduce the computational method for the discrete OMT problem (3). Without loss of generality, each tetrahedral mesh M is centralized and normalized so that the center of mass is located at the origin and the volume is 4 3 π.
Numerical OMT algorithm. In this subsection, we first develop an area-measure-preserving method for solving the OMT problem from the boundary ∂M to a unit ball. Then, using such a ball area-preserving OMT map as an initial boundary map, we develop a homotopy method to compute the desired mass-measure-preserving OMT map.
(i) Discrete OMT problem on ∂M : To minimize the size of the variables in (3), we first consider solving the OMT problem on ∂M . In detail, the area-measure-preserving piecewise linear map f | ∂M ≡ g : . . , n B := #V(∂M) , for the OMT problem on ∂M is iteratively used to minimize the area energy functional , is a regularization parameter, and L A (g) is the area-weighted Laplacian matrix, similar to that of the stretch energy minimization algorithm 35 , such that where σ g −1 (α) = ρ(α)|α| |g(α)| , which is dependent on ρ , is the area-measure stretch factor of g on triangular face α ∈ F (∂M) with ρ(α) defined in (2a) and θ i,j (g) and θ j,i (g) are two angles opposite to edge g([v i , v j ]).
The first term in (4) is designed to smooth the iterative vector g to avoid the occurrence of folding so that the resulting map g is as bijective as possible. The coefficients of L A (g) in (5) are modified by imposing the areameasure stretch factor σ g −1 (α) for α ∈ F (∂M) in the denominator. It is expected that the area ratio σ g −1 (α) between ρ(α)|α| and |g(α)| is close to one for all α ∈ F (∂M) when the iteration (4) converges. The second term in (4) with the regularization parameter is required to minimize the sum of the distances between v i and g(v i ) for all v i ∈ V(∂M).
Algorithm 1 computes the discrete OMT on ∂M . The steps 1-10 of Algorithm 1 provide an initial spherical area-measure-preserving map g , similar to the stretch energy minimization algorithm 31 . To minimize the energy functional (4) and achieve the spherical image constraint, we apply the stereographic projection to map the spherical image of g onto the extended complex plane C . The vertices v i ∈ V(∂M) are also projected by Then, the energy functional (4) is optimized on C by alternatingly solving h on the unit disk in association with the southern hemisphere and the inversion of the northern hemisphere in steps 13-18 of Algorithm 1. ( , www.nature.com/scientificreports/ (ii) Homotopy method for the OMT problem on M : We now construct a homotopy g t : ∂M → R 3 for the boundary map by To find the interior map that minimizes the cost (3), we start from the identity map and consecutively update the interior with the boundary map given by g t . Let be the uniform partition of the interval [0, 1] into p subintervals. For k = 1, . . . , p , we compute the interior map by solving the linear system where f (0) ≡ id and L V (f ) is the mass-weighted Laplacian matrix, similar to the volumetric stretch energy minimization algorithm 31 , such that in which, as in the literature 31,[35][36][37] , The detailed steps are summarized in Algorithm 2. www.nature.com/scientificreports/ 3D brain images with a discrete structure. All 3D brain images in the MSD Challenge have 240 × 240 × 155 voxels, and the irregular 3D brains, on average, account for approximately 16% (between 12% and 20%) of all voxels. However, because of the limitation of the computer's capacity, the tensor input of the U-Net algorithm 38 is restricted by 128 × 128 × 128 voxels, which is undoubtedly a bottleneck for applying the U-Net algorithm to train the 3D brain images for brain tumor segmentation. The most intuitive first thought is to partition the original brain image into several small cubes with 128 × 128 × 128 voxels or randomly select some small cubes to cover the brain image. In the former, some cubes contain images from the outside of the brain that do not contribute too much to the segmentation training, while in the latter, some cubes have an overlapping area that contains local and partially repeated information, which reduces the segmentation training accuracy. These shortcomings motivated us to introduce the OMT method. A 3D MRI brain image can be represented as grayscale values in [0, 1] for each grid point (the center of a voxel) of a cuboid I with rectangular grids 240 × 240 × 155 . If we systematically partition I into Delaunay tetrahedral meshes with vertices as grid points of I , then a 3D image can further be represented as a piecewise linear map ϕ : I → [0, 1] defined by the barycentric coordinates on tetrahedrons. A real 3D brain image can be constructed by a tetrahedral mesh M ⊆ I composed of an ordered vertex set V(M) , an edge set E(M) , a triangular face set F (M) and a tetrahedron set T (M) . Each vertex v ∈ M ⊂ I can be equipped with a density value As above, the piecewise linear density function ρ(τ ) on T (M) and the local mass m ρ (v) at v ∈ V(M) are given in (2a) and (2b), respectively. Two-stage OMT map. Let (M, ρ) be a 3D brain image with density function ρ and (N , δ ≡ 1) be a solid cube with constant density δ ≡ 1 . The canonical 3D cube N is typically suitable for the tensor input of the U-Net algorithm. However, it is difficult to find an OMT map from (M, ρ) to (N , δ) directly. The main reason is that cube N is not in a spherically symmetric domain, so the stereographic projection cannot be used. Therefore, the iteration for minimizing the area energy of (4) exhibits an oscillating behavior and cannot converge. In addition, it is not easy to directly find mass-preserving OMT maps f between any two 3-manifolds (M, µ) and (N , ν) because the mass stretch factor ( µ(τ )|τ |)/(ν(τ )|f (τ )|) , for τ ∈ T (M) , has difficulty converging to one. Hence, Here, T ′ (N ) denotes the Delaunay tetrahedral meshes for cube N . We now show that f * ρ,δ preserves the local mass stretch factor (ρ(τ )|τ |)/|f * ρ,δ (τ )| to be one. In light of Fig. 1b, we have www.nature.com/scientificreports/ for τ ∈ T (M) , which implies that f * ρ,δ is a mass-preserving map with δ = 1. The main purpose of the OMT problem is to find a mass-preserving map between a 3-manifold with a spherical boundary and a unit solid ball while keeping the deformation between the two manifolds as small as possible. Although we want to seek an OMT map from a 3-manifold to a cube, because the numerical convergence technique does not support this issue, we must first find the OMT map from the 3-manifold to the unit ball. Fortunately, an OMT map from a 3-manifold to a cube can be cleverly obtained by composing the OMT map from the 3-manifold to the unit ball with the inverse OMT map f * −1 δ from the unit ball to the cube, as shown in Fig. 1a.

Advantages of the TSOMT map.
The TSOMT map f * ρ,δ : M → N is the key issue for tumor segmentation in 3D brain images by using U-Net 38 . It has the following three advantages. (i) The TSOMT technique unifiedly converts each 3D brain image while keeping the minimal deformation and preserving the local mass into a canonical 3D cube uniformly remeshed by n × n × n . This cube is typically suitable for the tensor input of the U-Net algorithm (see the following subsection for the conversion loss). Furthermore, the mesh grid of the cube can be easily refined or coarsened depending on the accuracy requirement. (ii) The map f * ρ,δ between the 3D cube N and the brain M , as in Fig. 1, is a one-to-one map. It can skillfully represent the global information of a brain image and provide a more complete density distribution to the supervised learning algorithm, which can train the prediction function to be more accurate. In contrast, the other methods that randomly select several small cubes to cover the brain image 34 or sliding window combined with the intersection over union technique can express only the local information and reduce the accuracy improvement due to multiple overlapping regions. (iii) The tumor volume with lesions in the brain, in general, accounts for less than one-tenth of the total brain volume. In the supervised learning training data, there is an imbalance between the labeled tumor area and unlabeled healthy area. This phenomenon has a great impact on the intensity of supervised learning. Fortunately, the local measure ρ(τ )|τ | = |f * ρ,δ (τ )| in the TSOMT map is preserved for τ ∈ T (M) . This means that the density ρ(τ ) in the tumor is relatively large, and then the local volume in the cube is also relatively large and contains more mesh points; as a result, the tumor and healthy regions in the cube are more balanced, and the disease region can be more accurately determined.
Since the OMT problem is highly nonlinear, a strict mathematical proof is still lacking, but a numerical verification indicates that the computed OMT map is almost one-to-one, and tetrahedral flipping rarely occurs. It can be reasonably asserted that the TSOMT map between the brain image and the cube preserves the local mass and keeps the global information of the density distribution. The resulting cube can easily be uniformly remeshed by n × n × n as an input for the U-Net. Based on these advantages, we believe that the TSOMT map provides a more effective tensor input for the U-Net algorithm, which will inevitably obtain more accurate training results than the other existing methods.

Convergence verification and conversion loss of the TSOMT map
Verification criterion of convergence. To better understand the convergence behavior of the OMT algorithm, we introduce a global distortion measurement for the accuracy of a mass-preserving map f by the total mass-measure distortion defined as The smaller D M (f ) is, the better the mass preservation of f. In addition, to express the local distortion of the map f computed by the OMT algorithm, we define the local mass-measure stretch ratio R M of f at a vertex v as The closer R M (f , v) is to one, the better the mass preservation of f at v. Furthermore, we verify the bijectivity of the computed OMT map by checking the number of folded tetrahedrons. A map f is bijective if the number of folded tetrahedrons in T (f (M)) is zero. In general, the convex combination maps on 3-manifolds cannot guarantee bijectivity 39 . The ideal way to illustrate the bijectivity is to numerically check the bijectivity rate of the map defined as The method for checking folded tetrahedrons can be found in the literature 31 . i=1 be the brain images of the ith view corresponding to FLAIR, T1, T1CE and T2, respectively.
To clearly distinguish between tumor and nontumor regions in the brain, it is recommended that the standard grayscale of I i is enhanced by the histogram equalization algorithm 40 . Since the grayscale of I 1 for FLAIR has a clearer gap between tumors and nontumors, to save computational costs, we enhance the standard grayscale of I 1 by 40 and obtain the contrast-enhancing grayscale φ 1 : I 1 → [0, 1] . Then, we only construct the OMT map with φ 1 and share this OMT with four views: FLAIR, T1, T1CE, T2. As in (9) we define ρ 1 (v) := exp(φ 1 (v))) for each vertex v ∈ M 1 and construct the TSOMT map with N being a cube, as in Fig. 1 and L be cubes remeshed by the uniform grid points n × n × n . We consider the following map in Fig. 2 and define the grayscale values and labels on ×L typically forms effective training input data for the U-Net algorithm based on a set comprising an n × n × n × 4 tensor and an n × n × n label.
Numerical convergence of the TSOMT map. We test 484 3D brain images from the MSD Challenge dataset and compute the OMT maps f * ρ and f * δ with n = 128 and ρ =ρ 1 by the numerical OMT algorithm. For a typical 3D brain image, say no. 021, in Fig. 3a, we show the corresponding transport costs T c (f ρ ) of (3) in the OMT map f * ρ in blue and the total mass-measure distortions D M (f ρ ) of (10) in red vs. the number of partitions p in (7). We see that as p increases, the transport cost increases, while the total mass-measure distortion D M (f ρ ) decreases. The reasons are that (i) when p increases and becomes sufficiently large, the boundary map of homotopy becomes very fine, which makes the ratio max τ ∈T (M) ρ(τ )|τ |/|f ρ (τ )| closer to one and makes D M (f ρ ) undoubtedly smaller; (ii) On the other hand, when p decreases, the total mass-measure distortion constraint is more relaxed, which makes the corresponding transport cost smaller. The average of intersections of the blue line and the red line is roughly 8. In Fig. 3b, we plot the histogram of the local mass-measure stretch ratio R M (f ρ , v) in (11) for p = 4, 8 and 12. We observe that the deviations in R M (f ρ , v) show a downward trend for p from 4 to 12. For more detail, in Table 1, we list the transport cost T c (f * ρ ) , the total distortion D M (f * ρ ) , the mean, the SD, and the number of foldings from 2.2 × 10 5 vertices of M for p = 4, 8 and 12. We see that the transport cost and www.nature.com/scientificreports/ the total distortion for p = 8 are between those for p = 4 and 12. Therefore, in practice, we reasonably choose p = 8 for the OMT algorithm.
To show that the map f * ρ,δ computed by the TSOMT algorithm with p = 8 always has satisfactory local distortions, as we expected, in Fig. 4, we show the mean and mean ± SD of the local mass-measure stretching ratio R M (f * ρ,δ , v) for each brain image. Here, R M (f * ρ,δ , v) is defined in (11) with f = f * ρ,δ . We see that the mean of R M (f * ρ,δ , v) is always extremely close to one, and the SD of R M (f ρ,δ , v) is bounded by 0.112, which is satisfactory. Furthermore, we verify the bijectivity of the computed TSOMT map by checking the bijectivity rate of the map in (12). It is worth noting that among all 484 maps computed by the TSOMT algorithm, the number of folded tetrahedrons f * ρ,δ (τ ) is at most 19, while the number of tetrahedrons of each mesh is at least more than 1 million. As a result, the bijectivity rate of every TSOMT map is larger than 99.99% , which is quite satisfactory. Thanks to the large-scale bounded distortion mappings 41 , the folded tetrahedrons can be modified and unfolded into 100% bijective mass-preserving maps by slightly sacrificing the total mass distortion.

Conversion loss between cubes and original brains.
As in the previous section, the TSOMT map transforms irregular 3D brain images into cubes while preserving the local mass and minimizing the deformation, which makes the U-Net algorithm train an effective prediction function for brain tumor segmentation. Since the tensor cube is needed as the input for the U-Net, it is necessary to compare the conversion loss between   www.nature.com/scientificreports/ the original brain images and cubes of various sizes. We first remesh the original brain with roughly 2 × 10 5 vertices, which is an appropriate magnitude for the stability of the accuracy. Once the cuboid L is labeled by l (w) ∈ {0, 1, 2, 3} at each voxel center w in L , we construct the associated conversion label set on a brain image M ⊆ L by using the inverse function f * −1 1 ≡ f * −1 ρ 1 ,δ in (13) as follows: • For a voxel v ∈ M ⊆ L , if there is a voxel center w ∈L such that f * −1 1 (w) ∈ v, then v is labeled by l (w); In Table 2, we display the averages of the conversion loss between brains and cubes as in (15) for the WT, TC, and ET of all 484 brain images in the MSD Challenge dataset with typical grid sizes of 96 3 and 128 3 by the TSOMT map f * ρ 1 ,δ . We observe that the deformation of TSOMT-ρ 1 with a cube size of 128 × 128 × 128 does not result in a considerable accuracy loss, and the maximum conversion loss for the ET is less than 0.77% . On the other hand, although a cube size of 96 × 96 × 96 would conserve considerable computational resources, the maximum conversion loss for the ET is 3.65% , which is not conducive to constructing a good prediction function. Therefore, for training an effective prediction function by U-Net, we consciously take the input of the appropriate tensor size of 128 × 128 × 128 for the 3D U-Net algorithm.
Furthermore, in Table 3, we show the average percentages of the WT, TC, and ET in the original brain and in the cube by the TSOMT-ρ 1 map with a grid size of 128 × 128 × 128 . For instance, the WT accounts for 7.16% of the raw data of the original brain. However, under the enhanced histogram equalization 40 of the grayscale and TSOMT-ρ 1 maps, the WT is enhanced almost twofold in the cube, reaching 13.33% . Applying contrast enhancement to the grayscale image 40 indeed helps to better detect various tumors by the 3D U-Net algorithm.
Training setup U-Net structure. In this paper, we adopt the U-net algorithm, because medical images have blurred boundaries and complex gradients, more high-resolution information is required. High resolution is used for precise segmentation. The U-net algorithm 10,42 combines the low-resolution information (providing a basis for object category recognition) and the high-resolution information (providing a basis for precise segmentation and positioning), which is perfectly suitable for medical image segmentation.   (32,64,128) filters. The main idea is to supplement the decoding stages through a bridge structure with its corresponding encoding stage, which increases the resolution of the output based on this extra information. It has been widely accepted that this bridge structure (encode-bridge-decode) is far better for segmentation applications than a simple encode-decode without bridge. In each encoding stage, there is a 3D CNN followed by batch normalization (BN) and a rectified linear unit (ReLU) activation function, which often does not require dropout for regularization. The same structure is also applied to the decoding and bridge stages. The last convolutional layer was followed by a fully connected layer, and the final layer of the CNN employed a softmax classifier with 2 output nodes for the 2 desired classes. Table 4 shows the total learnable parameters in each layer in the entire network.
A deep learning neural network learns to map a set of inputs to a set of outputs from training data using an optimization process that requires a loss function to calculate the model error. From Table 3, the tumor percentage is small even after OMT and enhances histogram equalization. A well-known deep learning loss function for highly unbalanced segmentations is adopted to ensure convergence 43,44 : where Y is the predicted probability, T is the ground truth, K is the number of classes, M is the number of elements along the first two dimensions of Y, and w k = ( M m=1 T km ) −2 . The hyperparameters in the U-Nets are chosen as follows: Encoder Depth: 3; Initial Learning Rate: Data augmentation. Detecting malignant tumors in the brain and precise segmentation are difficult tasks in neurology because it is not easy to find enough well-labeled training datasets. In practice, these training data must be exactly diagnosed by a doctor, properly labeled, and deidentified. Therefore, a well-labeled training set for U-Net for detecting brain lesions is relatively small. Nevertheless, in ML, augmenting the amount of data is an effective way to improve the efficiency of U-Net training. U-Net usually requires multiview images of the same type but different angles for training. For example, the image of a tumor inside the brain is still a tumor after arbitrary rotation or mirroring. According to our experience, the Dice score will increase via simple data augmentations such as rotating, mirroring, shearing and cropping. As in (13), if we additionally construct f * 2 : (M, ρ 2 ) → (N , δ ≡ 1) with ρ 2 ≡ exp( 1 4 4 i=1 ϕ i )) , where ϕ 1 , . . . , ϕ 4 are the piecewise linear maps corresponding to FLAIR, T1, T1CE and T2, then f * 2 forms a mass-preserving TSOMT map from M to N . Then, by , we perform some data augmentation {N 1,j × N 2,j × N 3,j × N 4,j ×L j } 2 j=1 , as shown in Fig. 2. This data augmentation can be regarded as a perturbation of the size or shape of the brain tumor. It can effectively form an expanded dataset for our training process.  In practice, we first create a sharp contrast-enhanced grayscale image φ 1 : I 1 → [0, 1] by using the histogram equalization algorithm 40 , as before, and then compute the TSOMT map f * 1 as in (13), and we share this OMT among the four views, namely, FLAIR, T1, T1CE and T2, to construct four cubes {N i,1 } 4 i=1 and one label set L 1 , as in (14). Similarly, by using the TSOMT map f * 2 , we can construct {N i,2 } 4 i=1 and L 2 . After all the preprocessing is complete, 400 brain samples are expanded into 800 cube samples. Next, we use the U-Net algorithm to train three nets for the WT {1, 2, 3} , TC {2, 3} and ET {3}: After Net 1, Net 2, and Net 3 are trained, we use the following definition to predict the tumor labels of the brains in the training set and testing set.
Our calculations are implemented in MATLAB R2020a. The training is carried out in a Tesla V100S PCIe 32 GB×4 workstation and is stopped at 400 epochs 45 . Each epoch takes approximately 12 minutes. The computational time of the OMT is 20 seconds per iteration on a personal computer equipped with an NVIDIA GeForce RTX 2080 GPU. Thus, the inference including TSOMT preprocessing, U-Net prediction, and inverse TSOMT processing takes 200 seconds to complete on the same PC.

Flowchart of the segmentation maps.
To further visualize the flowchart for 2D cross-section images, Fig. 5 illustrates the whole inference process. Four brain images (FLAIR, T1, T1CE, and T2) were first transformed to each cube by the TSOMT separately. The original brain size of 240 × 240 × 155 × 4 is reduced to 128 × 128 × 128 × 4 cube images with 4 modalities. The cube images are processed by Net 1, Net 2, and Net 3 to obtain three segmentation images with a size of 128 × 128 × 128 . The inverse TSOMT procedure then converts the segmentation images to 3 × (240 × 240 × 155) . These three images can then be merged with simple logic to obtain the final predicted brain results ( 240 × 240 × 155 ). By matching the ground truth with the prediction voxel by voxel, the accuracy is then calculated as a ratio of false predictions to the entire tumor.
Evaluation criteria. In this paper, we use the Dice score, sensitivity (recall), specificity, precision and 95th percentile of the Hausdorff distance (HD95) as indicators to evaluate the segmentation performance of our proposed algorithm. Consider the confusion matrix as  For a fixed brain tumor, as in (15), let A and B denote the positive ground truths and predictions, respectively, for the WT, TC, and ET. Then, the HD is defined as HD95 is similar to the HD, and it calculates the 95th percentile of the distance between the boundary points in A and B. In the following, we consider applying the OMT technique on the MSD Challenge dataset to obtain a good training set and make highly accurate predictions.

Results and discussion
Dice score and loss plot. A deep learning neural network learns to map a set of inputs to a set of outputs from training and validation data using an optimization process that requires loss functions for training and validation sets to calculate the model error. For convenience, we implement the TSOMT procedure with a size of 96 × 96 × 96 and the U-net algorithm with a minibatch size of 8 on the MSD Challenge dataset with 400, 29 and 55 samples for training, validation and testing, respectively. Figure 6 plots the loss values of the training set (red solid line) and the validation set (red dotted line), as well as the Dice scores of the training set (blue solid line) and the testing set (blue dotted line) for the WT, TC and ET every 10 epochs. The two red curves corresponding to the loss functions in each subfigure of Fig. 6 form a variance (overfit) when the number of epochs nears 400 and a bias (underfit) when the number of epochs is near 10; the optimal number of epochs for the WT, TC and ET is between 45 and 55. The two blue curves of the Dice scores in each subfigure of Fig. 6 intersect at epoch 50 for the WT and TC and at epoch 100 for the ET with Dice scores of 0.9072, 0.8608 and 0.8305, respectively, which coincides with the optimal number of epochs for the bias and variance.
The plots of the blue solid lines for training in Fig. 6 can usefully reflect the training of the model. We see that the Dice scores for the WT, TC, and ET do not increase significantly after 400 epochs and approach 0.9716, 0.9318 and 0.8879, respectively. The trends of both the Dice scores and the loss values show a typical training history. These results are also consistent with the findings reported in Table 2, where the conversion loss increases in the order of ET < TC < WT, and in Table 3, where the tumor percentages decrease in the order of WT > TC > ET. A small tumor percentage implies fewer balanced data. Thus, the Dice score is smaller.
However, the Dice scores for the WT, TC, and ET in the testing set remain constant throughout the whole optimization process, where the training set often does not reflect the characteristics of the testing set. This drawback can be improved if more brain tumor cases are included in the training set. By reducing the size of each brain tumor case via the TSOMT method, our approach can take more training cases if needed.
Here, the 484 brain images of the MSD Challenge dataset are divided into a training set and a testing set with 400 and 84 samples, respectively. Table 5 lists the numerical results by Net 1, Net 2 and Net 3 at epoch 400 for the WT, TC, and ET. The conversion losses between the 128 × 128 × 128 cubes and brains for the WT, TC, and ET are quite reasonable (within 0.4% ∼ 1.3% ). With U-Net training, the conversion losses are compatible with those in Table 2, where U-Net training is not included. Additionally, the Dice scores for the brains and cubes show a decreasing trend of WT > TC > ET because of the small tumor percentage.
Other measures of the voxelwise overlap in the segmented regions such as the sensitivity, specificity, and HD95 evaluate the distances between segmentation boundaries and are all calculated as shown in Table 6. For all those matrices, it is expected that the training results should be better than the testing results. However, HD95 for the ET is the opposite (10.5229 for the training set vs. 6.011 for the testing set). There were 12 cases without ET labels in the 400-sample training set and zero cases in the 84-sample testing set. Those 12 cases dramatically increase the HD. The HD values without those 12 cases are enclosed in parentheses.
Postprocessing. It is known that simple postprocessing can considerably enhance the segmentation performance by providing additional information 4   www.nature.com/scientificreports/ improvements, our main focus is to use postprocessing for testing. To provide additional information, mirroring and rotation techniques are added during the testing process. Since the TSOMT map can skillfully represent the global information of a brain image and provide a complete density distribution to the U-Net algorithm, we apply the mirroring and rotation techniques during testing to increase the Dice scores of the WT, TC, and ET. Let C 1 × C 2 × C 3 × C 4 denote the cube corresponding to the TSOMT map of the testing data Here, we add four schemes { R k } 4 k=1 to the testing cube C 1 × C 2 × C 3 × C 4 , where R 1 represents a 90 degree counterclockwise rotation, R 2 indicates mirroring from left to right, R 3 denotes mirroring upside down, R 4 represents mirroring left to right, followed by a 90 degree counterclockwise rotation, provided by the MATLAB functions rot90, fliplr and flipud. For each scheme R k , we perform the following two steps: (i) rotation or mirroring C 1 × C 2 × C 3 × C 4 by R k to obtain a new cube C 1 ×C 2 ×C 3 ×C 4 and (ii) predicting the probability p k of voxel C 1 ×C 2 ×C 3 ×C 4 being a tumor. Let p 0 be the predicted probability of voxel C 1 × C 2 × C 3 × C 4 being a tumor.
Then, we use the following rule to determine the final judgment of this voxel: If ( 4 k=0 p k ) > (1 − 4 k=0 p k ) , then this voxel is a tumor; otherwise, it is healthy. Table 7 shows the brain Dice scores of the WT, TC, and ET for the schemes { R k } 4 k=1 at epoch 400 and the percentage improvement. The numerical results demonstrate that the proposed schemes { R k } 4 k=1 can improve the associated Dice score ranging from one to two percent. The whole testing procedure is as follows: TSOMT, U-Net inference, inverse TSOMT, and mirroring and rotation. Effect of augmented rotations. As indicated in the third item of the main contributions, the TSOMT method can substantially increase the augmented data to achieve more accurate and effective results. We apply the U-Net algorithm with 250 epochs to the MSD Challenge dataset (484 samples) preprocessed by the TSOMT method. To increase the augmented data, we construct densities ρ i , i = 2, 3, 4 and randomly rotate the 400 unit solid balls with density ρ i in the first-stage OMT for i = 2, 3, 4 . Figure 7 plots the Dice scores of the training set vs. the testing set from 50 to 250 epochs by the blue, red, green and black lines for the training data without augmented rotations and the training data with one, two and three augmented rotations, respectively. We see that the training data with more augmented rotations have a certain effect on improving the Dice score of the testing data. This result coincides with the advantage that we introduce the TSOMT procedure to represent an effective 3D brain image by a 128 × 128 × 128 cube.

3D prediction.
To evaluate the performance of the proposed segmentation algorithm, we compare the segmentation output with the ground-truth labels, which are prepared by field experts. The qualitative 3D segmentation results vs. the ground-truth labels are plotted in Figure 8 for case 410, where the brain Dice scores of the WT, TC, and ET are found to be 0.9738, 0.9188, and 0.8585, respectively. In addition to one case, for the training set (nos. 001-400), the best, average and worst cases for the WT are 0.9948, 0.9852 and 0.9619, respectively. This finding implies that the segmentation results of the lesion areas achieved by the proposed model are very close to   Figure 9 plots cross-sections of the ground truth and predictions for the cases with the most incorrectly predicted voxels (452, 426, and 462). The incorrect predictions are further grouped into false negatives and false positives in the figure. The false negative:false positive voxel ratios for the WT, TC, and ET in all testing cases are found to be 1.128, 1.529, and 1.904, respectively. Our approach shows false negatives rather than false positives, and the trend of WT < TC < ET conceptually agrees with previous observations. Furthermore, the TSOMT and inverse TSOMT methods do not alter the deficiency qualitatively because the conversion losses from cubes to brains for these three cases are all smaller than 1.2%. The conversion loss is too small to induce any new failure mode. In addition, the brain Dice scores of (452, 416, 462) for the WT, TC, and ET were found to be (0.7312, 0.9549, 0.9239), (0.9260, 0.6371, 0.7435), and (0.8348, 0.7280, 0.4130), respectively. False predictions of the WT, TC, and ET are seemingly independent from each other. This finding justifies the use of three U-Nets. A typical incorrect WT prediction is shown for case no. 452. Based on this typical type of brain tumor, we observe that the false negative and false positive regions for the best testing case that are incorrectly distinguished almost always occur at the junction boundaries of different levels of tumors and nontumors. This deficiency can hopefully be reinforced by boundary image processing 46 , boundary segmentation 47 , and edge detection 48,49 methods. On the other hand, for the worst testing case (no. 452) with a more severely spreading tumor, there is a relatively large region on the outermost periphery of the tumor that is very vague and difficult to distinguish, regardless of whether it is positive or negative. We see that the tumor type of this case significantly reduces the prediction accuracy. Inaccurately predicted tumor locations also occur on the interfaces of different labels of tumors or nontumors, but they seem to expand outward more. This is indeed a challenging problem of how to improve image segmentation techniques.
TC and ET false predictions are more like an embedded type than a boundary type in the WT. Case no. 416 (TC) and case no. 462 (ET) are plotted. In these two cross-sections, false positives are negligible. The TC and ET cases are inherently less accurate because of the smaller regions (see Table 7 for the testing case results). Additionally, the prediction has many scattered necrotic regions in the TC and ET. As a result, the false negative exhibits   Testing on the BRATS 2020 dataset. In the above sections, we train a U-Net model for 400 3D brain images from the MSD Challenge dataset with augmented rotations by using the TSOMT method, the U-Net algorithm, and the postprocessing correction technique and successfully achieve satisfactory Dice scores for the 84 MSD testing samples, as shown in Table 7.
We now use the well-trained U-Net model to infer brain tumors for the testing brain images from the BRATS 2020 dataset 1-3 . We first check the 369 brain images of the training data in BRATS 2020 and find that 210 samples are duplicates of samples in the MSD dataset. We then take the remaining 159 samples as the testing data and infer brain tumors. The Dice scores of the WT, TC, and ET are 0.9002, 0.7387 and 0.7723, respectively, which appear satisfactory. All 159 predicted NII files are included online at https:// reurl. cc/ 2rrrQE. Furthermore, an online evaluation platform for BRATS 2020 was recently opened and provides 125 unlabeled brain images for testing. The feedback Dice scores of the WT, TC and ET predicted by our U-Net model are 0.8829, 0.7289 and 0.7297, respectively (see the online record of the team "GIMILab" at https:// www. cbica. upenn. edu/ BraTS 20/ lboar dVali dation. html). The resulting scores are not too high but are still acceptable. The main reason could be that the current testing data have various formats or flaws that have not been standardized. We hypothesize that this testing dataset needs to be classified, which constitutes our research topic in the next phase. We believe that the TSOMT technique developed in this paper has numerous advantages and good efficiency.

Conclusions
This work mainly introduces the TSOMT technique to the research area of 3D medical image segmentation. We first propose an efficient and reliable numerical algorithm for the computation of the OMT map. Then, we use it to develop the TSOMT for transforming an irregular 3D brain image into a cube while maintaining minimal deformation. The TSOMT map is one-to-one and mass-preserving and minimizes the transport costs.
The concept of representing an irregular brain image with a unit cube with minimal deformation was first proposed in the research field, which is particularly beneficial to the U-Net algorithm's input format for creating training and testing sets. Expressing brain images as cubes greatly reduces the training input set size from 240 × 240 × 155 to 128 × 128 × 128 and reduces the computational time required for training. In addition, this preprocessing mean by the TSOMT map does not cause a considerable loss of accuracy. In contrast, the conversion loss under 128 3 grid points on the cube that we set is usually between only 0.51% and 0.77% . The transport cost, distortion, and conversion loss are all carefully examined. The small SD in the local mass ratios of TSOMT for 484 cases shows the robustness of the transform.
Another advantage of the OMT technique is that it can first perform histogram equalizations on the density of the original brain tumors. Generally, the tumor site with higher density accounts for less volume in the brain image. Nevertheless, we can utilize the merit of the mass-preserving property of the TSOMT to increase the www.nature.com/scientificreports/ proportion of brain tumors in the cube. This technology is very conducive to the effectiveness of U-Net learning for tumor segmentation.
The training accuracy at 400 epochs reached 0.9822 for the WT on the cube using a 3D U-Net. The inverse TSOMT method returned the dataset to the original size of 240 × 240 × 155 and lowered the accuracy to 0.9781 for the WT on the brain during the conversion loss. This TSOMT, U-Net inference and inverse TSOMT approach significantly improves the brain tumor detection and segmentation accuracy. The training data with more augmented rotations have a certain effect on improving the Dice score of the testing data. For the testing cases, a simple mirroring and rotation postprocessing technique with an accuracy improvement of one to two percent is added to the whole flow. For each new case, the overall flow can be completed in fewer than 200 seconds.
The high accuracy in our numerical experiments can once again reflect the superiority of the TSOMT. The associated cube holds almost all the global information of the brain image and is suitable for the U-Net input. Intuitively, the TSOMT combined with U-Net can master the essential learning rules and has a good chance of realizing a high-precision system for brain tumor segmentation. Therefore, the TSOMT map can be said to be superior to the other preprocessing methods. Thus, expanding the dataset with different tumor types in combination with the virtue of the TSOMT conversion can indeed have expected potential in the field of 3D medical image segmentation.