A novel 2-phase residual U-net algorithm combined with optimal mass transportation for 3D brain tumor detection and segmentation

Utilizing the optimal mass transportation (OMT) technique to convert an irregular 3D brain image into a cube, a required input format for a U-net algorithm, is a brand new idea for medical imaging research. We develop a cubic volume-measure-preserving OMT (V-OMT) model for the implementation of this conversion. The contrast-enhanced histogram equalization grayscale of fluid-attenuated inversion recovery (FLAIR) in a brain image creates the corresponding density function. We then propose an effective two-phase residual U-net algorithm combined with the V-OMT algorithm for training and validation. First, we use the residual U-net and V-OMT algorithms to precisely predict the whole tumor (WT) region. Second, we expand this predicted WT region with dilation and create a smooth function by convolving the step-like function associated with the WT region in the brain image with a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\times 5\times 5$$\end{document}5×5×5 blur tensor. Then, a new V-OMT algorithm with mesh refinement is constructed to allow the residual U-net algorithm to effectively train Net1–Net3 models. Finally, we propose ensemble voting postprocessing to validate the final labels of brain images. We randomly chose 1000 and 251 brain samples from the Brain Tumor Segmentation (BraTS) 2021 training dataset, which contains 1251 samples, for training and validation, respectively. The Dice scores of the WT, tumor core (TC) and enhanced tumor (ET) regions for validation computed by Net1–Net3 were 0.93705, 0.90617 and 0.87470, respectively. A significant improvement in brain tumor detection and segmentation with higher accuracy is achieved.

attention and interest from researchers in this field. The brain samples were scanned with four modalities, namely, fluid-attenuated inversion recovery (FLAIR), T1-weighted (T1), T1-weighted contrast-enhanced (T1CE), and T2-weighted (T2), by multiparametric magnetic resonance imaging (mpMRI). The challenge is the evaluation of state-of-the-art methods for the task of brain tumor segmentation of whole tumor (WT, labeled {2,1,4}), tumor core (TC, labeled {1,4}), and enhanced tumor (ET, labeled {4}) regions in the human brain. To address this issue, in the early years, random forest algorithms and machine learning techniques were used to perform image classification [6][7][8] and segmentation 6,[9][10][11] . In 2021, Biratu et al. provided a significantly comprehensive survey 12 of three model techniques, including region growing 13 , shallow machine learning 14 , and deep learning 15 , for brain tumor segmentation and classification. Later, BraTS 2021 3,4,16 was expanded to include a large number of new brain samples in the database, providing 1251 labeled samples for training and 219 unlabeled samples for validation. Subsequently, CNN structures with two layers 17 and eight layers 18 were proposed and made good progress in brain tumor segmentation. Then, a more sophisticated multiple CNN architecture, called the U-net model, was first developed in 19 and improved in 20 by assembling two full CNNs and U-net. The merits of applying the U-net model to the challenge of MSD 2018 1,2 were first proposed by Isensee et al. 21 . In 2018, a variant U-net, called the residual U-net 22 (ResUnet), was proposed to enhance the segmentation accuracy. By combining U-net and residual units 23,24 , ResUnet simplifies the training of deep networks, promotes the dissemination of information via a large number of skip connections, and implements a network designed with leaner parameters and superior performance. Therefore, we adopt ResUnet as the U-net architecture used for the model training and prediction in this paper.
For the study of brain tumor segmentations, preprocessing to effectively represent large quantities of input data for CNNs is crucial. For example, taking an irregular 3D physical brain image obtained from an MRI, which is generally composed of 1.5 million voxels, by randomly selecting several cubes (e.g., 16 cube filters were used by Isensee et al. 25 ) with seamless coverage to overlay the irregular brain image is a natural way to fit the input format of tensors for a U-net architecture. Nevertheless, the random sampling technique may lose the global information of the brain image, and it increases the quantity of input data. On the other hand, an efficient twostage optimal mass transportation (2SOMT) algorithm, newly proposed by Lin et al. 26 , was designed to first transform an irregular 3D brain image into a unit ball and then into a cube with minimal distortion and transport costs. This strategy can greatly reduce the capacity of input data and retain the global information of the 3D brain image, so the existing computing resources can be effectively used to attain the expected result. However, the 2SOMT did not fully use the merits of the density distribution of the brain image so that a U-net algorithm could not predict the tumor region more accurately. In addition, 2SOMT may produce more conversion loss between transformations from a brain to a ball and then to a cube. Thus, we are motivated to consider directly transforming an irregular brain image into a cube with a more precise density function to detect possible tumor regions so that a U-net algorithm is better positioned to learn to label the segmentation.
Optimal mass transportation (OMT) is a very old optimization problem that was raised by Monge in 1781 (see 27 for details) to find an optimal solution that minimizes the transport cost and preserves the local mass ratios between two spaces. The existence and uniqueness of a solution to the OMT problem was proven by Kantorovich 27 by relaxing the probability measure with a joint probability distribution. The regularity condition for the solution of the OMT problem was first shown by Caffarelli 11 , and an elegant theoretical survey paper "Optimal Transport: Old and New", which summarized the achievements of predecessors, was published by Villani 28 . For numerical methods, Brenier 10 proposed an alternative scheme for solving the OMT problem with a quadratic cost function for a special class of convex domains. Based on Brenier's approach and the variational principle 29 , Su et al. 30 developed a volume-preserving parameterization from a 3-manifold M with a spherical boundary to a unit ball B 3 . Recently, Yueh et al. 31 proposed a novel algorithm to compute a volume-preserving parameterization from M to B 3 by modifying the denominators of the coefficients of the corresponding Laplacian matrix by imposing the local volume stretch factor at each iteration step and adopted the projected gradient method (PGM) combined with the homotopy technique in 32 to find the OMT map between M and B 3 . In addition, the 2SOMT procedure from M to B 3 and from B 3 to a cube was developed by Lin et al. 26 and applied prior to ResUnet training and inference in 3D brain tumor segmentation.
In this paper, we study the applicability of mapping an irregular 3D image (i.e., a human brain) to a canonical domain (i.e., a cube or a cuboid), which minimizes the transport cost and preserves the local mass ratios. First, based on the homotopy technique, a direct one-stage OMT approach from a 3-manifold M with a genus-zero boundary to a cube is developed for 3D ResUnet training and inference to improve the higher conversion loss of 2SOMT 26 from M to B 3 and B 3 to a cube. Thus, we can construct a one-to-one correspondence between the input data of irregular images and the associated cubic tensors. With slight conversion loss between OMT maps, the usage of the capacity of the training data of the 3D ResUnet model is greatly reduced, and it is our belief that 3D ResUnet training can easily find a local minimum and achieve better performance.
Next, we propose a two-phase ResUnet with OMT (2P-ResUnet-OMT) algorithm utilizing the density distribution of brain tumor features and train four related networks to detect tumor regions and segment tumor labels. Given an irregular 3D brain, in Phase I, we first construct the associated density map at each vertex according to the normalized contrast-enhanced histogram equalization (CEHE) grayscale values of the FLAIR modality of a brain image by MRI. Then, we compute OMT maps from brain images to cubes for the training set and train Net0 by the ResUnet algorithm for the detection of possible tumor regions. In fact, there are no clues at the beginning; the CEHE grayscales of FLAIR, which typically reflect the distribution of WT, should be an effective way to detect tumor regions. Next, we covered these possible tumor regions with 5 voxels with morphological dilation. In Phase II, because ET ⊂ TC ⊂ WT , we construct a smooth density function by convolving the step-like function with exp(FLAIR) on the expanded WT region and 1.0 on the others, with a 5 × 5 × 5 blur box tensor. We remesh the tetrahedron with finer meshes in the higher density region in the brain so that the target tumor region can be enlarged in the cube by OMT and better viewed and learned by ResUnet. We then 1. Our proposed 2P-ResUnet-OMT algorithm transforms an irregular 3D brain image into a cube to satisfy the input format of ResUnet while preserving the local mass ratios between two domains and minimizing the transport cost and the distortion. These advantages for 2SOMT were highlighted in 26 . However, 2SOMT did not make full use of estimating the distribution of the density function, so ResUnet could not infer the target object accurately. 2P-ResUnet-OMT fully grasps the distribution of the associated density function to create an effective OMT map from an irregular 3D domain to a cube and provides it to ResUnet for training a high-performance prediction network. 2P-ResUnet-OMT inherits the advantages of 2SOMT in that it needs to use only a cube to represent an irregular 3D brain image without losing the most important global features and conversion accuracy. In this way, the computational cost and the computer environment can be greatly economized during ResUnet training and used for data augmentation, which exactly considers the limitation of the memory capacity. 2. One of the characteristics of the OMT map is to preserve the local mass ratios. With this peculiar feature, in Phase II of 2P-ResUnet-OMT, we apply mesh refinement on the expanded WT region detected during Phase I. The mesh refinement technique can increase the number of tetrahedrons in a specific region in the brain and enlarge the portion of volume appearing in the target domain; that is, using the ResUnet algorithm is similar to using a magnifying glass to view and learn how to mark the segmentation labels well. The numerical experiment with the trained Net1-Net3 models combined with ensemble voting shows that the Dice scores of validation for WT, TC and ET can reach 0.93705, 0.90617 and 0.87470, respectively; hence, this approach significantly boosts the accuracy of brain tumor detection and segmentation. 3. The OMT approach must convert the labels predicted by ResUnet to a brain image; therefore, to evaluate the Dice score more precisely, we propose a new conversion technique with ensemble voting postprocessing to convert the predicted labels on the cube back to each voxel of the brain by using the multiple values on the cube validated by various models to precisely evaluate the labels corresponding to voxels in the brain image. The expressively high validation Dice scores on the BraTS 2021 validation data suggest that using a cube to represent an irregular 3D brain image by OMT is indeed an innovative idea and the most streamlined approach for CNN training and prediction.
This paper is organized as follows. In "Discrete OMT problems and cubic OMT maps", we introduce the discrete OMT problem and the spherical-cubic area-measure-preserving and cubic volume-measure-preserving OMT maps. In "Two-phase ResUnet with OMT for training and validation", we propose a two-phase ResUnet model with OMT maps for training and validation. For the evaluation of high Dice scores, we develop an effective conversion technique to convert the predicted labels on the cube back to the brain image using all related probability information corresponding to each voxel in the brain image. In "Results and discussions", we show the improvement in the Dice score obtained by the ResUnet models in Phase II with mesh refinements on the expanded WT region provided by Phase I and the ensemble voting postprocessing for the label evaluation. Finally, concluding remarks are given in "Conclusions".

Discrete OMT problems and cubic OMT maps
Let M be a simplicial 3-complex that describes an irregular 3D brain image with a genus-zero boundary. M is generally composed of sets of vertices V(M) , edges E(M) , faces F(M) and tetrahedrons T(M) . A discrete OMT problem consists of finding a bijective function that maps M to a canonical simple domain with minimal distortion. The canonical shape could be a ball B 3 or a unit cube C 3 . A tensor form is necessary for the input of the U-net algorithm; therefore, a cube or a cuboid is the target domain for M . In this section, we propose a one-stage OMT approach to map M to C 3 .
Discrete OMT problem. Let ρ be a density map on V(M) . The piecewise linear density functions of ρ on F(∂M) and T(M) are defined by respectively, where v i ∈ V(α) , α ∈ F(∂M) , v i ∈ V(τ ) , and τ ∈ T(M) . Furthermore, we define the local area/ volume measures (i.e., local mass) by www.nature.com/scientificreports/ respectively, where |α| and |τ | are the area and volume of α and τ , respectively. Denote and as the sets of all area-/volume-measure-preserving (i.e., mass-preserving) piecewise linear maps from ∂M to ∂C 3 and from M to C 3 , respectively, in which the bijective maps between α and g(α) , as well as τ and f (τ ) , are determined by the barycentric coordinates on α and τ , respectively. For given g ∈ G ρ and f ∈ F ρ , we define the transport costs of g and f, respectively, by where a ρ (v) and m ρ (v) are the local area/volume measures at v ∈ V(∂M) and v ∈ V(M) , respectively, as in (2).
The discrete OMT problems on ∂M and M with respect to � · � 2 consist of finding a g * ρ ∈ G ρ and f * ρ ∈ F ρ that solve optimal problems where d ρ (g) and c ρ (f ) are given in (4). Without loss of generality, hereafter, each simplicial 3-complex M is centralized and normalized so that the center of mass is located at the origin and the mass is one.
Cubic area-measure-preserving OMT maps. Let M be a simplicial 3-complex with a genus-zero boundary and of mass one with density functions ρ : T(M) → R and ρ : F(∂M) → R defined on the tetrahedrons of M and triangles of ∂M , respectively. We define the area-weighted stretch energy on ∂M with m = #V(∂M) .
The piecewise linear function g on ∂M is given by the barycentric coordinates, g is called the induced function by g and g is the inducing matrix for g. The area-weighted stretch energy 33 on ∂M is defined as where L S (g) is the area-weighted Laplacian matrix with and where θ i,j (g) and θ j,i (g) are two angles opposite to edge g( To compute the cubic area-measure-preserving OMT (A-OMT) map from ∂M to ∂C 3 , we utilize the PGM proposed in 32 , which can be used to efficiently compute the A-OMT maps h * ρ : ∂M → S 2 , where S 2 denotes the unit sphere in R 3 and h * 1 : ∂C 3 → S 2 ( ρ = 1 ), respectively. Then, the composition map Fig. 1, is the desired A-OMT map. The computational procedure is summarized in Algorithm 1. www.nature.com/scientificreports/ Cubic volume-measure-preserving OMT maps. In this section, we will develop the OMT algorithm for directly solving the cubic OMT map f * ρ , as in (5), from M to C 3 . Let g * ρ be the cubic A-OMT map from ∂M to ∂C 3 computed by Algorithm 1. We now construct a homotopy g ζ : ∂M → R 3 for the boundary maps by . For k = 1, . . . , p , we compute the interior map by solving the linear system The corresponding computational procedure is stated in Algorithm 2.   (9), we define the total mass distortion and the local mass ratio as respectively, where N (v) is the set of 1-ring neighboring tetrahedrons of v. A physical brain image M is contained in I s and accounts for approximately 12-20% of the voxels. Suppose M is a simplicial 3-complex with a genus-zero boundary composed of tetrahedral meshes representing a brain image. Furthermore, I 1 records the adapted CEHE grayscales of FLAIR, and in general, the FLAIR modality typically reflects the distribution of WT = {2, 1, 4} ; therefore, the adapted CEHE grayscales on the voxel I 1 (i, j, k) can help with defining the density map on V(M) by where γ is a value chosen from the interval [1,2]. Two-phase ResUnet with OMT for training. For the given samples in the training set of 3D brain images, we propose a 2P-ResUnet-OMT algorithm with density function estimates to construct an effective input tensor for the ResUnet algorithm. In general, a real brain image contains approximately 1.5 million vertices. Therefore, it is reasonable to cover a brain image with 128 3 voxels.

Two-phase ResUnet with OMT for training and validation
The tumor regions of the brain are initially unknown; therefore, in the first phase, we utilize the grayscale of FLAIR to construct the density function in (11) for OMT and train a Net0 by the ResUnet algorithm to detect the possible tumor region of WT. For better coverage, we expand the possible tumor regions by a few voxels with dilation. In the second phase, we construct a new density function in (12) (see the following) according to the predicted and outer expanded tumor regions with higher densities for a new OMT, yielding enlarged tumor regions in the target cube while retaining unchanged nontumor regions. We then train Net1, Net2 and Net3 for WT, TC and ET, respectively, by the ResUnet algorithm. Consequently, the new OMT provided in Phase I is implemented analogously to a magnifying glass that enhances viewing and marking the brain tumor segmentations in Phase II.
Phase I. We first construct training tensors by using the OMT algorithm with the density ρ γ (v) , as in (11).
We compute the OMT map f * ρ γ with Algorithm 2 from M to a 128 × 128 × 128 cube N  II, respectively, we use Net0 to detect the possible tumor region of WT = {2, 1, 4} with the density function ρ γ (v) defined in (11) and cover WT by m voxels with dilation, that is, T ⊆ M , and construct a new density function ρ γ , as in (12) and (13) Fig. 4. www.nature.com/scientificreports/ In (ii), we see that for each voxel v j in M , we utilize the multiple values p t i and i = 1, . . . , n(j) on the cube to define the most likely probability, which can be used to make a more precise evaluation of the label prediction. Now, we denote GT as the ground truth of WT ⊃ TC ⊃ ET and PD as the prediction of WT , TC and ET , by (i)-(iii) above. The associated relationship between sets of GT and PD is plotted in Fig. 5. Let GT c and PD c be the complementary sets of GT and PD , respectively. We consider the confusion matrix as Here, we recall the following metrics 1 for numerical experiments:   www.nature.com/scientificreports/ Furthermore, if we define GT p and PD p as the probability density tensors of GT and PD, respectively, we can define The Dice loss in (16) can help to check the convergence of the training procedure for WT, TC and ET vs. epochs by the ResUnet algorithm. Improvement in dice scores with mesh refinement and ensemble voting postprocessing. In this subsection, we propose two methods to improve the Dice scores of WT, TC, and ET. One is mesh refinement on the WT region for the OMT map, and the other is ensemble voting postprocessing.
a. Mesh refinement With the merits of 2P-ResUnet-OMT, the density distribution of possible tumor regions in a brain image computed by Phase I can be enlarged with finer meshes and can be better viewed in Phase II for ResUnet training. One of the most important features of the OMT map is that the density can be increased in the region of interest, and then the region can be remeshed by the mesh refinement technique. In this way, due to the mass-preserving property of OMT, the region of interest can be enlarged in the cube, which enables ResUnet to learn more efficiently and achieve high-performance prediction results. b. Ensemble voting postprocess We propose an ensemble voting postprocessing approach to determine the final labels in the brain image for validation. The main purpose of this postprocessing step is to modify the probability p t i , t = 1, 2, 3 , in Steps (i)-(iii) of paragraph "Net0 and Net1-Net3 for Validation". We first select the three best models {Net1 ν , Net2 ν , Net3 ν } 3 ν=1 for WT, TC and ET from the training procedure. For each 128 × 128 × 128 brain tensor (R 0 ) for validation, we further build four 128 × 128 × 128 tensors with 90 degree counterclockwise rotations (R 1 ) , mirroring from the left to the right (R 2 ) , mirroring from the top to the bottom (R 3 ) and mirroring from the left to the right followed by a 90 degree counterclockwise rotation (R 4 ).
The various rotations R 1 , . . . , R 4 of the brain tensor R 0 constructed above indeed help to improve the Dice scores with the ensemble voting technique developed in (17) and (18).

Results and discussions
Based on the CNN technique, the U-Net algorithm is designed to learn an effective network from training data using an optimization process that requires decreasing the model error of the loss function on the training and validation sets. We adopt the ResUnet algorithm 22 and set the hyperparameters as follows: encoder depth: 3, initial learning rate: α 0 = 1.0 × 10 −4 , learning rate drop factor: F = 0.95 , learning rate drop period: P = 10 , L 2 -regularization: 1.0 × 10 −4 , and minimum batch size: 8.
For the 1251 brain image samples in the BraTS 2021 challenge database 3,4,16 , we randomly fix 1000 samples for training and 251 for validation. Our utilized ResUnet is implemented in PyTorch and the Medical Open Network for AI (MONAI) 34 , and training is carried out on a server equipped with an NVIDIA Tesla V100S PCIe 32 GB×4 GPU.

Partition number p in Algorithm 2.
We select BraTS0002 as an M from the BraTS 2021 dataset and compute the cubic V-OMT from M to C 3 by Algorithm 2. In Fig. 6, we plot the statistical summary of the local mass distortion τ ∈N (v) |ρ(τ )|τ | − |f (τ )||/4 , as in (10), and r f (v) for all v ∈ V(M) versus the partition number p of homotopy. In each box, the red centerline indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The dotted lines extend to the most extreme data points that are not considered outliers, and the outliers are represented separately with "+" signs. Furthermore, in Fig. 7, we also plot the statistical summary of the total mass distortion d M (f ) and the mean and standard deviation (SD) of r f (v) vs. the partition number p for the first 1000 brain samples from the BraTS 2021 dataset.
Loss function = Dice loss + Cross entropy loss www.nature.com/scientificreports/ Figure 6 shows that when p = 11 , the cubic V-OMT between BraTS0002 and C 3 has the smallest local mass distortion and the closest local mass ratio to one. Moreover, in Fig. 7, when p = 11 , the first 1000 brain samples of BraTS 2021 have the smallest total mass distortion and the best mean and SD of the local mass ratios. Therefore, we choose p = 11 in Algorithm 2.
Dimension m of blur box tensor. We now discuss the dimension m of the blur box tensor in (13), which covers the WT region by m voxels. To choose a suitable number m for the covering voxels with dilation for WT, we apply the sensitivity and precision metrics defined in (15), in which PD denotes the prediction of {WT covered by m voxels with dilation} by Net0.
In fact, the sensitivity metric in (15) indicates how many voxels lie in the prediction, and the precision metric in (15) indicates how precise the prediction is. Thus, we want to make both the sensitivity and precision as large as possible. In Fig. 8, we plot the mean, minimum, median and maximum values of the sensitivity and precision metrics of the WT validation vs. the numbers of covering voxels with dilation. We find that m = 5 is a suitable number to balance the sensitivity and precision values for the validation data.
For a fixed m = 5 , in Table 1, we list the mean, minimum, median, maximum and SD values of the transport costs, folding numbers and enlarged ratios for both the 1000 training samples and the 251 validation samples. The enlarged ratio is defined by (the ratio of WT in the cube)/(the ratio of WT in the raw data).
In Table 1, we observe that the numerical results of the transport costs, folding numbers and enlarged ratios for the 1000 training and 251 validation samples computed by f * ρ 1 in (13) are in line with what we expected.

Dice scores and loss functions.
We first compare 2P-ResUnet-OMT developed in Section "Two-Phase ResUnet with OMT for Training and Validation" with one-phase ResUnet-OMT (1P-ResUnet-OMT), i.e., the density functions of (11) with γ = 1.0 and 1.5 are used for training Net1. We learn Net1 by using 2P-and 1P-ResUnet-OMT with 300 epochs. In Fig. 9a,b, we plot the Dice scores of WT for training and validation by 2Pand 1P-ResUnet-OMT, respectively. We observe that for both the training and validation scores, 2P-ResUnet-OMT is obviously much better than 1P-ResUnet-OMT. Therefore, in the following numerical experiments, we prefer to adopt 2P-ResUnet-OMT. Furthermore, in Fig. 10, we compare the prediction results of FP (purple area)   www.nature.com/scientificreports/ and FN (blue area) using Phases I and II, respectively, for the worst (BraTS00098) and best (BraTS01321) Dice performance of real brain images. For the worst case, we see from Fig. 10a that Phase II significantly reduces the ratios of FP and FN and has a considerable improvement in Dice, sensitivity, and precision. For the best case shown in Fig. 10b, Phases I and II have only a slight difference between the four metrics in (15). Overall, Phase II actually reinforces the examples of underperforming prediction accuracy by Phase I. To expand the training data in Phase II, we use three different density functions ρ 1 (v) , ρ 1.5 (v) and ρ 2 (v) for v ∈ I 1 (i, j, k) ⊆ T , as in (13), to create 3000 augmented brain images for training. We now use 2P-ResUnet-OMT to train Net0 and Net1-Net3 on 3000 training samples. Then, we utilize them to obtain predictions on the 251 validation samples. In Fig. 11, we plot the Dice scores with blue "o" and "x" symbols and the loss functions with red "o" and "x" symbols vs. the epoch numbers for the training and validation sets of WT, TC and ET, respectively. Note that the Dice scores for WT, TC and ET are defined by (15), and the loss function is defined by (16). The predicted labels of WT, TC and ET in a brain image are evaluated by Steps (i)-(iii), which are precisely determined by the probability value p t j = n(j) i=1 p t i /n(j) , ( t = 1, 2, 3 ) in each voxel v j ∈ M. We see that the training and validation Dice scores for WT, TC and ET increase very quickly during the first 50 epochs but then do not increase significantly and reach (0.9720, 0.9673, 0.9330) and (0.9325, 0.8965, and 0.8614), respectively, after 300 epochs. On the other hand, the training and validation loss functions for WT, TC and ET decrease very quickly during the first 50 epochs and approach ( 7.008 × 10 −2 , 7.067 × 10 −2 , and 8.678 × 10 −2 ) and ( 8.006 × 10 −2 , 9.487 × 10 −2 , and 9.957 × 10 −2 ), respectively, after 300 epochs. The trends of both the Dice score and loss function value indicate the typical training and validation history. Thus, based on the clear tendency of the curves of the Dice scores and loss functions, in our experiment, we run ResUnet for 300 epochs.    (13) for 1000 brain samples to obtain 4000 augmented brain cubes and use ResUNet to train Net1-Net3. Furthermore, for validation, we compute 2P-OMT for 251 brain samples with the density function ρ 1.75 (v) on the expanded WT region by Phase I with mesh refinement. We train ResUnet for 300 epochs on 4000 augmented brain tensors. From epochs 10 to 300, for every 10 epoch, we validate the Dice scores on the 251 samples of validation data for WT, TC and ET. In Table 2, we show the top three validation Dice scores for WT at epochs 150, 170, and 130; for TC at epochs 140, 120, and 170; and for ET at epochs 100, 70, and 80 by Steps (i)-(iii) in the previous section. The corresponding training Dice scores for WT, TC and ET are listed in the first three columns of Table 2. We see that the validation Dice scores for WT, TC, and ET for the brain image reach 0.93469, 0.90251 and 0.86912, respectively, which is a satisfactory result.
Dice scores with ensemble voting postprocessing. In this subsection, we show the improvement in Dice scores with mesh refinement and the ensemble voting postprocessing approach to determine the final labels in the brain image for validation. We first select the three best models for WT, TC and ET at epochs (150, 170, 130), (140, 120, 170) and (100, 70, 80), respectively, from the training procedure, as shown in Table 2, and call them Net1 ν , Net2 ν , and Net3 ν for ν = 1, 2, 3.    Fig. 12a-c, we plot the histograms of the Dice scores with and without ensemble voting postprocessing in blue and green lines for WT, TC, and ET, respectively, vs. the epoch number. Furthermore, the associated increments of the Dice scores are plotted with red lines in Fig. 12a-c. We see that the Dice scores for WT, TC, and ET with the ensemble voting technique are much better than those without voting postprocessing. In addition, the Dice score curves for WT, TC and ET have a relatively stable upward trend.
In Table 3, we show the Dice, sensitivity, specificity, and 95th percentile of the Hausdorff distance 35 (HD95) scores of 251 validation samples for WT, TC, and ET in brain images by Net1 ν -Net3 ν , ν = 1, 2, 3 , with the ensemble voting technique. We see that Net1 ν -Net3 ν with mesh refinement and ensemble voting postprocessing, as well as with the precise conversion of Steps (i)-(iii), significantly boosts the validation Dice scores (251) on BraTS 2021. This result is very promising for brain tumor detection and segmentation.
Based on the description on the BraTS homepage 12 , we believe that the 2021 dataset is rich enough and contains almost all valid brain image data from past datasets (BraTS 2017-2020). To compare our results with related works in the survey paper 12 , we list some comparable results and associated techniques of 12 , which utilize the BraTS 2018 and 2019 datasets, as shown in Table 4. From this comparison, the 2P-ResUnet-OMT is quite satisfactory based on the Dice score performance.
Finally, to present visualization results of brain tumor segmentations, in Fig. 13, we show GT and PD predicted by Net1 ν -Net3 ν , ν = 1, 2, 3 , and the corresponding FP and FN for (a) the worst case (BraTS00098) and (b) the best case (BraTS01321), respectively. In Fig. 13a, we observe that FN is mostly distributed in the area with low FLAIR values (dark gray area). This may be because we use the value of FLAIR as the density function to which OMT refers. Due to the mass-preserving property of OMT and the smaller density function value in the dark gray area, its proportion in the cube by the OMT is also smaller than that in the original image. Therefore, the predictions for this area are likely to be less accurate. The selection of a more effective density function to improve the prediction accuracy of this region is one of our main research topics in the near future.

Conclusions
In this paper, we introduce 2P-ResUnet-OMT with density estimates for 3D brain tumor detection and segmentation. We first propose a cubic volume-measure-preserving OMT algorithm to compute an OMT map for transforming an irregular 3D brain image to a cube while preserving the local mass ratios and maintaining minimal deformation. Furthermore, OMT is bijective and minimizes the transport cost. The concept of expressing an  www.nature.com/scientificreports/ irregular brain image as a cube with minimal distortion is proposed for the first time in this research field, and these cubes are typically adequate for the tensor input format of the ResUnet algorithm that creates validation networks. Representing 3D brain images as cubes significantly reduces the effective brain images from sizes of 240 × 240 × 155 to cubes of sizes 128 × 128 × 128 and preserves the global information of tumor features. This novel OMT preprocessing technique can save a large quantity of input data and reduce the computational time for training. In addition, the ensemble voting technique proposed in (17)- (18) and the robust conversion Steps (i)-(iii) of paragraph "Net0 and Net1-Net3 for Validation" from cubes ( 128 × 128 × 128 ) with predicted labels back to brain images ( 240 × 240 × 155 ) considerably increase the Dice scores for brain images compared to those for cubes on the 1,251 brain image samples. One of the characteristics of the OMT map is that it can control the densities of tumor regions in brain images, and then, by mass-preserving OMT, the high-density areas can be enlarged in the cube so that the ResUnet algorithm can strengthen the cognition and learning in the high-density regions. In fact, 2P-ResUnet-OMT in the paragraph "Two-Phase ResUnet with OMT for Training and Validation" is designed for this purpose. Phase I first captures the possible region of WT and then covers this region with 5 voxels by dilation. Next, Phase II reconstructs new smooth density functions, as in (13), and performs mesh refinement on the range estimated by Phase I. With the advantage of the mass preservation of OMT, the portion of the possible WT region can be enlarged in the cube. Then, the ResUnet algorithm is utilized to train more effective Net1-Net3 models for tumor prediction and validation.
The Dice scores of WT, TC and ET by Net0 and Net1 ν -Net3 ν for ν = 1, 2, 3 , with mesh refinement and ensemble voting postprocessing reach 0.93705, 0.90617 and 0.87470 for validation, respectively. 2P-ResUnet-OMT with mesh refinement sufficiently utilizes the mass-preserving property to significantly improve the tumor detection and segmentation accuracy.
In future work, because an irregular 3D brain image needs to be represented by only a cube in our approach, we have much room to expand the augmented data with various density settings, such as in (12); these settings include rotating, mirroring, shearing and cropping and will allow for more opportunities to boost the prediction Table 4. Comparison results of the preprocessing method, model architecture, and performance in some deep learning-based algorithms and BraTS datasets. Here DSC, SEN, and SPE denote the dice score, sensitivity, and specificity, respectively. www.nature.com/scientificreports/ accuracy. In addition, we believe that for a 3D image provided by real 3D scanning instruments that may be developed in the future, the use of OMT to represent an irregular 3D object must retain the structure of the global information. This 3D OMT representation takes advantage of a precise conversion in the three directions in space and is beneficial to the input format of CNN algorithms. We believe this is a cross-trend research direction for medical images in the near future.