Abstract
We demonstrate that isomorphically mapping graylevel medical image matrices onto energy spaces underlying the framework of fast data density functional transform (fDDFT) can achieve the unsupervised recognition of lesion morphology. By introducing the architecture of geometric deep learning and metrics of graph neural networks, gridized density functionals of the fDDFT establish an unsupervised featureaware mechanism with global convolutional kernels to extract the most likely lesion boundaries and produce lesion segmentation. An AutoEncoderassisted module reduces the computational complexity from \(\mathcal{O}\left({N}^{3}\right)\) to \(\mathcal{O}\left(N\mathrm{log}N\right)\), thus efficiently speeding up global convolutional operations. We validate their performance utilizing various openaccess datasets and discuss limitations. The inference time of each object in large threedimensional datasets is 1.76 s on average. The proposed gridized density functionals have activation capability synergized with gradient ascent operations, hence can be modularized and embedded in pipelines of modern deep neural networks. Algorithms of geometric stability and similarity convergence also raise the accuracy of unsupervised recognition and segmentation of lesion images. Their performance achieves the standard requirement for conventional deep neural networks; the median dice score is higher than 0.75. The experiment shows that the synergy of fDDFT and a naïve neural network improves the training and inference time by 58% and 51%, respectively, and the dice score raises to 0.9415. This advantage facilitates fast computational modeling in interdisciplinary applications and clinical investigation.
Similar content being viewed by others
Introduction
The rise of computer vision technology has delineated a macroscope technical blueprint for biometric identification nowadays. Sophisticated deep neural networks on lesion recognition, tracking, and segmentation of various medical images have also further promoted the evolution of clinical investigations. Glioma modality identification has attracted significant attention from scientists and engineers and orientated the mainstream development of tumor image segmentation due to its high aggressiveness^{1,2} and infiltrative^{3} properties. Thus, imagevisionbased neural networks have reached fruitful achievements on highdimensional and multichannel glioma image recognition and segmentation tasks^{3,4}. However, neural networks are pushed to incorporate more complex structures and require large datasets and time costs to train the model effectively. Complicated and nonunified deep network networks also make optimizing the model structure difficult, increase hardware costs, and reduce the execution efficiency of embedded computing units. Fortunately, the emergence of geometric deep learning (GDL) technology is opening up an avenue for these problems.
By considering the intrinsic geometric configuration of image textures, the GDL methods map them into a nonEuclidean space for feature searching. Supported by its theoretical framework, which is based on physics, both the structure and the operation of neural networks become explainable^{5}. Adopting transformation symmetry and invariance increases the identification capability of neural network models for graphic representation, thereby significantly reducing the dependency of the deep learning model on large training sets^{6}. But it turns the problem to analyzing complex differential geometry in nonEuclidean spaces^{7}. The solution to this difficulty is to use the geometric properties of the image textures in terms of their physical behavior in Euclidean space. To simultaneously consider the medical image textures, pixel intensity distributions within medical images, and the interactions between pixel pairs, introducing the energymapping method based on the density functional theory (DFT) may be a good starting point^{8}.
The modern DFT and contemporary deep neural networks share the demand for an initial guess and iteration, which speeds up their fusion and synergic effect. The initial guess of electron density functions is essential in iterations of DFT frameworks. It also benefits the procedure of supervised backpropagation for training physical parameters in deep learning approaches^{9,10}. From DFT’s perspective, the random initial parameter would have definite meanings bestowed by the intrinsic essence of manyparticle systems. Relevant studies of pattern recognition across scientific applications have exhibited successful combinations of DFT and deep neural networks, including density functional training models^{11,12}, the composition and morphology of nanoparticle surface textures^{13}, configurations of highdimensional electron density and energy functionals^{9,14,15}, the unsupervised clustering of compounds^{16}, and simulations and modelings of complex systems^{17,18,19,20,21,22}. Although incorporating DFT with these data learning methods has led to a research trend in materials science, the heterogeneity in their data structures still impedes configurational integration.
Machine learning methods like multilayer perceptrons^{12,14,15,17} and unsupervised clustering^{16,17,18,19,20,21} are still mainstream compared to imagevisionbased neural networks, such as convolutional neural networks (CNNs)^{9,11,14,15}, mainly attributed to the lowdimensional data structures of DFT framework. However, valuable information from highdimensional data structures like the energy landscapes^{20,21} or atomic configurations^{21,22} may be missed. Introducing the GDL’s concept is also obstructed^{5}. On the other hand, most neural network layers and activation functions have to be manually modulated. Despite initial guess training possessing representative physical meanings, it isn’t easy to reflect the configuration of a manyparticle system with complex textures when incorporated with neural networks. For instance, node numbers in multilayer perceptrons and sizes of convolutional kernels in CNNs directly affect the construction of the receptive field on data morphology. Neural networks combine or transform the trained parameters extracted from those receptive fields to establish corresponding feature maps. Since these operations do not consider the longrange dependency between data points and scale the dimensions of the data morphology, the feature maps generated could not reveal detailed textures of an actual physical system. Hence, energy landscapes or highdimensional atomic structures reconstructed by these feature maps might lose the original physical essence. In recent investigations, energybased neural networks in pattern recognition tasks are pursuing better frameworks to solve this predicament by estimating the pairwise dependency of image pixels with a Coulombiclike form^{23,24,25}, optimizing neural network structures^{26}, constructing energy landscape of medical images^{8}, or connecting data points using density functionals^{27}. However, these network frameworks still have to burden the high computational complexity^{27}, large memory storage^{24}, and heavy data transmission loads^{26}.
To integrate the data structures from DFT and imagevisionbased neural networks without bearing these engineering problems, we propose fusing the holographic electron density theorem and the GDL architectures. The holographic electron density theorem^{28} describes that once we obtain an electron density function within any finite volume of a manyparticle system, we know the whole configuration of density functions inside this system in principle. By introducing this concept into the framework of graph neural networks (GNNs), we first assume that the configuration of a local electron density has bijective mappings (isomorphism) in a graph structure \(G=\left({\mathcal{V}}_{G},{\mathcal{E}}_{G}\right)\), with a vertex set \({\mathcal{V}}_{G}\) and an edge set \({\mathcal{E}}_{G}\), and can construct a topological subgraph \({G}_{S}=\left({\mathcal{V}}_{{G}_{S}},{\mathcal{E}}_{{G}_{S}}\right)\) of \(G\), where \({\mathcal{V}}_{{G}_{S}}\subseteq {\mathcal{V}}_{G}\) and \({\mathcal{E}}_{{G}_{S}}\subseteq {\mathcal{E}}_{G}\). Thus we infer that the given topological subgraph \({G}_{S}\) can determine all attributes of the topological graph \(G\) when each vertex in \(G\) has onetoone weighted edges with all other ones, similar to the holographic electron density theorem. In other words, \(G\) is a complete graph, and \({G}_{S}\) is an induced subgraph of \(G\). The onetoone weighted edges represent the particlepair interactions in a physical system, while a topological subgraph indicates a local electron density. Embedding the topological graph \(G\) into twodimensional Euclidean space \({\mathbb{R}}^{2}\), the graphic structure reduces to a specific matrix, for instance, a medical image in grid space. Thus we can treat connected pixels of medical images as physical particle clusters and topological subgraphs simultaneously under the GNN metrics and use them to estimate corresponding density functionals through the DFT or extract feature maps from CNNs. Figure 1 illustrates their complementary relationship.
On the other hand, the GDL treats these feature maps as geometric priors, and the operations of convolutional kernels and image pooling correspond to translational symmetry and scale separation^{5}, respectively. Since sets of bijective mappings from the feature maps in Euclidean space onto itself form an automorphic group \(\mathrm{Aut}\left(G\right)\), we can always define a symmetry group from it to inspect the properties of transformation invariance or equivariance^{5,29} of the feature maps from CNNs. It means the feature maps and the results obtained from convolutional and pooling operations belong to or are around the same symmetry group. These properties then benefit the theoretical combination of the DFT and CNNs under the GDL architecture and GNN metrics.
In the present work, we rebuilt the frameworks of the DFT and CNNs in the manner mentioned above for unsupervised lesion recognition and segmentation of brain tumor images. We showed that the fast data density functional transform (fDDFT)^{4} achieves the desired theoretical combination and structural integration between the DFT and convolutional operations. Kinetic and potential energy density functionals (KEDF and PEDF) under the fDDFT framework provide the procedures for intensity enhancement and global convolutions of input medical images, respectively. According to the definition of the automorphic group under the GDL architecture, the input image and the KEDF and PEDF landscapes belong to the same symmetry group since these energy landscapes are equivariant to the transformation of the input image^{5}. Disjoint subsets of the pixels, namely orbits, allow us to group them based on specific structural rules intuitively^{29}. The group of these subsets forms the feature maps from CNN’s perspective. The linear combination of the KEDF and PEDF determines the Lagrangian density functional (LDF) in the grid space, and rules of structural recognition and segmentation directly rely on the geometric stability of the LDF. Thus these gridized density functionals all have characteristic operations on input medical images in the grid space.
Moreover, we introduced (1) a speedup scheme by replacing the integral of PEDF of DDFT^{8}, a previous version of the fDDFT, with global convolutional operations, (2) an energybased loss function based on the gradient ascent of LDF to resolve geometric instability occurs at grid boundaries, (3) an AutoEncoderassisted module, an unsupervised neural network, built by the smoothness of the PEDF landscape, (4) a framework of featureaware unsupervised pattern recognition and segmentation on threedimensional brain tumor image datasets, and (5) the limitation of the method when suffering the situation of lowfeatured subgraph representations. To reveal the performance of the unsupervised pattern recognition and segmentation of the proposed framework in the grid system with highlevel heterogeneity, we employed graylevel medical image datasets to mimic chaotic physical environments in the study. To verify our method, we estimated the threedimensional soft dice score for each segmented brain tumor image^{30}. In all study cases, we find that the geometric stability of the LDF landscape enhances the effect of brain tumor image recognition, the similarity convergence assists the feature selection on highdimensional image structures, and the AutoEncoderassisted module significantly reduces the computational complexity. These results support the success of theoretical combination and structural integration between the DFT and neural networks in this study. However, DFTbased methods have limitations in segmenting lowfeatured subgraphs embedded in grid space with unstable energy ranges, like those medical image subgraphs with low energy, weak connectivity, and low heterogeneity.
Methods
Isomorphically mapping the data intensity matrices into energy spaces, the DDFT simultaneously reveals data significance and similarity by computing their KEDF and PEDF, respectively^{8,31}. For a \(D\)dimensional transformation, the relationship between the intensity matrix \(\rho\) and the pseudoFermi level \(k_{F}\) has the compact form of \(\rho \left[ {k_{F} } \right] = k_{F}^{D} /\left[ {D\left( {2\pi } \right)^{D} } \right] \in {\mathbb{R}}^{D}\)^{8}. Under this relation, the KEDF landscape \(t\left[ \rho \right]\left( {\varvec{r}} \right)\) and PEDF landscape \(u\left[ \rho \right]\left( {\varvec{r}} \right)\) have the following expressions in a twodimensional grid space \({\mathbb{R}}^{m \times n}\):
and
Arguments \({\varvec{r}}\in {\mathbb{N}}^{2}\) and \({{\varvec{r}}}^{\boldsymbol{^{\prime}}}\in {\mathbb{N}}^{2}\) represent position coordinates of an observed point and a source point in the \(m\times n\) grid space, respectively. The KEDF landscape represents an enhanced pixel intensity distribution, whereas the PEDF landscape uses a Coulombic potential to estimate longdistance interaction and measure data similarity between pair pixel points.
To compute the LDF and solve the unit mismatch between KEDF and PEDF, the DDFT constructs the LDF landscape \(\mathcal{L}\left[\rho \right]\left({\varvec{r}}\right)\) in a scalefree manner:
Similarly, the Hamiltonian density functional (HDF) landscape has the following expression:
The adaptive scaling factor \(\gamma\) in the above equations is the core of DDFT. It prevents the imbalance between KEDF and PEDF landscapes when being scaled or normalized:
The measures \(\langle u\left[\rho \right]\rangle \in {\mathbb{R}}\) and \(\langle t\left[\rho \right]\rangle \in {\mathbb{R}}\) are global means of PEDF and KEDF landscapes, respectively. The LDF landscape is the difference between the regularized KEDF and PEDF landscapes. Hence, points with zero values (i.e., stable locations geometrically) on the LDF landscape reveal the boundaries where the interactions between inside and outside terminate. Disjoint subgraphs enclosed by the boundaries on the LDF landscape have high data significance and similarity and form an aware feature map. Similar situations occur in the medical image matrices. The LDF would label heterogeneous components within the images after a global search of their corresponding boundaries. These components with intensities over the mean level of LDF would become a partition of the aware feature map.
The framework of the fast data density functional transform (fDDFT)
The typical procedure to estimate the PEDF of image matrices in Eq. (2) is to numerically transfer the integral to a discrete sum, i.e., \(\frac{1}{2}{\sum }_{i=1}^{m\times n}\rho \left({{\varvec{r}}}_{i}^{\boldsymbol{^{\prime}}}\right)\Delta {{\varvec{r}}}_{i}^{\boldsymbol{^{\prime}}}/{\Vert {\varvec{r}}{{\varvec{r}}}_{i}{\prime}\Vert }_{{\varvec{r}}\ne {{\varvec{r}}}_{i}{\prime}}\), and calculate the corresponding values pixel by pixel in order. The computational complexity of \(\mathcal{O}\left({N}^{3}\right)\)^{27} for consecutively searching image pixels is so high that parallel computations become inevitable^{8}. To expedite the PEDF estimation without additional hardware costs, the relationship between the pixel intensity matrix \(\rho \left({{\varvec{r}}}^{\boldsymbol{^{\prime}}}\right)\) and the kernel \(k\left({{\varvec{r}}}^{\boldsymbol{^{\prime}}};{\varvec{r}}\right)=1/\left(2\left{\varvec{r}}{{\varvec{r}}}^{\boldsymbol{^{\prime}}}\right\right)\) in Eq. (2) was redefined and treated as a convolution in this work:
The convolution between \(\rho \left({{\varvec{r}}}^{\boldsymbol{^{\prime}}}\right)\) and \(k\left({{\varvec{r}}}^{\boldsymbol{^{\prime}}};{\varvec{r}}\right)\) in the \({{\varvec{r}}}^{\boldsymbol{^{\prime}}}\) domain equals their product of Fourier transform \(P\left( {j\varvec{k^{\prime}}} \right) \cdot K\left( {j\varvec{k^{\prime}}} \right)\) in a \(\varvec{k^{\prime}}\) domain, i.e.,\(\rho \left( {\varvec{r^{\prime}}} \right) { \circledast } k\left( {\varvec{r^{\prime}};\varvec{r}} \right) \Leftrightarrow P\left( {j\varvec{k^{\prime}}} \right) \cdot K\left( {j\varvec{k^{\prime}}} \right)\). Hence, we established a new approach for the PEDF estimation to replace the complicated integrals by calculating their product in the \({{\varvec{k}}}^{\boldsymbol{^{\prime}}}\) domain, employing zero paddings^{32} for the product, and then inverting back to the \({{\varvec{r}}}^{\boldsymbol{^{\prime}}}\) domain. Under this framework, the computational complexity reduces to \(\mathcal{O}\left(N\mathrm{log}N\right)\).
Figure 2 displays the flowchart of the fDDFT. In Step 1, we first used Eq. (1) to delineate the KEDF landscape of an input brain tumor image \(I\in {\mathbb{R}}^{H\times W}\) acquired from an opensource dataset^{33}. Then we constructed the kernel \(k\left({{\varvec{r}}}^{\boldsymbol{^{\prime}}};{\varvec{r}}\right)\) based on its spatial characteristics in 2dimensional Euclidean. The values around the kernel origin, indicated by the red arrow in the inset, are the reciprocal offsets of Euclidean distances compared to the origin point. According to this mathematical property, we defined the structure \(k\left({{\varvec{r}}}^{\boldsymbol{^{\prime}}};{\varvec{r}}\right)\) as the reciprocal distance kernel (RDK). The dimensional factors of RDK, \({H}_{r}\) and \({W}_{r}\), are products of image dimensions and a dimensionreduced factor, respectively. Due to the smoothness of the PEDF landscape (see Step 2), the dimensions of the input image and its corresponding RDK can be compressed for the following operations and the PEDF estimation. The PEDF landscape can be reconstructed back to its original dimensions without losing information. To apply this advantage to the fDDFT framework, we introduced an AutoEncoderassisted module^{20} (i.e., the blue blocks) to manage this operation. We found that setting the dimensionreduced factor to 12.5%, i.e., \(\left({H}_{r}, {W}_{r}\right)=0.125\times \left(H, W\right)\), did not affect experimental results.
In Step 2, the 2dimensional fast Fourier transforms (2DFFTs) of the input image and RDK establish the PEDF landscape by inversing and mapping their elementwise product back onto the gridized energy space. The symbol \(\odot\) is the Hadamard product. It should emphasize that the 2DFFT embedded in the AutoEncoderassisted module is equivalent to a global convolutional operation. The module with a given dimensionreduced factor could effectively reduce the computational complexity caused by convolutional operations of conventional CNNs. The gamma regularization of Eq. (5) balances the unit mismatch between KEDF and PEDF. On the other hand, since the PEDF landscape represents the similarity between pixel points, it accentuates the subgraphs in grid space that have attributes of highintensity or dense edge connections, as indicated by the red arrow in the inset. Recent research also validates that multiview learning could reinforce the complementary information of different views, and the featuresearching in a specific latent space would raise the accuracy of object recognition^{3,20}. Thus, this work supports the mechanism of our proposed AutoEncoderassisted module and the 2DFFT operations in the latent space.
The similarity convergence and the geometric stability
Skulls and normal brain tissues with high intensity in the PEDF landscape often affect the capability of tumor recognition and segmentation. To prevent this, in Step 3, we utilized the Fermi normalization to only extract the subgraph \({\mathcal{U}}_{E}\), which has dense edge connections. The Fermi normalization is^{30}
Parameters \({\rho }_{F}\in {\mathbb{R}}\) and \({\rho }_{S}\in {\mathbb{R}}\) are the global mean and standard deviation of the input matrix \(\rho\), respectively. Here the input matrix, in this case, was the PEDF Landscape. Established from the foundation of Fermi–Dirac distribution and the concept of unsupervised learning^{30}, the Fermi normalization combines the advantages of zscore normalization and sigmoid activation function. The former \(x=\left(\rho {\rho }_{F}\right)/{\rho }_{S}\) uses \({\rho }_{F}\) as an energy level, i.e., a Fermilike level of the PEDF landscape, and then \({\rho }_{S}\) to modulate the energy pattern of the PEDF landscape. The latter \(1/\left({e}^{x}+1\right)\), a frequently used activation function in CNNs, squeezes the range into \(\mathcal{F}\mathcal{N}\in \left[0, 1\right]\). Thus, the Fermi normalization can search out the data points having intrinsic energy exceeding \({\rho }_{F}\). Moreover, magnet resonance (MR) images generated from different imaging protocols often have different intensity attributes; Fermi normalization can also offer them the same ground state. It makes those pixel elements that should be in the same subgraphs to have similar intensity attributes. To avoid influences from highintensity pixel elements, we used the Fermi normalization to preserve the elements having the condition of \(\mathcal{F}\mathcal{N}>0.5\). Similar to the concept of level set, we continued the procedure until the rest pixel number was less than half of the total data length. We styled this design as the similarity convergence, as illustrated in the sequential insets in Step 3 of Fig. 2.
On the other hand, the border discontinuity of the PEDF landscape results in the geometric deformation of the LDF landscape. As shown in the LDF landscape of the DDFT version in Step 3 of Fig. 2, the problem of this geometric instability, as delineated by the dashed red line with high curvature, would hide elements having high similarity. A suitable adaptive scaling factor is required to alleviate the geometric deformation that occurred in the DDFT version. Inspired by the backpropagation mechanism in neural networks, we introduced a gradient ascent associated with the statistical property of the LDF in the fDDFT framework to update the adaptive scaling factor. The adaptive scaling factor expressed in Eq. (5) becomes:
Symbols \({\langle \mathcal{L}\left[\rho \right]\rangle }_{new}\in {\mathbb{R}}\) and \({\langle \mathcal{L}\left[\rho \right]\rangle }_{previous}\in {\mathbb{R}}\) represent the global LDF means in new and previous states, respectively. Parameter \(\eta\) is the learning rate and is set to 0.5 for convenience. The stopping criterion for updating Eq. (8) is \({\langle \mathcal{L}\left[\rho \right]\rangle }_{new}\ge 0\) to embody the geometric stability in physical structure. The base curvature of the LDF landscape in the fDDFT version of Step 3 in Fig. 2, as delineated by the dashed red line, became more flattened. The elements having high similarity also became significant, as indicated by the red arrow.
The twodimensional projection of the LDF landscape, updated by the geometric stability algorithm, forms an aware feature map in the grid space, which includes the subgraphs composed of the tumor, the skull, and some brain tissues. Please note that the criterion of \({\langle \mathcal{L}\left[\rho \right]\rangle }_{new}\ge 0\) inhibits the elements in the aware feature map with intensity levels lower than the value of \({\langle \mathcal{L}\left[\rho \right]\rangle }_{new}\). Thus the aware feature map would provide information on the most likely locations and the corresponding boundaries of the brain tumor image. To recognize and segment the subgraphs composed of the tumor elements from the aware feature map, its \(N\) observations \({\sum }_{i=1}^{N}{\mathcal{T}}_{{S}_{i}}\) to the subgraph \({\mathcal{U}}_{E}\) of the PEDF landscape must be compared. Once elements in \({\mathcal{T}}_{{S}_{i}}\) and \({\mathcal{U}}_{E}\) have the same spatial coordinates, their belonging subgraphs would become the tumor candidates:
Through the introduction of pixel connectivity, the candidates of segmented tumor elements and the rest brain tissues belong to the subgraphs of \(\left\{I\cap {\mathcal{T}}_{C}\right\}\) and of \(\left\{I/{\mathcal{T}}_{C}\right\}\), respectively, as illustrated in Step 4 of Fig. 2. Therefore, brain tumor image elements are automatically recognized and eventually segmented under the proposed fDDFT framework. We provided the relevant algorithms in Supplementary Code.
Programming implement environment
In the data preprocessing procedure, we normalized the image intensity into the range of [0, 1] to save memory sizes before removing the noise from each original image. Then we recorded the rectangular dimensions of their subcomponents using bounding boxes. In the procedure of fDDFT estimation, the dimensionreduced input image and the RDK had a global convolutional operation, and the result was scaled back to its original dimensions to construct the PEDF and KEDF landscapes. The adaptive scaling factor was used to balance the unit mismatch between the PEDF and KEDF, and these regularized functionals composed the corresponding LDF and HDF landscapes. In establishing the geometric stability of the LDF landscape, the procedure continuously updated the adaptive scaling factor and LDF until the global mean of LDF was higher than zero. In the step of similarity convergence, we defined ranges having high similarity by utilizing Fermi normalizations to define twodimensional projection areas of HDF and LDF landscapes. Then these highsimilarity ranges proposed candidates for the subgraphs of aware feature maps in the step of pixel connectivity. Only subgraphs with edges larger than the permissible smallest edge number and with dimensions 75% smaller than that of the bounding boxes could join into the aware feature map. We then sequentially arranged these aware feature maps to establish its threedimensional brain tumor structure by choosing components with the highest energy level and the largest voxel. Eventually, we employed the threedimensional soft dice score metric to calculate the accuracy of segmented brain tumor images extracted from the aware feature map^{30}:
The symbols \(H\times W\times D\) and \(\epsilon ={10}^{5}\) are the total grid number in a threedimensional grid space and a factor to avoid \(DS\) diverging, respectively. The parameters \({t}_{i}\in \left\{\mathrm{0,1}\right\}\) and \({g}_{i}\in \left\{\mathrm{0,1}\right\}\) are the binary tumor image segmentation prediction and ground truth labeling, respectively.
Results
To exhibit the ingenuity of the fDDFT and its advantage of using lowcost hardware, we deliberately chose a CPUbased operating system with a dualcore @ 3.8 GHz and 32 GB of memory. The CPU times for automatic brain tumor recognition and segmentation (see Step 4 of Fig. 2) under the DDFT and fDDFT frameworks were 10.4 and 0.05 s, respectively. It should emphasize that the conventional DDFT employed 4thread parallel computations to overcome the inevitable high computational complexity from its PEDF estimation. The significant reduction in computational time consumption (about 208x) achieved by the fDDFT reveals its superiority over the previous version. We also investigated its capability on other types of images^{33}, and Fig. 3 illustrates the relevant segmentation results. As shown in the upperleft corner of each panel, the input images exhibit different outlooks. Figures 3a and b are coronal and axial views of brain MR images with notations labeled by radiologists as red arrows indicate. Figure 3c has a white margin surrounding the MR image. Even with these interferences, the fDDFT still generates precise tumor segmentations. In addition, the fDDFT provides successful segmentation of small tumors, as shown in Fig. 3d.
The featureaware recognition and segmentation of threedimensional brain tumor image datasets
To verify the fDDFT’s capability on pattern recognition and segmentation for high dimensional data structures, we exploited the Brain Tumor Segmentation Challenge 2020 (BraTS 2020) dataset^{34,35,36}. The dataset comprises magnetic resonance imaging (MRI) scans with 369 training sets and corresponding labels. Each trial includes 155 slices with 240 pixels \(\times\) 240 pixels in size and has multimodal images acquired from sequences of T1weighted, T1 contrastenhanced (T1CE), T2weighted, and T2 fluidattenuated inversion recovery sequence (T2 FLAIR) volumes. Image labels comprise the necrotic and nonenhancing tumor core (NCR/NET), the GDenhancing tumor (ET), the peritumoral edema (ED), and their union, i.e., the whole tumor (WT). Thus, we can treat this circumstance as 369 isolated systems. Each system has a dimension of \(H\times W\times D=\) 240 grids \(\times\) 240 grids \(\times\) 155 grids and has specific compounds embedded in its owning soft tissue. Since the task now is to identify the locations and appearances of these compounds under the fDDFT framework, we only employed the T2 FLAIR datasets as inputs and considered the contribution of the WT image labels in this experiment.
Diverse intensity distributions and structural combinations of edema, necrosis, tumor cores, and brain tissues raise the difficulty in pattern recognition. It is like nonnegligible defects doped within a compound and widens the original energy level. From the GNN’s perspective, it is similar to an induced subgraph of a complete graph that misses its edges and then loses the connection or similarity to other subgraphs. Meanwhile, intensity levels are inconsistent between MRI sequential slices or within trial scans due to different imaging protocols and apparatuses, which compelled us to amend the algorithm of similarity convergence. Despite all cases in Figs. 2 and 3 having obvious brain tumor patterns in their image representations, the searching task in threedimensional brain tumor image datasets must first judge whether an MR slice has elements belonging to the brain tumor. The problems mentioned above would reduce the accuracy of similarity convergence and then obstruct the operations of pattern recognition and segmentation.
To reinforce the pattern recognition capability, we introduced LDF and HDF into the algorithm of similarity convergence. We replaced the subgraph group of PEDF with that of HDF \({\mathcal{H}}_{E}\) to find the candidate subgraph of tumors:
Figure 4 illustrates the evolutional landscapes of these functionals under the procedure of similarity convergence. We used the area of the 2dimensional projection of the PEDF landscape as the criterion of reaching similarity convergence. Fermi normalization was employed to update these projection areas of functionals in the algorithm. The Fermi normalization truncates and saves areas with the top 50% energy levels in each round, prompting these areas to move toward the ranges with high similarity. The segmentation results in Fig. 4 reveal that introductions of the HED and LED projections raise the performance of brain tumor recognition compared to that of the original PEDF projection. The shapes of the PEDF projection in the initial and final states of Fig. 4a are similar, as indicated by the dashed lines, so its landscape suggests the range of high similarity as expected. However, the PEDF projection in the final state of Fig. 4b suffers a significant deformation compared to its shape in the initial state. There are three separate graphic blocks in the input image. Even though the PEDF projection also points to the highsimilar range, it puts too much attention on the largest block. From the perspective of neural networks, the PEDF projection in the similarity convergence causes an overfitting on the feature map. Compared to the outcome of the PEDF projection, the use of HED and LED presents a better capability for brain tumor pattern recognition.
Statistical results of featureaware segmentation and limitations on lowfeatured subgraph recognition
Figure 5 illustrates the statistical results of dice score estimations and the threedimensional reconstructions of representative brain tumors. To simplify the procedure of pixel connectivity, we only allowed the subgraphs with the smallest edge number exceeding the value of \(3\pi \sqrt{{\Vert I\Vert }_{0}}/2\) to stay on the aware feature map of the LED, where \({\Vert I\Vert }_{0}\) is the \({\mathcal{l}}_{0}\) norm of an input image \(I\in {\mathbb{R}}^{H\times W}\). The blue bars in Fig. 5a show the statistical distribution of the dice score for the whole BraTS 2020 set. The mean soft dice score of the unsupervised brain tumor image segmentation performed by aware feature maps is 0.6765. The quartiles are 0.5995, 0.7529, and 0.8402; the corresponding case numbers are 277, 185, and 92. The CPU time of fDDFT execution for each trial was about 1.76 s on average. The median dice score of the unsupervised brain tumor recognition and segmentation using the fDDFT has reached the standard performance of conventional deep neural networks^{8,30}. To visualize unsupervised segmentation results and discuss the limitations, we present the results of three representative cases to elaborate on the interactions between lowfeatured subgraphs and aware features. Descriptions of how these situations are analogous to the behavior of impurities along the Fermi surface in a chaotic physical system are also provided. Figure 5b displays twodimensional aware feature maps with lowfeatured subgraphs of the three cases. The numbers of 1, 2, and 4 markered on each ground truth represent NCR/NET, ED, and ET labels, respectively. With the highest soft dice score, case No. 315 has a lowfeatured subgraph connected to the tumor body because of the low heterogeneity between the brain tissue and tumor, resulting in a notperfect dice score. It is similar to a few impurities occupied near the Fermi surface and had slight perturbation on energy levels. In Fig. 5c, we exhibit the reconstructed threedimensional brain tumor of No. 315 and the label structure. The red arrow below the tumor indicates an undesired part formed by a lowfeatured subgraph embedded in the native aware feature map.
Cases 225 and 82 represent huge and tiny tumor bodies, respectively. Figure 5d and e illustrate their reconstructed results. The aware feature map of No. 225 shows an unsatisfying result. It misses the desired GDenhancing tumor, shown in the middle panel of Fig. 5b. We deduce that pixels with low energy and weak pair connection, as shown in the input image, block out the feature extraction in the procedure of similarity convergence. It is like several lowenergy impurities filled below the Fermi surface; they obstruct the electron transmission along the surface and thus cover the exertion of intrinsic material properties. Since giant tumors often produce a wide range of GDenhancing tumor representations in MRI, lowenergy pixels of these representations would obstruct unsupervised tumor pattern recognition, as shown in the reconstructed result of Fig. 5d. We also inspected the capability of pattern recognition and segmentation on tiny tumors. The lowfeatured subset of No. 82 displays a low heterogeneity between the tumor body and the thalamus, as shown in the bottom panel of Fig. 5b. Hence, the LDF could hardly distinguish their energy difference in the procedure of similarity convergence. This situation is analogous to the impurities occupied above the Fermi surface, and their induced carriers dominate the electron transmission within the conduction band. It also means that signals in the conduction band are mixed from the transmission and induced carriers. In other words, low heterogeneity would also significantly limit the capability of brain tumor pattern recognition and segmentation, as shown in Fig. 5e.
To avoid the specificity of the permissible smallest number interfering with the statistical result of the soft dice score, we also replaced the number with \(\pi \sqrt{{\Vert I\Vert }_{0}}\). The pink bars in Fig. 5a exhibit the corresponding statistical outcomes. It should be noted that we only applied this number to those trials with soft dice scores below the mean value of the original distribution. In this additional experiment, the quartiles are 0.6344, 0.7594, and 0.8414. As shown in Fig. 5a, the case number over the dice score of 0.7 increased significantly, but the other cases moved to the range of low scores. Thus, these statistical measures did not lead to a significant increase in the average dice score. In other words, as complex as a chaotic physical system, it is difficult to describe the diverse modality of brain tumor images by controlling the permissible smallest edge number naively.
The synergy between fDDFT and deep learning model
To inspect the synergic performance of applying the fDDFT to the conventional deep neural network, we also adapted the dimensionfusion UNet (DUNet)^{37}. The hardware specifications were AMD Ryzen Threadripper PRO 3955WX 16core CPU and an NVIDIA GeForce RTX 3090 GPU in this experiment. The initial learning rate was \(1.5\times {10}^{4}\), and the batch size and epoch were 16 and 60, respectively. We randomly picked 85% and 15% from the training set of BraTS 2020 as the training and validation sets. We also used the soft dice loss^{30} for loss function estimation and employed Adam as the optimizer in the training procedure. Table 1 lists the performance comparison, including the training time, inference time, and segmentation capability between the naïve DUnet and the synergic method. In the synergic method, we used fDDFT on the whole dataset and fed it into the DUnet as inputs. The total preprocessing time of fDDFT is about 14.4 min. The training and inference time was significantly reduced by 58% and 51%, respectively. Meanwhile, the whole segmentation scores of the synergic method are all superior to those of the naïve DUnet. These results also match the GDL’s perspective^{5}.
Discussion
We propose the fast data density functional transform (fDDFT) for featureaware unsupervised pattern recognition and segmentation based on the DFT configuration and the GDL architecture. Under the fDDFT framework, we create an AutoEncoderassisted module to reduce the computational complexity of global convolutional operations, a skill of geometric stability to enhance the capability of pattern recognition, and a mechanism of similarity convergence to assist the feature selection. We also utilize threedimensional brain tumor structures to imitate soft matter in a chaotic physical system and inspect unsupervised soft matter pattern recognition and segmentation capability. The performance of this framework achieves the standard requirement for conventional deep neural networks. The representative cases reflect ordinary circumstances in actual physical systems and thus point out the limitations of the fDDFT. Procedures of similarity convergence and pixel connectivity in the fDDFT have to face the challenges of low intensity, weak pair connection, and low heterogeneity.
Integrating the data structures from DFT and contemporary deep neural networks is another significant contribution of this article. Even though managing data structures under graphic architecture is still mainstream, the fDDFT regularizes and transforms these structures in the Euclidean space under the GDL architecture and GNN metrics. This strategy benefits the connection between the nonEuclidean and the grid spaces and reinforces the capability of data structural visualization for computational modeling applications.
On the other hand, we have to emphasize that using twostage calculation to obtain better soft dice score distributions, as shown in Fig. 5a, is not permissible in deep learning. Deep learning methods pursue a direct way for lesion pattern recognition and segmentation because they assume the corresponding labels are not accessible, and the training procedures are timeconsuming. Modifying important parameters in deep learning implies possible retraining of the whole model. Fortunately, the fDDFT can easily accommodate parameter rearrangement in any procedure of the model pipeline because of the short inference time of 2 s. The fDDFT can be a precursor of modern deep neural networks on feature selection and enhancement. The synergic experiment has validated that our proposed method can significantly improve the training time, inference time, and segmentation capability of neural networks. This technical advantage could benefit clinical investigations for clinicians to verify previous results, modify parameters, and obtain updated segmented outcomes.
Data availability
The BraTS 2020 dataset is publicly available (https://www.med.upenn.edu/cbica/brats2020/data.html). All other datasets that support this work are also available from the corresponding references. The Matlab code developed in this work is available in the Supplementary file.
References
Bauer, S., Wiest, R., Nolte, L. & Reyes, M. A survey of MRIbased medical image analysis for brain tumor studies. Phys. Med. Biol. 58, R97–R129. https://doi.org/10.1088/00319155/58/13/R97 (2013).
Rajinikanth, V., Satapathy, S. C., Fernandes, S. L. & Nachiappan, S. Entropy based segmentation of tumor from brain MR images – A study with teaching learning based optimization. Pattern Recognit. Lett. 94, 87–95. https://doi.org/10.1016/j.patrec.2017.05.028 (2017).
Ning, Z., Tu, C., Di, X., Feng, Q. & Zhang, Y. Deep crossview coregularized representation learning for glioma subtype identification. Med. Image Anal. https://doi.org/10.1016/j.media.2021.102160 (2021).
Su, Z.J. et al. Attention Unet with dimensionhybridized fast data density functional theory for automatic brain tumor image segmentation. In Lecture Notes in Computer Science (eds Crimi, A. & Bakas, S.) (Springer Nature Switzerland, 2021).
Bronstein, M. M. et al. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Process. Mag. 34, 18–42. https://doi.org/10.1109/MSP.2017.2693418 (2017).
Winkels, M., & Cohen, T. S. 3D GCNNs for pulmonary nodule detection. Preprint at arXiv:1804.04656 (2018).
Cohen, T. S., Weiler, M., Kicanaoglu, B., & Welling, M. Gauge equivariant convolutional networks and the icosahedral CNN. Preprint at arXiv:1902.04615 (2019).
Chen, C.C., Juan, H.H., Tsai, M.Y. & Lu, H.H.S. Unsupervised learning and pattern recognition of biological data structures with density functional theory and machine learning. Sci. Rep. https://doi.org/10.1038/s41598017189315 (2018).
Zhou, Y., Wu, J., Chen, S. & Chen, G. H. Toward the exact exchangecorrelation potential: A threedimensional convolutional neural network construct. J. Phys. Chem. Lett. 10, 7264–7269. https://doi.org/10.1021/acs.jpclett.9b02838 (2019).
Manukian, H., Pei, Y. R., Bearden, S. R. B. & Di Ventra, M. Modeassisted unsupervised learning of restricted Boltzmann machines. Commun. Phys. https://doi.org/10.1038/s4200502003738 (2020).
Meyer, R., Weichselbaum, M. & Hauser, A. W. Machine learning approaches toward orbitalfree density functional theory: Simultaneous training on the kinetic energy density functional and its functional derivative. J. Chem. Theory Comput. 16, 5685–5694. https://doi.org/10.1021/acs.jctc.0c00580 (2020).
Ballard, A. J. et al. Energy landscapes for machine learning. Phys. Chem. Chem. Phys. 19, 12585–12603. https://doi.org/10.1039/c7cp01108c (2017).
Zhou, Q. et al. Realspace imaging with pattern recognition of a ligandprotected Ag_{374} nanocluster at submolecular resolution. Nat. Commun. https://doi.org/10.1038/s41467018053725 (2018).
Wei, J. et al. Machine learning in materials science. InfoMat 1, 338–358. https://doi.org/10.1002/inf2.12028 (2019).
Li, H. et al. A density functional tight binding layer for deep learning of chemical hamiltonians. J. Chem. Theory Comput. 14, 5764–5776. https://doi.org/10.1021/acs.jctc.8b00873 (2018).
Zhang, Y. et al. Unsupervised discovery of solidstate lithium ion conductors. Nat. Commun. https://doi.org/10.1038/s41467019132141 (2019).
SantosSilva, T., Teixeira, P. I. C., AnquetilDeck, C. & Cleaver, D. J. Neuralnetwork approach to modeling liquid crystals in complex confinement. Phys. Rev. E 89, 053316. https://doi.org/10.1103/PhysRevE.89.053316 (2014).
Sakano, M. N. et al. Unsupervised learningbased multiscale model of thermochemistry in 1,3,5Trinitro1,3,5triazinane (RDX). J. Phys. Chem. A 124, 9141–9155. https://doi.org/10.1021/acs.jpca.0c07320 (2020).
Packwood, D. M. Exploring the configuration spaces of surface materials using timedependent diffraction patterns and unsupervised learning. Sci. Rep. https://doi.org/10.1038/s41598020627826 (2020).
Verriere, M. et al. Building surrogate models of nuclear density functional theory with Gaussian processes and autoencoders. Front. Phys. https://doi.org/10.3389/fphy.2022.1028370 (2022).
Elbaz, Y., Furman, D. & Toroker, M. C. Modeling diffusion in functional materials: From density functional theory to artificial intelligence. Adv. Funct. Mater. https://doi.org/10.1002/adfm.201900778 (2019).
Kuban, M., Rigamonti, S., Scheidgen, M. & Draxl, C. Densityofstates similarity descriptor for unsupervised learning from materials data. Sci. Data. https://doi.org/10.1038/s4159702201754z (2022).
Wang, X., Cirshick, R., Gupta, A., & He, K. Nonlocal Neural Networks. IEEE/CVF Conf. CVPR, 7794–7803 (2018).
Watters, N. et al. Visual interaction networks: Learning a physics simulator from video. Conf. NIPS (2017).
Chen, C.C., Tsai, M.Y., Kao, M.Z. & Lu, H.H.S. Medical image segmentation with adjustable computational complexity using data density functionals. Appl. Sci. https://doi.org/10.3390/app9081718 (2019).
Hsu, F.S. et al. Lightweight deep neural network embedded with stochastic variational inference loss function for fast detection of human postures. Entropy https://doi.org/10.3390/e25020336 (2023).
Yeo, B. C., Kim, D., Kim, C. & Han, S. S. Pattern learning electronic density of states. Sci. Rep. https://doi.org/10.1038/s41598019422779 (2019).
Mezey, P. G. The holographic electron density theorem and quantum similarity measures. Mol. Phys. 96, 169–178 (1999).
Bouritsas, G., Frasca, F., Zafeiriou, S. & Bronstein, M. M. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Trans. Pattern Anal. Mach. Intell. 45, 657–668. https://doi.org/10.1109/TPAMI.2022.3154319 (2023).
Tai, Y.L., Huang, S.J., Chen, C.C. & Lu, H.H.S. Computational complexity reduction of neural networks of brain tumor image segmentation by introducing fermidirac correction functions. Entropy https://doi.org/10.3390/e23020223 (2021).
Zaiser, M. Local density approximation for the energy functional of threedimensional dislocation systems. Phys. Rev. B 92, 174120. https://doi.org/10.1103/PhysRevB.92.174120 (2015).
Elliott, W. D. & Board, J. A. Jr. Fast fourier transform accelerated fast multipole algorithm. SIAM J. Sci. Comput. 17(2), 398–415 (1996).
Zhang, Y. & Wu, L. An MR brain images classifier via principal component analysis and kernel support vector machine. Prog. Electromagn. Res. 130, 369–388 (2012).
Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024. https://doi.org/10.1109/TMI.2014.2377694 (2015).
Bakas, S. et al. Advancing the cancer genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data. https://doi.org/10.1038/sdata.2017.117 (2017).
Bakas, S. et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. Preprint at arXiv:1811.02629 (2018).
Zhou, Y., Huang, W., Dong, P., Xia, Y. & Wang, S. DUNet: A dimensionfusion U shape network for chronic stroke lesion segmentation. IEEEACM Trans. Comput. Biol. Bioinform. 18, 940–950. https://doi.org/10.1109/TCBB.2019.2939522 (2021).
Acknowledgements
This research was partially supported by the National Science and Technology Council, Taiwan, under grant numbers 1112221E008087, 1102118MA49002MY3, and 1112634FA49014.
Author information
Authors and Affiliations
Contributions
H.H.S.L. supervised and suggested the work. S.J.H. and C.C.C. established the theoretical framework and the corresponding algorithms. S.J.H., C.C.C., and Y.K. designed the pipeline of simulations and arranged the computing machine. All authors have discussed the results and contributed to the article’s writing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, SJ., Chen, CC., Kao, Y. et al. Featureaware unsupervised lesion segmentation for brain tumor images using fast data density functional transform. Sci Rep 13, 13582 (2023). https://doi.org/10.1038/s41598023408485
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598023408485
This article is cited by

Comprehensive Review on MRIBased Brain Tumor Segmentation: A Comparative Study from 2017 Onwards
Archives of Computational Methods in Engineering (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.