Abstract
Dense disparities among multiple views are essential for estimating the 3D architecture of a scene based on the geometrical relationship between the scene and the views or cameras. Scenes with larger extents of homogeneous textures, differing scene illumination among the multiple views and with occluding objects affect the accuracy of the estimated disparities. Markov random fields based methods for disparity estimation address these limitations using spatial dependencies among the observations and among the disparity estimates. These methods, however, are limited by spatially fixed and smaller neighborhood systems or cliques. Recent learningbased methods generate rich set of stereo features for generating cost volume and estimating disparity. In this work, we present a new factor graphbased probabilistic graphical model for disparity estimation that allows a larger and a spatially variable neighborhood structure determined based on the local scene characteristics. Our algorithm improves the accuracy of disparity estimates in stereo image pairs with varying texture and illumination characteristics by enforcing spatial dependencies among scene characteristics as well as among disparity estimates. We evaluated our method using the Middlebury benchmark stereo datasets and the Middlebury evaluation dataset version 3.0 and compared its performance with recent stateoftheart disparity estimation algorithms. Our factor graphbased algorithm provided disparity estimates with higher accuracy when compared to the recent nonlearning and learningbased disparity estimation algorithms. The factor graph formulation can be used for obtaining maximum a posteriori estimates from models or optimization problems with complex dependency structure among hidden variables. The strategies of using a priori distributions with shorter support and spatial dependencies were useful for reducing the computational cost and improving message convergence in the model. The factorgraph algorithm is also useful for other dense estimation problems such as optical flow estimation.
Similar content being viewed by others
Introduction
The 3D architecture of an object or a scene can be estimated from two or more views of the scene by determining dense correspondences among the multiple views of the scene. A disparity map comprised of dense pixellevel correspondences between images in a stereo pair can be estimated using stereo matching algorithms. Stereo disparity estimation has wideranging applications such as robot navigation^{1}, aerial data analysis^{2}, image sequence analysis^{3}, and 3D surface reconstruction^{4}. The presence of large untextured regions (homogeneous intensity), occluding objects and uneven intensity distributions in the stereo pair of images introduce significant challenges in estimating stereo disparity maps.
Two broad categories of stereo matching methods are windowbased and energybased algorithms^{5}. The energybased algorithms^{6} are global methods with a cost function defined as a function of the entire image extent^{7}. Windowbased or local methods utilize a finite support window to define the cost function and are suitable for realtime applications. However, regions with homogeneous texture and occlusion affect the accuracy of the disparity estimated using local methods.
In general, stereo disparity estimation procedures include an initial cost calculation step, a cost aggregation step, an optimal disparity estimation step, and a disparity refinement step. Some of the successful and commonly used disparity cost measures are normalized crosscorrelation (NCC)^{8}, sum of squared differences (SSD)^{9}, sum of absolute differences (SAD)^{10}, gradientbased measures^{11}, featurebased measures^{12}, rank transform (RT)^{13}, and census transform (CT)^{14}. Cost aggregation is useful for minimizing matching uncertainties and improving the accuracy of the estimates^{15}. During cost aggregation, initial disparity cost measures within the support region of each pixel are aggregated^{16} such as using a low pass filter with a fixed kernel size or a variable support window (VSW)^{17}, using adaptive support weight (ASW)^{18}, and using a crossbased support window^{19}. Using edgepreserving filters such as a bilateral filter (BF)^{18,20} and guided image filter (GIF)^{21} for cost aggregation provides a significant improvement on the final disparity estimates over the initial disparity estimates. An optimal disparity map or disparity estimates are obtained from the aggregated cost volume using optimization procedures such as the winnertakeall (WTA)^{22} or using global optimization procedures such as dynamic programming(DP)^{23}, graph cut (GC)^{24}, intrinsic curves^{25}, and belief propagation^{26} methods.
Recently, learningbased methods have been successful in estimating stereo disparity with higher accuracy. Among the recent learningbased disparity estimation techniques, FCDCNN is a lightweight, fully convolutional and densely connected neural network^{27} based on a densely connected convolution network architecture^{28} and a CNN based patchmatching technique for estimating stereo disparity^{29}. FCDCNN is a fully convolutional siamese neural network with individual network branches with shared weights for simultaneously processing left and right stereo images. With 5 convolutional layers and 64 filters in each layer, the network generated features were used to estimate similarity between a reference patch centered at each pixel in the left image and candidate patches from the right image in the stereo pair. For each pixel location, a cosine similarity measure was estimated using the corresponding feature vectors from the left and right channels of the network. The cost volume was filtered along all three dimensions and an initial disparity map was obtained using a winnertakesall strategy. Accuracy of the initial disparity estimate at each pixel was assessed using a leftright consistency check. Inconsistent locations were identified and removed from the disparity map. Disparity values at these inconsistent locations were estimated based on classification of these locations as either part of homogeneous scene background or textured foreground objects, and using additional morphological processing of the disparity map. When compared to the MCCNNACRT^{29}, GCNet^{30}, and PSMNet^{31}, FCDCNN is a lightweight network with 0.37 M trainable parameters with higher postprocessing computational cost.
A successful strategy for improving accuracy of disparity estimates is to utilize spatial dependencies of the scene characteristics as well as of the disparity estimates. Among the probabilistic inference formulations for estimating dense geometric correspondences^{32}, Markov Random Fields (MRF) based approaches have been successful in modeling spatial geometrical dependencies^{33}. In brief, unknown true disparities among multiple views of a scene are modeled as random variables on the pixel lattice (random field) and dependence among random variables is assumed to follow Markov property. Thus, MRF represents a joint distribution of the random field using conditional distributions of each of the random variables. In learningbased MRF models, parameters of the MRF potential functions are either learned separately from training data^{34,35} or along with the unknown states of the random variables^{36}.
One of the limitations of the MRF models is that the neighborhood system used for enforcing spatial dependencies needs to be maximal. Further, the chosen dependency structure is uniformly enforced for all the random variables. Therefore, pairwise cliques or \(2 \times 2\) cliques are more commonly used in MRF models. While learning methods are available for optimizing MRF parameters and neighborhood structure, they are generally limited to specific tasks.
Establishing dense correspondences among geometrical coordinates of multiple views is an illposed problem due to scene occlusion. Occluded regions are locations that are visible only in one of the images in a stereo image pair. Therefore, it is not possible to directly estimate disparities in these locations. Other primary sources of errors in disparity estimation are presence of larger scene extents with smoother texture characteristics, differences in scene illumination with respect to the multiple views/observers, and presence of sharp depth or disparity boundaries.
In this paper, we present a new factor graphbased probabilistic graphical model (FGS algorithm) that provides more accurate estimates of stereo disparity by using spatial dependency of scene characteristics (image intensity) and of disparity estimates. Figure 1 shows a schematic representation of computational steps in the FGS algorithm for generating a posterior disparity estimates for each stereo image pair. From a reference (left) image in each stereo pair, locations of various scene elements (foreground objects, and background elements) were grouped into image segments. A sparse but highly confident set of disparity estimates within each segmented region were used to build a priori distribution of disparities within each segmented region. In a factor graph model, all pixel locations within an image segment were associated with the respective a priori disparity distribution. For each pixel location, a posteriori disparity was estimated by enforcing spatial dependency among disparity estimates using a larger but spatially variable neighborhood system. A larger neighborhood system improves disparity estimates in regions with smoother texture and occlusion. A spatially variable neighborhood system provides higher disparity contrast along the depth boundaries. A final disparity map was obtained by postprocessing the a posterior disparity map generated by the factor graph.
The main contributions of this study are as follows:

We present a new factor graphbased disparity estimation algorithm. To the best of our knowledge, this is the first application of factor graphs for estimating stereo and multiview disparity.

The probabilistic graphical model allows use of larger and spatially variable neighborhood structure for enforcing belief exchange and dependency among highly related neighboring pixel locations. While larger neighborhood systems are beneficial for improving disparity estimates, the computational cost of belief exchange among neighbors in a graphical model is prohibitively high. Our FGS framework allows use of larger neighborhood systems with reduced computational steps by retaining disparitydependency relationship only among highly related neighboring locations.

The FGS algorithm generates disparity maps with higher contrast along depth boundaries. This is possible because of our use of spatiallyvariable dependency structures while retaining dependency only with highly related neighboring locations (e.g. by discarding dependencies with pixels across an object or depth boundary).

We present strategies for reducing computational cost and for accelerating convergence of belief message exchanges in the factor graph namely 1. by reducing computational steps required for calculating belief measures in the factor graph by using binary potential functions; 2. by reducing number of variables as well as their limits of summation for marginalizing joint distributions during belief propagation in the factor graph; and 3. by generating a priori distributions with shorter support.

The factor graph structure can be used for determining a posteriori estimates in optimization problems with complex dependency structure among hidden variables.
We demonstrate the performance of the proposed probabilistic factor graph model by conducting extensive experiments using the Middlebury benchmark stereo datasets^{37,38,39,40} and compare its performance with other stateoftheart disparity estimation algorithms using Middlebury evaluation dataset version 3.0^{5}.
The remainder of the paper is structured as follows. In Section "Probabilistic factor graph model for disparity estimation", we present a detailed description of the new probabilistic factor graphbased stereo disparity estimation (FGS) algorithm. We present our experimental results in Section "Experimental results and discussion" and conclude this work in Section "Conclusions".
Probabilistic factor graph model for disparity estimation
For notational convenience, a linear index \(i \in \left\{ 1, \ldots , MN \right\}\) was used to identity each of the pixels in images of size \(M \times N\) pixels. Rectified stereo image pairs were corrected for any illumination differences using a homomorphic filter^{41}. In brief, each image I(i) is modeled as an interaction between an illumination component l(i) and a reflectance component r(i) as \(I(i) = l(i) \, r(i)\). Assuming that the illumination varies gradually over the imaging area, the illumination variation is subtracted from each image in the \(\log\) domain using a highpass filter. For each of the rectified and illumination corrected stereo image pairs, disparities between the left and right images were calculated by using the left image as reference.
Graph structure and message passing for approximate inference
A list of symbols used in the remainder of the paper is presented in Table 1. Figure 2 shows the schematic diagram of a factor graph model designed for estimating a disparity map from each stereo image pair. A disparity map is comprised of disparities \(\mathbb {D}= \left\{ d_i: d_i \in [d_{\min }, d_{\max } ] \subset \mathbb {Z} \right\} _{i=1}^{MN}\) at corresponding image pixel locations \(\left\{ 1, \ldots , MN \right\}\) in the reference image. The bipartite graph is comprised of a set of variable nodes \(\mathbb {V}\) and a set of factor nodes \(\mathbb {F}= \mathbb {E}\cup \mathbb {S}\). The variable nodes \(\mathbb {V}= \left\{ 1, \ldots , MN \right\}\) represent disparity labels assigned to each pixel. Evidence factor nodes \(\mathbb {E}= \left\{ 1, \ldots , MN \right\}\) provide prior beliefs or evidences in assigning possible disparity labels to pixel locations. Dependency factor nodes \(\mathbb {S}= \left\{ 1, \ldots , MN \right\}\) are used to model spatial dependencies among the disparity labels assigned to the neighboring pixels.
A random variable \(d_i \in \mathbb {D}\) is assigned to each variable node \(i \in \mathbb {V}\) to represent the disparity label assigned to the \(i\hbox {th}\) pixel location. Each \(j\hbox {th}\) evidence factor node in \(\mathbb {E}\) is connected onetoone with the corresponding \(i\hbox {th}\) variable node in \(\mathbb {V}\) to incorporate prior belief or evidence in determining the disparity labels \(d_i\). To represent the influence of each pixel location on its neighboring pixels, each variable node \(i \in \mathbb {V}\) is connected to one or more dependency factor nodes \(\left\{ k \in \mathbb {S} \right\}\) using localized intensity characteristics in the reference image as described in Section "Determining variable nodes associated with each dependency factor node".
Let, ne(i) represent the set of neighboring nodes connected with any given node i; \(ne(i)\setminus j\) represent a set of all neighboring nodes of i excluding node j, where \(\setminus\) represents the set subtraction operation; \(\mathbb {D}_j = \left\{ d_i, \forall i \in \mathbb {V}: i \in ne\left( j \in \mathbb {F}\right) \right\}\) be a collection of random variables associated with any factor node \(j \in \mathbb {F}\); and let, \(\mathbb {D}_j\setminus i \subseteq \mathbb {D}\) be a collection of random variables in \(\mathbb {D}_j\) except \(d_i\). An evidence potential function \(\psi _j\) associated with each factor node \(j \in \mathbb {E}\) is defined as a function of random variables \(\mathbb {D}_j\) of its neighboring nodes \(\psi _j\left( \mathbb {D}_j\right) = \psi _j\left( d_j\right)\). Similarly, the potential function \(\psi _k(\mathbb {D}_k)\) at the \(k\hbox {th}\) dependency factor node \(k \in \mathbb {S}\) is defined as a function of the random variables \(\mathbb {D}_k\) associated with its neighboring nodes. Therefore, the factor graph represents the joint distribution of disparity labels assigned to each of the pixel locations as
where, Z is the partitioning function. Probability of assigning various disparity labels to each pixel i can be obtained by marginalizing Eq. (1) with respect to \(\mathbb {D}\setminus i\).
This provides a sumproduct formulation^{42,43} for determining likely disparity labels for each of the pixel locations i based on a priori disparity information and spatial dependency characteristics of disparities in a stereo image pair.
For an approximate and efficient computation of marginal beliefs or probabilities in Eq. (2) using loopy belief propagation, local information available in each node is shared with neighboring nodes as variabletodependency factor messages \(\mu _{i\in \mathbb {V}\rightarrow f\in \mathbb {S}}\) and factortovariable messages \(\mu _{f\in \mathbb {F}\rightarrow i\in \mathbb {V}}\) until convergence^{44,45}. Each outgoing message from a node is defined as a function of incoming messages at the given node as follows^{44}.
Probabilistic model
For approximate inference on disparity label assignment, message structures for loopy belief propagation in Eqs. (3) and (4) and relevant potential functions were defined based on the posterior probability of disparity label assignment,
where, \(f_j\left( \mathbb {D}_j\right)\) is a function of the state of all variable nodes neighboring the factor node j. In our proposed method, the most relevant and compact spatial dependencies are defined independently at each pixel i using local image characteristics as described in Section "Determining variable nodes associated with each dependency factor node". Therefore, the joint likelihood term can be simplified as
and the posterior probability updated as
It can be observed that the posterior probability of \(d_i\) without conditioning on a dependency factor node \(k \in ne\left( i\right)\)
resembles the structure of the variabletodependency factor node message \(\mu _{i \rightarrow k}\) in Eq. (3). This further suggests that individual likelihood terms \(p\left( f_j\left( \mathbb {D}_j\right) \mid d_i\right)\) form the dependency factor nodetovariable node message structure in Eq. (4).
Considering the individual likelihood terms in Eq. (6),
and assuming that random variables \(\mathbb {D}_j\) associated with a dependency factor node \(j \in \mathbb {S}\) are independent as in Moon & Gunther^{46},
In the absence of any loops between any two variable nodes k and i (i.e. when there is at most one common dependency factor node \(s \in \mathbb {S}\) between any two variable nodes k and i), the conditional probability \(p\left( d_k \mid d_i\right)\) can be interpreted as the posterior probability of \(d_k\) conditioned on all dependency factor functions associated with the \(k\hbox {th}\) variable node, i.e. \(p\left( d_k \mid d_i\right) \propto p\left( d_k \mid \left\{ f_l\left( D_l\right) , \forall l : l \in ne\left( k\right) \right\} \right)\). By excluding the factor function \(f_s\left( \mathbb {D}_s\right)\) associated with the factor node \(s \in \left\{ ne\left( k\right) \cap ne\left( i\right) \right\}\) in \(p\left( d_k \mid d_i\right)\), the individual likelihood expression in Eq. (7) resembles the dependency factor nodetovariable node message structure \(\mu _{j\rightarrow i}\) in Eq. (4).
Message passing implementation
Based on the messagepassing structures in our factor graph model, the variabletodependency factor node message \(\mu _{i\in \mathbb {V}\rightarrow f\in \mathbb {S}}\) in Eq. (3) was approximated as the posterior probability of \(d_i\) conditioned on (satisfying) all of its neighboring factor nodes except the factor node f to which the message is sent as in Eq. (6). Similarly, the factor nodetovariable node messages \(\mu _{f\in \mathbb {F}\rightarrow i\in \mathbb {V}}\) in Eq. (4) was approximated as the likelihood of satisfying spatial dependency among the random variable states of all variables nodes associated with the factor node f except \(d_i\).
It can be observed that, for evidence factor nodes \(f \in \mathbb {E}\), the factortovariable messages in Eq. (4) simplifies as \(\mu _{{f \in \mathbb {E}} \rightarrow i\in \mathbb {V}} = \psi _f\left( d_i\right)\). This supplies fixed prior information about the state \(d_i\) of the ith variable node by restricting messages from variable nodes to only dependency factor nodes as in Eq. (3). We estimated the a priori distribution \(p\left( d_i\right)\) from a disparity cost volume containing the cost of assigning all possible disparity labels at each pixel location. Details of building the disparity cost volume is presented in Section 2.4. Therefore, the approximate inference obtained using our factor graph model provides updated disparities \(\mathbb {D}\) (an optimal surface within the cost volume) based on their posterior probabilities.
The potential function \(\psi _f\left( \mathbb {D}_f\right)\) in Eq. (4) and its probabilistic representation \(p\left( f_j\left( \mathbb {D}_j\right) \right)\) in Eq. (7) at the spatial dependency factor node j was assigned a value of 1.0 when the states \(\mathbb {D}_j \setminus i\) of the neighboring nodes are same as that of \(d_i\); and was assigned a value of 0.0 when the states \(\mathbb {D}_j \setminus i\) are different from that of \(d_i\). This enforces spatial dependencies among the neighboring pixels in the final disparity map \(\mathbb {D}\). In addition, this further reduces the number of marginalization operations in Eqs. (4) and (7).
For each stereo image pairs, each evidence factor nodes \(e \in \mathbb {E}\) were initialized with a priori probabilities \(p\left( d_i\right)\) of the variable node \(i \in ne\left( e\right)\) as the evidence factor nodetovariable node messages. The initial message from other variable and dependency factor nodes were set to be a uniform probability vector representing equally likely states. After initial message passing from the evidence factor nodes \(\mathbb {E}\), message exchange continues among all graph nodes \(\mathbb {V}\cup \mathbb {F}\) until message convergence. We utilized an \(L_2\) measure of
for assessing message convergence at message passing iteration \(t+1\).
Disparity cost volume and a priori disparity distribution \(p\left( d_i\right)\)
For any given stereo image pair, let, \(C\left( i, d_i\right)\) be a cost volume representing the cost of assigning a disparity label \(d_i\) to location i. Thus the final disparity map for a given stereo image pair will be an optimal surface within the cost volume \(C\left( i, d_i\right)\). Figure 3 shows a schematic representation of algorithmic steps used for disparity cost volume calculation.
For computationally efficient disparity estimation, the reference image was segmented using an unsupervised texture segmentation method^{47}. In brief, sharp image segment boundaries were derived using a Gabor filter bank and image boundaries were aggregated using kmeans clustering to generate an image segmentation map. Within each of the segmented regions, highly confident disparity estimates at several candidate locations were obtained using an eigenbased feature matching method^{48}. Using the zonal/regional disparity distributions, disparity cost at each of the location i was calculated using a normalized crosscorrelation measure in the frequency domain. Within each segmented region, only disparity labels ranging from \(\left[ \mu _d  \sigma _d, \mu _d + \sigma _d\right]\) were considered based on the distribution of the disparity labels within the segmented region as shown in Fig. 3. This results in a sparse disparity cost volume \(C(i, d_i)\) and thus facilitates a faster inference due to reduced marginalization limits in Eq. (2). A detailed description of the initial cost volume computation as part of a hybrid of crosscorrelation and scene segmentation (HCS) algorithm is available elsewhere^{49}.
A priori probability of assigning a disparity label \(d_i\) to the \(i\hbox {th}\) pixel location in the reference image was estimated from the cost volume \(C\left( i, d_i\right)\) as
Determining variable nodes associated with each dependency factor node
We utilized edgepreserving filter kernels, namely the guided image filters (GIF)^{50} and bilateral filters (BF)^{51} to determine neighboring variable nodes with the highest influence on the true state of disparity at each of the \(i\hbox {th}\) pixel locations. Neighborhood dependency information from these kernels were used to define a nonsymmetric, irregular, and higherorder neighborhood system based on the fact that objects at various depths in the scene may exhibit a disparity boundary along the object boundary. Let \(\left( x, y\right)\) represent the 2D coordinates of the \(i\hbox {th}\) pixel. Highly dependent neighbors of location (x, y) were selected using an \(\alpha \hbox {th}\) percentile cutoff of the kernel coefficients. Coordinates of these highly dependent neighbors were used to identify the neighboring variable nodes of each dependency factor node.
In brief, guided image filtering (GIF) is an edgepreserving smoothing algorithm. At each pixel location (x, y) in the reference image \(\hat{r_l}\), a guided filter kernel \(W_{xy}\left( i, j\right)\) with smoothness parameter \(\epsilon\) was estimated as
where \(\omega _{xy}\) is the window size used for estimating local illumination characteristics namely the mean illumination \(\mu _{xy}\) and standard deviation \(\sigma\) at location \(\left( x, y\right)\).
The Bilateral filter (BF)^{51} is an edgepreserving nonlinear Gaussian filter with coefficients defined as a function of spatial and intensity similarities estimated respectively using a localized domain kernel and a range kernel. Bilateral filter kernel coefficients at location \(\left( x, y\right)\) is given as
where the first exponential term represents a domain kernel as a function of pixel distance with respect to the center pixel \(\left( x, y\right)\) and the second term represents a range kernel as a function of regional image intensity with respect to that of the center pixel \(\left( x, y\right)\). Parameters \(\sigma _d\) and \(\sigma _r\) controls the extent of influence neighboring pixels have on the domain and range filters respectively.
Disparity estimates
Upon message passing convergence, an approximate estimate of the posterior probability of assigning a disparity label \(d_i\) is available in each variable node as given in Eq. (5). A maximum a posterior disparity estimate was determined at each pixel location i based on the disparity label \(d_i\) with the maximum posterior probability at the respective pixel i.
Postprocessing the disparity maps
Occluded pixels were identified based on a lack of consistency or agreement between the disparity maps estimated using the leftright order vs rightleft order of each stereo pair^{52}. For each occluded pixel, the disparity estimate from its nearest nonoccluded pixel within the same scanline (row) was assigned. Further, a weighted median filter was used to minimize spurious disparity assignments in the occluded region^{53}.
Experimental results and discussion
Algorithmic steps of the proposed factor graph method are presented in Algorithm 1. We evaluated the proposed method using stereo images from the Middlebury benchmark stereo datasets^{37,38,39}, and^{40}. We present a detailed evaluation of the proposed FGS algorithm followed by a comprehensive comparison of its performance with other stateoftheart disparity estimation algorithms using Middlebury evaluation dataset version 3.0^{5}.
FGS parameters and FGS implementation
The FGS algorithm was implemented in MATLAB 2018b and its performance was evaluated using an Intel(R) Xeon(R) workstation with E31271 v3, 3.6 GHz processor.
For illumination correction, high pass filters of size \(21 \times 21\) pixels were obtained from a lowpass averaging filter. For unsupervised texture segmentation of the reference image, Gabor filters were designed^{47} with filter orientations between \(0^{\circ }150^{\circ }\) degrees in steps of \(30^{\circ }\), and wavelength starting from 2.83 up to the magnitude of hypotenuse of the input image. These filter parameters were previously identified to provide localized frequency features (roughly orthogonal subset) and spatial orientation features from input images^{47}. Kmeans clustering algorithm was initialized with \(K=15\) with 5 replicates and ran for a maximum of 500 iterations. Based on our experiments, we observed that a higher number of clusters K are beneficial in reducing the computational complexity and for faster convergence of the FGS algorithm when there are several foreground objects at multiple depths in the scene. This is because of the fact that disparity values of locations within each distinct scene element are likely to be similar. Therefore, more accurate grouping of these scence elements will likely provide a priori disparity distributions with smaller standard deviations and shorter support leading to faster convergence of the FGS algorithm. While computational cost may be higher with nonoptimal choice of clustering parameters the accuracy of the FGS disparity estimates is not affected.
We evaluated various image features to identify robust features necessary for building a priori disparity distributions. Among the minimum eigenvalue algorithm, speeded up robust features algorithm (SURF), Harris corner and edge detector algorithm, and accelerated segment test algorithm (FAST), we observed that the eigenbased feature matching method^{48} performed well with various types of benchmark images. For cost volume calculations, an optimal template size of \(3 \times 3\) pixels were used^{49}. For identifying variable nodes connected to each dependency factor node, bilateral filter kernel of size \(7 \times 7\) pixels, domain kernel parameter of \(\sigma _d = 3\), range kernel parameter of \(\sigma _r = 0.1\) and coefficient percentile cutoff of \(\alpha =97\) were used. We observed that the smallest kernel sizes along with a higher percentile cutoff \(\alpha\) identified fewer but highly dependent neighboring nodes. Therefore, the computational cost of message passing was significantly reduced with fewer but highly reliable neighboring variable nodes connected to each dependency factor node.
Computational complexity
In the FGS algorithm, message exchanges between the variable nodes \(\mathbb {V}\) and the spatial dependency factor nodes \(\mathbb {S}\) with \(\mathbb {V}  = \mathbb {S}  = MN\) are computationally demanding. Let, D be the cardinality of sample space of the random variable \(d_i\) associated with the ith variable node i.e. maximum number of disparity levels in a stereo pair. Let, Q be the maximum number of variable nodes connected to a dependency factor node. Preparing a message for sending from a factor node f to a variable node i, \(\mu _{f\in \mathbb {F}\rightarrow i\in \mathbb {V}}\) in Eq. (4), requires marginalizing for the variable \(d_i\) from the product form of joint distribution of disparity labels. The joint distribution is comprised of product of a potential function with \((Q1)\) variabletofactor messages. Factor node message related to \(d_i\) is obtained by marginalizing a maximum of \((Q1)\) neighboring variables \(\mathbb {D}_f\setminus d_i\). For each variable being marginalized, there are Q product operations i.e. (\(Q  1) + 1\). Because there are at most D disparity levels, marginalizing each variable requires D summation terms. Therefore, there are \((D^{Q1} \, Q)\) computations for each level of the disparity variable \(d_i\). With D levels of \(d_i\), there are \(D^Q\, Q\) calculations for preparing a dependency factortovariable node message. In contrast, no calculations are required for preparing a message for sending from an evidence factor node to its corresponding variable node. Therefore, the FGS algorithm has a polynomial time complexity of \(O\left( MN\, D^Q\, Q\right)\) per iteration including the computations required for preparing messages from each of the MN dependency factor nodes. It can be observed that far fewer number of calculations are required for preparing a message for sending from a variable node \(\mu _{f\in \mathbb {F}\rightarrow i\in \mathbb {V}}\) in Eq. (4) with a complexity of \(O\left( MN\, D\, (Q1)\right)\) and therefore does not affect the overall complexity of the FGS algorithm.
As described earlier, the following key features were employed to further reduce the computational requirements of the FGS algorithm.

1.
\((D^Q\, Q)\) number of computations are eliminated when the state of the ith variable node \(d_i\) is different from those of its neighboring nodes by using a binary potential function \(\psi _f\left( \mathbb {D}_f\right)\) in the factor nodetovariable node messages \(\mu _{f\in \mathbb {F}\rightarrow i\in \mathbb {V}}\) in Eq. (4). Interaction among neighboring nodes can be further increased with appropriate changes to the potential function.

2.
The number of disparity levels D in each of the \((Q1)\) marginalization operations required for preparing factor node messages are significantly lowered by using a sparse cost volume for estimating a priori disparity in Eq. (9).

3.
The number of factor node messages were reduced by connecting only highly correlated or dependent variable nodes (with a high bilateral filter coefficient cutoff) to each of the spatial dependency factor nodes.
Performance metrics
For performance evaluation, we used the common performance metrics available for assessing the accuracy of the estimated disparity maps namely, the disparity error maps, peak signaltonoise ratio (PSNR), and average absolute error (Avg. err in pixels). Disparity error maps were computed as locationwise difference between the estimated disparity \(\hat{d}(x,y)\) and its groundtruth d(x, y) as \(\hat{d}(x,y)  d(x,y)\). PSNR provides a measure of similarity between an estimated disparity map \(\hat{d}(x,y)\) of size \(M \times N\) pixels and the groundtruth disparity map d(x, y) as follows.
A thresholded average disparity error metric with a disparity threshold of T pixels was defined as
Average disparity errors were assessed at two disparity threshold levels of \(T=2\) pixels (Bad2.0) and \(T=0\) pixels (Avg. err).
In the majority of the stereo pairs tested, the FGS algorithm converged between 25 and 30 iterations based on the convergence measure in Eq. (8).
Filter selection for identifying FGS variable node neighbors
The edgepreserving filters (Sect. ‘Determining variable nodes associated with each dependency factor node’) with the highest accuracy and computational speed were selected for identifying neighboring variable nodes for each of the dependency factor nodes in the FGS algorithm. Figure 4c and e show the disparity maps for the “Teddy” stereo pair (ground truth disparity shown in Fig. 4a) estimated using guided image filters and bilateral filters respectively based on an initial disparity shown in Fig. 4b. Disparity error maps for the guided image filter and bilateral filter are shown in Fig. 4d and f respectively.
Quantitative performance measures of the edgepreserving filters are presented in Table 2. The accuracy of the FGS algorithm with guided filters (based on the Avg. err, PSNR, and Bad2.0 metrics) were slightly better than the bilateral filters. Because the guided filters resulted in a larger number of neighboring variable nodes, the run time of the FGS algorithm, however, was higher with the guided filters when compared to the FGS algorithm with bilateral filters. Therefore, for further evaluation of the FGS algorithm, bilateral filtering was chosen as the optimal choice for identifying neighboring variable nodes in the FGS factor graph.
Detailed assessment of the FGS algorithm using selected stereo pairs
For detailed quantitative and qualitative assessment of the FGS algorithm, we utilized stereo image pairs with differing textures, illuminations, and exposure characteristics from the Middlebury stereo dataset including 2003^{37}, 2005^{38}, 2006^{39}, and 2014^{40} stereo datasets. The stereo pairs selected for assessment were the Teddy and Cones stereo pair (2003), Dolls stereo pair (2005), Rocks1 stereo pair (2006), and the Motorcycle stereo pair (2014). Assessment results based on the estimated disparity maps with and without postprocessing are presented in the following sections.
Assessment results without postprocessing
For the selected stereo pairs of scenes with differing textures and scene illumination, Fig. 5 shows disparity maps estimated by the HCS algorithm without cost aggregation and postprocessing, disparity maps estimated by the FGS algorithm without any postprocessing, and errors in the disparity maps estimated by the FGS algorithm. A summary of the assessment metrics without postprocessing the disparity maps is presented in Table 3.
Without postprocessing, the FGS algorithm provided significantly lower average error (Avg. err) and percentage of bad pixels (Bad2.0) when compared to the initial disparity estimates. Higher PSNR values indicate a higher degree of similarity between the FGS estimated disparity maps and the ground truth disparity maps. Visual inspection of the FGS disparity maps in Fig. 5 revealed that the FGS algorithm is able to estimate more accurate disparities in occluded regions and generate disparity maps with welldefined boundaries. These observations validate our choice of using nonsymmetric / irregular, variable and higherorder neighborhood dependency relationship among neighboring pixel locations in the FGS algorithm.
Assessment results with postprocessing
For selected stereo pairs with varying texture and illumination characteristics, Fig. 6 shows the disparity maps generated after postprocessing the maximum a posteriori disparity estimates (shown in Fig. 5) and their corresponding disparity error maps. Table 4 presents a summary of assessment metrics after postprocessing the disparity estimates. After postprocessing, inconsistencies in stereo disparity estimates were corrected and disparity maps with sharper depth boundaries were generated. The average error (Avg. err) and percentage of bad pixels (Bad2.0) further decreased and the PSNR value increased after postprocessing indicating higher similarity with the ground truth disparity estimates. Though the FGS disparity estimates without postprocessing demonstrated higher accuracy in occluded and homogeneous regions, postprocessing using a weighted median filter further reduced inconsistent disparity estimates and improved the overall quality of the disparity maps.
Performance of the FGS algorithm vs stateoftheart algorithms
To the best of our knowledge, the proposed FGS algorithm is the first disparity estimation technique based on factor graphs. The performance of the nonlearningbased FGS algorithm was compared with recent stateoftheart learningbased and nonlearningbased disparity estimation algorithms. All disparity estimation algorithms were evaluated using stereo pairs from the current Middlebury evaluation dataset version 3.0. In addition to the performance metrics of Avg. err, PSNR and Bad 2.0 %, a weighted average measure was calculated for each performance metric based on the level of difficulty of estimating the disparity map for each stereo pair in the Middlebury evaluation dataset version 3.0. A summary of all the assessment metrics and a weighted performance measure for the FGS algorithm is presented in Table 5.
FGS vs nonlearning based disparity estimation methods
Performance of the FGS algorithm was compared with 12 recently developed nonlearningbased disparity estimation procedures including 7 local methods, 3 global methods, and one fusion method using both local and global approaches for disparity estimation. The local methods were: weighted adaptive crossregionbased guided image filtering method (ACRGIFOW)^{54}, realtime stereo matching algorithm with FPGA architecture (MANE)^{55}, adaptive supportweight approach in pyramid structure (DAWAF)^{56}, encodingbased approaches PPEPGF^{57}, absolute difference (AD) and census transformbased stereo matching with guided image filtering (ADSRGIF)^{58}, the sum of absolute difference (SAD) based stereo matching aggregated with adaptive weighted bilateral filter (SMAWP)^{59}, statistical maximum a posteriori estimation of MRF disparity labels (SRM)^{60}. The global disparity estimation procedures chosen for comparison were: binocular narrowbaseline stereo matching procedure using a maxtree data structure (MTS)^{61} and its improvement (MTS2)^{62}, and an accelerated multiblock matching (MBM) algorithm on GPU^{22}. FASW approach^{63} uses both local and global strategies and is based on a census transform with adaptive support weight.
Avg. err metric and a weighted average of Avg. err for the FGS algorithm and the nonlearningbased disparity estimation procedures are presented in Table 6. Image weight given in Table 6 represents the level of difficulty in estimating the disparity from a given stereo pair. In general, the proposed FGS algorithm provided higher accuracy for all the stereo pairs comparable to that of other nonlearningbased methods. The FGS method provided the lowest weighted average of Avg. err of 6.45 pixels among all the nonlearningbased methods. Further, the FGS method provided the lowest estimation error (Avg. err) for 3 out of the 10 difficult stereo pairs (image weight = 8) and for 4 out of the 5 moderately difficult stereo pairs (image weight = 4). Among the remaining 7 difficult stereo pairs, HCS method provided lower error for 1 pair, ACRGIFOW method for 1 pair, SRM method for 2 pairs, DAWAF method for 1 pair, and FASW for 2 pairs. The DAWAF method provided the lowest estimation error for the remaining 1 moderately difficult stereo pairs.
FGS vs learningbased disparity estimation methods
Performance of the FGS algorithm was also compared with the following recently developed learningbased disparity procedures: a method based on a fusion of convolutional neural networks (CNN) and conditional random fields (LBPS)^{64}, a fully convolutionaldensely connected neural network (FCDCNN)^{27}, a deeplearning assisted method to produce initial estimate with SemiGlobal Block Matching method (SGBMP)^{65}, a deep learningbased selfguided cost aggregation method (DSGCA)^{66}, a stereo matching algorithm with a pretrained network and global energy minimization SIGMRF^{67}, a multidimensional convolutional neural network (MSMDROB)^{68} and a CNNbased network using ResNet (CBMBNet)^{69}.
Avg. err metric for the FGS algorithm and the learningbased disparity estimation procedures are presented in Table 7. Among the recent learningbased disparity estimation procedures, the LBPS method provided the least weighted Avg. err of 5.51 for all stereo pairs, provided the least Avg. error for 4 out of 10 difficult stereo pairs and the least Avg. err for 4 out of 5 moderately difficult stereo pairs. The FGS method provided the secondlowest weighted average of Avg. err of 6.45 for all stereo pairs, provided the least Avg. err for 4 out of 10 difficult stereo pairs and the least Avg. error for 1 out of 5 moderately difficult stereo pairs. The CBMBNet provided the third lowest weighted Avg. err of 6.65 and the least Avg. err for 2 out of 10 difficult stereo pairs.
Conclusions
We have presented a new probabilistic factorgraphbased disparity estimation algorithm that improves the accuracy of disparity estimates in stereo image pairs with varying texture and illumination characteristics by enforcing spatial dependencies among scene characteristics as well as among disparity estimates. In contrast to MRF models, our factor graph formulation allows a larger as well as a spatially variable neighborhood system dependent only on the local scene characteristics. Our factor graph formulation can be used for obtaining maximum a posteriori estimates from models or optimization problems with complex dependency structure among hidden variables. The strategies of using a priori distributions with shorter support and spatial dependencies are useful for improving the computational speed of message convergence in factor graphbased inference problems. We rigorously evaluated the performance of the new factorgraphbased disparity estimation algorithm using Middlebury benchmark stereo datasets^{37,38,39}, and^{40}. Our experimental results indicate that the factorgraph algorithm provides disparity estimates with higher accuracy when compared to recent nonlearning as well as learningbased disparity estimation algorithms using Middlebury evaluation dataset version 3.0^{5}. The factorgraph algorithm may also be useful for other dense estimation problems such as optical flow estimation.
Data availability
Stereo datasets that were used to evaluate the proposed methods are publicly available from the following URLS: (1) https://vision.middlebury.edu/stereo/data/scenes2003/. (2) https://vision.middlebury.edu/stereo/data/scenes2005/. (3) https://vision.middlebury.edu/stereo/data/scenes2006/. (4) https://vision.middlebury.edu/stereo/data/scenes2014/.
References
DeSouza, G. N. & Kak, A. C. Vision for mobile robot navigation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 24, 237–267 (2002).
Svensk, J. Evaluation of aerial image stereo matching methods for forest variable estimation. Master’s thesis, Linköping University (2017).
Huang, Y. & Young, K. Binocular image sequence analysis: Integration of stereo disparity and optic flow for improved obstacle detection and tracking. EURASIP J. Adv. Signal Process. 2008, 1–10 (2008).
Remondino, F., ElHakim, S. F., Gruen, A. & Zhang, L. Turning images into 3d models. IEEE Signal Process. Mag. 25, 55–65 (2008).
Scharstein, D. & Szeliski, R. A taxonomy and evaluation of dense twoframe stereo correspondence algorithms. Int. J. Comput. Vision 47, 7–42 (2002).
Gu, Z., Su, X., Liu, Y. & Zhang, Q. Local stereo matching with adaptive supportweight, rank transform and disparity calibration. Pattern Recogn. Lett. 29, 1230–1235 (2008).
Yang, Q., Wang, L., Yang, R., Stewénius, H. & Nistér, D. Stereo matching with colorweighted correlation, hierarchical belief propagation, and occlusion handling. IEEE Trans. Pattern Anal. Mach. Intell. 31, 492–504 (2008).
Roma, N., SantosVictor, J. & Tomé, J. A comparative analysis of crosscorrelation matching algorithms using a pyramidal resolution approach. In Empirical Evaluation Methods in Computer Vision, 117–142 (World Scientific, 2002).
Marghany, M., Tahar, M. R. B. M. & Hashim, M. 3d stereo reconstruction using sum square of difference matching algorithm. Sci. Res. Essays 6, 6404–6411 (2011).
Parvathy, B. Survey on stereovision based disparity map using sum of absolute difference. Int. J. Innov. Sci. Res. Technol. 3, 218–219 (2018).
DeMaeztu, L., Villanueva, A. & Cabeza, R. Stereo matching using gradient similarity and locally adaptive supportweight. Pattern Recogn. Lett. 32, 1643–1651 (2011).
Wang, L., Liu, Z. & Zhang, Z. Feature based stereo matching using twostep expansion. Math. Probl. Eng. 2014 (2014).
Gac, N., Mancini, S., Desvignes, M. & Houzet, D. High speed 3d tomography on cpu, gpu, and fpga. EURASIP J. Embed. Syst. 2008, 1–12 (2009).
Mei, X. et al. On building an accurate stereo matching system on graphics hardware. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 467–474 (IEEE, 2011).
Fang, J. et al. Accelerating cost aggregation for realtime stereo matching. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems, 472–481 (IEEE, 2012).
Hamid, M. S., Abd Manap, N., Hamzah, R. A. & Kadmin, A. F. Stereo matching algorithm based on deep learning: A survey. J. King Saud Univ. Comput. Inf. Sci. (2020).
Tombari, F., Mattoccia, S., Di Stefano, L. & Addimanda, E. Classification and evaluation of cost aggregation methods for stereo correspondence. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, 1–8 (IEEE, 2008).
Yoon, K.J. & Kweon, I. S. Adaptive supportweight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28, 650–656 (2006).
Zhang, K., Lu, J. & Lafruit, G. Crossbased local stereo matching using orthogonal integral images. IEEE Trans. Circuits Syst. Video Technol. 19, 1073–1079 (2009).
Zhu, S., Cao, D., Wu, Y. & Jiang, S. Edgeaware dynamic programmingbased cost aggregation for robust stereo matching. J. Electron. Imaging 24, 043016 (2015).
Zhu, S., Wang, Z., Zhang, X. & Li, Y. Edgepreserving guided filtering based cost aggregation for stereo matching. J. Vis. Commun. Image Represent. 39, 107–119 (2016).
Chang, Q. & Maruyama, T. Realtime stereo vision system: A multiblock matching on gpu. IEEE Access 6, 42030–42046 (2018).
Arranz, A., Sánchez, Á. & Alvar, M. Multiresolution energy minimisation framework for stereo matching. IET Comput. Vis. 6, 425–434 (2012).
Wang, H.q., Wu, M., Zhang, Y.b. & Zhang, L. Effective stereo matching using reliable points based graph cut. In 2013 Visual Communications and Image Processing (VCIP), 1–6 (IEEE, 2013).
Tomasi, C. & Manduchi, R. Stereo matching as a nearestneighbor problem. IEEE Trans. Pattern Anal. Mach. Intell. 20, 333–340 (1998).
Liang, C.K., Cheng, C.C., Lai, Y.C., Chen, L.G. & Chen, H. H. Hardwareefficient belief propagation. IEEE Trans. Circuits Syst. Video Technol. 21, 525–537 (2011).
Hirner, D. & Fraundorfer, F. Fcdcnn: A densely connected neural network for stereo estimation. In 2020 25th International Conference on Pattern Recognition (ICPR), 2482–2489 (IEEE, 2021).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4700–4708 (2017).
Zbontar, J. et al. Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 2287–2318 (2016).
Kendall, A. et al. Endtoend learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision, 66–75 (2017).
Chang, J.R. & Chen, Y.S. Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 5410–5418 (2018).
Sun, J., Zheng, N.N. & Shum, H.Y. Stereo matching using belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 25, 787–800 (2003).
Kim, E. P., Choi, J., Shanbhag, N. R. & Rutenbar, R. A. Error resilient and energy efficient mrf messagepassingbased stereo matching. IEEE Trans. Very Large Scale Integr. VLSI Syst. 24, 897–908 (2015).
Lan, X., Roth, S., Huttenlocher, D. & Black, M. J. Efficient belief propagation with learned higherorder markov random fields. In European conference on computer vision, 269–282 (Springer, 2006).
Roth, S. & Black, M. J. Fields of experts. Int. J. Comput. Vision 82, 205–229 (2009).
Zhang, L. & Seitz, S. M. Parameter estimation for mrf stereo. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, 288–295 (IEEE, 2005).
Scharstein, D. & Szeliski, R. Highaccuracy stereo depth maps using structured light. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., vol. 1, I–I (IEEE, 2003).
Scharstein, D. & Pal, C. Learning conditional random fields for stereo. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, 1–8 (IEEE, 2007).
Hirschmuller, H. & Scharstein, D. Evaluation of cost functions for stereo matching. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, 1–8 (IEEE, 2007).
Scharstein, D. et al. Highresolution stereo datasets with subpixelaccurate ground truth. In German conference on pattern recognition, 31–42 (Springer, 2014).
Oppenheim, A. V. & Schafer, R. W. From frequency to quefrency: A history of the cepstrum. IEEE Signal Process. Mag. 21, 95–106 (2004).
Pearl, J. Reverend bayes on inference engines: A distributed hierarchical approach. In Probabilistic and Causal Inference: The Works of Judea Pearl, 129–138 (2022).
Kschischang, F. R., Frey, B. J. & Loeliger, H.A. Factor graphs and the sumproduct algorithm. IEEE Trans. Inf. Theory 47, 498–519 (2001).
Barber, D. Bayesian Reasoning and Machine Learning (Cambridge University Press, 2012).
Murphy, K., Weiss, Y. & Jordan, M. I. Loopy belief propagation for approximate inference: An empirical study. arXiv preprint arXiv:1301.6725 (2013).
Moon, T. K. & Gunther, J. H. Multiple constraint satisfaction by belief propagation: An example using sudoku. In 2006 IEEE mountain workshop on adaptive and learning systems, 122–126 (IEEE, 2006).
Jain, A. K. & Farrokhnia, F. Unsupervised texture segmentation using gabor filters. Pattern Recogn. 24, 1167–1186 (1991).
Shi, J. et al. Good features to track. In 1994 Proceedings of IEEE conference on computer vision and pattern recognition, 593–600 (IEEE, 1994).
Shabanian, H. & Balasubramanian, M. A new hybrid stereo disparity estimation algorithm with guided image filteringbased cost aggregation. In 2021 Electronic Imaging (Society for Imaging Science and Technology, https://doi.org/10.2352/issn.24701173.2021.2.sda059, 2021).
He, K., Sun, J. & Tang, X. Guided image filtering. In European conference on computer vision, 1–14 (Springer, 2010).
Tomasi, C. & Manduchi, R. Bilateral filtering for gray and color images. In Sixth international conference on computer vision (IEEE Cat. No. 98CH36271), 839–846 (IEEE, 1998).
Cochran, S. D. & Medioni, G. 3d surface description from binocular stereo. IEEE Trans. Pattern Anal. Mach. Intell. 14, 981–994 (1992).
Brownrigg, D. R. The weighted median filter. Commun. ACM 27, 807–818 (1984).
Kong, L., Zhu, J. & Ying, S. Local stereo matching using adaptive crossregionbased guided image filtering with orthogonal weights. Math. Probl. Eng. 2021 (2021).
VázquezDelgado, H.D. et al. Realtime multiwindow stereo matching algorithm with fuzzy logic. IET Comput. Vision 15, 208–223 (2021).
Navarro, J. & Buades, A. Semidense and robust image registration by shift adapted weighted aggregation and variational completion. Image Vis. Comput. 89, 258–275 (2019).
Fu, Y., Lai, K., Chen, W. & Xiang, Y. A pixel pairbased encoding pattern for stereo matching via an adaptively weighted cost. IET Image Proc. 15, 908–917 (2021).
Jiangping, K. & Sancong, Y. Stereo matching based on guidance image and adaptive support region. Acta Opt. Sin. 40 (2020).
Razak, S., Othman, M. & Kadmin, A. The effect of adaptive weighted bilateral filter on stereo matching algorithm. IJEAT 8, 284–287 (2019).
Okae, J., Du, J. & Hu, Y. Robust statistical approach to stereo disparity maps denoising and refinement. Control Theory Technol. 18, 348–361 (2020).
Brandt, R., Strisciuglio, N., Petkov, N. & Wilkinson, M. H. Efficient binocular stereo correspondence matching with 1d maxtrees. Pattern Recogn. Lett. 135, 402–408 (2020).
Brandt, R., Strisciuglio, N. & Petkov, N. Mtstereo 2.0: improved accuracy of stereo depth estimation withmaxtrees. arXiv preprint arXiv:2006.15373 (2020).
Wu, W., Zhu, H., Yu, S. & Shi, J. Stereo matching with fusing adaptive support weights. IEEE Access 7, 61960–61974 (2019).
Knobelreiter, P., Sormann, C., Shekhovtsov, A., Fraundorfer, F. & Pock, T. Belief propagation reloaded: Learning bplayers for labeling problems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7900–7909 (2020).
Hu, Y., Zhen, W. & Scherer, S. Deeplearning assisted highresolution binocular stereo depth reconstruction. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 8637–8643 (IEEE, 2020).
Park, I. K. et al. Deep selfguided cost aggregation for stereo matching. Pattern Recogn. Lett. 112, 168–175 (2018).
Nahar, S. & Joshi, M. V. A learned sparseness and igmrfbased regularization framework for dense disparity estimation using unsupervised feature learning. IPSJ Trans. Comput. Vis. Appl. 9, 1–15 (2017).
Lu, H., Xu, H., Zhang, L., Ma, Y. & Zhao, Y. Cascaded multiscale and multidimension convolutional neural network for stereo matching. In 2018 IEEE Visual Communications and Image Processing (VCIP), 1–4 (IEEE, 2018).
Chen, Y., Xia, Y. & Wu, C. A cropbased multibranch network for matching cost computation. In 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISPBMEI), 1–9 (IEEE, 2018).
Acknowledgements
This research was supported in part by an unrestricted startup fund from the Herff College of Engineering and a graduate assistantship from the Department of Electrical and Computer Engineering, The University of Memphis.
Author information
Authors and Affiliations
Contributions
H.S and M.B designed and implemented the algorithm. H.S conducted all experiments. M.B. supervised all experiments. H.S and M.B prepared, revised and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shabanian, H., Balasubramanian, M. A novel factor graphbased optimization technique for stereo correspondence estimation. Sci Rep 12, 15613 (2022). https://doi.org/10.1038/s41598022193369
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598022193369
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.