A network-based framework for shape analysis enables accurate characterization of leaf epidermal cells

Cell shape is crucial for the function and development of organisms. Yet, versatile frameworks for cell shape quantification, comparison, and classification remain underdeveloped. Here, we introduce a visibility graph representation of shapes that facilitates network-driven characterization and analyses across shapes encountered in different domains. Using the example of complex shape of leaf pavement cells, we show that our framework accurately quantifies cell protrusions and invaginations and provides additional functionality in comparison to the contending approaches. We further show that structural properties of the visibility graphs can be used to quantify pavement cell shape complexity and allow for classification of plants into their respective phylogenetic clades. Therefore, the visibility graphs provide a robust and unique framework to accurately quantify and classify the shape of different objects.

classical contour representation of the distances of the points on the contour from the centroid of the shape. The resulting profile is used as input to the discrete Fourier transforms. Finally, we compare the descriptors of shape resulting from the Fourier transform by using the Euclidean distance and the correlation distance. To apply the shape comparison using the Fourier transform on shapes with different number of nodes, we use the node reducing approach based on modularity clustering to create shapes with equal number of nodes. The resulting distance matrix between all pairwise shape comparisons is used for hierarchical complete-linkage clustering, as shown in Supplementary Fig. 5. Surprisingly, this representation provides a clustering of poor quality, in which rectangular shapes are clustered with triangular shapes and circular shapes are clustered with rectangular shapes (Supplementary Fig. 5b-e).

Supplementary Note 4: Homogeneity of the derived clusterings of shapes.
To compare the quality of clusters derived from the different comparison methods, we calculate the Biological Homogeneity Index (BHI), which provides a measure of homogeneity of the determined clusters 3 . Let ℬ(i) and ℬ(j) be functional classes containing shape i and j, respectively. We assign the indicator function Ι(ℬ(i)= ℬ(j)) the value 1 if ℬ(i) and ℬ(j) match. Thus, for a given statistical clustering partition Ck, the BHI is defined as: where K is the number of clusters and nk is the number of annotated shapes in a cluster. A value of 1 for the BHI denotes perfect clustering, in which all shapes assigned to a cluster belong to the same group (provided as a priori knowledge). Smaller values for the BHI denote larger departure from homogeneity and point at issues of the resulting clusterings. To do so, we use the dendrogram derived from the shape comparisons and cut them into three clusters by using the dendrogram distance, since we compare three different shape types. To prevent clusters with a single shape, we merged single shape clusters with the closest cluster according to the dendrogram distance. The rotational distance and the Fourier transform cannot be used to compare shapes with different number of nodes, so we used the node-reducing approach based on modularity clustering to create shapes with equal number of nodes. As shown in Supplementary Table 1, the values for the BHI for the different algorithms for comparison of visibility graphs outperform the Fourier transform approaches.

Supplementary Note 5: Visibility graph centralities to infer local features of pavement cells.
To show that the closeness centrality is the best graph property to identify lobes as local features of pavement cells, we select and compare the performance of other centrality measures. Let , , ∈ be nodes in an undirected graph = ( , ) with = | | vertices and = | | edges. We define deg ( ) as the degree of node and ( , ) as the length of shortest path connecting nodes and . Let ( , ) be the number of shortest path and J ( , ) the number of shortest paths connecting nodes and through node . The adjacency matrix is denoted by = ( J,M ) where J,M = 1 if an edge connects nodes and , and J,M = 0 otherwise. The effective resistance between the nodes and corresponds to the term MJ ( ) − MJ ( ) and MS ( ) is defined as the information throughput of node 4 . Lastly, we denote as the eigenvalue and M,S ( ) as the overall commodity sent from node to and forwarded by node (11). Following this notation, the used centralities are defined as shown in Supplementary  Table 4.
We use the extracted visibility graphs of the pavement cell gold standard and calculate all of the abovementioned centralities. To detect lobes, we identify the local minima of the centralities and computed the RMSE to measure the difference to the mean of manually detected lobes. The closeness centrality shows the lowest RMSE, closely followed by the local reaching centrality which is very similar to the closeness centrality ( Supplementary Fig. 11).

Supplementary Note 6: Optimal distance between visibility graph nodes.
To find the optimal distance between visibility graph nodes, we extract the visibility graphs of the pavement cell gold standard using various pixel distances. ).
Lobes are detected by identifying the local minima of the closeness centrality for each visibility graph and are afterwards compared to the mean of manually detected lobes by calculating the root mean square error (RMSE). For each pixel distance, we compute the mean RMSE and identified the minimum of the means for the cells of the two different image resolutions. The lowest mean RMSE is detected for a distance of seven pixel (Col-0/oryzalin-treated) and 10 pixel (clasp-1) between visibility graph nodes (see Supplementary Fig. 17). Given the image resolution, we can calculate the optimal distance, or node coverage, as: Using the detected pixel distance minima and the two different image resolutions, we obtain a node coverage of 0.65 .epn bc for both images, thus defining the optimal pixel distance that is used for the visibility graph creation of our approach.

Supplementary Note 7: Prediction of plant clades from PCs.
We create the visibility graphs of 6359 selected, adaxial pavement cells from 213 different plant species using the data set of Vöfely et al. 5 . From the created visibility graphs, we select ten graphs per species and label them according to their plant clade affiliation (eudicots, monocots, ferns, angiosperms, gymnosperms). The distance matrix of the selected graphs is computed and displayed by using PCA (Supplementary Fig. 22). Here, the ferns (blue) are easily distinguishable from the other four clades, while the centroids of eudicots (green), monocots (pink) and angiosperms (light blue) are clustered closely together.
Supplementary Note 8: Accuracy of the pre-processing and segmentation pipeline. The quality of our segmentation pipeline is evaluated by calculating the pixel accuracy, a common validation metric for binary classification, which is calculated as: where TP = True Positive (i.e. pixels correctly detected as pavement cells), TN = True Negative (i.e. pixels correctly detected as background), FP = False Positive (i.e. pixels wrongly detected as pavement cells) and FN = False Negative (i.e. pixels wrongly detected as background).
Supplementary Figure 1: Different types of visibility graphs. The visibility graph represents a set of objects along with a visibility relation between them. One type of visibility graph is formed on the vertices of multiple polygons; it is applied in robotics, where it can be used to find the shortest path among polygonal obstacles. In another type of visibility graphs the nodes correspond to the amplitude (i.e. value) at time points and edges are placed between nodes if they can see each other, and this type of visibility graphs has been used to deduce properties of time series. In addition, visibility graphs can be used for shape decomposition by placing nodes along a piecewise linear approximation of a shape. The here proposed approach GraVis places nodes along the shape contour and connects nodes which are visible to each other.

Supplementary Figure 2: Rotational permutation of a graph. (a)
To circularly permute a graph, the corresponding adjacency matrix is used. The first row and column of the matrix are rotated to the end of the matrix, decreasing the row-and column-indices by one. (b) Shapes are compared by calculating the rotational distance between their corresponding visibility graphs. The resulting distance matrix is used for hiererchical completelinkage clusterting of a set of synthetic shapes with same number of nodes, on which the rotational distance can be readily applied. (c) Clustering of shapes with different number of nodes that were reduced to the same size using modularity clustering. The shapes were separated into three distinct clusters, indicated by different shades of gray. The clusters are determined by using the dendrogram distance, whereby to prevent single shape clusters, we merge clusters with single shapes to the next close cluster. The shapes were separated into three distinct clusters, indicated by different shades of gray. The clusters are determined by using the dendrogram distance, whereby to prevent single shape clusters, we merge clusters with single shapes to the next close cluster.  (20) were reduced stepwise by one node and used to calculate the distance matrix to use for hierarchical clustering. To measure the quality of the clustering, the BHI was calculated for the resulting clusters (see Supplementary Fig. 7).

Supplementary Figure 7: Sensitivity of the number of nodes on the performance of shape comparison.
Visibility graphs with equal number of nodes were compared against each other for the synthetic shapes. The number of nodes were decreased in a stepwise fashion using the node-reducing method. The distance matrix was calculated for the resulting graphs and used for hierarchical clustering whose quality we quantified with the BHI (blue). The highest average BHI was calculated with graphs that have a distance between of 10-14 pixels between nodes (gray). Boxplots are shown with median (horizontal line), 25 th and 75 th percentiles (box edges) and 1.5-fold of the interquartile range (whiskers).    To find the optimal parameters for the lobe detection tools GraVis, PaCeQuant and LOCO-EFA, the cells of the gold standard are randomly split into a training and test set. Lobes are detected for the training set by using different parameters or parameter combinations. The performance of the thus resulting detected lobes is compared by computing the root mean square error (RMSE) based on the manually detected lobes by 20 experts. The parameter or parameter combination resulting in the lowest RMSE is then used for the lobe detection of the test set, where the performance is evaluated by calculating the RMSE. This procedure is repeated 20 to 30 times and the parameter setting with the overall lowest RMSE is chosen for the comparison of lobe detection tools after tuning (see Supplementary Fig.  19). For GraVis, the node distance was tuned, while for LOCO-EFA the threshold between consecutive modes are tuned to determine the number of detected lobes. In PaCeQuant, all combinations of the three different parameters (Gaussian in curvature analysis, minimal length of a protrusion section and the minimal length of an indentation section) were tuned.
Supplementary Figure 19: Recovery of tri-cellular junctions for each cell of the gold standard. The percentage and variance of manually detected tri-cellular junctions for each cell of the gold standard is shown. The n=20 independent experts were provided only with the pavement cell contours for the manual detection and did not have any information about cell wall segments with neighboring cells, thus leading to variances in the detection of all tri-cellular junctions by the experts. Data are presented as mean values +/-standard deviation.    The Biological Homogeneity Index (BHI) is calculated using the clusters shown in Supplementary Fig. 2 and 5. The BHI show that the Laplacian eigenvalues, as implemented in GraVis, cluster the synthetic shapes perfectly if all shapes have the same number of nodes. Nevertheless, the Laplacian eigenvalues also perform very well for the synthetic shapes with different number of nodes and the reduced set of nodes, similar to the rotational distance. In contrast, the Fourier transform using either the Euclidean or correlation distance performs worst. The Biological Homogeneity Index (BHI) is calculated using the clusters shown in Supplementary Fig. 8f. The BHI values show that the Laplacian approach used by GraVis is best to cluster the shapes of pavement cells into distinct groups, followed by the Fourier transform approach using the correlation distance. Each listed centrality measure was used to identify lobes of the 30 manually annotated graphs of the gold standard. The performance of different lobe detection tools is compared by computing the root mean square error (RMSE) based on the manually detected lobes by 20 experts. The comparison is done with and without including tri-cellular junctions. Furthermore, the RMSE is calculated based on either the mean of manually detected lobes using default or tuned parameters, or the median of manually detected lobes using default parameters. The difference in means between the tools is tested using the one-sided Kruskal-Wallis test, while the difference in variance between the tools is tested using the one-sided Bartlett's test. The number of lobes for the different pavement cell genotypes in Fig. 7c are used to do a pairwise comparison using the two-sided Dunn's post-hoc test with Benjamini-Hochberg correction. Non-significant adjusted p-values (p-value > 0.05) are highlighted in gray and are used to create a graph displaying clusters of genotypes with similar phenotypes (Supplementary Fig. 21). The shape metrics listed in Supplementary Fig. 23 are used to do a pairwise comparison using the two-sided Dunn's post-hoc test between the different clades for the solidity, aspect ratio, number of lobes per cell and relative completeness. Non-significant adjusted p-values (p-value < 0.01, Bemjamini-Hochberg correction) are highlighted in gray.  Fig. 24) are selected for manual segmentation of pavement cells and for pre-processing with GraVis. The manual segmented cells are used as ground truth for the calculation of the pixel accuracy of the segmented cells by GraVis. The average pixel accuracy is 95%.