Figure 1: The logic of the treeness test. | Nature Communications

Figure 1: The logic of the treeness test.

From: The statistical geometry of transcriptome divergence in cell-type evolution and cancer

Figure 1

(a) Discretized expression data is represented in a 0–1 matrix where rows represent samples and columns represent genes. The entry is 0 if the gene is not expressed in a particular cell sample and 1 if it is expressed above threshold. (b) The data in the matrix can be represented as a Hamming distance matrix, where each entry is the number of genes where two cell types have different expression values. (c) Any collection of Hamming distances among four samples can be represented in a box graph, which consists of a internal box with dimensions e and f, and terminal branches with length a, b, c and d. (d) A tree is a special case of a box diagram where one and only one of the inner edges have zero length; e=0 and f>0. (e) Probability density of δ for random matrices. This distribution is relatively flat with a maximum at δ=0.5. The integral from δ=0 to δc is the type I error probability α for rejecting the null hypothesis, that is, the probability that a δ value of ≤δc is due to chance.

Back to article page