To the Editor:

Recently, Lex and Gehlenborg1 published an interesting column about set visualization full of valuable advice for scholars seeking an appropriate representation of intersecting data sets. However, we think that two slightly misleading statements deserve further explanation: “There are 2n possible intersections for n sets” and “Euler diagrams represent intersecting sets as overlapping shapes.”

Strictly speaking, an intersection of two (or more) sets is the set that contains elements belonging to both (or at least two) sets. Thus, the number of possible intersections given n sets is < 2n. In fact, 2n is the number of all possible intersections, plus the number of subsets formed by elements belonging to only one set (that is, disjoint elements), plus the set formed by elements which belong to neither of the above subsets.

Consider an Euler diagram (Fig. 1) that illustrates all possible situations arising from mutations in two genes, A and B. Area 1 is properly defined as the intersection (mutations occurred in both A and B genes—co-mutated genes). Area 2 and 3 correspond, respectively, to cases in which only A or only B was mutated. Finally, region 4, outside the circles, represents the case in which no mutation occurred in either A or B. Thus four possible situations are represented by four corresponding regions of the plane (2n = 22), including two populated by genes that are not co-mutated (n = 2) and one that defines the intersection (2nn − 1). Of course, the above considerations hold for less trivial analyses. For example, five genes generate 32 (2n = 25) possible cases—five corresponding to mutations in one gene and 26 corresponding to intersections (two or more genes co-mutated); this describes Figure 1b of Lex and Gehlenborg's column1.

Figure 1
figure 1

Euler diagram displaying the intersection of two genes, A and B.