Figure 1 : Histograms displaying the number of atoms and masses of entries in PubChem.

From: Automated evaluation of consistency within the PubChem Compound database

Figure 1

The x-axis of histograms (a) and (b) represents the number of atoms in a compound, and the y-axis indicates the number of compounds with the corresponding number of atoms. (a) Histogram of masses for compounds with fewer than 152 atoms: those for “Current-Full” entries (2D structures) are shown in blue, and those for the “Compound_3D” entries are shown in green. The 152-atom cutoff was chosen based on the maximum number of atoms in compounds in the “Compound_3D” dataset. (b) Counts for compounds with >152 atoms. PubChem contains no 3D structure information for these compounds. (c) Histogram of masses of compounds as reported in the SDF files of PubChem “Current-Full” entries. Most of the compounds in the database had masses less than 1,000 Da; however, 11, 550 compounds had mass higher than 2,000 Da (not shown in (c)) – for example, PubChem CID 23393956 reported the exact mass of 59,745.256 Da.