Table 1 Compression ratios and times provided by all tested tools

From: DeeZ: reference-based compression by local assembly

   Pseudomonas aeruginosa RNA-seq Human RNA-seq Human HiSeq
Tool Random access Size (MB) Ratio Compr. time Dec. time Size (MB) Ratio Compr. time Dec. time Size (MB) Ratio Compr. time Dec. time
None (original data) 19,008 1.00 72,398 1.00 437,589 1.00
Gzip No 3,210 5.92 13:35 02:15 12,236 5.92 0:47:24 08:35 99,180 4.41 7:31:42 2:12:10
SAMtools Yes 3,340 5.69 14:25 02:59 13,119 5.52 0:53:13 10:59 106,596 4.11 6:49:42 2:11:33
Scramblea Yes N/A N/A N/A N/A 10,063 7.19 0:36:04 11:28 75,784 5.77 6:36:41 1:59:39
Quipb (non-reference based) No 2,561 7.42 14:53 16:57 10,601 6.83 1:00:31 57:30 78,221 5.59 6:45:12 7:24:07
Quipb (reference based) No 2,181 8.72 14:54 17:20 8,271 8.75 0:56:10 57:04 61,905 7.07 8:34:38 7:27:23
DeeZ Yes 1,921 9.89 12:01 10:39 8,010 9.04 1:18:10 48:42 62,808 6.97 5:49:48 6:34:26
DeeZ (partial random access) Partialc 1,828 10.40 13:27 12:20 7,615 9.51 1:27:22 54:51 58,879 7.43 6:50:33 7:42:04
Samcompd (non-reference based) No 1,473* 12.91* 13:05* N/A 6,781* 10.68* 0:51:12* 56:30* 52,389 8.35 6:03:46 N/A
Samcompd (reference based) No N/A N/A N/A N/A 6,724* 10.77* 0:50:29* 56:54* 51,733 8.46 6:00:33 N/A
DeeZ (Samcomp fields onlye) Yes 1,623 11.71 12:01 10:39 7,136 10.14 1:18:10 48:42 54,435 8.04 5:49:48 6:34:26
DeeZ (partial random access, Samcomp fields onlye) Partialc 1,531 12.41 13:27 12:20 6,746 10.73 1:27:22 54:51 50,536 8.66 6:50:33 7:42:04
  1. File sizes are reported in megabytes (MB). Compression (compr.) and decompression (dec.) times are reported in (H:)MM:SS format.
  2. aThe Scramble-decompressed SAM file was missing 1 GB in the first data set, 4 GB in the second and 17 GB in the third data set.
  3. bThe Quip-decompressed SAM file was missing 1 GB in the third data set.
  4. cQuality scores were compressed via the Samcomp model and thus not randomly accessible; the other fields were.
  5. dSamcomp v.0.7 was not able to decompress any of the data sets above; Samcomp v.0.8 (results asterisked) succeeded in decompressing the human RNA-seq data set.
  6. eDeeZ's compression and decompression times include processing of all fields in the SAM file. N/A, not applicable (Sampcomp crashed during compression or decompression).