Exploring the limit of using a deep neural network on pileup data for germline variant calling

Single-molecule sequencing technologies have emerged in recent years and revolutionized structural variant calling, complex genome assembly and epigenetic mark detection. However, the lack of a highly accurate small variant caller has limited these technologies from being more widely used. Here, we present Clair, the successor to Clairvoyante, a program for fast and accurate germline small variant calling, using single-molecule sequencing data. For Oxford Nanopore Technology data, Clair achieves better precision, recall and speed than several competing programs, including Clairvoyante, Longshot and Medaka. Through studying the missed variants and benchmarking intentionally overfitted models, we found that Clair may be approaching the limit of possible accuracy for germline small variant calling using pileup data and deep neural networks. Clair requires only a conventional central processing unit (CPU) for variant calling and is an open-source project available at

Fig. 1: Clair network architecture and layer details.
Fig. 2: ONT benchmarking results for SNPs and indels.
Fig. 3: The category distribution of FPs and FNs made by Clair in the 1:168×|2:64× experiment on ONT data and six genome browser screen captures showing examples of different categories.

Data availability

The details of and links to the reference genomes, truth variants, ONT data, PacBio CCS data and Illumina data that support the findings of this study are available in the ‘Data sources’ section of the Supplementary Notes. The variant call format files generated by Clair in this study are available at

Code availability

Clair is open source, and available at Clair is licensed under the BSD 3-Clause licence.


Author information

Authors and Affiliations



R.L. and T.-W.L. conceived the study. R.L, C.-L.W., Y.-S.W., C.-I.T., C.-M. Liu and C.-M. Leung analysed the data and wrote the paper.

Corresponding authors

Correspondence to Ruibang Luo or Tak-Wah Lam.

Luo, R., Wong, CL., Wong, YS. et al. Exploring the limit of using a deep neural network on pileup data for germline variant calling. Nat Mach Intell 2, 220–227 (2020).

