Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A streamlined solution for processing, elucidating and quality control of cyclobutane pyrimidine dimer sequencing data


UV radiation may lead to melanoma and nonmelanoma skin cancers by causing helix-distorting DNA damage such as cyclobutane pyrimidine dimers (CPDs). These DNA lesions, if located in important genes and not repaired promptly, are mutagenic and may eventually result in carcinogenesis. Examining CPD formation and repair processes across the genome can shed light on the mutagenesis mechanisms associated with UV damage in relevant cancers. We recently developed CPD-Seq, a high-throughput and single-nucleotide resolution sequencing technique that can specifically capture UV-induced CPD lesions across the genome. This novel technique has been increasingly used in studies of UV damage and can be adapted to sequence other clinically relevant DNA lesions. Although the library preparation protocol has been established, a systematic protocol to analyze CPD-Seq data has not been described yet. To streamline the various general or specific analysis steps, we developed a protocol named CPDSeqer to assist researchers with CPD-Seq data processing. CPDSeqer can accommodate both a single- and multiple-sample experimental design, and it allows both genome-wide analyses and regional scrutiny (such as of suspected UV damage hotspots). The runtime of CPDSeqer scales with raw data size and takes roughly 4 h per sample with the possibility of acceleration by parallel computing. Various guiding graphics are generated to help diagnose the performance of the experiment and inform regional enrichment of CPD formation. UV damage comparison analyses are set forth in three analysis scenarios, and the resulting HTML pages report damage directional trends and statistical significance. CPDSeqer can be accessed at

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Graphic illustration of the CPD-Seq methodology.
Fig. 2: The CPDSeqer protocol is composed of 17 steps.
Fig. 3: CPD-Seq–specific QC diagnostic plots (demonstrated on Gene Expression Omnibus dataset GSE119249).
Fig. 4: Graphic output of Step 11 to show genome-wide CPD damage site distribution.
Fig. 5: Example positional CPD damage aggregate figure for a nucleosome region (output of Step 12).

Data availability

All datasets (GSE1034875, GSE799773 and GSE1192496) used in the demonstration for this protocol are available through the NCBI Short Read Archive ( All figures used in this article are original. All preprocessed resource files listed in Supplementary Data 1 are available at

Code availability

This protocol, including all scripts (Shell, Python and R), is hosted at A comprehensive test case involving all 17 steps entails an empirical CPD-Seq dataset and corresponding stepwise testing code scripts, which are available at


  1. 1.

    Guy, G. P., Machlin, S. R., Ekwueme, D. U. & Yabroff, K. R. Prevalence and costs of skin cancer treatment in the US, 2002–2006 and 2007–2011. Am. J. Prev. Med. 48, 183–187 (2015).

    Article  Google Scholar 

  2. 2.

    Mouret, S. et al. Cyclobutane pyrimidine dimers are predominant DNA lesions in whole human skin exposed to UVA radiation. Proc. Natl Acad. Sci. USA 103, 13765–13770 (2006).

    CAS  Article  Google Scholar 

  3. 3.

    Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Chromosomal landscape of UV damage formation and repair at single-nucleotide resolution. Proc. Natl Acad. Sci. USA 113, 9057–9062 (2016).

    CAS  Article  Google Scholar 

  4. 4.

    Mao, P., Wyrick, J. J., Roberts, S. A. & Smerdon, M. J. UV-induced DNA damage and mutagenesis in chromatin. Photochem. Photobiol. 93, 216–228 (2017).

    CAS  Article  Google Scholar 

  5. 5.

    Mao, P. et al. ETS transcription factors induce a unique UV damage signature that drives recurrent mutagenesis in melanoma. Nat. Commun. 9, 2626 (2018).

    Article  Google Scholar 

  6. 6.

    Elliott, K. et al. Elevated pyrimidine dimer formation at distinct genomic bases underlies promoter mutation hotspots in UV-exposed cancers. PLoS Genet. 14, e1007849 (2018).

    Article  Google Scholar 

  7. 7.

    Premi, S. et al. Genomic sites hypersensitive to ultraviolet radiation. Proc. Natl Acad. Sci. USA 116, 24196–24205 (2019).

    CAS  Article  Google Scholar 

  8. 8.

    Lindberg, M., Bostrom, M., Elliott, K. & Larsson, E. Intragenomic variability and extended sequence patterns in the mutational signature of ultraviolet light. Proc. Natl Acad. Sci. USA 116, 20411–20417 (2019).

    CAS  Article  Google Scholar 

  9. 9.

    Brown, A. J., Mao, P., Smerdon, M. J., Wyrick, J. J. & Roberts, S. A. Nucleosome positions establish an extended mutation signature in melanoma. PLoS Genet. 14, e1007823 (2018).

    Article  Google Scholar 

  10. 10.

    Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Asymmetric repair of UV damage in nucleosomes imposes a DNA strand polarity on somatic mutations in skin cancer. Genome Res. 30, 12–21 (2020).

    CAS  Article  Google Scholar 

  11. 11.

    Duan, M., Selvam, K., Wyrick, J. J. & Mao, P. Genome-wide role of Rad26 in promoting transcription-coupled nucleotide excision repair in yeast chromatin. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020).

    CAS  Article  Google Scholar 

  12. 12.

    Mao, P. et al. Genome-wide maps of alkylation damage, repair, and mutagenesis in yeast reveal mechanisms of mutational heterogeneity. Genome Res. 27, 1674–1684 (2017).

    CAS  Article  Google Scholar 

  13. 13.

    Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    Article  Google Scholar 

  14. 14.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  Article  Google Scholar 

  15. 15.

    Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    CAS  Article  Google Scholar 

  16. 16.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  17. 17.

    Ward, C. M., To, T. H. & Pederson, S. M. ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files. Bioinformatics 36, 2587–2588 (2020).

    CAS  Article  Google Scholar 

  18. 18.

    Guo, Y., Ye, F., Sheng, Q. H., Clark, T. & Samuels, D. C. Three-stage quality control strategies for DNA re-sequencing data. Brief. Bioinform. 15, 879–889 (2014).

    CAS  Article  Google Scholar 

  19. 19.

    Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).

    CAS  Article  Google Scholar 

  20. 20.

    Girardot, C., Scholtalbers, J., Sauer, S., Su, S. Y. & Furlong, E. E. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers. BMC Bioinformatics 17, 419 (2016).

    Article  Google Scholar 

  21. 21.

    Andrews, S. A Quality Control Tool for High Throughput Sequence Data. Available at (2010).

  22. 22.

    Guo, Y. et al. Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics 103, 323–328 (2014).

    CAS  Article  Google Scholar 

  23. 23.

    Yu, H. et al. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. PLoS Comput. Biol. 16, e1007968 (2020).

    CAS  Article  Google Scholar 

Download references


This study was supported by a Cancer Center Support Grant (P30CA118100) and R01ES030993-01A1 from the National Cancer Institute, funding from the National Institutes of Health (R21ES029302), a pilot grant from the UNM Center for Metals in Biology and Medicine (P20GM130422), the Bioinformatics Shared Resources and the Biostatistics Shared Resources at The Comprehensive Cancer Center. None of the funding bodies were involved in the study design; data collection, analysis or interpretation; or writing of the manuscript.

Author information




Q.S., H.Y. and L.J. developed code for the protocol. M.D. and J.H. performed protocol testing. H.K. provided statistical support. J.J.W. and P.M. provided knowledge support for CPD-Seq. Y.G., P.M., H.Y. and S.N. wrote the manuscript. Y.G. and P.M. supervised the project.

Corresponding authors

Correspondence to Peng Mao or Yan Guo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Ashby J. Morrison, Anna R. Poetsch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Mao, P. et al. Nat. Commun. 9, 2626 (2018):

Mao, P. et al. Genome Res. 30, 12–21 (2020):

Duan, M. et al. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020):

Supplementary information

Supplementary Data 1

Inventory of pre-compiled resources files for CPDSeqer. Includes file name, download link, brief description and applicable steps of BED files and naked DNA normalization files.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sheng, Q., Yu, H., Duan, M. et al. A streamlined solution for processing, elucidating and quality control of cyclobutane pyrimidine dimer sequencing data. Nat Protoc 16, 2190–2212 (2021).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing