Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

A streamlined solution for processing, elucidating and quality control of cyclobutane pyrimidine dimer sequencing data

Abstract

UV radiation may lead to melanoma and nonmelanoma skin cancers by causing helix-distorting DNA damage such as cyclobutane pyrimidine dimers (CPDs). These DNA lesions, if located in important genes and not repaired promptly, are mutagenic and may eventually result in carcinogenesis. Examining CPD formation and repair processes across the genome can shed light on the mutagenesis mechanisms associated with UV damage in relevant cancers. We recently developed CPD-Seq, a high-throughput and single-nucleotide resolution sequencing technique that can specifically capture UV-induced CPD lesions across the genome. This novel technique has been increasingly used in studies of UV damage and can be adapted to sequence other clinically relevant DNA lesions. Although the library preparation protocol has been established, a systematic protocol to analyze CPD-Seq data has not been described yet. To streamline the various general or specific analysis steps, we developed a protocol named CPDSeqer to assist researchers with CPD-Seq data processing. CPDSeqer can accommodate both a single- and multiple-sample experimental design, and it allows both genome-wide analyses and regional scrutiny (such as of suspected UV damage hotspots). The runtime of CPDSeqer scales with raw data size and takes roughly 4 h per sample with the possibility of acceleration by parallel computing. Various guiding graphics are generated to help diagnose the performance of the experiment and inform regional enrichment of CPD formation. UV damage comparison analyses are set forth in three analysis scenarios, and the resulting HTML pages report damage directional trends and statistical significance. CPDSeqer can be accessed at https://github.com/shengqh/cpdseqer.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Graphic illustration of the CPD-Seq methodology.
Fig. 2: The CPDSeqer protocol is composed of 17 steps.
Fig. 3: CPD-Seq–specific QC diagnostic plots (demonstrated on Gene Expression Omnibus dataset GSE119249).
Fig. 4: Graphic output of Step 11 to show genome-wide CPD damage site distribution.
Fig. 5: Example positional CPD damage aggregate figure for a nucleosome region (output of Step 12).

Similar content being viewed by others

Data availability

All datasets (GSE1034875, GSE799773 and GSE1192496) used in the demonstration for this protocol are available through the NCBI Short Read Archive (https://www.ncbi.nlm.nih.gov/sra). All figures used in this article are original. All preprocessed resource files listed in Supplementary Data 1 are available at https://cqsweb.app.vumc.org/Data/cpdseqer/.

Code availability

This protocol, including all scripts (Shell, Python and R), is hosted at https://github.com/shengqh/cpdseqer. A comprehensive test case involving all 17 steps entails an empirical CPD-Seq dataset and corresponding stepwise testing code scripts, which are available at https://cqsweb.app.vumc.org/Data/cpdseqer/.

References

  1. Guy, G. P., Machlin, S. R., Ekwueme, D. U. & Yabroff, K. R. Prevalence and costs of skin cancer treatment in the US, 2002–2006 and 2007–2011. Am. J. Prev. Med. 48, 183–187 (2015).

    Article  Google Scholar 

  2. Mouret, S. et al. Cyclobutane pyrimidine dimers are predominant DNA lesions in whole human skin exposed to UVA radiation. Proc. Natl Acad. Sci. USA 103, 13765–13770 (2006).

    Article  CAS  Google Scholar 

  3. Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Chromosomal landscape of UV damage formation and repair at single-nucleotide resolution. Proc. Natl Acad. Sci. USA 113, 9057–9062 (2016).

    Article  CAS  Google Scholar 

  4. Mao, P., Wyrick, J. J., Roberts, S. A. & Smerdon, M. J. UV-induced DNA damage and mutagenesis in chromatin. Photochem. Photobiol. 93, 216–228 (2017).

    Article  CAS  Google Scholar 

  5. Mao, P. et al. ETS transcription factors induce a unique UV damage signature that drives recurrent mutagenesis in melanoma. Nat. Commun. 9, 2626 (2018).

    Article  Google Scholar 

  6. Elliott, K. et al. Elevated pyrimidine dimer formation at distinct genomic bases underlies promoter mutation hotspots in UV-exposed cancers. PLoS Genet. 14, e1007849 (2018).

    Article  Google Scholar 

  7. Premi, S. et al. Genomic sites hypersensitive to ultraviolet radiation. Proc. Natl Acad. Sci. USA 116, 24196–24205 (2019).

    Article  CAS  Google Scholar 

  8. Lindberg, M., Bostrom, M., Elliott, K. & Larsson, E. Intragenomic variability and extended sequence patterns in the mutational signature of ultraviolet light. Proc. Natl Acad. Sci. USA 116, 20411–20417 (2019).

    Article  CAS  Google Scholar 

  9. Brown, A. J., Mao, P., Smerdon, M. J., Wyrick, J. J. & Roberts, S. A. Nucleosome positions establish an extended mutation signature in melanoma. PLoS Genet. 14, e1007823 (2018).

    Article  Google Scholar 

  10. Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Asymmetric repair of UV damage in nucleosomes imposes a DNA strand polarity on somatic mutations in skin cancer. Genome Res. 30, 12–21 (2020).

    Article  CAS  Google Scholar 

  11. Duan, M., Selvam, K., Wyrick, J. J. & Mao, P. Genome-wide role of Rad26 in promoting transcription-coupled nucleotide excision repair in yeast chromatin. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020).

    Article  CAS  Google Scholar 

  12. Mao, P. et al. Genome-wide maps of alkylation damage, repair, and mutagenesis in yeast reveal mechanisms of mutational heterogeneity. Genome Res. 27, 1674–1684 (2017).

    Article  CAS  Google Scholar 

  13. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    Article  Google Scholar 

  14. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  Google Scholar 

  15. Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    Article  CAS  Google Scholar 

  16. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  17. Ward, C. M., To, T. H. & Pederson, S. M. ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files. Bioinformatics 36, 2587–2588 (2020).

    Article  CAS  Google Scholar 

  18. Guo, Y., Ye, F., Sheng, Q. H., Clark, T. & Samuels, D. C. Three-stage quality control strategies for DNA re-sequencing data. Brief. Bioinform. 15, 879–889 (2014).

    Article  CAS  Google Scholar 

  19. Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).

    Article  CAS  Google Scholar 

  20. Girardot, C., Scholtalbers, J., Sauer, S., Su, S. Y. & Furlong, E. E. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers. BMC Bioinformatics 17, 419 (2016).

    Article  Google Scholar 

  21. Andrews, S. A Quality Control Tool for High Throughput Sequence Data. Available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).

  22. Guo, Y. et al. Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics 103, 323–328 (2014).

    Article  CAS  Google Scholar 

  23. Yu, H. et al. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. PLoS Comput. Biol. 16, e1007968 (2020).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This study was supported by a Cancer Center Support Grant (P30CA118100) and R01ES030993-01A1 from the National Cancer Institute, funding from the National Institutes of Health (R21ES029302), a pilot grant from the UNM Center for Metals in Biology and Medicine (P20GM130422), the Bioinformatics Shared Resources and the Biostatistics Shared Resources at The Comprehensive Cancer Center. None of the funding bodies were involved in the study design; data collection, analysis or interpretation; or writing of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Q.S., H.Y. and L.J. developed code for the protocol. M.D. and J.H. performed protocol testing. H.K. provided statistical support. J.J.W. and P.M. provided knowledge support for CPD-Seq. Y.G., P.M., H.Y. and S.N. wrote the manuscript. Y.G. and P.M. supervised the project.

Corresponding authors

Correspondence to Peng Mao or Yan Guo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Ashby J. Morrison, Anna R. Poetsch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Mao, P. et al. Nat. Commun. 9, 2626 (2018): https://doi.org/10.1038/s41467-018-05064-0

Mao, P. et al. Genome Res. 30, 12–21 (2020): https://doi.org/10.1101/gr.253146.119

Duan, M. et al. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020): https://doi.org/10.1073/pnas.2003868117

Supplementary information

Supplementary Data 1

Inventory of pre-compiled resources files for CPDSeqer. Includes file name, download link, brief description and applicable steps of BED files and naked DNA normalization files.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheng, Q., Yu, H., Duan, M. et al. A streamlined solution for processing, elucidating and quality control of cyclobutane pyrimidine dimer sequencing data. Nat Protoc 16, 2190–2212 (2021). https://doi.org/10.1038/s41596-021-00496-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-021-00496-3

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer