UV radiation may lead to melanoma and nonmelanoma skin cancers by causing helix-distorting DNA damage such as cyclobutane pyrimidine dimers (CPDs). These DNA lesions, if located in important genes and not repaired promptly, are mutagenic and may eventually result in carcinogenesis. Examining CPD formation and repair processes across the genome can shed light on the mutagenesis mechanisms associated with UV damage in relevant cancers. We recently developed CPD-Seq, a high-throughput and single-nucleotide resolution sequencing technique that can specifically capture UV-induced CPD lesions across the genome. This novel technique has been increasingly used in studies of UV damage and can be adapted to sequence other clinically relevant DNA lesions. Although the library preparation protocol has been established, a systematic protocol to analyze CPD-Seq data has not been described yet. To streamline the various general or specific analysis steps, we developed a protocol named CPDSeqer to assist researchers with CPD-Seq data processing. CPDSeqer can accommodate both a single- and multiple-sample experimental design, and it allows both genome-wide analyses and regional scrutiny (such as of suspected UV damage hotspots). The runtime of CPDSeqer scales with raw data size and takes roughly 4 h per sample with the possibility of acceleration by parallel computing. Various guiding graphics are generated to help diagnose the performance of the experiment and inform regional enrichment of CPD formation. UV damage comparison analyses are set forth in three analysis scenarios, and the resulting HTML pages report damage directional trends and statistical significance. CPDSeqer can be accessed at https://github.com/shengqh/cpdseqer.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All datasets (GSE1034875, GSE799773 and GSE1192496) used in the demonstration for this protocol are available through the NCBI Short Read Archive (https://www.ncbi.nlm.nih.gov/sra). All figures used in this article are original. All preprocessed resource files listed in Supplementary Data 1 are available at https://cqsweb.app.vumc.org/Data/cpdseqer/.
This protocol, including all scripts (Shell, Python and R), is hosted at https://github.com/shengqh/cpdseqer. A comprehensive test case involving all 17 steps entails an empirical CPD-Seq dataset and corresponding stepwise testing code scripts, which are available at https://cqsweb.app.vumc.org/Data/cpdseqer/.
Guy, G. P., Machlin, S. R., Ekwueme, D. U. & Yabroff, K. R. Prevalence and costs of skin cancer treatment in the US, 2002–2006 and 2007–2011. Am. J. Prev. Med. 48, 183–187 (2015).
Mouret, S. et al. Cyclobutane pyrimidine dimers are predominant DNA lesions in whole human skin exposed to UVA radiation. Proc. Natl Acad. Sci. USA 103, 13765–13770 (2006).
Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Chromosomal landscape of UV damage formation and repair at single-nucleotide resolution. Proc. Natl Acad. Sci. USA 113, 9057–9062 (2016).
Mao, P., Wyrick, J. J., Roberts, S. A. & Smerdon, M. J. UV-induced DNA damage and mutagenesis in chromatin. Photochem. Photobiol. 93, 216–228 (2017).
Mao, P. et al. ETS transcription factors induce a unique UV damage signature that drives recurrent mutagenesis in melanoma. Nat. Commun. 9, 2626 (2018).
Elliott, K. et al. Elevated pyrimidine dimer formation at distinct genomic bases underlies promoter mutation hotspots in UV-exposed cancers. PLoS Genet. 14, e1007849 (2018).
Premi, S. et al. Genomic sites hypersensitive to ultraviolet radiation. Proc. Natl Acad. Sci. USA 116, 24196–24205 (2019).
Lindberg, M., Bostrom, M., Elliott, K. & Larsson, E. Intragenomic variability and extended sequence patterns in the mutational signature of ultraviolet light. Proc. Natl Acad. Sci. USA 116, 20411–20417 (2019).
Brown, A. J., Mao, P., Smerdon, M. J., Wyrick, J. J. & Roberts, S. A. Nucleosome positions establish an extended mutation signature in melanoma. PLoS Genet. 14, e1007823 (2018).
Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Asymmetric repair of UV damage in nucleosomes imposes a DNA strand polarity on somatic mutations in skin cancer. Genome Res. 30, 12–21 (2020).
Duan, M., Selvam, K., Wyrick, J. J. & Mao, P. Genome-wide role of Rad26 in promoting transcription-coupled nucleotide excision repair in yeast chromatin. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020).
Mao, P. et al. Genome-wide maps of alkylation damage, repair, and mutagenesis in yeast reveal mechanisms of mutational heterogeneity. Genome Res. 27, 1674–1684 (2017).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ward, C. M., To, T. H. & Pederson, S. M. ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files. Bioinformatics 36, 2587–2588 (2020).
Guo, Y., Ye, F., Sheng, Q. H., Clark, T. & Samuels, D. C. Three-stage quality control strategies for DNA re-sequencing data. Brief. Bioinform. 15, 879–889 (2014).
Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).
Girardot, C., Scholtalbers, J., Sauer, S., Su, S. Y. & Furlong, E. E. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers. BMC Bioinformatics 17, 419 (2016).
Andrews, S. A Quality Control Tool for High Throughput Sequence Data. Available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
Guo, Y. et al. Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics 103, 323–328 (2014).
Yu, H. et al. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. PLoS Comput. Biol. 16, e1007968 (2020).
This study was supported by a Cancer Center Support Grant (P30CA118100) and R01ES030993-01A1 from the National Cancer Institute, funding from the National Institutes of Health (R21ES029302), a pilot grant from the UNM Center for Metals in Biology and Medicine (P20GM130422), the Bioinformatics Shared Resources and the Biostatistics Shared Resources at The Comprehensive Cancer Center. None of the funding bodies were involved in the study design; data collection, analysis or interpretation; or writing of the manuscript.
The authors declare no competing interests.
Peer review information Nature Protocols thanks Ashby J. Morrison, Anna R. Poetsch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Mao, P. et al. Nat. Commun. 9, 2626 (2018): https://doi.org/10.1038/s41467-018-05064-0
Mao, P. et al. Genome Res. 30, 12–21 (2020): https://doi.org/10.1101/gr.253146.119
Duan, M. et al. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020): https://doi.org/10.1073/pnas.2003868117
About this article
Cite this article
Sheng, Q., Yu, H., Duan, M. et al. A streamlined solution for processing, elucidating and quality control of cyclobutane pyrimidine dimer sequencing data. Nat Protoc 16, 2190–2212 (2021). https://doi.org/10.1038/s41596-021-00496-3