Protocol | Published:

RNA sequencing and swarm intelligence–enhanced classification algorithm development for blood-based disease diagnostics using spliced blood platelet RNA

Abstract

Blood-based diagnostics tests, using individual or panels of biomarkers, may revolutionize disease diagnostics and enable minimally invasive therapy monitoring. However, selection of the most relevant biomarkers from liquid biosources remains an immense challenge. We recently presented the thromboSeq pipeline, which enables RNA sequencing and cancer classification via self-learning and swarm intelligence–enhanced bioinformatics algorithms using blood platelet RNA. Here, we provide the wet-lab protocol for the generation of platelet RNA-sequencing libraries and the dry-lab protocol for the development of swarm intelligence–enhanced machine-learning-based classification algorithms. The wet-lab protocol includes platelet RNA isolation, mRNA amplification, and preparation for next-generation sequencing. The dry-lab protocol describes the automated FASTQ file pre-processing to quantified gene counts, quality controls, data normalization and correction, and swarm intelligence–enhanced support vector machine (SVM) algorithm development. This protocol enables platelet RNA profiling from 500 pg of platelet RNA and allows automated and optimized biomarker panel selection. The wet-lab protocol can be performed in 5 d before sequencing, and the algorithm development can be completed in 2 d, depending on computational resources. The protocol requires basic molecular biology skills and a basic understanding of Linux and R. In all, with this protocol, we aim to enable the scientific community to test platelet RNA for diagnostic algorithm development.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Code availability

The thromboSeq dry-lab source code is available via GitHub (https://github.com/MyronBest/thromboSeq_source_code), and is for research purposes only.

Data availability

The HD-LGG raw sequencing data FASTQ files have been deposited in the NCBI GEO database under accession no. GSE 107868. The non-cancer NSCLC dataset is available at GSE 89843.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references that use this protocol

Best, M. G. et al. Cancer Cell 28, 666–676 (2015): https://doi.org/10.1016/j.ccell.2015.09.018

Best, M. G. et al. Cancer Cell 32, 238.e9–252.e9 (2017): https://doi.org/10.1016/j.ccell.2017.07.004

References

  1. 1.

    Alix-Panabières, C. & Pantel, K. Clinical applications of circulating tumor cells and circulating tumor DNA as liquid biopsy. Cancer Discov. 6, 479–491 (2016).

  2. 2.

    Chan, K. C. A. et al. Noninvasive detection of cancer-associated genome-wide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing. Proc. Natl Acad. Sci. USA 110, 18761–18768 (2013).

  3. 3.

    Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).

  4. 4.

    Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).

  5. 5.

    Fahrmann, J. F. et al. Investigation of metabolomic blood biomarkers for detection of adenocarcinoma lung cancer. Cancer Epidemiol. Biomarkers Prev. 24, 1716–1723 (2015).

  6. 6.

    Skog, J. et al. Glioblastoma microvesicles transport RNA and proteins that promote tumour growth and provide diagnostic biomarkers. Nat. Cell Biol. 10, 1470–1476 (2008).

  7. 7.

    Cohen, J. D. et al. Combined circulating tumor DNA and protein biomarker-based liquid biopsy for the earlier detection of pancreatic cancers. Proc. Natl Acad. Sci. USA 114, 10202–10207 (2017).

  8. 8.

    Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).

  9. 9.

    Rowley, J. W. et al. Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes. Blood 118, e101–e111 (2011).

  10. 10.

    Schubert, S., Weyrich, A. S. & Rowley, J. W. A tour through the transcriptional landscape of platelets. Blood 124, 493–502 (2014).

  11. 11.

    Dittrich, M. et al. Analysis of SAGE data in human platelets: features of the transcriptome in an anucleate cell. Thromb. Haemost. 95, 643–651 (2006).

  12. 12.

    Bray, P. F. et al. The complex transcriptional landscape of the anucleate human platelet. BMC Genomics 14, 1 (2013).

  13. 13.

    Kissopoulou, A., Jonasson, J., Lindahl, T. L. & Osman, A. Next generation sequencing analysis of human platelet PolyA+ mRNAs and rRNA-depleted total RNA. PLoS ONE 8, e81809 (2013).

  14. 14.

    Alhasan, A. A. et al. Circular RNA enrichment in platelets is a signature of transcriptome degradation. Blood 127, e1–e11 (2016).

  15. 15.

    Landry, P. et al. Existence of a microRNA pathway in anucleate platelets. Nat. Struct. Mol. Biol. 16, 961–966 (2009).

  16. 16.

    Boilard, E. et al. Platelets amplify inflammation in arthritis via collagen-dependent microparticle production. Science 327, 580–583 (2010).

  17. 17.

    McAllister, S. S. & Weinberg, R. A. The tumour-induced systemic environment as a critical regulator of cancer progression and metastasis. Nat. Cell Biol. 16, 717–727 (2014).

  18. 18.

    Gnatenko, D. V. et al. Transcript pro ling of human platelets using microarray and serial analysis of gene expression. Clin. Res. 101, 2285–2293 (2003).

  19. 19.

    McRedmond, J. P. et al. Integration of proteomics and genomics in platelets: a profile of platelet proteins and platelet-specific genes. Mol. Cell. Proteomics 3, 133–144 (2004).

  20. 20.

    Simon, L. M. et al. Human platelet microRNA-mRNA networks associated with age and gender revealed by integrated plateletomics. Blood 123, e37–e45 (2014).

  21. 21.

    Rox, J. M. et al. Gene expression analysis in platelets from a single donor: evaluation of a PCR-based amplification technique. Clin. Chem. 50, 2271–2278 (2004).

  22. 22.

    Rolf, N. Optimized procedure for platelet RNA profiling from blood samples with limited platelet numbers. Clin. Chem. 51, 1078–1080 (2005).

  23. 23.

    Edelstein, L. C. et al. Racial differences in human platelet PAR4 reactivity reflect expression of PCTP and miR-376c. Nat. Med. 19, 1609–1616 (2013).

  24. 24.

    Nilsson, R. J. A. et al. Blood platelets contain tumor-derived RNA biomarkers. Blood 118, 3680–3683 (2011).

  25. 25.

    Calverley, D. C. et al. Significant downregulation of platelet gene expression in metastatic lung cancer. Clin. Transl. Sci. 3, 227–232 (2010).

  26. 26.

    Best, M. G. et al. RNA-seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics. Cancer Cell 28, 666–676 (2015).

  27. 27.

    Best, M. G. et al. Swarm intelligence-enhanced detection of non-small-cell lung cancer using tumor-educated platelets. Cancer Cell 32, 238–252 (2017).

  28. 28.

    Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

  29. 29.

    Takagi, S. et al. Platelets enhance multiple myeloma progression via IL-1β upregulation. Clin. Cancer Res. 24, 2430–2439 (2018).

  30. 30.

    Zhang, Q. et al. RNA sequencing enables systematic identification of platelet transcriptomic alterations in NSCLC patients. Biomed. Pharmacother. 105, 204–214 (2018).

  31. 31.

    Clancy, L., Beaulieu, L. M., Tanriverdi, K. & Freedman, J. E. The role of RNA uptake in platelet heterogeneity. Thromb. Haemost. 117, 948–961 (2017).

  32. 32.

    Eicher, J. D. et al. Characterization of the platelet transcriptome by RNA sequencing in patients with acute myocardial infarction. Platelets 27, 230–239 (2016).

  33. 33.

    Wrzyszcz, A., Urbaniak, J., Sapa, A. & Woźniak, M. An efficient method for isolation of representative and contamination-free population of blood platelets for proteomic studies. Platelets 28, 43–53 (2017).

  34. 34.

    Abdel-Ghany, S. E. et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 7, 11706 (2016).

  35. 35.

    Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).

  36. 36.

    Warren, S. Simultaneous, multiplexed detection of RNA and protein on the NanoString® nCounter® platform. in Gene Expression Analysis: Methods and Protocols (eds. Raghavachari, N. & Garcia-Reyero, N.) 105–120 (Springer, Clifton, NJ, 2018).

  37. 37.

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

  38. 38.

    Ramaswamy, S. et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001).

  39. 39.

    Yeang, C. H. et al. Molecular classification of multiple tumor types. Bioinformatics 17(Suppl. 1), S316–S322 (2001).

  40. 40.

    Alshamlan, H. M., Badr, G. H. & Alohali, Y. A. Genetic Bee Colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput. Biol. Chem. 56, 49–60 (2015).

  41. 41.

    Xi, M., Sun, J., Liu, L., Fan, F. & Wu, X. Cancer feature selection and classification using a binary quantum-behaved particle swarm optimization and support vector machine. Comput. Math. Methods Med. 2016, 1–9 (2016).

  42. 42.

    Mukherjee, S. et al. Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10, 119–142 (2003).

  43. 43.

    Banfi, G., Salvagno, G. L. & Lippi, G. The role of ethylenediamine tetraacetic acid (EDTA) as in vitro anticoagulant for diagnostic purposes. Clin. Chem. Lab. Med. 45, 565–576 (2007).

  44. 44.

    Davila, J. I. et al. Impact of RNA degradation on fusion detection by RNA-seq. BMC Genomics 17, 814 (2016).

  45. 45.

    Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).

  46. 46.

    Teruel-Montoya, R. et al. MicroRNA expression differences in human hematopoietic cell lineages enable regulated transgene expression. PLoS ONE 9, e102259 (2014).

  47. 47.

    Trichler, S. A., Bulla, S. C., Thomason, J., Lunsford, K. V. & Bulla, C. Ultra-pure platelet isolation from canine whole blood. BMC Vet. Res. 9, 144 (2013).

  48. 48.

    Li, X., Mauro, M. & Williams, Z. Comparison of plasma extracellular RNA isolation kits reveals kit-dependent biases. Biotechniques 59, 13–17 (2015).

  49. 49.

    Adiconis, X. et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).

  50. 50.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  51. 51.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  52. 52.

    Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2014).

  53. 53.

    Langmead, B. et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  54. 54.

    Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

  55. 55.

    Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).

  56. 56.

    Lever, J., Krzywinski, M. & Altman, N. Points of significance: classification evaluation. Nat. Methods 13, 603–604 (2016).

  57. 57.

    Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

  58. 58.

    Li, X. et al. A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data. PLoS ONE 12, e0176185 (2017).

  59. 59.

    Bonyadi, M. R. & Michalewicz, Z. Particle swarm optimization for single objective continuous space problems: a review. Evol. Comput. 25, 1–54 (2017).

  60. 60.

    Kennedy, J. F. & Eberhart, R. C. Particle swarm optimization. in Proceedings of the 1995 IEEE International Conference on Neural Networks Vol. 4, 1942–1948 (1995).

  61. 61.

    Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

  62. 62.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

Download references

Acknowledgements

Financial support was provided by European Research Council grants 713727 and 336540 (T.W.), Dutch Organisation of Scientific Research grant 91711366 (T.W.), the Dutch Cancer Society (T.W.), Horizon 2020 Marie-Curie European Liquid Biopsy Academy grant 765492 (T.W.), and Stichting STOPHersentumoren.nl (M.G.B., N.S., T.W.). We are thankful to F. Rustenburg, H. Verschueren, E. Post, T. Lagerweij, P. Schellen, L.E. Wedekind, I.E. Kooi, D. Vessies, D. van den Broek, B. Ylstra, J.C. Reijneveld, D.P. Noske, W.P. Vandertop, and P. Wesseling for their contributions. We thank R.J.A. Nilsson, L. Köhn, M. Arkani, and C. Oudejans for testing the thromboSeq software.

Author information

M.G.B. and T.W. designed the thromboSeq pipeline and wrote the manuscript. S.G.J.G.I.t.V. and N.S. contributed to the dry- and wet-lab protocol design.

Competing interests

M.G.B. and T.W. are inventors on relevant patent applications. T.W. received funding from Illumina and is a shareholder of GRAIL, Inc.

Correspondence to Myron G. Best or Thomas Wurdinger.

Integrated supplementary information

  1. Supplementary Figure 1 CD45 depletion of platelet preparations processed according to the thromboSeq protocol.

    (A) Summary of flow cytometry experiment indicating the number of nucleated cells and platelets detected in platelet preparations (n=3 healthy individuals) isolated with and without a CD45-depletion step (EasySep, StemCell Technologies, #29037). Samples were stained with anti-human CD42b-FITC (Beckman Coulter, IM0648U) and the DNA-marker TOTO-3 (Thermo Fisher Scientific), and quantified using the BD LSRFortessa X-20. Based on the FSC/SSC-gating cellular and platelet fractions were identified. Nucleated cells were distinguished by TOTO-3 DNA positivity. Though CD45 depletion results in no nucleated cells detected, the number of identified platelets was reduced significantly. (B) Representative Agilent Bioanalyzer Picochip analysis of platelet preparations without (left) and with (right) an EasySep CD45-depletion step. CD45 depletion results in reduced platelet and platelet total RNA yield. (C) Read counts per one million total spliced reads mapping to the CD45 gene in both the HD-LGG dataset and publicly available non-cancer versus NSCLC dataset. This data indicates that remaining CD45 transcripts can be detected in the thromboSeq FASTQ-files.

  2. Supplementary Figure 2 Examples of Agilent Bioanalyzer traces.

    Shown are representative examples of Agilent Bioanalyzer traces related to the section. (A) Incorrect marker recognized by Agilent Bioanalyzer software. The 5S peak was detected as the reference marker due to total RNA overload (left). Dilution of the total RNA in nuclease-free H2O resulted in a high-quality RNA Bioanalyzer trace (right). (B) Appearance of a degraded RNA profile due to Picochip overload. The Agilent Bioanalyzer RNA Picochip was overloaded with total RNA, resulting in a Bioanalyzer trace similar to that of samples with degraded RNA (left). Notice the reference marker shows low fluorescence signal. Dilution of the sample in nuclease-free H2O resulted in a high-quality RNA Bioanalyzer trace (right). (C) Skewed Agilent Bioanalyzer profiles. Gel electrophoresis traces of skewed profiles as obtained from the Agilent Bioanalyzer software, likely due to pin contamination (left). Gel electrophoresis traces of a successfully analyzed Bioanalyzer RNA Picochip. (D) Incorrect marker recognized by Agilent Bioanalyzer software. An incorrect peak of the Bioanalyzer RNA Picochip trace was selected as reference marker (left). Selection of the correct peak as the reference marker results in correct quantification of the total RNA isolate.

Supplementary information

  1. Supplementary Figures and Text

    Supplementary Figures 1 and 2 and Supplementary Manual

  2. Reporting Summary

  3. Supplementary Tables 1 and 2

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: Overview of PSO-enhanced thromboSeq.
Fig. 2: Pre-analytical variables for platelet processing.
Fig. 3: Platelet total RNA and SMARTer cDNA quality assessment.
Fig. 4: SMARTer cDNA amplification and TruSeq labeling.
Fig. 5: Comparison of single-end read versus paired-end read sequencing.
Fig. 6: Progress plots of the PSO-enhanced thromboSeq algorithm.
Fig. 7: PSO-enhanced thromboSeq for lower-grade glioma diagnostics.
Supplementary Figure 1: CD45 depletion of platelet preparations processed according to the thromboSeq protocol.
Supplementary Figure 2: Examples of Agilent Bioanalyzer traces.

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.