Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

RNA sequencing and swarm intelligence–enhanced classification algorithm development for blood-based disease diagnostics using spliced blood platelet RNA

Abstract

Blood-based diagnostics tests, using individual or panels of biomarkers, may revolutionize disease diagnostics and enable minimally invasive therapy monitoring. However, selection of the most relevant biomarkers from liquid biosources remains an immense challenge. We recently presented the thromboSeq pipeline, which enables RNA sequencing and cancer classification via self-learning and swarm intelligence–enhanced bioinformatics algorithms using blood platelet RNA. Here, we provide the wet-lab protocol for the generation of platelet RNA-sequencing libraries and the dry-lab protocol for the development of swarm intelligence–enhanced machine-learning-based classification algorithms. The wet-lab protocol includes platelet RNA isolation, mRNA amplification, and preparation for next-generation sequencing. The dry-lab protocol describes the automated FASTQ file pre-processing to quantified gene counts, quality controls, data normalization and correction, and swarm intelligence–enhanced support vector machine (SVM) algorithm development. This protocol enables platelet RNA profiling from 500 pg of platelet RNA and allows automated and optimized biomarker panel selection. The wet-lab protocol can be performed in 5 d before sequencing, and the algorithm development can be completed in 2 d, depending on computational resources. The protocol requires basic molecular biology skills and a basic understanding of Linux and R. In all, with this protocol, we aim to enable the scientific community to test platelet RNA for diagnostic algorithm development.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of PSO-enhanced thromboSeq.
Fig. 2: Pre-analytical variables for platelet processing.
Fig. 3: Platelet total RNA and SMARTer cDNA quality assessment.
Fig. 4: SMARTer cDNA amplification and TruSeq labeling.
Fig. 5: Comparison of single-end read versus paired-end read sequencing.
Fig. 6: Progress plots of the PSO-enhanced thromboSeq algorithm.
Fig. 7: PSO-enhanced thromboSeq for lower-grade glioma diagnostics.

Similar content being viewed by others

Code availability

The thromboSeq dry-lab source code is available via GitHub (https://github.com/MyronBest/thromboSeq_source_code), and is for research purposes only.

Data availability

The HD-LGG raw sequencing data FASTQ files have been deposited in the NCBI GEO database under accession no. GSE 107868. The non-cancer NSCLC dataset is available at GSE 89843.

References

  1. Alix-Panabières, C. & Pantel, K. Clinical applications of circulating tumor cells and circulating tumor DNA as liquid biopsy. Cancer Discov. 6, 479–491 (2016).

    Article  PubMed  Google Scholar 

  2. Chan, K. C. A. et al. Noninvasive detection of cancer-associated genome-wide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing. Proc. Natl Acad. Sci. USA 110, 18761–18768 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).

    Article  CAS  PubMed  Google Scholar 

  5. Fahrmann, J. F. et al. Investigation of metabolomic blood biomarkers for detection of adenocarcinoma lung cancer. Cancer Epidemiol. Biomarkers Prev. 24, 1716–1723 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Skog, J. et al. Glioblastoma microvesicles transport RNA and proteins that promote tumour growth and provide diagnostic biomarkers. Nat. Cell Biol. 10, 1470–1476 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Cohen, J. D. et al. Combined circulating tumor DNA and protein biomarker-based liquid biopsy for the earlier detection of pancreatic cancers. Proc. Natl Acad. Sci. USA 114, 10202–10207 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Rowley, J. W. et al. Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes. Blood 118, e101–e111 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Schubert, S., Weyrich, A. S. & Rowley, J. W. A tour through the transcriptional landscape of platelets. Blood 124, 493–502 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Dittrich, M. et al. Analysis of SAGE data in human platelets: features of the transcriptome in an anucleate cell. Thromb. Haemost. 95, 643–651 (2006).

    Article  CAS  PubMed  Google Scholar 

  12. Bray, P. F. et al. The complex transcriptional landscape of the anucleate human platelet. BMC Genomics 14, 1 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kissopoulou, A., Jonasson, J., Lindahl, T. L. & Osman, A. Next generation sequencing analysis of human platelet PolyA+ mRNAs and rRNA-depleted total RNA. PLoS ONE 8, e81809 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Alhasan, A. A. et al. Circular RNA enrichment in platelets is a signature of transcriptome degradation. Blood 127, e1–e11 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Landry, P. et al. Existence of a microRNA pathway in anucleate platelets. Nat. Struct. Mol. Biol. 16, 961–966 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Boilard, E. et al. Platelets amplify inflammation in arthritis via collagen-dependent microparticle production. Science 327, 580–583 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. McAllister, S. S. & Weinberg, R. A. The tumour-induced systemic environment as a critical regulator of cancer progression and metastasis. Nat. Cell Biol. 16, 717–727 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Gnatenko, D. V. et al. Transcript pro ling of human platelets using microarray and serial analysis of gene expression. Clin. Res. 101, 2285–2293 (2003).

    CAS  Google Scholar 

  19. McRedmond, J. P. et al. Integration of proteomics and genomics in platelets: a profile of platelet proteins and platelet-specific genes. Mol. Cell. Proteomics 3, 133–144 (2004).

    Article  CAS  PubMed  Google Scholar 

  20. Simon, L. M. et al. Human platelet microRNA-mRNA networks associated with age and gender revealed by integrated plateletomics. Blood 123, e37–e45 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Rox, J. M. et al. Gene expression analysis in platelets from a single donor: evaluation of a PCR-based amplification technique. Clin. Chem. 50, 2271–2278 (2004).

    Article  CAS  PubMed  Google Scholar 

  22. Rolf, N. Optimized procedure for platelet RNA profiling from blood samples with limited platelet numbers. Clin. Chem. 51, 1078–1080 (2005).

    Article  CAS  PubMed  Google Scholar 

  23. Edelstein, L. C. et al. Racial differences in human platelet PAR4 reactivity reflect expression of PCTP and miR-376c. Nat. Med. 19, 1609–1616 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Nilsson, R. J. A. et al. Blood platelets contain tumor-derived RNA biomarkers. Blood 118, 3680–3683 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Calverley, D. C. et al. Significant downregulation of platelet gene expression in metastatic lung cancer. Clin. Transl. Sci. 3, 227–232 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Best, M. G. et al. RNA-seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics. Cancer Cell 28, 666–676 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Best, M. G. et al. Swarm intelligence-enhanced detection of non-small-cell lung cancer using tumor-educated platelets. Cancer Cell 32, 238–252 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Takagi, S. et al. Platelets enhance multiple myeloma progression via IL-1β upregulation. Clin. Cancer Res. 24, 2430–2439 (2018).

    Article  CAS  PubMed  Google Scholar 

  30. Zhang, Q. et al. RNA sequencing enables systematic identification of platelet transcriptomic alterations in NSCLC patients. Biomed. Pharmacother. 105, 204–214 (2018).

    Article  CAS  PubMed  Google Scholar 

  31. Clancy, L., Beaulieu, L. M., Tanriverdi, K. & Freedman, J. E. The role of RNA uptake in platelet heterogeneity. Thromb. Haemost. 117, 948–961 (2017).

    Article  PubMed  Google Scholar 

  32. Eicher, J. D. et al. Characterization of the platelet transcriptome by RNA sequencing in patients with acute myocardial infarction. Platelets 27, 230–239 (2016).

    Article  CAS  PubMed  Google Scholar 

  33. Wrzyszcz, A., Urbaniak, J., Sapa, A. & Woźniak, M. An efficient method for isolation of representative and contamination-free population of blood platelets for proteomic studies. Platelets 28, 43–53 (2017).

    Article  CAS  PubMed  Google Scholar 

  34. Abdel-Ghany, S. E. et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 7, 11706 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).

    Article  CAS  PubMed  Google Scholar 

  36. Warren, S. Simultaneous, multiplexed detection of RNA and protein on the NanoString® nCounter® platform. in Gene Expression Analysis: Methods and Protocols (eds. Raghavachari, N. & Garcia-Reyero, N.) 105–120 (Springer, Clifton, NJ, 2018).

  37. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  38. Ramaswamy, S. et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Yeang, C. H. et al. Molecular classification of multiple tumor types. Bioinformatics 17(Suppl. 1), S316–S322 (2001).

    Article  PubMed  Google Scholar 

  40. Alshamlan, H. M., Badr, G. H. & Alohali, Y. A. Genetic Bee Colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput. Biol. Chem. 56, 49–60 (2015).

    Article  CAS  PubMed  Google Scholar 

  41. Xi, M., Sun, J., Liu, L., Fan, F. & Wu, X. Cancer feature selection and classification using a binary quantum-behaved particle swarm optimization and support vector machine. Comput. Math. Methods Med. 2016, 1–9 (2016).

    Article  Google Scholar 

  42. Mukherjee, S. et al. Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10, 119–142 (2003).

    Article  CAS  PubMed  Google Scholar 

  43. Banfi, G., Salvagno, G. L. & Lippi, G. The role of ethylenediamine tetraacetic acid (EDTA) as in vitro anticoagulant for diagnostic purposes. Clin. Chem. Lab. Med. 45, 565–576 (2007).

    Article  CAS  PubMed  Google Scholar 

  44. Davila, J. I. et al. Impact of RNA degradation on fusion detection by RNA-seq. BMC Genomics 17, 814 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).

    Article  CAS  PubMed  Google Scholar 

  46. Teruel-Montoya, R. et al. MicroRNA expression differences in human hematopoietic cell lineages enable regulated transgene expression. PLoS ONE 9, e102259 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Trichler, S. A., Bulla, S. C., Thomason, J., Lunsford, K. V. & Bulla, C. Ultra-pure platelet isolation from canine whole blood. BMC Vet. Res. 9, 144 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Li, X., Mauro, M. & Williams, Z. Comparison of plasma extracellular RNA isolation kits reveals kit-dependent biases. Biotechniques 59, 13–17 (2015).

    Article  CAS  PubMed  Google Scholar 

  49. Adiconis, X. et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Google Scholar 

  52. Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Langmead, B. et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).

    Article  CAS  PubMed  Google Scholar 

  56. Lever, J., Krzywinski, M. & Altman, N. Points of significance: classification evaluation. Nat. Methods 13, 603–604 (2016).

    Article  CAS  Google Scholar 

  57. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Li, X. et al. A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data. PLoS ONE 12, e0176185 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Bonyadi, M. R. & Michalewicz, Z. Particle swarm optimization for single objective continuous space problems: a review. Evol. Comput. 25, 1–54 (2017).

    Article  PubMed  Google Scholar 

  60. Kennedy, J. F. & Eberhart, R. C. Particle swarm optimization. in Proceedings of the 1995 IEEE International Conference on Neural Networks Vol. 4, 1942–1948 (1995).

  61. Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Financial support was provided by European Research Council grants 713727 and 336540 (T.W.), Dutch Organisation of Scientific Research grant 91711366 (T.W.), the Dutch Cancer Society (T.W.), Horizon 2020 Marie-Curie European Liquid Biopsy Academy grant 765492 (T.W.), and Stichting STOPHersentumoren.nl (M.G.B., N.S., T.W.). We are thankful to F. Rustenburg, H. Verschueren, E. Post, T. Lagerweij, P. Schellen, L.E. Wedekind, I.E. Kooi, D. Vessies, D. van den Broek, B. Ylstra, J.C. Reijneveld, D.P. Noske, W.P. Vandertop, and P. Wesseling for their contributions. We thank R.J.A. Nilsson, L. Köhn, M. Arkani, and C. Oudejans for testing the thromboSeq software.

Author information

Authors and Affiliations

Authors

Contributions

M.G.B. and T.W. designed the thromboSeq pipeline and wrote the manuscript. S.G.J.G.I.t.V. and N.S. contributed to the dry- and wet-lab protocol design.

Corresponding authors

Correspondence to Myron G. Best or Thomas Wurdinger.

Ethics declarations

Competing interests

M.G.B. and T.W. are inventors on relevant patent applications. T.W. received funding from Illumina and is a shareholder of GRAIL, Inc.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references that use this protocol

Best, M. G. et al. Cancer Cell 28, 666–676 (2015): https://doi.org/10.1016/j.ccell.2015.09.018

Best, M. G. et al. Cancer Cell 32, 238.e9–252.e9 (2017): https://doi.org/10.1016/j.ccell.2017.07.004

Integrated supplementary information

Supplementary Figure 1 CD45 depletion of platelet preparations processed according to the thromboSeq protocol.

(A) Summary of flow cytometry experiment indicating the number of nucleated cells and platelets detected in platelet preparations (n=3 healthy individuals) isolated with and without a CD45-depletion step (EasySep, StemCell Technologies, #29037). Samples were stained with anti-human CD42b-FITC (Beckman Coulter, IM0648U) and the DNA-marker TOTO-3 (Thermo Fisher Scientific), and quantified using the BD LSRFortessa X-20. Based on the FSC/SSC-gating cellular and platelet fractions were identified. Nucleated cells were distinguished by TOTO-3 DNA positivity. Though CD45 depletion results in no nucleated cells detected, the number of identified platelets was reduced significantly. (B) Representative Agilent Bioanalyzer Picochip analysis of platelet preparations without (left) and with (right) an EasySep CD45-depletion step. CD45 depletion results in reduced platelet and platelet total RNA yield. (C) Read counts per one million total spliced reads mapping to the CD45 gene in both the HD-LGG dataset and publicly available non-cancer versus NSCLC dataset. This data indicates that remaining CD45 transcripts can be detected in the thromboSeq FASTQ-files.

Supplementary Figure 2 Examples of Agilent Bioanalyzer traces.

Shown are representative examples of Agilent Bioanalyzer traces related to the section. (A) Incorrect marker recognized by Agilent Bioanalyzer software. The 5S peak was detected as the reference marker due to total RNA overload (left). Dilution of the total RNA in nuclease-free H2O resulted in a high-quality RNA Bioanalyzer trace (right). (B) Appearance of a degraded RNA profile due to Picochip overload. The Agilent Bioanalyzer RNA Picochip was overloaded with total RNA, resulting in a Bioanalyzer trace similar to that of samples with degraded RNA (left). Notice the reference marker shows low fluorescence signal. Dilution of the sample in nuclease-free H2O resulted in a high-quality RNA Bioanalyzer trace (right). (C) Skewed Agilent Bioanalyzer profiles. Gel electrophoresis traces of skewed profiles as obtained from the Agilent Bioanalyzer software, likely due to pin contamination (left). Gel electrophoresis traces of a successfully analyzed Bioanalyzer RNA Picochip. (D) Incorrect marker recognized by Agilent Bioanalyzer software. An incorrect peak of the Bioanalyzer RNA Picochip trace was selected as reference marker (left). Selection of the correct peak as the reference marker results in correct quantification of the total RNA isolate.

Supplementary information

Supplementary Figures and Text

Supplementary Figures 1 and 2 and Supplementary Manual

Reporting Summary

Supplementary Tables 1 and 2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Best, M.G., In ’t Veld, S.G.J.G., Sol, N. et al. RNA sequencing and swarm intelligence–enhanced classification algorithm development for blood-based disease diagnostics using spliced blood platelet RNA. Nat Protoc 14, 1206–1234 (2019). https://doi.org/10.1038/s41596-019-0139-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-019-0139-5

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing