Primer design represents a widely employed gambit in diverse molecular applications including PCR, sequencing and probe hybridization. Variations of PCR, including primer walking, allele-specific PCR and nested PCR provide specialized validation and detection protocols for molecular analyses that often require screening large numbers of DNA fragments. In these cases, automated sequence retrieval and processing become important features and furthermore, a graphic that provides the user with a visual guide to the distribution of designed primers across targets is most helpful in quickly ascertaining primer coverage. To this end, I describe here, PrimerMapper, which provides a comprehensive graphical user interface that designs robust primers from any number of inputted sequences while providing the user with both, graphical maps of primer distribution for each inputted sequence and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also enables the visualization of graphical maps within a browser and allows the user to draw new primers directly onto the webpage. Other features of PrimerMapper include allele-specific design features for SNP genotyping, a remote BLAST window to NCBI databases and remote sequence retrieval from GenBank and dbSNP. PrimerMapper is hosted at GitHub and freely available without restriction.
PCR is a widely employed and indispensable tool for an extensive and ever-growing number of molecular applications1. These applications include protocols in diverse fields such as biomedical research, forensic science, as well as phylogenetic analysis2,3,4,5. PCR has also been repeatedly tinkered, resulting in creative and versatile modifications. Some of these variants include multiplex PCR, nested PCR, primer walking, DNA cloning, assembly PCR, overlapping PCR, allele-specific PCR, loop-mediated isothermal amplification (LAMP)6 and digital PCR7. In each case, primer design is the first step in building robust and effective PCR-based experiments. Effective primer design requires numerous calculations, including melting temperature (Tm), GC%, self-complementarity and hairpin formation, which are each derived from a candidate primer’s size and sequence. Manual primer design is laborious and time-consuming and as a result, automated primer design has become a requisite tool in the PCR arsenal8,9,10,11,12,13,14,15,16,17,18,19,20.
The diversification of PCR-based methodologies coupled with a rapid expansion in available genomic data has put further demands on the scale and utility of primer design software. In particular, batch primer design has become an important feature of primer design tools to accommodate screening large numbers of genes and datasets. However, when designing primers for large numbers of sequences, automated sequence retrieval and processing becomes important and furthermore, graphical outputs depicting the map of sequence/primer position is often the best way to rapidly validate primer coverage and distribution. To meets these demands, I describe here a program called PrimerMapper that provides a comprehensive graphical user interface that facilitates the design of robust primers from any number of inputted sequences while providing the user with graphical outputs of primer maps to each inputted sequence and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also permits the user to visualize each primer sequence map in a browser and allows the user to draw new primers in specific locations directly onto the webpage. Other features of PrimerMapper include primer design tools for SNP genotyping, a remote BLAST facility and also remote sequence retrieval from dbSNP and GenBank.
PrimerMapper will be helpful for researchers working with large datasets where primers must be efficiently designed for many genes or SNPs, as well as for various PCR-based applications including primer walking, nested PCR, sequence-specific probe construction and assembly PCR reactions, where graphical outputs can quickly help users determine primer distribution.
Results and Discussion
Generating graphics that map each designed primer onto the target sequence provides easy and fast validation controls for researchers to examine the distribution and position of each primer. PrimerMapper (http://dohalloran.github.io/PrimerMapper/) can automate sequence retrieval and primer design for any number of sequences while returning maps of primers along each sequence that can be visualized as image files or within a browser (see Fig. 1 for overview).
The user interface (Fig. 2) is divided into three components that can be executed independently. The first section is for general primer design from DNA sequences in fastA format21 that can be uploaded remotely from NCBI’s GenBank22 by clicking the “Get Sequence” button (Fig. 2, circle 1) or locally by clicking the “Load File” button (Fig. 2, circle 2). For remote retrieval, PrimerMapper uses the e-utilities feature from NCBI (Supplementary Table S1). Sample sequence accession numbers are populated in the “Get Sequence” textbox – these accession numbers can be deleted by the user and updated. The Primer design criteria are populated in the textboxes within the “PRIMER DESIGN” frame (Fig. 2, circle 3). PrimerMapper will collect the primer design criteria defined by the user such as primer maximum and minimum lengths, GC% and melting temperature (Tm), as well as the five-prime (5′) and three-prime (3′) search windows. The search windows are the areas across which PrimerMapper will scan for appropriate primers that meet the user’s requirements; for example, a five-prime (5′) search area of ‘150’, will result in PrimerMapper searching the first 150bp of each sequence for an appropriate primer sequence. PrimerMapper will then calculate hairpin and self-complementarity scores for each primer and only return primers whose scores are above a minimum threshold. Thresholds were determined by PCR validation experiments and also in silico testing. By default, repetitive sequences are excluded from primer design – this is defined as more than 5 mononucleotide repeats or more than 4 dinucleotide repeats. However, if the user wishes to design primers from repetitive sequences, ‘Y’ can be entered in the ‘repetitive sequence’ option box (see Fig. 2). A three-prime (3′) GC clamp can also be specified by the user for each primer (this is not the case for allele specific primers). PrimerMapper also includes a primer specificity detection feature with mismatch options. To use this feature the user can enter “Y” in the “input specificity” textbox and enter the number of allowed mismatches, if any, in the “mis-matches” textbox – this feature will ensure that each primer is specific (apart from permitted mismatches) to the entire input file uploaded by the user. After all the primer design parameters are completed by the user, PrimerMapper is then executed by clicking the button within the “RUN” frame (Fig. 2, circle 4). Once all criteria are met, PrimerMapper will print the primer sequence and features to a file and also generate text based files that mark the position and length of each primer within each sequence. These positional text files are used to generate the graphical maps for each sequence. If the user clicks the “Multiplex PCR dimer scores” button, PrimerMapper also implements a combinations without replacements algorithm (n choose k: equation 1) for all primers (both forward and reverse) to calculate cross-complementarity primer-dimer scores. The user must start with “1: Design Primers”, followed by “2: Multiplex PCR dimer scores” or “3: Clean-up”. The “Clean-up” button can be clicked at the end by the user to remove temporary files from the current working directory used in the generation of graphical outputs.
The second section of the interface is the SNP INPUT component. Similar to section 1 above, the data can be uploaded locally in rs_fastA format (Fig. 2, circle 5) or retrieved remotely (Fig. 2, circle 6) from NCBI’s dbSNP23. SNP sequence data is preprocessed by PrimerMapper to collect the SNP position and type from the header field. Next, the primer design criteria are populated within the “PRIMER DESIGN” frame (Fig. 2, circle 7), followed by program execution after clicking buttons 1 to 3 within the “RUN” frame (Fig. 2, circle 8). PrimerMapper’s SNP primer design includes the design of allele-specific primers. Allele-specific PCR is a PCR-based method used to detect known SNPs24,25. In this protocol, the specific primers are designed to permit amplification by DNA polymerase only if the nucleotide at the 3′-end of the primer perfectly complements the base at the wildtype or polymorphic site.
The third section is the BLAST26 window to NCBI databases at the bottom of the interface (Fig. 2, circle 9); here the user can input any number of fastA formatted primers and BLAST against a specific database at NCBI (Fig. 2, circle 10). The database is selected by the user and includes the following: nucleotide collection, genomic human, genomic others, EST others, EST human, EST mouse. The results of the BLAST analysis are printed to a text file in the current working directory. The BLAST feature is driven by a separate script called ‘web_blast.pl’ that should be in the same PATH or directory as the driver script mentioned above.
An example of a primer sequence map generated by PrimerMapper for DNA based sequences in fastA format is shown in Fig. 3. The primers designed to each inputted sequence (in this case only two sequences), is converted to a graphical output with each primer represented by a blue glyph arrowed in the direction of synthesis (Fig. 3a,b). Each primer is named by its starting position and the fastA header from each sequence is at the top of the scaled graphic. The number line represents a scaled version of the input sequence in base pairs. PrimerMapper can also generate a concatenated map that assembles each sequence (adopting the order followed in the inputted file) and its designed primers into a single map so as to quickly and easily validate the distribution of primers across all sequences (Fig. 3c). This approach is also implemented for SNP based sequences (Fig. 4a–c) where primers spanning the SNP (which is denoted by a green symbol – see Fig. 4a) are mapped onto their derived sequence. A single assembled view of all inputted SNP sequences and their primers can also be generated from SNP data by PrimerMapper (Fig. 4c).
PrimerMapper also generates tab separated value (TSV) based files that list the designed primers and their corresponding features from each inputted sequence (Fig. 5a). In the case of SNP data, PrimerMapper preprocesses all input by collecting the SNP location and SNP type from the rs_fastA formatted input. These features are highlighted in Fig. 5b by red rectangles. If the SNP sequence data is not remotely retrieved from dbSNP, the user can locally upload their own data; however, the header for each sequence must be unique and contain the SNP location and sequence length as indicated within the red box i.e. “pos = 501|len1001”. Furthermore, the standard SNP sequence format must be adopted for the SNP type, e.g. “R” or “Y”, inserted within each sequence where standard IUPAC notation27 is applied: R = A/G, Y = C/T, M = A/C, K = G/T, W = A/T, S = C/G, B = C/G/T, D = A/G/T, H = A/C/T, V = A/C/G, N = A/C/G/T. As well as generating a TSV file similar to Fig. 5a, PrimerMapper will also generate a TSV file from SNP data similar to Fig. 5c depicting the allele specific primers for each SNP and the appropriate wildtype and polymorphic primer specific to each SNP, as well as the basic features for each primer e.g. sequence header, Tm, Self-complementarity score (score should be below 10), hairpin score (ΔG score closer to zero is better) and GC%.
Browser visualization of primer map
Validation tests were performed for PrimerMapper via in silico analysis as well as PCR based experimentation. PCR based tests are shown in Fig. 7a and the corresponding primer sequences are listed in Table 1. PCR reactions using primers designed by PrimerMapper that span the C. elegans transcript, ZK5204a, were performed and the resulting products are presented in Fig. 7a. In each case bands of the correct size were obtained from each PCR. Run-time testing was also performed for PrimerMapper (Fig. 7b); run-time tests using files containing various numbers of sequences (2, 10, 20, 50, 100, 150, 200 and 1,000) were provided as input to PrimerMapper and ran using default settings to generate primer files and text based positional files by executing the first step in the “RUN” frame of DNA based local fastA formatted sequences (see Fig. 2, yellow circle number 4). The relationship between run-time and sequence number was best fit with a quadratic equation. To compare the primer melting temperatures calculated by PrimerMapper with other algorithms, we generated 100 random primers that varied in size from 18-30bps and plotted their melting temperature against the melting temperatures obtained using other algorithms (Fig. 7c–f). Firstly, the melting temperature obtained by PrimerMapper was compared with the NEB calculator for NEB Taq DNA polymerase (Fig. 7c; r2 = 0.979). The NEB calculator uses the algorithm defined by SantaLucia28 and is salt corrected as described by Owczarzy et al.29 for Taq DNA polymerase buffer. Next, we compared PrimerMapper with the NEB calculator for NEB Phusion® polymerase (Fig. 7d; r2 = 0.99); the algorithm for Phusion® polymerase is defined by Breslauer et al.30 and salt corrected to the appropriate Phusion® polymerase buffer conditions as described by Schildkraut31. The melting temperature correlations are also plotted for PrimerMapper versus NEB Q5® Hi-Fi polymerase (Fig. 7e; r2 = 0.94), which uses the algorithm by SantaLucia28 and is salt corrected as described by Owczarzy et al.29 for the Q5® buffer system. Finally, the primer melting temperatures for PrimerMapper were compared to that of Primer316 (Fig. 7f; r2 = 0.985) which by default uses the algorithm by Breslauer et al.30. In each case, there were robust correlations observed between the melting temperatures calculated by PrimerMapper and each algorithm, with the highest correlation observed for PrimerMapper to that of the NEB calculator for NEB Phusion® polymerase (Fig. 7d; r2 = 0.99).
In order to efficiently design large numbers of primers from different data types, automated sequence retrieval and processing becomes critical. Of equal importance is an ability to quickly scan and validate the coverage and position of designed primers. Many primer design tools such as Primer316, BatchPrimer39, Primer-Blast20, PrimerDesign-M32,33, PrimerView34 and PerlPrimer14 provide effective ways to analyze large datasets. PrimerMapper builds upon these developments to provide a central resource that combines the key features of these tools while also offering a new layer of design and visualization that are not offered by any other primer design tools. To automate the process of bulk primer design, PrimerMapper offers sequence retrieval and processing options from GenBank and dbSNP. In order to quickly validate the density and coverage of designed primers, PrimerMapper returns primer maps for each sequence as an image file and also generates a single concatenated primer map of all sequences with their derived primers; this latter feature is not offered by other software and provides a fast and effective way to examine primer distribution across contigs or linked genes for primer walking or sequencing experiments. Another unique innovation of PrimerMapper is the ability to view these primer maps within a browser where new primers can quickly and easily be drawn by the user directly onto the webpage. This feature allows the user to design primers that may flank a known SNP or alternatively generate new primers in unique positions or relative to other primers for nested PCR experiments. PrimerMapper’s interface offers numerous other features including remote BLAST options similar to the software, Primer-Blast20, as well as generating primer dimer scores for each primer pairing. Taken together, PrimerMapper attempts to bring together key features from numerous primer design tools into a single program while adding new layers of design that enable primer design en masse from any number of DNA or SNP sequences.
where, n is the set of all primers designed by PrimerMapper and k is the number of primers chosen (i.e. 2) for each k-combination. This implementation slows down substantially as the number of primers designed by PrimerMapper increases (or as the number of sequences increases). Features of the basic algorithm for PrimerMapper are shown in detail in Fig. 1, some of which have been described previously17,19,34 including calculations for primer Tm8,19 which were selected based upon validation experiments performed using primers designed using these calculations (Fig. 7c–f).
DNA isolation and PCR
Caenorhabditis elegans wildtype (N2) strain was maintained at 20 °C using standard procedures on NGM plates seeded with E. coli strain OP5036. C. elegans DNA was isolated by harvesting a mixed population of animals collected in a 1.5 ml tube. 200 μl of lysis buffer (60 g/ml proteinase K, 10 mM Tris-Cl, pH 8.3, 50 mM KCl, 2.5 mM MgCl2, 0.45% IGEPAL, 0.45% Tween−20, 0.01% gelatin) was added to the tube and then placed in a freezer at −80 °C for 10 mins followed by incubation at 60 °C for 1 hr followed by 95 °C for 15 mins. Tubes were then centrifuged at 13,000 rpm for 1 min in a bench top centrifuge and 50 μl gDNA supernatant isolated for PCR. Primer pairs used for C. elegans PCR reactions are displayed in Table 1 and PCR reactions performed with Taq DNA polymerase from NEB (M0273L) using the following cycling conditions: 95 °C for 2 mins, 95 °C for 30 sec, 55 °C for 30 sec and 72 °C extension for 1min for 35 cycles. The final PCR products were electrophoresed on 1.5% agarose gels.
How to cite this article: O’Halloran, D. M. PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection. Sci. Rep. 6, 20631; doi: 10.1038/srep20631 (2016).
Saiki, R. K. et al. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487–491 (1988).
Kamachi, K. et al. Development and evaluation of a loop-mediated isothermal amplification method for rapid diagnosis of Bordetella pertussis infection. J. Clin. Microbiol. 44, 1899–1902 (2006).
Heckman, K. L. & Pease, L. R. Gene splicing and mutagenesis by PCR-driven overlap extension. Nat. Protoc. 2, 924–932 (2007).
Lee, J., Shin, M. K., Ryu, D. K., Kim, S. & Ryu, W. S. Insertion and deletion mutagenesis by overlap extension PCR. Methods Mol. Biol. 634, 137–146 (2010).
Hatano, B. et al. LAMP using a disposable pocket warmer for anthrax detection, a highly mobile and reliable method for anti-bioterrorism. Jpn. J. Infect. Dis. 63, 36–40 (2010).
Notomi, T. et al. Loop-mediated isothermal amplification of DNA. Nucleic Acids Res. 28, E63 (2000).
Vogelstein, B. & Kinzler, K. W. Digital PCR. Proc. Natl. Acad. Sci. USA. 96, 9236–9241 (1999).
Rychlik, W. & Rhoads, R. E. A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA. Nucleic Acids Res. 17, 8543–8551 (1989).
You, F. M. et al. BatchPrimer3: a high throughput web application for PCR and sequencing primer design. BMC Bioinformatics 9, 253-2105-9-253 (2008).
Marshall, O. Graphical design of primers with PerlPrimer. Methods Mol. Biol. 402, 403–414 (2007).
Torres, C. et al. LAVA: an open-source approach to designing LAMP (loop-mediated isothermal amplification) DNA signatures. BMC Bioinformatics 12, 240-2105-12-240 (2011).
Qu, W., Shen, Z., Zhao, D., Yang, Y. & Zhang, C. MFEprimer: multiple factor evaluation of the specificity of PCR primers. Bioinformatics 25, 276–278 (2009).
Qu, W. et al. MFEprimer-2.0: a fast thermodynamics-based program for checking PCR primer specificity. Nucleic Acids Res. 40, W205–8 (2012).
Marshall, O. J. PerlPrimer: cross-platform, graphical primer design for standard, bisulphite and real-time PCR. Bioinformatics 20, 2471–2472 (2004).
Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).
Untergasser, A. et al. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 35, W71–4 (2007).
O’Halloran, D. M. STITCHER: a web resource for high-throughput design of primers for overlapping PCR applications. BioTechniques 58, 325 (2015).
Contreras-Moreira, B., Sachman-Ruiz, B., Figueroa-Palacios, I. & Vinuesa, P. primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies. Nucleic Acids Res. 37, W95–W100 (2009).
Li, K. et al. Novel computational methods for increasing PCR primer design effectiveness in directed sequencing. BMC Bioinformatics 9, 191-2105-9-191 (2008).
Ye, J. et al. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13, 134-2105-13-134 (2012).
Lipman, D. J. & Pearson, W. R. Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985).
Benson, D. A. et al. & GenBank . Nucleic Acids Res. 41, D36–42 (2013).
Sherry, S. T., Ward, M. & Sirotkin, K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).
Newton, C. R. et al. Analysis of any point mutation in DNA. The amplification refractory mutation system (ARMS). Nucleic Acids Res. 17, 2503–2516 (1989).
Okayama, H., Curiel, D. T., Brantly, M. L., Holmes, M. D. & Crystal, R. G. Rapid, nonradioactive detection of mutations in the human genome by allele-specific amplification. J. Lab. Clin. Med. 114, 105–113 (1989).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Cornish-Bowden, A. Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13, 3021–3030 (1985).
SantaLucia, J. Jr. A unified view of polymer, dumbbell and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA. 95, 1460–1465 (1998).
Owczarzy, R. et al. Effects of sodium ions on DNA duplex oligomers: improved predictions of melting temperatures. Biochemistry 43, 3537–3554 (2004).
Breslauer, K. J., Frank, R., Blocker, H. & Marky, L. A. Predicting DNA duplex stability from the base sequence. Proc. Natl. Acad. Sci. USA. 83, 3746–3750 (1986).
Schildkraut, C. Dependence of the melting temperature of DNA on salt concentration. Biopolymers 3, 195–208 (1965).
Yoon, H. & Leitner, T. PrimerDesign-M: a multiple-alignment based multiple-primer design tool for walking across variable genomes. Bioinformatics 31, 1472–1474 (2015).
Brodin, J. et al. A multiple-alignment based primer design algorithm for genetically highly variable DNA targets. BMC Bioinformatics 14, 255-2105-14-255 (2013).
O’Halloran, D. M. PrimerView: high-throughput primer design and visualization. Source Code Biol. Med. 10, 8-015-0038-2. eCollection 2015 (2015).
Stajich, J. E. et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12, 1611–1618 (2002).
Brenner, S. The genetics of Caenorhabditis elegans. Genetics 77, 71–94 (1974).
I would like to thank members of the O’Halloran lab for proof reading the manuscript and also for detailed discussions. I would like to thank The George Washington University (GWU) Columbian College of Arts and Sciences, GWU Office of the Vice-President for Research and the GWU Department of Biological Sciences for Funding.
The author declares no competing financial interests.
About this article
Cite this article
O’Halloran, D. PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection. Sci Rep 6, 20631 (2016). https://doi.org/10.1038/srep20631
Nature Protocols (2017)