PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection

Primer design represents a widely employed gambit in diverse molecular applications including PCR, sequencing, and probe hybridization. Variations of PCR, including primer walking, allele-specific PCR, and nested PCR provide specialized validation and detection protocols for molecular analyses that often require screening large numbers of DNA fragments. In these cases, automated sequence retrieval and processing become important features, and furthermore, a graphic that provides the user with a visual guide to the distribution of designed primers across targets is most helpful in quickly ascertaining primer coverage. To this end, I describe here, PrimerMapper, which provides a comprehensive graphical user interface that designs robust primers from any number of inputted sequences while providing the user with both, graphical maps of primer distribution for each inputted sequence, and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also enables the visualization of graphical maps within a browser and allows the user to draw new primers directly onto the webpage. Other features of PrimerMapper include allele-specific design features for SNP genotyping, a remote BLAST window to NCBI databases, and remote sequence retrieval from GenBank and dbSNP. PrimerMapper is hosted at GitHub and freely available without restriction.

PCR is a widely employed and indispensable tool for an extensive and ever-growing number of molecular applications 1 . These applications include protocols in diverse fields such as biomedical research, forensic science, as well as phylogenetic analysis [2][3][4][5] . PCR has also been repeatedly tinkered, resulting in creative and versatile modifications. Some of these variants include multiplex PCR, nested PCR, primer walking, DNA cloning, assembly PCR, overlapping PCR, allele-specific PCR, loop-mediated isothermal amplification (LAMP) 6 , and digital PCR 7 . In each case, primer design is the first step in building robust and effective PCR-based experiments. Effective primer design requires numerous calculations, including melting temperature (T m ), GC%, self-complementarity, and hairpin formation, which are each derived from a candidate primer's size and sequence. Manual primer design is laborious and time-consuming and as a result, automated primer design has become a requisite tool in the PCR arsenal [8][9][10][11][12][13][14][15][16][17][18][19][20] .
The diversification of PCR-based methodologies coupled with a rapid expansion in available genomic data has put further demands on the scale and utility of primer design software. In particular, batch primer design has become an important feature of primer design tools to accommodate screening large numbers of genes and datasets. However, when designing primers for large numbers of sequences, automated sequence retrieval and processing becomes important, and furthermore, graphical outputs depicting the map of sequence/primer position is often the best way to rapidly validate primer coverage and distribution. To meets these demands, I describe here a program called PrimerMapper that provides a comprehensive graphical user interface that facilitates the design of robust primers from any number of inputted sequences while providing the user with graphical outputs of primer maps to each inputted sequence, and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also permits the user to visualize each primer sequence map in a browser and allows the user to draw new primers in specific locations directly onto the webpage. Other features of PrimerMapper include pendently. The first section is for general primer design from DNA sequences in fastA format 21 that can be uploaded remotely from NCBI's GenBank 22 by clicking the "Get Sequence" button ( Fig. 2, circle 1) or locally by clicking the "Load File" button ( Fig. 2, circle 2). For remote retrieval, PrimerMapper uses the e-utilities feature from NCBI (Supplementary Table S1). Sample sequence accession numbers are populated in the "Get Sequence" textbox -these accession numbers can be deleted by the user and updated. The Primer design criteria are populated in the textboxes within the "PRIMER DESIGN" frame ( Fig. 2, circle 3). PrimerMapper will collect the primer design criteria defined by the user such as primer maximum and minimum lengths, GC%, and melting temperature (T m ), as well as the five-prime (5′ ) and three-prime (3′ ) search windows. The search windows are the Figure 1. Flowchart of the PrimerMapper process. Data can be uploaded into PrimerMapper from a local file or from NCBI's GenBank 22 or dbSNP 23 databases. SNP sequence data is preprocessed by PrimerMapper prior to execution to collect the SNP position and type from the header field, and also to remove whitespace from the header field so as to preserve the entire sequence name. The user sets the various parameters for primer design for DNA sequence or SNP sequences such as primer maximum and minimum lengths, GC%, melting temperature (T m ), etc. PrimerMapper will then calculate hairpin and self-complementarity scores and only return primers whose scores meet specific thresholds. Repetitive sequence consisting of 5 or more mononucleotide repeats or more than 4 dinucleotide repeats is excluded from primer design by default (this can be changed by the user). Once all criteria are met, PrimerMapper will print the primer sequence and features to a file, and also generate files that mark the position and length of each primer within each sequence. These positional files are used to generate the graphical outputs for each sequence. For multiplex PCR reactions, it is useful to examine potential cross-complementarity between all primer pairs-to achieve this PrimerMapper can also implement a combinations without replacements algorithm (n choose k) for all primers to calculate crosscomplementarity primer-dimer scores. Primer sequence(s) can then be inputted into the BLAST window to generate BLAST reports for each primer at a user-defined NCBI database. areas across which PrimerMapper will scan for appropriate primers that meet the user's requirements; for example, a five-prime (5′ ) search area of '150' , will result in PrimerMapper searching the first 150bp of each sequence for an appropriate primer sequence. PrimerMapper will then calculate hairpin and self-complementarity scores for each primer, and only return primers whose scores are above a minimum threshold. Thresholds were determined by PCR validation experiments and also in silico testing. By default, repetitive sequences are excluded from primer design -this is defined as more than 5 mononucleotide repeats or more than 4 dinucleotide repeats. However, if the user wishes to design primers from repetitive sequences, 'Y' can be entered in the 'repetitive sequence' option box (see Fig. 2). A three-prime (3′ ) GC clamp can also be specified by the user for each primer (this is not the case for allele specific primers). PrimerMapper also includes a primer specificity detection feature with mismatch options. To use this feature the user can enter "Y" in the "input specificity" textbox, and enter the number of allowed mismatches, if any, in the "mis-matches" textbox -this feature will ensure that each primer The PrimerMapper user interface is divided into three sections that can be executed independently. The first section is for general primer design from DNA sequences in fastA format that can be uploaded by clicking the "Get Sequences" button which will collect fastA formatted sequences from NCBI's GenBank (yellow circle, number 1), or locally by clicking the "Load File" button (yellow circle, number 2). The Primer design criteria are populated in the textboxes within the "PRIMER DESIGN" frame (yellow circle, number 3). Primer design then begins by clicking the buttons within the "RUN" frame (yellow circle, number 4). The user must start with "1: Design Primers", followed by "2: Multiplex PCR dimer scores" or "3: Clean-up". The second section of the interface is for "SNP INPUT". Similar to the first section, the data can be uploaded locally (yellow circle, number 5) or remotely from dbSNP by entering appropriate accession number(s) (yellow circle, number 6). Next the primer design criteria are populated within the "PRIMER DESIGN" frame (yellow circle, number 7), followed by program execution by clicking the buttons 1 to 3 in the "RUN" frame (yellow circle, number 8). The third section is the BLAST window at the bottom of the interface (yellow circle, number 9); here the user can input any number of fastA formatted primer sequences and BLAST against a specific database at NCBI (yellow circle, number 10). The results from the BLAST report are printed to a text file in the current working directory.
is specific (apart from permitted mismatches) to the entire input file uploaded by the user. After all the primer design parameters are completed by the user, PrimerMapper is then executed by clicking the button within the "RUN" frame ( Fig. 2, circle 4). Once all criteria are met, PrimerMapper will print the primer sequence and features to a file, and also generate text based files that mark the position and length of each primer within each sequence. These positional text files are used to generate the graphical maps for each sequence. If the user clicks the "Multiplex PCR dimer scores" button, PrimerMapper also implements a combinations without replacements algorithm (n choose k: equation 1) for all primers (both forward and reverse) to calculate cross-complementarity primer-dimer scores. The user must start with "1: Design Primers", followed by "2: Multiplex PCR dimer scores" or "3: Clean-up". The "Clean-up" button can be clicked at the end by the user to remove temporary files from the current working directory used in the generation of graphical outputs.
The second section of the interface is the SNP INPUT component. Similar to section 1 above, the data can be uploaded locally in rs_fastA format (Fig. 2, circle 5) or retrieved remotely (Fig. 2, circle 6) from NCBI's dbSNP 23 . SNP sequence data is preprocessed by PrimerMapper to collect the SNP position and type from the header field. Next, the primer design criteria are populated within the "PRIMER DESIGN" frame ( Fig. 2, circle 7), followed by program execution after clicking buttons 1 to 3 within the "RUN" frame ( Fig. 2, circle 8). PrimerMapper's SNP primer design includes the design of allele-specific primers. Allele-specific PCR is a PCR-based method used to detect known SNPs 24,25 . In this protocol, the specific primers are designed to permit amplification by DNA polymerase only if the nucleotide at the 3′ -end of the primer perfectly complements the base at the wildtype or polymorphic site.
The third section is the BLAST 26 window to NCBI databases at the bottom of the interface (Fig. 2, circle 9); here the user can input any number of fastA formatted primers and BLAST against a specific database at NCBI (Fig. 2, circle 10). The database is selected by the user and includes the following: nucleotide collection, genomic human, genomic others, EST others, EST human, EST mouse. The results of the BLAST analysis are printed to a text file in the current working directory. The BLAST feature is driven by a separate script called 'web_blast.pl' that should be in the same PATH or directory as the driver script mentioned above. Output files. An example of a primer sequence map generated by PrimerMapper for DNA based sequences in fastA format is shown in Fig. 3. The primers designed to each inputted sequence (in this case only two sequences), is converted to a graphical output with each primer represented by a blue glyph arrowed in the direction of synthesis (Fig. 3a,b). Each primer is named by its starting position and the fastA header from each sequence is at the top of the scaled graphic. The number line represents a scaled version of the input sequence in base pairs. PrimerMapper can also generate a concatenated map that assembles each sequence (adopting the order followed in the inputted file) and its designed primers into a single map so as to quickly and easily validate the distribution of primers across all sequences (Fig. 3c). This approach is also implemented for SNP based sequences (Fig. 4a-c) where primers spanning the SNP (which is denoted by a green symbol -see Fig. 4a) are mapped onto their (c) PrimerMapper also generates a concatenated graphic that assembles each sequence (adopting the order followed in the inputted file) and its designed primers into a single view so as to quickly and easily validate the distribution of primers across all sequences.
Scientific RepoRts | 6:20631 | DOI: 10.1038/srep20631 derived sequence. A single assembled view of all inputted SNP sequences and their primers can also be generated from SNP data by PrimerMapper (Fig. 4c).
PrimerMapper also generates tab separated value (TSV) based files that list the designed primers and their corresponding features from each inputted sequence (Fig. 5a). In the case of SNP data, PrimerMapper preprocesses all input by collecting the SNP location and SNP type from the rs_fastA formatted input. These features are highlighted in Fig. 5b by red rectangles. If the SNP sequence data is not remotely retrieved from dbSNP, the user can locally upload their own data; however, the header for each sequence must be unique and contain the SNP location and sequence length as indicated within the red box i.e. "pos = 501|len1001". Furthermore, the standard SNP sequence format must be adopted for the SNP type, e.g. "R" or "Y", inserted within each sequence where standard IUPAC notation 27 Fig. 5a, PrimerMapper will also generate a TSV file from SNP data similar to Fig. 5c depicting the allele specific primers for each SNP and the appropriate wildtype and polymorphic primer specific to each SNP, as well as the basic features for each primer e.g. sequence header, T m , Self-complementarity score (score should be below 10), hairpin score (Δ G score closer to zero is better), and GC%.
Browser visualization of primer map. PrimerMapper will generate positional text files that are configured for browser visualization using the script 'cols_to_rows.pl' . This script must be placed in the current working directory or specified PATH. A folder called "Browser_PrimerMapper" will be generated in the current working directory by PrimerMapper and a file for each primer/sequence map is placed inside this folder. These files can be opened from within "primermapper.html" under 'choose file' (Fig. 6, circle 1). Primers can be visualized as arrows or lines, and also a "magnifying glass" checkbox can be ticked so as to magnify the primer names by moving the mouse over the primers within the primer-map canvas (Fig. 6, circle 2). The user can then click the "submit" button and the primer/sequence map is generated, which allows the user to get the base pair position along the sequence by moving the mouse under the sequence map (Fig. 6, circle 3). The user can also draw new primers directly onto the canvas (Fig. 6, circle 4) which will return a JavaScript popup box with the DNA sequence, GC%, and primer T m values of the sequence that corresponds to the line drawn by the user (Fig. 6, circle 5). If the line is drawn from left to right, a forward primer is returned, and if the line is drawn from right to left, a reverse complemented primer is returned in the popup box. The primers are graphically represented as arrows in Fig. 6 (circle 6). This feature not only provides a graphical output to visualize the primer/sequence map but also allows the user to draw new primers in specific positions, for example, perhaps flanking a SNP, or perhaps relative to others primers within the primer map for nested PCR experiments.
Testing. Validation tests were performed for PrimerMapper via in silico analysis as well as PCR based experimentation. PCR based tests are shown in Fig. 7a, and the corresponding primer sequences are listed in Table 1. PCR reactions using primers designed by PrimerMapper that span the C. elegans transcript, ZK5204a, were performed, and the resulting products are presented in Fig. 7a. In each case bands of the correct size were obtained from each PCR. Run-time testing was also performed for PrimerMapper (Fig. 7b); run-time tests using files containing various numbers of sequences (2,10,20,50,100,150,200, and 1,000) were provided as input to PrimerMapper, and ran using default settings to generate primer files and text based positional files by executing the first step in the "RUN" frame of DNA based local fastA formatted sequences (see Fig. 2, yellow circle number 4). The relationship between run-time and sequence number was best fit with a quadratic equation. To compare the primer melting temperatures calculated by PrimerMapper with other algorithms, we generated 100 random primers that varied in size from 18-30bps and plotted their melting temperature against the melting temperatures obtained using other algorithms (Fig. 7c-f). Firstly, the melting temperature obtained by PrimerMapper was compared with the NEB calculator for NEB Taq DNA polymerase ( Fig. 7c; r 2 = 0.979). The NEB calculator uses the algorithm defined by SantaLucia 28 and is salt corrected as described by Owczarzy et al. 29 for Taq DNA polymerase buffer. Next, we compared PrimerMapper with the NEB calculator for NEB Phusion ® polymerase ( Fig. 7d; r 2 = 0.99); the algorithm for Phusion ® polymerase is defined by Breslauer et al. 30 and salt corrected to In the case of SNP data, PrimerMapper pre-processes all input by collecting the SNP location and SNP type from the rs_fastA formatted input. These features are highlighted by red rectangles. If the SNP sequence data is not remotely retrieved from dbSNP 23 , the user can locally upload data, however, the header for each sequence must be unique and contain the SNP location and sequence length as formatted within the red rectangle i.e. "pos = 501|len1001". Furthermore, the standard SNP sequence format must be adopted with the SNP type e.g. "R" or "Y" etc. inserted within each sequence. (c) PrimerMapper returns a TSV file from SNP data depicting the allele-specific primers for each SNP and the appropriate wildtype and polymorphic primers specific to each SNP, as well as their basic features i.e. sequence header, T m , Selfcomplementarity score, Δ G hairpin score, and GC%. the appropriate Phusion ® polymerase buffer conditions as described by Schildkraut 31 . The melting temperature correlations are also plotted for PrimerMapper versus NEB Q5 ® Hi-Fi polymerase ( Fig. 7e; r 2 = 0.94), which uses the algorithm by SantaLucia 28 and is salt corrected as described by Owczarzy et al. 29 for the Q5 ® buffer system. Finally, the primer melting temperatures for PrimerMapper were compared to that of Primer3 16 ( Fig. 7f; r 2 = 0.985) which by default uses the algorithm by Breslauer et al. 30 . In each case, there were robust correlations observed between the melting temperatures calculated by PrimerMapper and each algorithm, with the highest correlation observed for PrimerMapper to that of the NEB calculator for NEB Phusion ® polymerase ( Fig. 7d;

Conclusions
In order to efficiently design large numbers of primers from different data types, automated sequence retrieval and processing becomes critical. Of equal importance is an ability to quickly scan and validate the coverage and position of designed primers. Many primer design tools such as Primer3 16 , BatchPrimer3 9 , Primer-Blast 20 , PrimerDesign-M 32,33 , PrimerView 34 and PerlPrimer 14 provide effective ways to analyze large datasets. PrimerMapper builds upon these developments to provide a central resource that combines the key features of these tools while also offering a new layer of design and visualization that are not offered by any other primer design tools. To automate the process of bulk primer design, PrimerMapper offers sequence retrieval and processing options from GenBank and dbSNP. In order to quickly validate the density and coverage of designed primers, PrimerMapper returns primer maps for each sequence as an image file, and also generates a single concatenated primer map of all sequences with their derived primers; this latter feature is not offered by other software and provides a fast and effective way to examine primer distribution across contigs or linked genes for primer walking or sequencing experiments. Another unique innovation of PrimerMapper is the ability to view these primer maps within a browser where new primers can quickly and easily be drawn by the user directly onto the webpage. This feature allows the user to design primers that may flank a known SNP or alternatively generate new primers in unique positions or relative to other primers for nested PCR experiments. PrimerMapper's interface offers numerous other features including remote BLAST options similar to the software, Primer-Blast 20 , as Figure 6. Visualization of primer map in a browser. A sample graphical output generated by PrimerMapper for browser visualization is shown. A canvas text file for each sequence is generated by PrimerMapper and these files can then be opened directly in a browser by clicking the "primermapper.HTML" file and loading the data. A sample configured text file is included in the download called "PrimerMapper_Sample.txt" which can be directly loaded into "primermapper.HTML" (circle 1). Primers can be visualized as arrows or lines and also a 'magnifying glass' can be checked so as to magnify the primer names by moving the mouse over the primers within the primermap canvas (circle 2). The user can also get the base pair position along the sequence (circle 3) by moving the mouse under the sequence map. The user can also draw new primers on the canvas (circle 4) which will return a JavaScript popup box with the DNA sequence, GC%, and primer T m values of the sequence corresponding to the line drawn by the user (circles 4 and 5). If the line is drawn from left to right, a forward primer is returned, and if the line is drawn from right to left, a reverse complemented primer is returned in the popup box. The user can also click the 'User Guide' button to get an overview of each feature.  Supplementary Table S1. PrimerMapper is a package with a constructor subroutine called "new" that enables the user to execute the module by instantiating a PrimerMapper object. Subroutines from PrimerMapper are exported from the package into a driver script called 'PrimerMapper_driver.pl' . Browser visualization of the primer map has been validated using Google Chrome and requires the script called 'cols_to_rows.pl' to be in the same PATH or directory as the driver script mentioned above (PrimerMapper_driver.pl). To examine potential cross-complementarity between all primer pairs, PrimerMapper implements a combinations without replacements algorithm (n choose k) for all primers (both