Assessing Sus scrofa diversity among continental United States, and Pacific islands populations using molecular markers from a gene banks collection


Human migration and trade facilitated domesticated livestock movement, gene flow and development of diverse populations upon which agriculture is based. In addition, varying USA ecological conditions has led to a diverse set of livestock populations to utilize. Quantifying genetic diversity of these populations is incomplete. This paper quantifies genetic diversity captured by the National Animal Germplasm Program and explores genetic structure and differences among 19 pig populations (feral populations from Pacific islands, continental US, and Chinese breeds) using 70,231 SNP from 500 animal samples. Among continental US breeds Fis was consistently low suggesting genetic variability is sufficiently available for breeders to use. A unique population structure using principal component analysis illustrated clear distinctions between Duroc, Yorkshire, Hampshire, breeds of Chinese origin, and feral Pacific Island populations were identified. Five Y chromosome haplotypes were evaluated and demonstrated migration patterns from European, central Asia, and potentially Polynesian waves of gene flow. Quantifying diversity and potential origin of Pacific populations provides insight for future uses, and the need for preservation. Viewing gene bank holdings in context of diversity measures we found a lack of inbreeding within breeds, suggesting the collection represents a wide sampling of individual breeds.


The objective of this study was to quantify the genetic diversity of a substantial number of pig breeds found in the United States (US) as previous assessments tend to be focused on Europe and Asia1,2,3,4. Specifically the assessment provides insights to the genetic variability captured in the USDA’s swine germplasm collection5,6. The study evaluated major breeds, often termed international commercial breeds, rare breeds, Chinese breeds imported into the U.S. during the late 1980’s, and feral populations from the Pacific islands of Hawaii, Kauai, and Guam. It has been hypothesized that such feral populations may be harbingers of genetic diversity not found in other US populations given the potential for either Asian or Polynesian ancestry7.

Current feral pig populations on the Hawaii and Kauai are the product of a migration wave associated with Polynesian settlers followed by European explorers during the 1700’s7,8 and additional domestic pigs gone feral upon the chain of Hawaiian Islands. McCann et al.8 suggested animals sampled in the Hawaiian Islands originated from Europe, China, Papua-New Guinea, Vanuatu, and shared haplotypes with other domestic-feral pigs and S. verrucosus (from Java). Larson et al.9 suggest Hawaiian Islands pigs as members of the Pacific clade. No analysis has been performed on Guam’s feral population. But as part of the Mariana island chain, feral pigs might belong to an East Asia clade9.

Breed development occurred in three geographic areas10: England, China, and the U.S. Obviously, U.S. breeds were imported or derived from other parts of the world at varying points in time; however, the imports have been used to maintain the original breed or construct new breeds like Duroc, Hereford, Ossabaw Island, Spots/Poland China, and Chester White. The Duroc in particular is an interesting example as its foundation breeds have not been shown to have strong genetic linkages with other European breeds1,3. These authors have suggested that the breed may have been comprised, in part, of Guinea Hog from Africa (among other breeds) due to its red color.

A better understanding of the genetic diversity among these U.S. populations has a number of uses, including: general management of genetic diversity by public and private sectors, targeting future germplasm collections, and providing input into policies concerning feral pig populations on Pacific islands.


Genetic Diversity

Basic marker statistics of the GGP Porcine HD v1 (57,668 SNPs) have shown that more than half of the markers were polymorphic for feral and rare breeds/genetic groups (Table S1) suggesting the GGP Porcine HD v1 has utility in evaluating pig populations not considered in its development. The Mangalitsa (MA) and Chinese populations had the lowest number of polymorphic SNPs, but the MA and Chinese populations had small sample sizes and/or a genetic bottleneck due to the limited importation. Despite small sample size of the other minor breeds, their average heterozygosity was similar to commercial breed estimates.

The calculated genetic diversity parameters (Table 1) were performed using a smaller SNP panel (8,764) and were uniform among the commercial breeds. Observed heterozygosity for Yorkshire (YK), Landrace (LA), and Duroc (DU) were 0.375, 0.397 and 0.372, respectively. The minor breeds Hereford and Tamworth had similar levels of heterozygosity as the commercial breeds. While other minor breeds may have had higher levels of heterozygosity their limited sample size makes any conclusion problematic. The Pacific islands populations and Chinese breeds had similar observed and expected heterozygosity as the commercial US breeds.

Table 1 Sampling information and genetic diversity parameters of 19 pig populations.

Overall, the levels of inbreeding in most of the populations evaluated were low indicating the high genetic diversity levels among the in-situ population captured in the repository by the germplasm conservation program. The feral population on HI had the highest inbreeding level (0.081) which is higher than the inbreeding levels for KI (−0.017) and GI (0.011), which are geographically smaller islands.

Population differentiation, as measured by pairwise FST, was non-significant for only three comparisons MA vs LB, MA vs OI, and MA vs GH (Table 2). Highest FST levels were associated with rare breeds, Pacific island, and Chinese populations and the lowest FST values were found among the CW, YK, and LA, along with HE and DU comparisons (Table 2).

Table 2 Pairwise FST values between 19 pig populationsa.

Genetic Structure

The first three principal components (PC) accounted for 39.6% of the variation (Fig. 1). The primary PC explained 20% of total variance and provided a clear separation of the Chinese, Pacific island, and continental breeds. The second and third PCs partitioned the continental breeds, seemingly based upon skin pigmentation (black, white and red) and function (maternal vs sire breeds). Specifically, the second and third PC placed DU, YK, and HS breeds at extreme positions of the second and third PC axis. In between the extreme PC positions’ there was a large grouping of breeds. Interestingly, the Pacific island populations (HI, KI and GI) were positioned closer to Minzhu (MI) and the other Chinese breeds (ME and FE).

Figure 1

Population structure of all US, Pacific Islands, and China populations analyzed revealed by Principal component analysis. The graph represents a 3D scatterplot of the 3 firsts PCAs, explaining 20.34, 10.76, and 8.47% of the genetic variance among the populations, respectively. BE – Berkshire, CW – Chester White, DU – Duroc, FE – Fengjing, GI – Guam Island, GH – Guinea Hog, HS – Hampshire, HI – Hawaii island, KI – Kauai island, HF – Hereford, LA – Landrace, LB – Large Black, MA – Mangalitsa, ME – Meishan, MI – Minzhu, OI – Ossabaw Island, SP – Spotted, TA – Tamworth, YK – Yorkshire.

ADMIXTURE was run with K = 2 to 21 and those results were in agreement with PCA. Cross validation error was used to assess which K provided insights into population structure. That said from K = 7 to K = 21 the cross validation curve was asymptotic with an approximate 1.5% change per K. Therefore we selected K of 2, 3, 7, 8, and 10 for evaluation. At K = 2 the breeds were partitioned between Meishan (ME), originating in China, and those from the continental US. Admixed were the populations from GI, HI, KI, MI, and FE (Fig. 2). By K = 3 DU and YK emerged as distinct populations along with ME. For K = 7, 8 and 10 structure among the major breeds, plus HF, KI, HI, ME and FE was evident. At K = 17, ME–U was distinct from ME-I and ME–MARC populations. As larger K’s were evaluated, additional breeds were identified as their own cluster and several breeds were partitioned into subpopulations e.g., YK and ME. The separation of the three ME subpopulations was a function of genetic drift11. However, the YK subpopulations are likely due to imported animals during the 1980’s which were widely used within that breed.

Figure 2

Population Structure of all US, Pacific Islands and China populations revealed by Admixture assignment proportions, where K is the number of assumed ancestral clusters that ranged from 2 to 21. BE – Berkshire, CW – Chester White, DU – Duroc, FE – Fengjing, GI – Guam Island, GH – Guinea Hog, HS – Hampshire, HI – Hawaii island, KI – Kauai island, HF – Hereford, LA – Landrace, LB – Large Black, MA – Mangalitsa, ME – Meishan, MI – Minzhu, OI – Ossabaw Island, SP – Spotted, TA – Tamworth, YK – Yorkshire.

To increase the admixture assessment of the smaller populations’ ADMIXTURE analysis was performed with nine breeds (Fig. 3). K equal four (CV = 0.621) was identified as the most probable number of clusters in this subset of populations. The populations of HI, KI, and MI, plus the grouping of SP, OI, GH, LB, and MA comprised the four clusters. Sample size for the rare breeds was small and may explain the lack of differentiation. However, minor breeds that tend not to be under continuous and breed wide selection can exhibit substantial amounts of admixture, as the GI population does (Fig. 3), similar to the findings with goats12,13.

Figure 3

Population Structure of minor, feral, Pacific Islands and China populations revealed by Admixture assignment proportions where K is the number of assumed ancestral clusters that ranged from 2 to 9. GI – Guam island, GH – Guinea Hog, HI – Hawaii island, KI – Kauai island, LB – Large Black, MA – Mangalitsa, MI – Minzhu, OI – Ossabaw island, and, SP – Spotted.

Y chromosome

Five different haplotypes were found among the animals sampled (Table 3 and Fig. 4). The H3 was present in 278 samples (57%). Except for LB and MA, this haplotype was observed in all breeds that came to the US from Europe. Furthermore, H3 was also the only haplotype observed in DU, YK, TA, HS, and GH breeds. The haplotypes H1 and H2 were present in pigs from the HI and KI, China, and the US mainland. The H4 haplotype was exclusive to animals from China and the Pacific islands. Pigs from Guam and Kauai islands were the only populations exhibiting the H5 haplotype. Median joining network analysis (Fig. 4B) has shown that haplotypes H4 and H5 (China and Pacific islands) are closer than the remaining three haplotypes.

Table 3 Absolute haplotype frequency by population for the five haplotypes estimated (H1-H5) using 6 SNPs on the Y chromosome.
Figure 4

(A) Frequency distribution of five Y chromosome haplotypes from sampled pig populations by geographic origin. (B) Median-joining network of Y chromosome haplotypes in all US, Pacific Islands and China populations.

Effective Population Size

As most pig breeds were developed after 1800 we report Ne for the tested groups to generation 60 (Fig. 5). The most recent generation had an Ne ranging from 51 to 112 for the tested populations. The Ne trend was linear across generations and the r2 decay rates were similar to other reports on swine14. All breeds were pooled to compute Ne on a species level, where Ne was estimated to be 236.

Figure 5

Ancestral and recent effective population size of swine populations place in 10 groups for 60 previous generations. BE – Berkshire, CW – Chester White, DU – Duroc, Chinese (ME – Meishan, MI – Minzhu, FE – Fengjing), Pacific islands (GI – Guam Island, HI – Hawaii island, KI – Kauai island), Minor breeds (GH – Guinea Hog, LB – Large Black, MA – Mangalitsa, OI – Ossabaw Island, SP – Spotted, TA – Tamworth), HF – Hereford, HS – Hampshire, LA – Landrace, YK – Yorkshire.


The SNP chip used in this study was sufficient to explore genetic diversity and structure among rare and specialized pig breeds originating from Asia and Europe even with expected ascertainment bias. Pigs have a higher linkage disequilibrium than, for example, ruminant species and this was confirmed with the final number of SNP used in the data set of 8,764 markers. By using the GGP Porcine HD v1 chip we had access to SNP located on the Y chromosome which enabled a deeper analysis of genetic variability.

Based upon the population parameters evaluated US breeds, as represented by germplasm samples in the repository, are genetically robust with regard to heterozygosity levels. Genetic diversity of repository samples were similar to European, Asian and previous US values15,16,17,18,19. Inbreeding (FIS) within breed/population were low suggesting boars sampled for the germplasm collection were lowly related and a broad range of genetic variability has been captured in the gene bank. Furthermore, FIS values were lower than the pedigree based inbreeding levels previously computed20 and Canadian values15.

Among populations, pairwise FST suggested the breeds evaluated are distinct; similar to21, using microsatellite markers (FST = 0.10 between DU and YK). As reported in the literature15, the estimates of FST within European and within Asian origin populations are smaller than among Asian and European breeds and are in agreement with ADMIXTURE analysis. Paired combinations of Fst between feral populations of the Pacific and breeds originating in China were similar and confirm the PC analysis.

The number of polymorphic SNP in the larger panel (Table S1) and within the 8,764 subset, is lower for the Pacific island populations and minor breeds (approximately one-third the number of SNP in commercial breeds). This may be due to ascertainment bias i.e. the core of the BeadChip was developed based on four commercial breeds (Duroc, Landrace, Large White, and Piétran) and a wild boar population22. On the other hand, the low number of markers should not be an issue since these markers did not suggest the presence of bottlenecks (data not shown); conversely, they highlight the high genetic diversity of these populations (high Ho and low FIS).

The PCA provides a unique view of the relationship among the 19 populations evaluated, in terms of the placement of US mainland breeds (and selection efforts which have differentiated these populations), Pacific islands, and breeds from China (Fig. 1). It suggests feral populations in the Pacific are a confluence of east and west origin and the result of various waves of human migration7,23. Typical of PCA analysis a relatively small proportion of genetic variance was accounted for, however previous reports for pigs display similar proportions18,20,24. Our results underscore the divergence of Duroc which usually appears as a main and separate cluster from other breeds18,20,25,26.

Combining PCA and FST analysis suggest DU, HS, and YK are genetically distinct from one another. While this might be expected when comparing YK to the two sire breeds we anticipated more similarity between DU and HS. As to the next tier of commercial breeds (Berkshire, Landrace and Chester White) pair-wise FST values were similar to the DU – HS value. The three maternal white breeds had significant but smaller pair-wise FST values, the PC analysis indicated the YK was distinct from the LA and CW (both of which were shown to be closely associated; Fig. 1). ADMIXTURE results support the PC analysis for when K = 8, CW and LA were members of the same cluster, by K = 10 they were placed in different clusters while YK had its own cluster by K = 3.

The expansion of genetic structure analysis performed with ADMIXTURE corroborates the common pattern that separates populations of European and Asian origin1; while Pacific island populations and MI were substantially admixed. When K = 3 the DU, YK, and ME had separate clusters and the remaining European breeds were admixed. Interestingly at K = 3 DU shows little to no remnant of the Chinese origin breeds, underscoring the breed’s unique genetic composition. Through subsequent clusters in the ADMIXTURE run DU showed little to no admixture, from K = 3 to K = 19, while the YK showed higher levels of admixture, from K = 7 to 21, which may be a function of newly imported Yorkshires that were under different selection strategies or separated by genetic drift. By K = 10 KI was segregated into its own population but HI and GI remained admixed, that said substantial portions of the HI and GI populations were assigned to the KI cluster. Interestingly the MI, OI, and GH all had proportional assignments to the KI cluster underscoring the east - west confluence. At K = 10 we noted that MI, TA, SP, and GH were highly admixed suggesting they are a reservoir of genetic diversity.

Y-chromosome results provide insights into the migration of these swine populations. Interestingly GI’s population had haplotypes from China (H4) and an unknown origin (H5). We hypothesize that H5 may be part of the Pacific clade as reported earlier9 and observed in other livestock23. If this assumption is correct it suggests the Guam feral population may be the result of two different migrations from east Asia and the south Pacific. The identification of the H4 and H5 haplotypes on KI suggests further eastward migration of swine populations with Polynesian and East Asians into the central Pacific. The presence of the H4 haplotype on KI would be counter to9. The remaining haplotypes on KI (H2) and HI (H1 and H2) were present among continental US populations, suggesting admixture with later importation of European derived pigs. The PCA and ADMIXTURE results appear to support the haplotype analysis among the Pacific island populations. In Fig. 1 the Pacific Island populations were intermediate to the Chinese and continental US populations. In the ADMIXTURE analysis the island populations have high proportions of admixture relative to other highly defined clusters.

Haplotypes H1 and H2 are common in the continental US and the Hawaiian Islands. Haplotype-2 was the most cosmopolitan of haplotypes appearing in China, Hawaiian Islands and US. Additionally its internal position in the network analysis (Fig. 4) suggests that H2 might be an ancient haplotype, even though it is more genetically prevalent in US haplotypes. Its presence in minor and rare breeds could support the view of H2 being an older haplotype. If this is correct, its presence on the Hawaiian Islands might be the result of earlier exploration by European’s in the 18th century. However, because of its presence in the Chinese MI breed H2 could have originated during the domestication of pigs in China and imported into Europe during the 18th century.

Haplotype H3 was observed in 11 of the 19 populations but it was only found in the continental US. Also, it was the only haplotype among the prominent breeds: DU, HS and YK; and minor breeds: TA, Hereford and GH. Given that the Chinese and Pacific populations lack the H3 haplotype suggests a European origin, and may have as an original source wild boars found west of the Caucus Mountains27.

A major aspect of this study was to provide a first assessment of the within and among breed genetic diversity captured in the national gene bank collection. Effective population sizes for commercial breeds were in agreement with work14 assessing Canadian DU, YK, and LA using LD. This study’s results are also in agreement with the Ne computed with pedigree information19 for DU, YK, LA, HS, and BE. In all instances Ne was >50 exceeding the minimum Ne suggested by FAO28. By pooling study populations we noted substantial variation, as suggested by an Ne = 236 which resides in the gene bank and can be used to advance breeding goals, similar to that reported with sheep29.

For breeds with more than 20 animals genotyped the results tended to be homogeneous, in that, levels of heterozygosity and inbreeding were similar and at levels that suggest sufficient genetic variability exists among the populations captured in the repository28. Results for breeds with less than 15 animals’ are viewed as preliminary. That said, genetic parameters for SP, HE and TA were similar to breeds with larger sample size. Up to K = 10 in the ADMIXTURE analysis the SP and TA were admixed, at K = 10 the HF were placed in a unique cluster, but TA was not. Those early admixture patterns for low-medium K values reflect clusters of breeds that share recent admixture events, higher genetic diversity and/or were not undergoing selection nor experienced a recent bottleneck13,15.

The samples from Guam, Kauai, Hawaii, and US/Chinese ME provide interesting models for evaluating small isolated populations. The ME included randomly bred lines; while the island populations are all feral but confined to the island sampled. Genetic parameters (Table 1) for GI and KI suggest the populations have similar levels of heterozygosity to the numerically larger commercial breeds in the continental US and greater than the ME samples. As the number of ME imported was small, our results suggest larger numbers of East Asian and Polynesian pigs were part of the migration to those islands. Another possible interpretation is that the admixture of the different importations have broadened the genetic base of the feral populations. Island size also supports our admixture conjecture, as Hawaii is approximately 7 times larger than Kauai and 19 times larger than Guam. With Kauai and Guam having potentially multiple waves of immigration plus a potential sex ratio close to 1:1 inbreeding accumulation would be dampened and a relatively high effective population size maintained as demonstrated by the computed Ne of 77. As a result, such populations may serve several important functions in the future. For example, they could serve as sentinel populations for monitoring genetic by environmental interactions in relation to a changing climate or a resource to study a wide variety of diseases with what may be naive populations. The feral population may also serve non-biological functions in relation to cultural practices of indigenous human populations found in the Pacific.


Large and substantial genetic differences in pig populations dispersed across the continental US, Pacific Islands and Chinese breeds were found. Results suggest that feral populations from Pacific islands have a combination of European, East Asian and potentially South East Asian origin. Identifying the diverse origins of feral pigs on the Pacific islands contributes to the policy dialog about the control/eradication and/or conservation of these populations. While there may be environmental issues, the populations in question are unique and may have cultural significance for persons of Polynesian ancestry. Exploring feral genomes further may reveal other genetic attributes of interest for the research community and industry.

This study’s evaluation of commercial breeds acquired for gene banking purposes is encouraging. Samples from commercial breeds suggest a range of genetic variability has been captured by the gene bank. As an indicator of genetic variability among in-situ populations, our results suggest no lack of genetic variability. Some rare breeds showed a lack of genetic distinction in PC analysis and were admixed which was likely due to limited sample size. But, HE and TA were distinct (particularly in PC analysis) and were shown to have measures of genetic variation in the same ranges as the commercial breeds. As the collection of swine genetic resources in the gene bank continues to grow it will be a resource to further explore various aspects of genetic diversity or gene function.

Material and Methods

IACUC Review

No samples were collected for this study; rather they were collected as part of other studies or program activities not associated with this study. Semen samples were acquired from swine artificial insemination centers by the National Animal Germplasm Program as part of efforts to conserve genetic resources. Blood samples from Pacific island populations were collected by USDA Wildlife Services in their routine efforts to manage feral populations and then provided to the National Animal Germplasm Program.

Breeds and Populations

This study focused on assessing populations found upon the continental US and its Pacific islands. We were particularly interested in evaluating feral Pacific populations’ origin and relationship to rare and commercial breeds. Samples and data originated from 19 pig breeds (Table 1) or genetic groups (N = 500): Berkshire, Duroc, Hampshire, Landrace, Yorkshire, Chester White, and Spotted represented commercial breeds; Fengjing, Meishan, Minzhu represented Chinese breeds originally imported to the US in the late 1980’s; Guinea Hog, Hereford, Large Black, Mangalitsa, Ossabaw Island, and Tamworth were among the minor and rare breeds; and feral hogs from the Pacific islands of Guam, Kauai, and Hawaii (Table 1). All breeds, with exception of Mangalitsa, Meishan, Fengjing, and Minzhu were established in the US for one century or longer. Total genomic DNA were extracted from sperm, blood or FTA cards (Guam, Kauai and Hawaii). The protocol varies according to the origin of the biological material and were adjusted by Neogen Genomics ( All samples and genotypes are deposited with the USDA ARS National Animal Germplasm Program (

The Meishan was composed of three subpopulations11 denoted by Meishan – IM (samples from the original Meishan importation), Meishan – U (randomly mated control herd maintained at Iowa State University) and Meishan – MARC (randomly mated control herd maintained by USDA-ARS).

Substantial distances exist between the geographic areas sampled. From continental U.S. to the Hawaiian Islands is approximately 3,900 km and Hawaiian Islands to Guam is approximately 6,000 km. The distance between the Hawaiian Islands of Hawaii and Kauai is approximately 500 km. Island size varies greatly: Guam, 544 km2; Kauai, 1,440 km2; and Hawaii, 10,432 km2. Guam’s samples were from the same area of that island.

Genotyping and quality control

Samples were genotyped by a commercial vendor (Neogen Genomics, Lincoln, NE), using GGP Porcine HD v1 (70,231 SNP). Final Reports generated with Genome Studio Software (Illumina, San Diego, CA, USA) were imported into SNP & Variation Suite v8.6 (Golden Helix, Inc., Bozeman, MT, Due to the use of feral and rare breeds we first evaluated the polymorphism content of the BeadChip transferable to these breeds. SNP with call rate <0.95 and located at sexual chromosomes were pruned from the initial dataset, leading to 57,668 SNP. Basic marker statistics as number of polymorphic SNP, average call rate and average number of alleles were calculated for each breed (Table S1).

The next step was remove markers not mapped in the Sscrofa10.2 assembly (2,674 SNP). The 54,994 remaining SNP were filtered using the following genetic parameters: Hardy-Weinberg Equilibrium (HWE) p < 0.0001, and Linkage Disequilibrium (LD) r2 > 0.3. The final data set had 8,764 SNP used in all autosomal analysis.

Autosomal SNP Analysis

Within breeds genetic diversity parameters were estimated with 8,764 SNP using Arlequin Principal Component Analysis (PCA) using SNP & Variation Suite v8.6 (SVS) (Golden Helix, Inc., Bozeman, MT, was performed to evaluate population stratification. The SVS software implements the EIGENSTRAT algorithm31. The results were represented in a 3D scatterplot of the first three principal components using SigmaPlot 13 (

Population structure analysis was carried out using ADMIXTURE 1.2.231 which is based on maximum likelihood. The inference of number of populations or clusters (K) present in the analysis was estimated by the cross-validation procedure., where the smallest cross-validation error rate indicates the most probable K value for the data set33. The default termination criteria for the ADMIXTURE analysis was assumed when the log-likelihood increases by less than ϵ = 10−4 between interactions. The termination criteria of is ϵ = 10−5 between interactions was also tested, without any difference on the cross-validation error values. The routine analysis was carried out for 22 clusters (Ks) with 10 replicates for each one. A second ADMIXTURE analysis with nine populations consisting of minor and rare breeds, and feral populations (12 clusters tested with 10 replicates each) was performed. The graphic representations of the proportions of assignment of each individual within population were visualized using CLUMPAK (

Chromosome Y

The GGP Porcine HD v1 contains 14 SNP within the Chromosome Y. The analysis of the Y chromosome in mammals reflects the paternal history of migration and evolution of breeds and populations. For this analysis all female samples were eliminated, therefore 475 male samples were used. The 14 markers were pruned using a call rate <0.95. Of the remaining 11 SNP, two were monomorphic, and three more presented heterozygosity ≠ 0. Ultimately, six SNP were used to estimate the number of different haplotypes, haplotypes frequencies and to reconstruct a network among populations. The haplotype estimates and frequencies were calculated by the software Arlequin and the relationship between haplotypes was investigated by constructing median networks using Network 5 (

Effective Population Size: To establish baselines and trends of Ne the SNeP software34 was used with default conditions. Prior to using SNeP, SNP filtering was redone dropping the LD aspects. As a result 16,021 SNP were used in this analysis. Due to small sample sizes for some breeds the LD analysis was performed on 10 groups indicated by the population structure identified when ADMIXTURE analysis used K = 10. The 10 populations were: BE, CW, DU, a Chinese group (ME, FE, MI), a Pacific island group (GI, HI, KI), a minor breed group (GH, TA, OI, LB, SP, MA), HS, HE, LA, and YK35.


USDA, Agricultural Research Service, is an equal opportunity/affirmative action employer. Mention of companies or commercial products does not imply recommendation or endorsement by the USDA over others not mentioned. USDA neither guarantees nor warrants the standard of any product mentioned. Product names mentioned solely to report factually on available data and provide specific information.

Data Availability

Molecular data and other information about the animals used in this study are available via the Animal-GRIN database:


  1. 1.

    Groenen, M. A. et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491, 393–398 (2012).

    ADS  CAS  Article  Google Scholar 

  2. 2.

    Larson, G. et al. Patterns of East Asian pig domestication, migration, and turnover revealed by modern and ancient DNA. Proc Natl Acad Sci USA 107, 7686–7691 (2010).

    ADS  CAS  Article  Google Scholar 

  3. 3.

    Megens, H. J. et al. Biodiversity of pig breeds from China and Europe estimated from pooled DNA samples: differences in microsatellite variation between two areas of domestication. Genet Sel Evol 40, 103–128 (2008).

    PubMed  PubMed Central  Google Scholar 

  4. 4.

    Ramirez, O. et al. Integrating Y-chromosome, mitochondrial, and autosomal data to analyze the origin of pig breeds. Mol Biol Evol 26, 2061–2072 (2009).

    CAS  Article  Google Scholar 

  5. 5.

    Blackburn, H. D. Genebank development for the conservation of livestock genetic resources in the United States of America. Livestock Science 120, 196–203 (2009).

    Article  Google Scholar 

  6. 6.

    Blackburn, H. D. Biobanking Genetic Material for Agricultural Animal Species. Annu. Rev. Anim. Biosci. 6, 69–82 (2018).

    CAS  Article  Google Scholar 

  7. 7.

    Linderholm, A. et al. A novel MC1R allele for black coat colour reveals the Polynesian ancestry and hybridization patterns of Hawaiian feral pigs. R Soc Open Sci 3, 160304 (2016).

    Article  Google Scholar 

  8. 8.

    McCann, B. E. et al. Mitochondrial diversity Supports Multiple Origins for Invasive Pigs. The Journal of Wildlife Management 78, 202–213 (2014).

    Article  Google Scholar 

  9. 9.

    Larson, G. et al. Phylogeny and ancient DNA of Sus provides insights into neolithic expansion in Island Southeast Asia and Oceania. Proc Natl Acad Sci USA 104, 4834–4839 (2007).

    ADS  CAS  Article  Google Scholar 

  10. 10.

    Jones G. F. Genetic Aspects of domestication, common breeds and their origin. In: The genetics of the pig (eds Rothschild, M. F. R. & Wallingford, A.), pp. 17–50. CAB International (1998).

  11. 11.

    Blackburn, H. D., Plante, Y., Rohrer, G., Welch, C. S. & Paiva, S. R. Impact of genetic drift on access and benefit sharing under the Nagoya Protocol: the case of the Meishan pig. J Anim Sci 92, 1405–1411 (2014).

    CAS  Article  Google Scholar 

  12. 12.

    Carvalho, G. M., Paiva, S. R., Araujo, A. M., Mariante, A. & Blackburn, H. D. Genetic structure of goat breeds from Brazil and the United States: Implications for conservation and breeding programs. J Anim Sci 93, 4629–4636 (2015).

    CAS  Article  Google Scholar 

  13. 13.

    Paim, T. P. et al. Selection Signatures and Genetic Diversity in Specialized and Locally Adapted Goat Breeds in Americas. International Plant & Animal Genome XXVI Conf. 13, P0434 (2018).

    Google Scholar 

  14. 14.

    Grossi, D. A. et al. Genetic diversity, extent of linkage disequilibrium and persistence of gametic phase in Canadian pigs. BMC Genet 18, 6.

  15. 15.

    Zhang, C. & Plastow, G. Genomic Diversity in Pig (Sus scrofa) and its Comparison with Human and other Livestock. Curr Genomics 12, 138–146 (2011).

    Article  Google Scholar 

  16. 16.

    Manunza, A. et al. A high throughput genotyping approach reveals distinctive autosomal genetic signatures for European and Near Eastern wild boar. PLoS One 8, e55891 (2013).

    ADS  CAS  Article  Google Scholar 

  17. 17.

    Herrero-Medrano, J. M. et al. Whole-genome sequence analysis reveals differences in population management and selection of European low-input pig breeds. BMC Genomics 15, 601 (2014).

    Article  Google Scholar 

  18. 18.

    Mujibi, F. D. et al. Genetic diversity, breed composition and admixture of Kenyan domestic pigs. PLoS One 13, e0190080 (2018).

    Article  Google Scholar 

  19. 19.

    Yang, B. et al. Genome-wide SNP data unveils the globalization of domestic pigs. Genet. Sel. Evol. 49, 71 (2017).

    Article  Google Scholar 

  20. 20.

    Welsh, C. S., Stewart, T. S., Schwab, C. & Blackburn, H. D. Pedigree analysis of 5 swine breeds in the United States and the implications for genetic conservation. J Anim Sci 88, 1610–1618 (2010).

    CAS  Article  Google Scholar 

  21. 21.

    Luetkemeier, E. S., Sodhi, M., Schook, L. B. & Malhi, R. S. Multiple Asian pig origins revealed through genomic analyses. Mol Phylogenet Evol 54, 680–686 (2010).

    Article  Google Scholar 

  22. 22.

    Ramos, A. M. et al. Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS One 4, e6524 (2009).

    ADS  Article  Google Scholar 

  23. 23.

    Thomson, V. A. et al. Using ancient DNA to study the origins and dispersal of ancestral Polynesian chickens across the Pacific. PNAS 111, 4826–4831 (2014).

    ADS  CAS  Article  Google Scholar 

  24. 24.

    Burgos-Paz, W. et al. Porcine colonization of the Americas: a 60k SNP story. Heredity (Edinb) 110, 321–330 (2013).

    CAS  Article  Google Scholar 

  25. 25.

    Ai, H., Huang, L. & Ren, J. Genetic diversity, linkage disequilibrium and selection signatures in chinese and Western pigs revealed by genome-wide SNP markers. PLoS One 8, e56001 (2013).

    ADS  CAS  Article  Google Scholar 

  26. 26.

    Roberts, K. S. & Lamberson, W. R. Relationships among and variation within rare breeds of swine. J Anim Sci 93, 3810–3813 (2015).

    CAS  Article  Google Scholar 

  27. 27.

    Larson, G. et al. Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science 307, 1618–1621 (2005).

    ADS  CAS  Article  Google Scholar 

  28. 28.

    FAO. Cryoconservation of animal genetic resources. FAO Animal Production and Health Guidelines No. 12. Rome (2012).

  29. 29.

    Blackburn, H. D. et al. Genetic structure of Ovis aries populations near domestication centers and in the New World. Genetica 139, 1169–1178 (2011).

    CAS  Article  Google Scholar 

  30. 30.

    Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10, 564–567 (2010).

    Article  Google Scholar 

  31. 31.

    Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–909 (2006).

    CAS  Article  Google Scholar 

  32. 32.

    Alexander, D. H. et al. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19, 1655–1664 (2009).

    CAS  Article  Google Scholar 

  33. 33.

    Alexander, D. H. & Lange, K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12, 246 (2011).

    Article  Google Scholar 

  34. 34.

    Kopelman, N. M. et al. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. Mol Ecol Resour 15, 1179–1191 (2015).

    CAS  Article  Google Scholar 

  35. 35.

    Barbato M., Orozco-terWengel P., Tapio M. & Bruford, M. W. SNeP: a tool to estimate trends in recent effective population size trajectories using genome-wide SNP data. Frontiers Genetics 6, Article 109, 1–6 (2015).

Download references

Author information




H.D.B. and S.R.P. designed study; D.A.F. and C.S.W. performed analyses; all authors participated in the interpretation of results; H.D.B., D.A.F. and S.R.P. wrote the manuscript which was reviewed by all authors.

Corresponding author

Correspondence to H. D. Blackburn.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Faria, D.A., Wilson, C., Paiva, S. et al. Assessing Sus scrofa diversity among continental United States, and Pacific islands populations using molecular markers from a gene banks collection. Sci Rep 9, 3173 (2019).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.