Genomics of the origin and evolution of Citrus

The genus Citrus, comprising some of the most widely cultivated fruit crops worldwide, includes an uncertain number of species. Here we describe ten natural citrus species, using genomic, phylogenetic and biogeographic analyses of 60 accessions representing diverse citrus germ plasms, and propose that citrus diversified during the late Miocene epoch through a rapid southeast Asian radiation that correlates with a marked weakening of the monsoons. A second radiation enabled by migration across the Wallace line gave rise to the Australian limes in the early Pliocene epoch. Further identification and analyses of hybrids and admixed genomes provides insights into the genealogy of major commercial cultivars of citrus. Among mandarins and sweet orange, we find an extensive network of relatedness that illuminates the domestication of these groups. Widespread pummelo admixture among these mandarins and its correlation with fruit size and acidity suggests a plausible role of pummelo introgression in the selection of palatable mandarins. This work provides a new evolutionary framework for the genus Citrus.

Guohong albert Wu 1 , Javier Terol 2 , Victoria ibanez 2 , antonio López-García 2 , estela Pérez-román 2 , carles borredá 2 , concha Domingo 2 , francisco r. Tadeo 2 , Jose carbonell-caballero 3  The genus Citrus and related genera (Fortunella, Poncirus, Eremocitrus and Microcitrus) belong to the angiosperm subfamily Aurantioideae of the Rutaceae family, which is widely distributed across the monsoon region from west Pakistan to north-central China and south through the East Indian Archipelago to New Guinea and the Bismarck Archipelago, northeastern Australia, New Caledonia, Melanesia and the western Polynesian islands 1 . Native habitats of citrus and related genera roughly extend throughout this broad area (Extended Data Fig. 1a and Supplementary Table 1), although the geogra phical origin, timing and dispersal of citrus species across southeast Asia remain unclear. A major obstacle to resolving these uncertainties is our poor understanding of the genealogy of complex admixture in cultivated citrus, as has recently been shown 2 . Some citrus are clonally propagated apomictically 3 through nucellar embryony, that is, the development of non-sexual embryos originating in the maternal nucellar tissue of the ovule, and this natural process may have been co-opted during domestication; grafting is a relatively recent phenomenon 4 . Both modes of clonal propagation have led to the domestication of fixed (desirable) genotypes, including interspecific hybrids, such as oranges, limes, lemons, grapefruits and other types.
Under this scenario, it is not surprising that the current chaotic citrus taxonomy-based on long-standing, conflicting proposals 5,6 -requires a solid reformulation consistent with a full understanding of the hybrid and/or admixture nature of cultivated citrus species. Here we analyse genome sequences of diverse citrus to characterize the diversity and evolution of citrus at the species level and identify citrus admixtures and interspecific hybrids. We further examine the network of relatedness among mandarins and sweet orange, as well as the pattern of the introgression of pummelos among mandarins for clues to the early stages of citrus domestication.

Diversity and evolution of the genus Citrus
To investigate the genetic diversity and evolutionary history of citrus, we analysed the genomes of 58 citrus accessions and two outgroup genera (Poncirus and Severinia) that were sequenced to high coverage, including recently published sequences 2,3,7 as well as 30 new genome sequences described here. For our purpose, we do not include accessions related by somatic mutations. These sequences represent a diverse sampling of citrus species, their admixtures and hybrids (Supplementary Tables 2, 3 and Supplementary Notes 1, 2). Our collection includes accessions from eight previously unsequenced and/or unexamined citrus species, such as pure mandarins (Citrus reticulata), citron (Citrus medica), Citrus micrantha (a wild species from within the subgenus Papeda), Nagami kumquat (Fortunella margarita, also known as Citrus japonica var. margarita), and Citrus ichangensis (also known as Citrus cavaleriei; this species is also considered a Papeda), as well as three Australian citrus species (Supplementary Notes 3,4). For each species, we have sequenced one or more pure accessions without interspecific admixture.
Local segmental ancestry of each accession can be delineated for both admixed and hybrid genotypes, based on genome-wide ancestryinformative single-nucleotide polymorphisms (Supplementary Note 5). Comparative genome analysis further identified shared haplotypes among the accessions (Supplementary Notes 6, 7). In particular, we demonstrate the F1 interspecific hybrid nature of Rangpur lime and red rough lemon (two different mandarin-citron hybrids), Mexican lime (a micrantha-citron hybrid) and calamondin (a kumquat-mandarin hybrid), and confirm, using whole-genome sequence data, the origins of grapefruit (a pummelo-sweet orange hybrid), lemon (a sour orangecitron hybrid) and eremorange (a sweet orange and Eremocitrus glauca (also known as Citrus glauca) hybrid). We also verified the parentage of Cocktail grapefruit, with low-acid pummelo as the seed parent and The genus Citrus, comprising some of the most widely cultivated fruit crops worldwide, includes an uncertain number of species. Here we describe ten natural citrus species, using genomic, phylogenetic and biogeographic analyses of 60 accessions representing diverse citrus germ plasms, and propose that citrus diversified during the late Miocene epoch through a rapid southeast Asian radiation that correlates with a marked weakening of the monsoons. A second radiation enabled by migration across the Wallace line gave rise to the Australian limes in the early Pliocene epoch. Further identification and analyses of hybrids and admixed genomes provides insights into the genealogy of major commercial cultivars of citrus. Among mandarins and sweet orange, we find an extensive network of relatedness that illuminates the domestication of these groups. Widespread pummelo admixture among these mandarins and its correlation with fruit size and acidity suggests a plausible role of pummelo introgression in the selection of palatable mandarins. This work provides a new evolutionary framework for the genus Citrus.

Article reSeArcH
King and Dancy mandarins as the two grandparents on the paternal side. The origin of the Ambersweet orange is similarly confirmed to be a mandarin-sweet orange hybrid with Clementine as a grandparent. We have previously shown that sour orange (cv. Seville) (Citrus aurantium) is a pummelo-mandarin hybrid, and have analysed the more complex origin of sweet orange (Citrus sinensis) 2 . Re-analysing sequences from ten cultivars of sweet orange 3 shows that they are all derived from the same genome by somatic mutations, and were thus not included in our study.
We identified ten progenitor citrus species (Supplementary Note 4.1) by combining diversity analysis (Extended Data Table 1), multidimensional scaling and chloroplast genome phylogeny (Extended Data Fig. 1b). The first two principal coordinates in the multidimensional scaling (Fig. 1a) separate three ancestral (sometimes called 'fundamental') Citrus species associated with commercially important types 8,9citrons (C. medica), mandarins (C. reticulata) and pummelos (Citrus maxima)-and display lemons, limes, oranges and grapefruits as hybrids involving these three species. The nucleotide diversity distributions (Fig. 1b) show distinct scales for interspecific divergence and intraspecific variation, and reflect the genetic origin of each accession. Hybrid accessions (sour orange, calamondin, lemon and non-Australian limes) with ancestry from two or more citrus species are readily identified on the basis of their higher segmental heterozygosity (1.5-2.4%) relative to intraspecific diversity (0.1-0.6%). Other citrus accessions show bimodal distributions in heterozygosity (sweet orange, grapefruits and some highly heterozygous mandarins) due to interspecific admixture, a process that generally involves complex backcrosses. Among the pure genotypes without interspecific admixture, citrons show significantly lower intraspecific diversity (around 0.1%) than the other species (0.3-0.6%). The reduced heterozygosity of citrons, a mono-embryonic species, is probably due to the cleistogamy of its flowers 10 , a mechanism that promotes pollination and self-fertilization in unopened flower buds, which in turn reduces heterozygosity.
The identification of a set of pure citrus species provides new insights into the phylogeny of citrus, their origins, evolution and dispersal. Citrus phylogeny is controversial 1,5,6,11,12 , in part owing to the difficulty of identifying pure or wild progenitor species, because of substantial interspecific hybridization that has resulted in several clonally propagated and cultivated accessions. Some authors assign separate binomial species designations to clonally propagated genotypes 1,6 . Our nuclear genome-based phylogeny, which is derived from 362,748 singlenucleotide polymorphisms in non-genic and non-pericentromeric genomic regions, reveals that citrus species are a monophyletic group and establishes well-defined relationships among its lineages ( Fig. 1c and Supplementary Note 8). Notably, the nuclear genome-derived phylo geny differs in detail from the chloroplast-derived phylogeny (Extended Data Fig. 1). This is not unexpected, as chloroplast DNA is a single, non-recombining unit and is unlikely to show perfect lineage sorting during rapid radiation (Supplementary Note 8.3).
The origin of citrus has generally been considered to be in southeast Asia 1 , a biodiversity hotspot 13 with a climate that has been influenced by both east and south Asian monsoons 14 16 , has traits that are characteristic of current major citrus  Figure 1 | Genetic structure, heterozygosity and phylogeny of Citrus species. a, Principal coordinate analysis of 58 citrus accessions based on pairwise nuclear genome distances and metric multidimensional scaling. The first two axes separate the three main citrus groups (citrons, pummelos and mandarins) with interspecific hybrids (oranges, grapefruit, lemon and limes) situated at intermediate positions relative to their parental genotypes. b, Violin plots of the heterozygosity distribution in 58 citrus accessions, representing 10 taxonomic groups as well as 2 related genera, Poncirus (Poncirus trifoliata, also known as Citrus trifoliata) and Chinese box orange (Severinia). White dot, median; bar limits, upper and lower quartiles; whiskers, 1.5× interquartile range. The bimodal separation of intraspecies (light blue) and interspecies (light pink) genetic diversity is manifested among the admixed mandarins and across different genotypes including interspecific hybrids. Threeletter codes are listed in parenthesis with additional descriptions in Supplementary Table 2. c, Chronogram of citrus speciation. Two distinct and temporally wellseparated phases of species radiation are apparent, with the southeast Asian citrus radiation followed by the Australian citrus diversification. Age calibration is based on the citrus fossil C. linczangensis 16  groups, and provides definite evidence for the existence of a common Citrus ancestor within the Yunnan province approximately 8 million years ago (Ma).
Our analysis establishes a relatively rapid Asian radiation of citrus species in the late Miocene (6-8 Ma; Fig. 1c, d), a period coincident with an extensive weakening of monsoons and a pronounced climate transition from wet to drier conditions 17 . In southeast Asia, this marked climate alteration caused major changes in biota, including the migration of mammals 18 and rapid radiation of various plant lineages 19,20 . Australian citrus species form a distinct clade that was proposed to be nested with citrons 12 , although distinct generic names (Eremocitrus and Microcitrus) were assigned in botanical classifications by Swingle 1,5 . Both molecular dating analysis 21 and our whole-genome phylogenetic analysis do not support an Australian origin for citrus 22 . Rather, citrus species spread from southeast Asia to Australasia, probably via transoceanic dispersals. Our genomic analysis indicates that the Australian radiation occurred during the early Pliocene epoch, around 4 Ma. This is contemporaneous with other west-to-east angiosperm migrations from southeast Asia 23,24 , presumably taking advantage of the elevation of Malesia and Wallacea in the late Miocene and Pliocene 25,26 (Supplementary Note 9).
The nuclear and chloroplast genome phylogenies indicate that there are three Australian species in our collection. One of the two Australian finger limes shows clear signs of admixture with round limes (Supplementary Note 5.4). The closest relative to Australian citrus is Fortunella, a species that has been reported to grow in the wild in southern China 27 . Australian citrus species are diverse, and found natively in both dry and rainforest environments in northeast Australia, depending on the species 28 . Our phylogeny shows that the progenitor citrus probably migrated across the Wallace line, a natural barrier for species dispersal from southeast Asia to Australasia, and later adapted to these diverse climates.
The results also show that the Tachibana mandarin, naturally found in Taiwan, the Ryukyu archipelago and Japan 29 , split from mainland Asian mandarins (Fig. 1c, d) during the early Pleistocene (around 2 Ma), a geological epoch with strong glacial maxima 30 . Tachibana, as did other flora and fauna in the region, very probably arrived in these islands from the adjacent mainland 31 during the drop in the sea level of the South China Sea and the emergence of land bridges 32,33 , a process promoted by the expansion of ice sheets that repetitively occurred during glacial maxima (Supplementary Note 9).
Although Tachibana 5,6 has been assigned its own species (Citrus tachibana), sequence analysis reveals that it has a close affinity to C. reticulata 34,35 and does not support its taxonomic position as a separate species (Supplementary Note 4.1). However, both chloroplast genome phylogeny (Extended Data Fig. 1b) and nuclear genome clustering (Fig. 1a) clearly distinguish Tachibana from the mainland Asian mandarins. This suggests that Tachibana should be designated a subspecies of C. reticulata. By contrast, the wild Mangshan 'mandarin' (Citrus mangshanensis) 7 represents a distinct species, with comparable distances to C. reticulata, pummelo and citron 2 (Extended Data Table 1).

Pattern of pummelo admixture in the mandarins
Using 588,583 ancestry-informative single-nucleotide polymorphisms derived from three species, C. medica, C. maxima and C. reticulata, we delineate the segmental ancestry of 46 citrus accessions (Extended Data Fig. 2 and Supplementary Note 5). Pummelo admixture is found in all but 5 of the 28 sequenced mandarins, and the amount and pattern of pummelo admixture, as identified by phased pummelo haplotypes ( Fig. 2a and Supplementary Note 6), suggests the classification of the mandarins into three types.
Type-1 mandarins represent pure C. reticulata with no evidence of interspecific admixture and include Tachibana, three unnamed Chinese mandarins (M01, M02, M04) 3 and the ancient Chinese cultivar Sun Chu Sha Kat reported here, a small tart mandarin commonly grown in China and Japan, and also found in Assam. This cultivar is likely described in Han Yen-Chih's ad 1178 monograph 'Chü Lu' 36 Type-1  Type-2  Type-3   TBM  SCM  M01  M02  M04  CLP  SNK  M17  M12  M03  HLM  M10  M15  CSM  M16  M11  DNC  WLM  M14  PKM  M08  CLM  WMM  M19  M21  UNS  KNG  M20  SO5  SWO  SSO  BO2  BO3  GF0  PAR  LIM  LMA  RRL  MXL  CHP  STP  LAP  GXP  BUD  COR  HUM  VEU  CAL  FOR   in the heterogeneous set of mandarins, the degree of pummelo introgression subdivides the group into pure (type-1) and admixed (type-2 and -3) mandarins. Three-letter code as in Fig. 1 Article reSeArcH includes references to citrus cultivated during the reign of Emperor Ta Yu (2205-2197 bc). Sixteen of the twenty-eight mandarins belong to type-2 mandarins, which have a small amount of pummelo admixture (1-10% of the length of the genetic map; Fig. 2a), usually in the form of a few short segments distributed across the genome. Although the lengths and locations of these admixed segments may be distinct in different mandarins, they share one or two common pummelo haplotypes (designated as P1 and P2) (Extended Data Fig. 3). By contrast, the seven remaining mandarins (type-3) contain higher proportions of pummelo alleles (12-38%; Fig. 2a) in longer segments. Although the P1 and P2 pummelo haplotypes are also detectable among type-3 mandarins, other more extensive pummelo haplotypes dominate the pummelo admixture in type-3 mandarins (Fig. 2b and Extended Data Table 2).
These observations suggest that the initial pummelo introgression into the mandarin gene pool may have involved as few as one pummelo tree (carrying both P1 and P2 haplotypes), the contribution of which was diluted by repeated backcrosses with mandarins (Supplementary Note 6.3). The introgressed pummelo haplotypes became widespread and gave rise to type-2 (early-admixture) mandarins (Fig. 2b). We propose that later, additional pummelo introgressions gave rise to type-3 (late-admixture) mandarins and sweet orange, and that some modern type-3 mandarins were derived from hybridizations among existing mandarins and sweet orange. This late-admixture model for type-3 mandarins is consistent with the historical records for Clementine and Kiyomi (both mandarin-sweet orange hybrids), and for W. Murcott, Wilking and Fallglo (hybrids involving other type-3 mandarins), whereas definitive records for the remaining two late-admixture mandarins (King and Satsuma) are not available.

Domestication of mandarins and sweet orange
Citrus domestication probably began with the identification and asexual propagation of selected, possibly hybrid or admixed individuals, rather than recurrent selection from a breeding population as for annual crops 37,38 . Additional diversity was obtained by capturing somatic mutations that occur within a relatively few basic genotypes. Therefore, conventional approaches to identifying selective pressures under recurrent breeding 39 cannot be applied. We can, however, use genome sequences to infer some features of the early stages of citrus domestication. Here we focus on mandarins, a class of citrus comprising small and easily peeled fruits that are of high commercial value.
All 28 mandarin accessions, except for Tachibana, exhibit an extensive network of relatedness (with a coefficient of relatedness, r > 1/8), and all but four mandarins (three of the four are pure or type-1 mandarins) show second degree or higher relatedness (r > 1/4) to at least one (mean = 7) other mandarin (Fig. 3a and Supplementary Note 7). By contrast, sequenced pummelos and citrons appear to be independent selections from relatively large populations. In the A m b e r s w e e t o r a n g e K iy o m i m a n d a r in C le m e n t in e m a n d a r in M a r s h g r a p e f r u it E r e m o r a n g e F a ll g lo m a n d a r in P o n k a n m a n d a r in D a n c y m a n d a r in C h a n g s h a m a n d a r in C h in e s e m a n d a r in   Fig. 4a) as a result of shared haplotypes between their parents. The high degree of relatedness among mandarins implies extensive sharing of C. reticulata haplotypes. Sweet orange also shows extensive haplotype sharing at the level of r > 0.1 with 25 of the 28 sequenced mandarins (except for three pure or type-1 mandarins; Fig. 3b and Extended Data Fig. 4b). Two lateadmixture mandarins (Clementine and Kiyomi) are direct offspring of sweet orange. Among the early-admixture (type-2) mandarins, Ponkan shows the highest affinity to sweet orange 2 with r ≈ 0.36. Even the pure mandarin, Sun Chu Sha Kat has r ≈ 0.23, equivalent to second degree relatedness to sweet orange. We can rule out the scenario that sweet orange is the common ancestor of the mandarins, because of a lack of pummelo haplotypes (derived from sweet orange) among the mandarins. Rather, the extensive C. reticulata haplotype sharing between sweet orange and mandarins suggests that the mandarin parent of sweet orange was part of an expansive network of relatedness among mandarins.
Because our collection of mandarins represents a diverse set of both ancient and modern varieties, including economically important accessions with mostly unknown parentage, the presence of an extensive relatedness network was not anticipated a priori. The shared C. reticulata haplotypes are suggestive of and consistent with signatures of the human selection process, during which mandarins with desirable traits were necessarily maintained through clonal propagation (nucellar polyembryony or grafting). Although one cannot preclude the possibility that the relatedness network was initiated before domestication from a small number of founder trees, human selection of accessions resulting from natural hybridization probably had a key role in the process of domestication that eventually led to the extensive relatedness network observed today. For example, modern mandarins, such as Clementine and W. Murcott, are known to be selections from chance seedlings found in Algeria 40 and Morocco 2 , at the onset and middle of the last century, respectively.
Pummelo admixture is correlated with fruit size and acidity, suggesting a role for pummelo introgression in citrus domestication. As both fruit size and acidity profile for the most recently sequenced accessions 3 are not described, we used 37 citrus accessions in this analysis. We find that the fruit sizes of mandarins, oranges, grapefruit and pummelos show a strong positive correlation (Pearson correlation coefficient r = 0.88) with pummelo admixture proportion (Extended Data Fig. 5a, b and Supplementary Note 10.1). In addition to fruit size, a pivotal driver of fruit domestication is palatability, a characteristic that in citrus requires low to moderate levels of acidity. In mandarins, palatability appears to be linked to pummelo introgression at a major locus at the start of chromosome 8 (0.3-2.2 Mb), where all nine known palatable mandarins, but none of the four known acidic mandarins, show pummelo admixture in at least part of the genomic region (Extended Data Fig. 3). This locus is also found to be significant in a genome scan for palatability association (Extended Data Fig. 5c, d and Extended Data Table 3) and contains several potentially relevant genes (Supplementary Note 10.2). Among these genes is a gene encoding the mitochondrial NAD + -dependent isocitrate dehydrogenase (IDH) which regulates citric-acid synthesis 41 (Extended Data Table 4).
Our study finds that domesticated citrus fruit crops, such as mandarins and sweet orange, experienced a complex history of admixture, conceptually similar to those well-recognized in annual crops, such as rice 42 and maize 43 , and in other fruit trees, such as apple 44 and grape 45 , for which the current genomic diversity is linked to widespread ancient introgression. Other cultivated citrus groups, the interspecific F1 hybrids in particular, originated from hybridizations of two pure parental species. Several of these involve C. medica (citron), including limes and lemons 10 . A unique and critical characteristic of the three pivotal species (C. maxima, C. reticulata and C. medica) that gave rise to most cultivated citrus fruits is the occurrence of a complex floral anatomy (Extended Data Fig. 6), thus leading to the development of more complex fruit. Other species were also involved in hybridizations, including Fortunella and C. micrantha. Distinct from the mandarin lineages, these hybrids are characterized by their acidic fruit, and their selection must have been made on the basis of other characteristics, such as a sweet edible peel and aroma 2 , respectively.

Conclusion
On the basis of genomic, phylogenetic and biogeographic analyses of 60 diverse citrus and related accessions, we propose that the centre of origin of citrus species was the southeast foothills of the Himalayas, in a region that includes the eastern area of Assam, northern Myanmar and western Yunnan. Our analyses suggest that the ancestral citrus species underwent a sudden speciation event during the late Miocene. This radiation coincided with a pronounced transition from wet monsoon conditions to a drier climate, as observed in nearby areas in many other plant and animal lineages. The Australian citrus species and Tachibana, a native Japanese mandarin, split later from mainland citrus during the early Pliocene and Pleistocene, respectively. By distinguishing between pure species, hybrids and admixtures, we could trace the genealogy and genetic origin of the major citrus commercial cultivars. Both the extensive relatedness network among mandarins and sweet orange, and the association of pummelo admixture with desirable fruit traits suggest a complex domestication process.
Our work challenges previous proposals for citrus taxonomy. For example, we find that several named genera (Fortunella, Eremocitrus and Microcitrus) are in fact nested within the citrus clade. These and other distinct clades that we have identified are therefore more appropriately considered species within the genus Citrus, on a par with those that formerly were referred to as the three 'true' or 'biological' species (C. reticulata, C. maxima and C. medica). Additionally, the related genus, Poncirus, a subject of continuous controversy since it was originally proposed to be within the genus Citrus 12,46 , is clearly a distinct clade that is separate from Citrus based on sequence divergence and wholegenome phylogeny.
In summary, this work presents insights into the origin, evolution and domestication of citrus, and the genealogy of the most important wild and cultivated varieties. Taken together, these findings draw a new evolutionary framework for these fruit crops, a scenario that challenges current taxonomic and phylogenetic thoughts, and points towards a reformulation of the genus Citrus.
Online Content Methods, along with any additional Extended Data display items and Source Data, are available in the online version of the paper; references unique to these sections appear only in the online paper.

MEthODS
Sample collection and sequencing. Whole-genome sequences from a total of 60 accessions were analysed: 58 citrus accessions with different geographical origins and two representative outgroup genera. Twelve of these genomes, including five mandarins, four pummelos, two oranges and a wild Mangshan mandarin (C. mangshanensis) were reanalysed from previous works 2,7 . We also reanalysed 19 genomes from Chinese collections, including 15 unnamed mandarins, 2 Chinese sour oranges, Ambersweet orange and Cocktail grapefruit (a hybrid resembling grapefruit) that have been previously reported 3 . The 30 accessions that were newly sequenced came from citrus germ-plasm banks and collections at IVIA, Valencia, Spain; SRA, Corse, France; UCR, Riverside and FDACS/DPI, Florida and included nine mandarins, two limes, one rough lemon, one grapefruit, one lemon, four citrons, one Australian desert lime, one eremorange, two Australian finger limes, two Australian round limes, one kumquat, one calamondin, one micrantha, one Ichang papeda, one trifoliate orange and one Chinese box orange (Supplementary Note 1).
DNA libraries were constructed using standard protocols with some modifications. Library insert sizes range from 325 to 500 bp. Sequencing was performed on HiSeq2000/2500 instruments using 100-bp paired-end reads. Primary analysis of the data included quality control on the Illumina RTA sequence analysis pipeline (Supplementary Note 2). Variant calls and Citrus species diversity. Illumina paired-end reads were aligned to the haploid Clementine reference sequence 2 and the sweet orange chloroplast genome assembly 47 using bwa-mem 48 . PCR duplicates were removed using Picard. Raw variants were called using GATK HaplotypeCaller 49 with subsequent filtering based on read map quality score, base quality score, read depth and so on (Supplementary Note 3.1).
Interspecific admixtures versus pure citrus species were distinguished based on sliding window analysis of heterozygosity and pairwise genetic distance D (Supplementary Note 4). Genome-wide ancestry informative markers for the progenitor species were derived using pure accessions. Admixture analysis was carried out in sliding windows using ancestry informative markers (Supplementary Notes 5). Citrus relatedness and haplotype sharing. Interspecific phasing was used to extract admixed haplotypes. Identical-by-descent sharing was calculated for each of the non-overlapping sliding windows across the genome and used to estimate coefficient of relatedness among citrus accessions (Supplementary Notes 6, 7). Phylogeny and speciation dating. We used Chinese box orange (genus Severinia) as an outgroup. Time calibration is based on the C. linczangensis 16 fossil from Lincang, Yunnan, China. MrBayes 50 was used for whole genome Bayesian phylogenetic inference, and corroborated with a PhyML 51 reconstructed maximum likelihood tree. A penalized likelihood method 52 as implemented in APE 53 was used to construct the chronogram (Supplementary Note 8). Genome scan of palatability association. We used a mixed linear model as implemented in gemma 54 for a case-control study of citrus acidity and palatability with 37 citrus accessions. A conservative Bonferroni correction was used to select significant genomic loci, with subsequent manual examination of each candidate variant in all accessions to identify most discriminatory loci for fruit palatability (Supplementary Note 10). Data availability. Whole-genome shotgun-sequencing data generated in this study have been deposited at NCBI under BioProject PRJNA414519. Prior resequencing data analysed here can be accessed under BioProject accession numbers PRJNA320985 (mandarins) and PRJNA321100 (oranges), and also under the NCBI Sequence Read Archive accession codes SRX372786 (sour orange), SRX372703 (sweet orange), SRX372702 (low-acid pummelo), SRX372688 (Chandler pummelo), SRX372685 (Willowleaf mandarin), SRX372687 (W. Murcott mandarin), SRX372665 (Ponkan mandarin) and SRX371962 (Clementine mandarin). The Clementine reference sequence used here is available at https://phytozome. jgi.doe.gov/. Figure 3 | Pattern of pummelo introgression in mandarins. a, Distinct admixed pummelo haplotypes among mandarins, oranges and grapefruit are shown in different colours; the C. reticulata haplotypes are masked. The admixture pattern separates the mandarins into three groups, with type-1 representing pure mandarins. Type-2 mandarins contain a small amount of pummelo admixture derived from two C. maxima haplotypes: P1 (light blue colour) and P2 (dark blue), suggesting as few as one common pummelo ancestor in the distant past. Type-3 mandarins are characterized by both marked pummelo admixture and additional pummelo haplotypes besides P1 and P2. b, Haplotype trees for two chromosome segments where pummelo haplotypes of type-2 mandarins are in green. Left, haplotype tree for chr3:3.2-5.2 Mb. Sweet orange, sour orange, and twelve of the sequenced mandarins are interspecific hybrids, and their phased C. maxima and C. reticulata haplotypes are denoted by prepending, respectively, 'p' and 'm' to the corresponding accession codes. The nine type-2 mandarins share the same pummelo haplotype (P1). Right, the haplotype tree for chr2:31.4-33.4 Mb. Two pummelo haplotypes (P1, P2) are shared among seven type-2 mandarins, with Ponkan mandarin containing both P1 and P2. Sweet orange also carries two pummelo haplotypes at this locus, denoted by pC.SWO (shared with Clementine) and pA.SWO (alternate haplotype). Life Sciences Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form is intended for publication with all accepted life science papers and provides structure for consistency and transparency in reporting. Every life science submission will use this form; some list items might not apply to an individual manuscript, but all fields must be completed for clarity.

Replication
Describe whether the experimental findings were reliably reproduced. N/A

Randomization
Describe how samples/organisms/participants were allocated into experimental groups. N/A

Blinding
Describe whether the investigators were blinded to group allocation during data collection and/or analysis.

N/A
Note: all studies involving animals and/or human research participants must disclose whether blinding and randomization were used.

Statistical parameters
For all figures and tables that use statistical methods, confirm that the following items are present in relevant figure legends (or in the Methods section if additional space is needed).

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement (animals, litters, cultures, etc.) A description of how samples were collected, noting whether measurements were taken from distinct samples or whether the same sample was measured repeatedly A statement indicating how many times each experiment was replicated The statistical test(s) used and whether they are one-or two-sided (note: only common tests should be described solely by name; more complex techniques should be described in the Methods section) A description of any assumptions or corrections, such as an adjustment for multiple comparisons The test results (e.g. P values) given as exact values whenever possible and with confidence intervals noted A clear description of statistics including central tendency (e.g. median, mean) and variation (e.g. standard deviation, interquartile range)

Clearly defined error bars
See the web collection on statistics for biologists for further resources and guidance.