The general organisation of the Asian ichthyofauna has been investigated by various authors (Banarescu and Coad, 1991; Rainboth, 1991; Banarescu, 1992) and the Chinese ichthyofauna has recently been inventoried. The Chang Jiang (Yangtze River) basin harbours 361 species, of which 177 are endemic (Fu et al, 2003) and the Zhu Jiang (Pearl River) 133 freshwater fish species, of which about 33 are endemic (Anonymous, 1981). This ichthyofauna, considered to be very rich (Banarescu, 1992) and underestimated (Perdices et al, 2003), is threatened by hydrological alterations, mainly the Three Gorges Dam (and other dams) especially in the upper reaches. Overfishing has led to a decrease in fish size, mainly in lakes (Cao et al, 1991; Xie et al, 2001), and the biodiversity is thought to be decreasing (Fu et al, 2003). Faced with this danger, fish farming is a solution for biodiversity conservation (Brown and Day, 2002). For this strategy, the use of detailed methods for discriminating intraspecific geographic diversification is necessary. Here, we give an example of a phylogeographic survey of a possible candidate for domestication using methods adapted to difficulties in the field.

The family Cyprinidae is the most widespread and species-rich freshwater fish family in the world. Unfortunately, very little is known about Cyprinidae in China. One of the most common and widespread genera in this family is Zacco, an Asian representative. Together with the genus Opsariichthys, it can be found from the Amur River in Russia to northern Vietnam (Banarescu and Coad, 1991). According to FISHBASE (Froese and Pauly, 2003), the species Z. chengtui, Z. taiwanensis and Z. platypus are also native to China.

The systematics within the genus Zacco is still rapidly changing and may be assisted by genetic studies. For instance, the well-known Japanese dark chub Z. temminckii has been divided into two taxa using genetic data: two mainly sympatric types described using 27 allozyme loci. Since no hybrids have been found at the 13 diagnostic loci, Okazaki et al (1991) proposed that the two types of dark chub are indeed two distinct species, Z. temminckii and Z. sieboldi.

Howes (1980) described significant morphological differences between the genera Zacco and Opsariichthys, and suggested that Z. pachycephalus is a member of Opsariichthys rather than Zacco. Ashiwa and Hosoya (1998) also supported Howes' conclusion. A new key for this genus has recently been produced for Japanese species (Hosoya et al, 2003). Biogeographical research has also been conducted in Taiwan on Z. pachycephalus using 26 allozymic loci. Several highly differentiated groups of populations were discovered, and attributed to separation into different refuges during the most recent glacial period (Wang et al, 1999).

All these examples highlight the necessity for improving the systematics of these species, especially before starting a domestication programme. A multidisciplinary group of Chinese and European scientists, funded by the European Community, have investigated a dozen target species that seem to be promising for fish farming in China. The genus Zacco, and especially the pale chub Z. platypus, is considered to be a target species. It is abundant and widespread in Chinese waters, but otherwise relatively little studied.

On the basis of morphology, Chinese populations of Z. platypus have been considered to form a single species, despite its huge range. However, Perdices et al (2003) presented a different picture in a recent paper written as part of the same scientific network and using most of the range sampled here. The surprising results, based on cytochrome b sequences, indicated that there are at least four undescribed species in this region. Zacco A occupies most of the Hunan-Guangxi regions, in the tributaries of two basins, the Chang Jiang and the Zhu Jiang catchments. Zacco B was recorded in the south-east of the Sichuan province. Zacco C is situated in the Li Shui river, in the Chang Jiang basin (not sampled in this survey). Finally, Zacco D is composed of two samples called here A25 and A37, in the southern tributaries of the Chang Jiang basin.

The aim of this study is the phylogeographic description of Z. platypus in south-west China in order to check this published interpretation, which parallels one or several in Japan and Taiwan in similar species complexes. The clear description of the genetic subdivisions will be of importance in a future programme assessing the comparative performance of populations and for constituting domestic strains. Nuclear markers are used in order to provide estimates of polymorphism, which are not given in the Perdices et al paper (2003) as it was based on only a few (3–11) individuals in each sample.

Since the nuclear DNA of this genus has never been analysed, introns were chosen, as many universal primers are available in the literature. The EPIC PCR (Exon-primed Intro-Crossed) technique was used. Otherwise, without any previous knowledge, allozymes would have been the only other markers able to give information on the nuclear genome without long and expensive preparation (ie genomic bank constitution). Unfortunately, keeping samples frozen in the field was impossible because the infrastructure was not available to consistently supply liquid nitrogen in southern China, and allozyme analysis was abandoned.

This survey can also be considered to test the capacities of a little-used nuclear marker class (introns) to describe geographic structure of a nearly unknown species.

Materials and methods


Sampling trips were organised in three Chinese regions: Hunan Province in March–April 2001 (five sites, A25 to A39), Sichuan Province in August 2002 (three sites, B52 to B54) and Guangxi Province in March 2003 (sites C04 to C55b). The location of the sampling sites is given in Figure 1 and the characteristics of all sampling sites in Table 1. A total of 264 specimens of Z. platypus were collected from 15 localities. Another cyprinid species was used as an outgroup for tree polarisation: a sample of 16 specimens of Opsariichthys bidens. The two species share similar external morphological characters, and both have a vast distribution. However, osteological characters separate them well (Howes, 1980).

Figure 1
figure 1

Map of sampling sites in central west and south China.

Table 1 Sampling details for Z. platypus

The collecting methods are shown in Table 1: when the sample was caught in the wild, it was considered as a part of a natural population; when the sample was bought at a fish market, the exact origin was not certain because the fish could have come from more than one population.

Two main basins were sampled: the Chang Jiang (Yangstze River) in central and east China and the Xi Jiang, the more southerly main tributary of the Zhu Jiang (Pearl River). Specimens were identified morphologically in the field and identification was confirmed at the Swedish Museum of Natural History. All fish analysed were considered to belong to the species Z. platypus (except for the 16 specimens of O. bidens).

Molecular analyses

DNA was extracted from fin tissues using the phenol: chloroform method. PCR reactions were carried out in a total volume of 10 μl, containing 1 μl of 10 × buffer (Promega), 2.5 mM MgCl2, 0.2 mM of each dNTP (Invitrogen), 0.5 μM of each primer (MWG-Biotech AG, labelled with CY5 or Fluorescein), 0.3 U of Taq polymerase (Sigma) and 1 μl of DNA template (at about 150 μg/ml).

Thermo-cycling conditions in an Eppendorf Mastercycler® consisted of an initial denaturation at 94°C for 3 min, followed by 35 cycles of denaturation at 94°C for 1 min, annealing at an appropriate temperature (see Table 2) for 1 min, extension at 72°C for 1 min 20 s, and a final extension at 72°C for 10 min.

Table 2 Informative (polymorphic) intron system characteristics

These intragene sequences are highly polymorphic noncoding regions surrounded by relatively conserved exon sequences from which the primers have been designed. When these universal primers are used, several loci can be amplified, corresponding to duplications of the DNA through ancient polyploidisations, tandem repeats or pseudogenes (see Hassan et al, 2002; Atarhouch et al, 2003). Since several loci can be amplified by a single PCR reaction, it is necessary to verify that the primers actually annealed to their target DNA sequence. The appropriate conditions were assessed in a preliminary experiment using a thermal gradient of annealing temperatures produced by an Eppendorf Mastercycler Gradient® to determine the maximum annealing temperature allowing amplification. When several loci were amplified, we concluded that targeted sequences are the same and show a close match with the complementary universal primers. We call the set of loci amplified by the same pair of primers the ‘system’. Only the length polymorphism of introns amplification was analysed (Figure 2), which means that a part of the variation, such as single base-pair substitutions, is not detected.

Figure 2
figure 2

Mlc2-c intron locus polymorphism showing several diagnostic alleles separating populations of geographic samples of the same nominal Z. platypus species.

In total, 1 μl of PCR mixture from each fish was loaded onto an 8% denaturing polyacrylamide gel (Biorad). The PCR products were visualised with an FMBIO® fluorescent imaging system (Hitachi). Allele sizes were determined using a fluorescently labelled ladder of known size (Promega) with the FMBIO ANALYSIS 8.0 image analyser program.

Statistical analyses

Most statistical analyses were performed using the program GENETIX (Belkhir et al, 1998). This analysis was performed in four steps:

First, a multidimensional analysis was used to obtain the overall structure of the samples. A large part of the distribution range of this species was sampled, so a genetic structure, due at least to isolation between basins, was expected. The best way to determine the overall structure is to keep the individual fish as the unit of information, irrespective of the sample origin. Among the various possible statistical methods, a multidimensional analysis, well suited to binary genotypic data, was used: Factorial Correspondence Analysis (FCA) first proposed by Benzécri (1973). The input matrix must be disjunct and the values discrete. The coding method has been described by She et al (1987). FCA does not make use of data describing the origin of the individuals and can, therefore, demonstrate the existence of unexpected taxa or subgroups, for example, a given sample can consist of several sympatric subgroups. Thus, this stage of analysis is a prerequisite for other calculations.

Second, depending on the samples and possible subsamples, standard parameters were estimated, that is, the allele frequencies, the observed (Ho) and unbiased expected heterozygosity (Hnb) of Nei (1978) and the Fis using the estimator f of Weir and Cockerham (1984).

Null alleles (no amplification) were observed at the Aldo-B2-A, Aldo-C1-A and Mlc-2-b loci. They are labelled ‘001’ in Table 3. The first two loci showed intrasample polymorphism. Since heterozygote genotypes cannot be recognised (null/active heterozygotes appear as active/active homozygotes), we made a back-calculation assuming that the population is in Hardy–Weinberg equilibrium, which is not always the case (this must be considered as an approximation). Starting from the only certain genotype frequency (null/null homozygote), we estimated the null allele frequency to be the square root of the genotype frequency. The recalculated frequencies are indicated in bold characters in Table 3. Note that the three recalculated loci were not used in the calculation of the H and Fis estimations.

Table 3 Allele frequencies measured in 12 intron loci. Allele 001 is a null allele (no amplification)

Third, to fully investigate the population structure, an AMOVA was performed to test the hierarchical stream structure, using only allele frequencies. For this, the ARLEQUIN software (Schneider et al, 2000) was used to describe the distribution of the genetic diversity among different levels of organisation: the basin (here two basins were sampled, the Chang Jiang basin whose samples have been labelled A, and the Xi Jiang basin labelled C) and the subbasins (two subbasins have been considered in the Chang Jiang basin, constituted, respectively, by the samples A25+A31 and A37+A38+A39; four subbasins were considered in the Xi Jiang basin, constituted by the samples C55b, C04+C52, C09 and C45).

Fst values were estimated between samples to illustrate the third and last level of differentiation. For this, the estimator θ of Weir and Cockerham (1984) was calculated and the significance estimated by resampling (5000 permutations) using GENETIX software. Since they had null alleles, the Aldo-B2-A, Aldo-C1-A and Mlc-2-b loci were not included.

Fourth, a phylogenetic tree was constructed based on the Nei's (1978) genetic distances between samples. This distance calculation takes into account the effects of small sample sizes. The Neighbour Joining (NJ) algorithm of the PHYLIP 3.5c software (Felsenstein, 1993) was implemented and trees were displayed with the TREEVIEW 1.40 program for tree construction (Page, 1996). The robustness of the branches was tested by the bootstrap method, using a data set built with the SEQBOOT program from PHYLIP.


Subdivisions within the species Z. platypus

A total of 27 intron systems were tested. Table 2 shows the data for the five systems which were informative (ie including one or more polymorphic loci) and that gave easily interpretable and polymorphic patterns (Figure 2). In total, 12 polymorphic loci were scored and their allele frequencies calculated. The heterozygosity parameters given in Table 3 must be interpreted with caution, as they do not represent the estimated polymorphism of the species, but rather the relative polymorphism of samples.

The FCA analyses (Figure 3a and b) clearly discriminate intraspecific subdivisions. The initial overall analysis (14 Z. platypus samples, plus one Opsariichthys bidens outgroup sample) indicates that five discrete genetic groups can be identified, one of them being the outgroup (Figure 3a). The percentage of inertia explained by the three first axes are 19.6, 17.2 and 8.8%, respectively. Within the diversity of Z. platypus, four groups are isolated on the diagram, which correspond to distinct genetic pools, each showing intragroup similarities:

Figure 3
figure 3

Factorial Correspondence Analysis plot of the first two axes. (a) Analysis of the whole sample; (b) analysis of group 1 only.

Group 1 contains 69% of the total sample. Its geographical distribution ranges from the south of the middle Chang Jiang tributaries up to the middle and low part of the Xi Jiang basin (samples A25, A31, A37 to 39, plus C04, C09, C45, C52 and C55a). No allele specificity has been detected for basins.

A close but distinct group (number 2) is constituted by sample C22 alone. This is the most western sample from the Xi Jiang and its limited representation is probably due to the sampling range that was chosen for this project. One individual from group 1 (station A39) was close to this second group.

The third group is constituted by samples B53 and B54, from the upper Chang Jiang basin.

The fourth group, is composed of the B52 sample. Its position on axis 3 (not shown) is far towards the positive coordinates, whereas the other groups are near the origin of this third axis. Although genetically clearly distinct, groups 3 and 4 are geographically very close (less than 20 km), on the Chishui He subbasin.

Finally, the outgroup sample is clearly distinct.

Figure 3b represents another FCA involving group 1 alone. The three first axes account for 13.6, 9 and 7.8%, respectively, of the total inertia, indicating a weaker structure than in the former FCA. The first axis is preponderant, explaining the main significant structure detected: samples A25 and A37 form a subgroup, which is only slightly overlapping with the remaining individuals of this first group. The differentiation within group 1 is at a lower scale than the intergroup differentiation.

Several levels of organisation have been taken into account. (i) Analysing interbasins differentiation, with AMOVA, the genetic diversity resides mostly within samples (57.7%), rather than between samples of each basin (28.6%) or between basins (13.7%). Between basins, Fct=0.13 (P<0.001 ***) indicates a significant differentiation. (ii) Analysing subbasin structure, no significant Fct values have been observed in each basin (AMOVA). In Chang Jiang basin, between subbasins, Fct=−0.09 (P=0.21, not significant). In Xi Jiang basin, between subbasins, Fct=−0.22 (P=0.60, not significant). (iii) Intersamples differentiation has been tested through Fst calculations (Genetix software). High values in pair-wise comparisons probably reflect ascertainment bias due to selecting only polymorphic loci. All these values were considered to be significantly different from zero using permutation tests, except the three comparisons between samples A38, A39 and C55b, representing the most similar samples.

Phylogenetic structure

Genetic distances summarise the amount of differentiation among samples or taxa. The calculation is not based on all the observed loci but only on polymorphic ones. Consequently, the distances can only be used for this study and not compared to other examples in the literature.

The overall intergeneric Nei (1978) distances is D=1.18. The intergroup distances range from D=0.57 (groups 1 and 2) up to D=1.31 (groups 1 and 4). Note that some intraspecific distances exceed the intergeneric distance.

To visualise the general organisation of the nominal species Z. platypus, an NJ tree was constructed using Nei (1978) distances. Figure 4 is in agreement with the FCA observations. The outgroup is well separated from the genus Zacco. The bootstrap values confirm the significance of most clusters. We consider nodes supported by more than 70% of bootstrap replicates to be highly robust (Zharkikh and Li, 1992; Hillis and Bull, 1993; Lecointre et al, 1994). However, we also consider groups that are supported by more than 50% of bootstrap replicates. In this way, the division Opsariichthys – group 1+2 – group 3 – group 4 – is very robust. The division between groups 1 and 2 is less well supported, which is also reflected in their proximity in the FCA analysis.

Figure 4
figure 4

NJ tree based on Nei's (1978) distances between Z. platypus samples. O. bidens is used as outgroup. In italics are percentage bootstrap values.

Population characteristics

Table 3 shows the allele frequencies observed at the 12 polymorphic loci. A total of 52 alleles were scored. Null alleles were recorded at the Aldo-B2-A, Aldo-C1-A and Mlc-2-B loci. When recalculated, the frequencies are indicated in bold characters.

Unbiased heterozygosity ranges from 0.07 to 0.25. As only polymorphic loci were taken into account and also because no data are available on intron polymorphism in the literature, we do not know if these values are high. Of the previously defined genetic groups, group 1 is the most polymorphic with a mean Hnb=0.17 (10 samples); group 2 Hnb=0.07 (one sample); group 3 Hnb=0.15 (2 samples) and group 4 Hnb=0.16 (1 sample).

Within group 1, the Hunan samples (‘A’) are more polymorphic with Hnb=0.21 while the Guangxi samples (‘C’) show an Hnb=0.13. Within group 1, the diversity is similar between samples caught by scientists or bought at the market (0.18 and 0.16, respectively).

When analysing heterozygote deficiencies (Fis), which give an idea of population cohesion and panmixia, it is surprising to see that most samples are not in equilibrium. This general observation does not depend on fishing/market origin or the genetic group to which the fish belong.


Phylogeographic structure of Z. platypus

A large part of the Chinese distribution of Z. platypus was sampled and analysed. The nominal species is divided into at least four genetic groups differentiated by several diagnostic alleles from the intron data. The main group, group 1, occurs in the central part of the sampling area, suggesting that the genetic groups considered as marginal in this study are probably well represented in other regions, to the west and north of the area sampled.

The second main finding concerns the distribution of group 1, which is well represented in the middle tributaries of the two sampled basins with little differentiation between them.

Analysis of molecular variance (AMOVA and Fst) indicated that most of the genetic variation in group 1 is among samples, although limited yet significant differentiation was detected between basins. No significant Fct has been observed between subbasins. The extensive range of the group 1 taxon, covering an area of about 800 km and at least over two basins, suggests that the speciation occurred in a former riverine network linking the central part of both basins. The present isolation between samples is independent of distance, the differentiation is not significant between subbasins yet is significant between samples of the same subbasin. These patterns may indicate that gene flow was previously high within basins but has become limited, due to climatic change or human influence.

The overall picture does not show a clear differentiation between basins within group 1. It seems probable that differentiation by vicariance is rather slow in this species, unless the limited interbasins differentiation is due to river captures which would have to have occurred recently. The genetic differentiation between the four groups appears to be rather ancient. However, more information on the drainage development of the three sampled provinces would be required for a more detailed explanation of the phylogeography of aquatic organisms in this region.

The geographic proximity of groups 3 and 4 has not led to a corresponding genetic proximity. This pattern suggests reproductive isolation, probably corresponding to species status. Group 1 shows a slight geographic differentiation. Group 2 (sample C22 only), to the west of the group 1 distribution, could also be considered to be a distant subgroup of the group 1. The NJ tree (and therefore the genetic distances) confirms the genetic proximity of groups 1 and 2. These two groups are, however, separated by two diagnostic loci: Aldo-B2-A and Aldo-C1-A, which is not the case for the differentiation withing group 1.

Perdices et al (2003) confirmed the existence of a main genetic group in the sampling area (called ‘Zacco A’=our ‘group 1’) including, among others, our localities A31 and A39. Samples A25 and A37, which are weakly differentiated from the main group according to our nuclear markers (Figure 3b), were strongly divergent according to mtDNA data, forming the ‘Zacco D’ group (which is part of our ‘group 1’). The more clear-cut results may be explained by the greater sensitivity to population isolation of mtDNA, because of its clonal transmission (Avise, 1994). The isolation of populations B53 and B54 was also confirmed by the mtDNA data, forming ‘Zacco B’ (= ‘group 3’ here). Perdices et al (2003) proposed giving species status to the four taxa identified in their study, given the presence of unique haplotypes in each lineage. We agree with this point of view, mainly because the intertaxon genetic distances (eg groups 1 and 3) are similar to the interspecies distances (Zacco-Opsariichthys), although genetic distance can suggest but not establish species status. A morphological analysis of the same fish is currently underway at the Swedish Museum of Natural History to decide the phylogenetic status of the Zacco lineages.

Evolutionary history of the Z. platypus species complex in China

The geographical distribution pattern of the Z. platypus genetic groups is not consistent with the present-day drainage structure. Moreover, the genetic differentiation does not correspond to the geographic distances between samples: genetically very similar samples have been observed at very long distances (samples A31 and C45 for example) and well-differentiated samples (possibly belonging to different species) have been sampled in closely adjacent localities (samples B52 and B53). These observations establish that limited migration and, therefore, isolation by distance, is only a partial structure explanation of the pattern of genetic differentiation.

The genetic structure could also have been shaped by the history of the river network. It is known that the river courses have changed several times in the region (Rainboth, 1991), and especially in the Chang Jiang basin (Xiao et al, 2001). Detailed reconstitution of the changes in surface topology and slopes in the recent and ancient past is necessary to understand the possible connections between basins. This type of reconstruction has been done, for example, to explain the history of a European cyprinid (Persat and Berrebi, 1990), but still needs to be done for China.

By comparing the genetic structure of other fish species in the same region, we can try to corroborate these observations. Another study has been conducted on a closely related cyprinid species, O. bidens (Berrebi et al, in press) sampled in Hunan and Guangxi, using introns. The overall structure of O. bidens is similar to that of Z. platypus, showing several independent lineages (probably species) generally distributed in different rivers. In O. bidens also, genetic structure and river networks are not concordant, indicating that similar forces shaped both species complexes. The O. bidens study detected a main taxon inhabiting the area around the junction between south-middle tributaries of Chang Jiang and north-middle tributaries of Xi Jiang basins. This is clearly the mark of an ancient single shared river network, which became divided into two basins after the Z. platypus and O. bidens main group speciation.

The use of introns in phylogeography

Introns are still infrequently used for population genetics and phylogeography. This kind of marker, the subject of recent publications (Dixon et al, 1996; Villablanca et al, 1998; Bierne et al, 2000; Daguin et al, 2001) is promising for intraspecific structure description; however, basic knowledge is still missing and several technical problems must still be solved.

In this study, we propose several improvements: the PCR thermal gradient is a prerequisite for distinguishing priming on target sequences from priming on random incomplete sequences. This method reduces the number of amplified loci, and the remainder are likely to be homologous. Homologous loci should comprise the intron loci included in the active gene and recently derived silent copies.

One of the questions that needs to be solved is the frequent deficit of heterozygotes (end of Table 3). This deficit has been observed in several nuclear markers such as microsatellites (Aurelle and Berrebi, 2002) and several hypotheses can be proposed to explain it. Selection, the Walhund effect, kin-structure and linkage are classical explanations, but the possibility that the deficit is an artefact must be considered first. One of the possible artefacts is intra-PCR competition. Larger alleles may not be detected because of their relatively inefficient amplification. We have found no published reference demonstrating such a phenomenon on introns, but we investigated this possiblity by plotting the Fis values (heterozygote deficiencies) against the range of allele sizes. Two tests were performed by calculating the Pearson correlation coefficient (i) between the Fis estimations and the allele distribution range (ie the difference between the largest and the shortest alleles) and (ii) between the Fis estimation and the number of alleles. Both coefficients were positive (0.263 and 0.274, respectively) but not significant. We, therefore, have no evidence of this kind of artefact.

Population characteristics and strain constitution

In addition to examining the population structure of Z. platypus, the relationships of lineages and their history, this survey also had an applied objective. The strong structure observed within the nominal Z. platypus species leads us to suggest that at least four strains should be selected for performance comparisons.

Among the sampled populations of each taxon, only group 1 has been sufficiently sampled to provide alternative strains. Samples A31 and A39 are the most polymorphic and give the best security in terms of adaptability of the strain for rearing and stocking. Because the samples of the two sampled basins (Chang Jiang and Xi Jiang) differ slightly (as reflected in their mean polymorphism and the AMOVA), the comparison of performances should also include population C04 or C55b, which are good candidates with a heterozygosity of 0.20.

Sample A25 should also be tested because of its weak differentiation from group 1 and because mtDNA data indicated a strong and ancient divergence from group 1 (Perdices et al, 2003). Groups 2, 3 and 4 should be evaluated after the collection of more samples from neighbouring regions.