Pax3/7 duplicated and diverged independently in amphioxus, the basal chordate lineage

The Pax3/7 transcription factor family is integral to developmental gene networks contributing to important innovations in vertebrate evolution, including the neural crest. The basal chordate lineage of amphioxus is ideally placed to understand the dynamics of the gene regulatory network evolution that produced these novelties. We report here the discovery that the cephalochordate lineage possesses two Pax3/7 genes, Pax3/7a and Pax3/7b. The tandem duplication is ancestral to all extant amphioxus, occurring in both Asymmetron and Branchiostoma, but originated after the split from the lineage leading to vertebrates. The two paralogues are differentially expressed during embryonic development, particularly in neural and somitic tissues, suggesting distinct regulation. Our results have implications for the study of amphioxus regeneration, neural plate and crest evolution, and differential tandem paralogue evolution.

The cephalochordate Pax3/7 locus is highly conserved. We aligned the relevant genomic scaffolds for B. lanceolatum, B. floridae, B. belcheri, and A. lucayanum in mVISTA (Fig. 1a), revealing high levels of conservation of non-coding sequence near the cephalochordate Pax3/7 locus (≥90% identity over the majority of the ~74 kb window shown in Fig. 1). We identified 84 B. floridae/A. lucayanum pairs of CNEs (Conserved Non-coding Elements) previously described by Yue et al., Supplementary File 6 35 within 20 kbs of the B. lanceolatum Pax3/7 locus (Fig. 1a), covering about 12% of the non-coding sequence in this region. No CNE was found to reoccur in this window, implying divergence in the cis-regulatory landscape of the two paralogues.
Scientific REPORTS | (2018) 8:9414 | DOI:10.1038/s41598-018-27700-x Expression of Pax3/7a. In mid-gastrulae (G5, Fig. 3a), Pax3/7a is expressed in a semicircular band in the dorsal endoderm of the blastoporal lip. This expression pattern remains relatively diffuse in the late gastrula (G6/7, Fig. 3b), but by the early neurula (N0, Fig. 3c) the expression domain has become condensed into lines running symmetrically either side of the midline (Fig. 3c, red arrowheads) with enlarged anterior patches, though weak expression persists throughout the posterior. By the hatchling neurula (N1, Fig. 3d), Pax3/7a has diffuse expression with greater concentration in five indistinct, bilaterally symmetrical areas; the anterior mesodermal tissue (black arrow), the anterior end and posterior of the neural tube (white arrows), the postero-lateral somitic tissue (white arrowheads), and the postero-medial notochord tissue (black arrowhead). These expression domains continue with little change through to the mid neurula (N2, Fig. 3e), except for the appearance of a distinct domain of asymmetrical Pax3/7a expression in the anterior (marked throughout by an asterisk placed just posteriorly), which is consistently absent or very weak on the right side. In the late neurula (N3, Fig. 3f), Pax3/7a expression has become condensed into the anterior and posterior mesodermal regions, and into the left anterior somite. Patchy and granular neural regions of expression have also appeared. The asymmetrical domain persists into the early larva (L1, Supplementary Fig. S1a) while the other domains of expression are substantially reduced such that only a few anterior neural and the posterior mesodermal domains are present. This pattern continues in the L2 and L3 larvae ( Supplementary Fig. S1b and c), with faint, patchy neural expression reappearing in the latter stage.
Expression of Pax3/7b. In mid-gastrulae (G5, Fig. 3a), Pax3/7b is expressed in smaller lateral patches in the dorsum in both germ layers. This lateral expression pattern continues in the late gastrula (G6/7, Fig. 3b); by the early neurula (N0, Fig. 3c) the interior lateral borders of the expression domain have become strongly resolved (Fig. 3c, blue arrowheads), though weak medial expression continues. Pax3/7b expression overlaps with Pax3/7a in the posterior regions but with a much weaker signal. By the N1 stage (Fig. 3d), in contrast to Pax3/7a, there are five distinct, symmetrical domains of Pax3/7b expression in the dorsolateral neural tube. These spots are flanked at their anterior and posterior limits by the weaker, more diffuse regions of Pax3/7a expression (white arrows). This expression pattern continues with little change through to the mid neurula (N2, Fig. 3e). By stage N3 (Fig. 3f), the neural regions of expression are reduced in size and number, retaining only the two anterior-most and posterior-most spots, while the strong asymmetrical domain of expression in the left anterior

Discussion
Gene duplication is an important mechanism in evolution, providing a potent source of new genetic material on which evolution can act outside the constraints on single-copy genes. Transcription factors stand out as a particularly important subset of retained and adapted paralogous genes. Paralogue divergence includes subfunctionalisation and neofunctionalisation of binding specificity and motif recognition, upstream regulatory control, and cofactor interaction, which all provide opportunities for more intricate spatiotemporal expression control and the potential for the generation of novel gene regulatory networks and morphology 5 .
The two rounds of whole genome duplication (2R-WGD) at the base of the vertebrate lineage 37 provided an ample source of stoichiometrically-balanced raw genetic material, possibly facilitating the elaboration of vertebrate novelties including the head, neural crest, and neurogenic placodes 6,7 . In contrast, cephalochordate genomes bear no indications of paleopolyploidy events [37][38][39] , and share more similarities in terms of architecture and gene content with the chordate ancestral genome than other extant chordate clades 40 . Cephalochordates therefore have many fewer paralogues than vertebrates, though both RNA-mediated and DNA-mediated duplications have been described. Among the latter, homeobox genes are most numerous; paralogues have been found in Evx 41, 42   small-scale tandem duplications. Of these, only Vent1 and Vent2 have been the subject of detailed functional assays, which established their cis-and trans-regulation in the amphioxus dorsoventral patterning regulatory network 46 and their expression in pharmacologically manipulated embryos 47 .
Our data from three species of Branchiostoma and Asymmetron lucayanum, a representative of the earliest branching of the extant amphioxus genera, support the idea that tandem gene duplication may have been an important mechanism for generating cell type diversity in the cephalochordate ancestor. We report that amphioxus possess two paralogues of Pax3/7, a gene notable for its functions in neural plate border specification, its vertebrate roles in neural crest and placode specification, and for its involvement in somitogenesis, myogenesis and the population of regenerative muscle satellite cells possibly common to all bilaterians 17 . We confirm that this duplication predates the modern cephalochordate radiation but post-dates the divergence from other chordates, implying that the chordate ancestor had a single copy.
One of our key findings is that Pax3/7a and Pax3/7b diverged symmetrically but heterogeneously between duplication and the cephalochordate radiation (Fig. 2). They share very strong nucleotide sequence conservation and 100% amino acid sequence identity in the paired domain, EH1/Octapeptide motif and homeodomain, possibly the result of gene conversion. In contrast, they have diverged substantially in the linker regions, the N-terminus (where Pax3/7b seems to have lost an exon) and the four exons of the C-terminus. The paralogues have changed little since their divergence, both in coding sequence and local CNEs; of the pair, Pax3/7a has changed more since the Asymmetron/Branchiostoma speciation events, indicating it might be under slightly relaxed selection, but has a more prototypical PHT domain 48 (Supplementary Fig. S3; Supplementary File 3, residues 447-467 at positions 801-829), while Pax3/7b is more conserved among species. Pronounced evolutionary asymmetry is common amongst tandem paralogues (reviewed by Holland et al.) 49 , for instance, in AmphiEvx; however, examples in which asymmetry is not observed have also been documented (AmphiEmx).
Although cephalochordates are considered to be slow-evolving, the pattern we observe in paralogue divergence is also consistent with the recent estimate that the crown cephalochordate node dates to only 38.8-46.0 million years ago (MYA), in contrast to previous results placing it ~120-250 MYA (see Igawa et al. 50 and references therein). Based on their calibration date of the cephalochordate/Olfactores split approximately 550 MYA, the duplication, fixation, fate-determination and preservation phases of paralogue evolution (see Innan and Kondrashov) 51 all occurred in the ~500 MYA interval during which no evident radiation occurred. Comparatively rapid change and quicker preservation is considered typical of tandem duplications 5 , although as Pax3/7 genes are transcription factors involved in development, and specifically neurogenesis 52 , their sequence and expression domain change may have been severely constrained.
Symmetry of sequence evolution rate between paralogues is considered indicative of subfunctionalisation 53 . The evolutionary trajectory of cephalochordate Pax3/7 duplicates, based on the symmetry of sequence change evident in Fig. 2, seems to accord with the duplication-degeneration-complementation (DDC) model of Force et al. 3 or the specialisation model of Hughes 2 . According to these models, the duplicated pair, under relaxed purifying selection, accumulates either mutations that complementarily degrade (DDC) or improve (specialisation) their capacity to perform subsets of their pre-duplication function, until the loss of either paralogue is deleterious. The only non-duplicated chordate or deuterostome outgroups for ancestral Pax3/7 function are found in the tunicates and hemichordates. However, both groups have a highly divergent Pax3/7 sequence, and the former of which has a very derived genome and morphology. Consequently, it is difficult to determine the exact set of ancestral functions of the Pax3/7 pro-orthologue in the chordate ancestor. Nevertheless, a conserved role in neural border specification is highly probable, given enrichment of Pax3/7 in lateral neuroblasts in a number of bilaterians 9 .
Although the DNA-binding domains of Pax3/7a and Pax3/7b are identical, it is likely that the differences in the C-terminus and in the linker regions between the conserved domains are sufficient to alter their functionality. Amino-terminal sequence changes have been shown to affect the binding specificity of DNA-binding domains and homeodomains in general 18,54 and Pax genes specifically (reviewed by Mayran et al.) 52 .
Small sequence changes have the potential to differentially modify the binding affinity of the paired domain and homeodomain, the binding modality of the paired subdomains, and subnuclear localisation 56 . The modest differences between Pax3 and Pax7 sequence, located mostly in the C-terminus, are enough to produce substantial differences in target activation in myogenesis 23 (Mayran et al. 55 and references therein). The extent of these substantial functional effects caused by the minor differences in mutants, splice variants and between vertebrate Pax3 and Pax7 is an indication that Pax3/7a and Pax3/7b, which have diverged more than Pax3/Pax7, probably behave differently with regard to target recognition and interaction with cofactors. Such sequence change has been highlighted as an important but under-appreciated mechanism in the evolution of developmental GRNs 57 .
Regardless of putative differences in downstream activity, Pax3/7a and Pax3/7b are expressed differently during gastrulation and neurulation in B. lanceolatum, demonstrating that the paralogues have diverged in their cis-regulation. Pax3/7a and Pax3/7b are expressed in partially overlapping but distinct domains in the neural plate (G5 to N0, Fig. 3a-c, red and blue arrowheads), presumably as the result of modification of an ancestral neural plate domain. Pax3/7a is expressed throughout the dorso-posterior mesoderm prior to neurulation (G5 and G6/7, Fig. 3a,b) while Pax3/7b is restricted to smaller, bilaterally symmetrical dorso-posterior regions in both the mesoderm and ectoderm, consistent with a role in the initial specification of the neural plate border. Distinct lateral lines of expression do appear in Pax3/7a in the late gastrula/early neurula (G7/N0, Fig. 3c), but diffuse expression remains throughout the posterior. By the mid-neurula, the paralogues seem to have switched to a different expression programme, one in which their expression patterns have the least overlap. Particularly notable are the tight, defined neural spots of Pax3/7b and the appearance of the asymmetrical, sinistral domain (the anterior somite 26 ) of expression that first appears in Pax3/7a (left of asterisk throughout, N2, Fig. 3e) and later appears in Pax3/7b (N3, Fig. 3f). As the embryo becomes a larva, the two expression patterns converge until both expression patterns are largely restricted to the asymmetrical domain (L1 and 2, Supplementary Fig. S1a-c). Thus, divergence between duplicate expression patterns increases during gastrulation and early neurulation, peaking at mid-neurula stages, consistent with function partitioning. Although we still know very little about Asymmetron lucayanum developmental gene expression, our data indicate similar results for Pax3/7 paralogues in this species (Supplementary Fig. S2). This is currently the only example in cephalochordates in which a gene duplication event has been shown to predate the divergence of extant lineages and for which expression data exist in more than one genus.
Our results broadly recapitulate previous Pax3/7 expression data from B. lanceolatum (Fig. 3H,I and J of Somorjai et al.) 33 , considering that the latter used a probe with probable cross-reactivity between the 5′ conserved region of Pax3/7a and Pax3/7b. In contrast, the B. lanceolatum expression patterns are not a perfect subset of the Pax3/7(a) domains reported for B. floridae (Fig. 5 of Holland et al.) 26 , who used a similarly cross-reactive probe. Potentially missing from our patterns are the anterior somitic and mesodermal expression (Fig. 5F,G,I and K of Holland et al.) 26 , the distinct anterior neural spot (arrow, Fig. 5K,M,P and Q of Holland et al.) 26 and the larval axial musculature and notochord expression (Fig. 5M,P, and Q of Holland et al.) 26 . Minor discrepancies are not unusual, but significant differences among Branchiostoma species are rare 33 . It is possible that these differences are caused by the general variability between probes for the same target, Pax gene probe cross-reactivity, Scientific REPORTS | (2018) 8:9414 | DOI:10.1038/s41598-018-27700-x or experimental sensitivity. The probes we used were by necessity relatively short in order to limit possible cross-reaction of highly conserved regions, but the expression patterns we observed are highly specific and reproducible, suggesting they reflect the core domains of Pax3/7a and Pax3/7b.
In contrast to what we see in amphioxus, differences between vertebrate Pax3 and Pax7 early developmental expression are much less pronounced, to the extent that they have 'swapped' expression profiles during evolution (see Monsoro-Burq 10 , and references therein). Pax3/Pax7 appear in the neural plate border during neural induction in the early gastrula, and intensify at the lateral edges to mark the dorsal edge of the closing neural tube, a pattern comparable to late gastrula/early neurula expression in amphioxus. Pax3 and/or Pax7 are also expressed throughout the posterior dorsal neuraxis, an approximate analogue of the neural spots in Pax3/7b and later Pax3/7a, though these spots are more spatiotemporally restricted.
While Pax3 and Pax7 appear to play semi-redundant roles in neural development, they diverge in function in vertebrate myogenesis (reviewed by Buckingham & Relaix) 58 . Pax3 acts broadly from the onset of myogenesis in the presomitic mesoderm to the dermomyotome, while Pax7 expression is later and restricted to a dermomyotomal subdomain. These PAX3/PAX7 positive cells form a proliferative muscle progenitor population that eventually positions itself underneath the basal lamina on the muscle fibres. In the adult, these cells become a heterogenous population of quiescent satellite cells; all are maintained by Pax7 expression, but some also expresses Pax3, which is known in this context to be an inadequate substitute, binding 10-fold fewer targets, most of which are also targets of PAX7. During myogenesis, Pax3 and Pax7 seem to be responsible for maintaining the cells in a proliferative/quiescent but undifferentiated state. Lack or cessation of Pax3 or Pax7 expression in a cell can lead to apoptosis or cell cycle exit and muscle differentiation via MyoD, depending on the precise context.
Although the later myogenic roles of amphioxus Pax3/7 genes are yet to be thoroughly characterised, at least one of the paralogues is known to be expressed in adult muscle, as Pax3/7b has been amplified from adult B. belcheri segmental muscle 31 . Whether both paralogues are involved in adult muscle development redundantly, or rather show temporal or tissue-specific patterns of expression (similarly to Pax3 and Pax7 in post-embryonic muscle development and regeneration in mice) is still unclear. Our initial identification of Pax3/7b transcripts in a tail blastema transcriptome clearly identifies a role in the adult regeneration process. However, previous characterization of Pax3/7 in a population of satellite-like cells and the nerve cord during tail regeneration utilized a cross-reactive in situ hybridisation probe 27 . We therefore cannot currently rule out changes in paralogue function during postembryonic processes in amphioxus. Future studies are required to determine to what extent divergence has occurred in expression, downstream targets, and interaction with co-factors in both myogenic and neural contexts.
Amphioxus Pax3/7 has been considered a useful proxy for understanding the properties and deployment of the chordate proto-Pax3/7. Our findings showing independent vertebrate and cephalochordate Pax3/7 duplicationsand the resulting functional and regulatory divergence -offer new insight into genomic constraint/plasticity, and evolvability of gene duplicates and GRNs in different duplication contexts. In amphioxus, tandem duplication and divergence of Pax3/7 has resulted in a subfunctionalisation (and possibly neofunctionalisation) of ancestral neural plate border 9 and muscle-related 17,18 functions, many of which parallel those seen in vertebrate Pax3 and Pax7 following WGD. Dissecting the regulatory landscape of Pax3/7 genes in amphioxus, including the function of the CNEs partitioned between paralogues, should shed further light on genome architecture evolution in chordates.

Conclusions
We show that cephalochordates, which are considered to be a significant outgroup to vertebrates in the study of the evolution of the neural crest GRN, have two Pax3/7 paralogues where it was previously thought that this family was represented by a single-copy gene in these animals. This discovery has implications both for previous and future studies of amphioxus development and regeneration, and for vertebrate studies in which cephalochordates are used as an outgroup. The amphioxus Pax3/7 gene pair also offers a tantalising and tractable example of cis-regulatory and sequence subfunctionalisation after tandem duplication of a developmental transcription factor involved in the development of key chordate features.

Methods
Genomic & transcriptomic analysis. A tBLASTn 59 search of a transcriptome, generated from the pre-amputation and blastemal tissues of a regenerating B. lanceolatum post-anal tail (14 dpa/stage 2 sensu Somorjai et al.) 27 assembled with developmental transcriptomic data from Oulion et al. 60 , for homeodomains selected from HomeoDB 61 retrieved a partial Pax3/7b sequence. Subsequent identification and comparison was done by alignment in Jalview 2.x 62 . The exon structures of Pax3/7a and Pax3/7b were manually predicted with reference to tBLASTn searches of the known sequences against the available genomes: the B. lanceolatum draft assembly (Bl71nemr) (European Amphioxus Genome Consortium), the B. floridae reference genome version 2.0 37 , the B. belcheri draft assembly 38 (HapV2), and the A. lucayanum draft assembly 35 , and used to manually produce diagrams of the gene and protein structure, in reference to domains predicted by the Conserved Domain Database 63  The region of the B. lanceolatum genome represented in the VISTA plot was used as a query for a BLASTn search against the Conserved Non-Coding Elements database presented in Supplementary File 6 of Yue et al. 35 . Matching sequences were retrieved from the CNE database, and the sequences aligned back to the query using MAFFT-addfragments mode 66 . Spreadsheet tools were used to extract positional information and to generate the visualisation of distribution.
Exon positions for various deuterostome Pax3/7 genes were extracted from their NCBI records listed under the accession numbers in Supplementary Table S4

Embryo collection. Adult European amphioxus (Branchiostoma lanceolatum) were collected from
Argeles-sur-mer (France), kept in a semi-closed circulating system at 16.5 °C, and induced to spawn as described previously 72 . Populations of A. lucayanum were collected from Bimini (Bahamas), kept in filtered seawater at 25 °C, and induced to spawn as described previously 35  Cloning and probe synthesis. RNA was extracted from B. lanceolatum embryos fixed at a selection of developmental stages using TRIsure (Bioline) using the supplier's protocol. A. lucayanum embryos fixed in RNAlater (Invitrogen) were transferred to TRIsure and treated similarly. cDNA libraries were produced using the Tetro cDNA Synthesis kit (Bioline). Gene fragments for probe synthesis were amplified by PCR using gene-specific primers (Supplementary Table S2) designed using B. lanceolatum transcriptomic (see above) and genomic sequences from B. lanceolatum and A. lucayanum cDNA. The amplicons were ligated into pGEM-T Easy vector (Promega) and transformed into the XL10-Gold (Stratagene) competent E. coli cell strain using standard heat shock protocols. Selected clones were cultured and extracted using peqGOLD (Peqlab) or Promega plasmid miniprep kits and sequenced for verification using Universal M13F (5′-GTAAACGACGGCCAGT-3′) and M13R (5′-AACAGCTATGACCATG-3′) primers at the University of Oxford Zoology Sequencing Service. Probe template was produced using PCR with M13 primers. Bands were verified using agarose electrophoresis and precipitated using sodium acetate (3 M, pH 5.2) and ethanol. DIG-labelled (Roche) antisense probes were transcribed in vitro using T3 and SP6 enzymes as appropriate, following standard protocols.
In situ hybridisation. Whole mount in situ hybridisation was performed as previously reported 33 . In brief, embryos fixed in 4% paraformaldehyde and stored at −20 °C were rehydrated in NaPBS + 0.1% Tween and permeabilised with proteinase K, followed by post-fixation and acetic anhydride treatment to reduce background. A paralogue-and species-specific DIG-labelled probe was hybridised overnight at 65 °C to target mRNA in the embryos. Excess probe was washed back out through decreasing concentrations of formamide before being treated with RNAses to reduce background. The embryos were blocked against non-specific antibody binding and exposed to alkaline phosphatase-associated anti-DIG antibodies overnight at 4 °C. The embryos were finally stepped into buffer and NBT and BCIP (alkaline phosphatase substrates) introduced.

Equipment and settings.
To capture the images used in Fig. 3 and Supplementary Figs S1 and S2, embryos were mounted in 95% glycerol/5% PBS and examined under a Leitz DMRB microscope (Leica Microsystems) with Nomarski optics. Images were captured using a Retiga 2000R camera in the QCapture software suite (QImaging) and processed in the GNU Image Manipulation Package (GIMP) and Inkscape.