Globin E is a myoglobin-related, respiratory protein highly expressed in lungfish oocytes

Globins are a classical model system for the studies of protein evolution and function. Recent studies have shown that – besides the well-known haemoglobin and myoglobin – additional globin-types occur in vertebrates that serve different functions. Globin E (GbE) was originally identified as an eye-specific protein of birds that is distantly related to myoglobin. GbE is also present in turtles and the coelacanth but appeared to have been lost in other vertebrates. Here, we show that GbE additionally occurs in lungfish, the closest living relatives of the tetrapods. Each lungfish species harbours multiple (≥5) GbE gene copies. Surprisingly, GbE is exclusively and highly expressed in oocytes, with mRNA levels that exceed that of myoglobin in the heart. Thus, GbE is the first known oocyte-specific globin in vertebrates. No GbE transcripts were found in the ovary or egg transcriptomes of other vertebrates, suggesting a lungfish-specific function. Spectroscopic analysis and kinetic studies of recombinant GbE1 of the South American lungfish Lepidosiren paradoxa revealed a typical pentacoordinate globin with myoglobin-like O2-binding kinetics, indicating similar functions. Our findings suggest that the multiple copies of GbE evolved to enhance O2-supply in the developing embryo of lungfish, analogous to the embryonic and fetal haemoglobins of other vertebrates. In evolution, GbE must have changed its expression site from oocytes to eyes, or vice versa.

Together, the available evidence is consistent with GbE having a Mb-like function in O 2 supply to the metabolically highly active avian retina 21 . Lungfish (Dipnoi) have received much scientific interest because of their ability to breathe air, their conserved morphology that remained largely unchanged since the Devoian, and their phylogenetically position as closest living relatives of the tetrapods [27][28][29][30] . There are six extant lungfish species that dwell in rivers and (seasonal) freshwater lakes in the tropics 31 . Four species of the genus Protopterus live in Africa, Lepidosiren paradoxa in South America and Neoceratodus forsteri in Australia. Nearly all vertebrates have only a single Mb gene, which is expressed in the skeletal and heart muscles. In striking contrast, the West African lungfish P. annectens harbours at least seven distinct Mb genes with tissue-specific expression patterns 32 . For example, distinct Mb paralogs occur in the heart and skeletal muscle, and highest levels of Mb mRNA were found in the brain. Recombinant paralogous Mb proteins of P. annectens display different O 2 binding affinities and enzymatic activities (J. Lüdemann, A. Fago, T. Burmester, unpublished data). The data suggest that the lungfish Mb paralogs carry out distinct functions and that the Mb genes evolved by neofunctionalisation and/or subfunctionalisation after multiple gene duplications.
In the present study, we demonstrate the occurrence of multiple GbE genes in three lungfish species, which are expressed almost exclusively in the ovary. Together with the biochemical analyses of recombinant lungfish GbE, we propose that GbE may have an Mb-like role in O 2 supply in the ovary of lungfish during development. Our findings add a novel level of complexity to the studies of globin evolution and function.

Results
Identification of GbE genes in lungfish species. An assembly of transcriptomes of the South American lungfish L. paradoxa (Supplemental Information Table 1) revealed several globin genes. TBLASTN and BLASTN searches identified two Hb α, three Hb β, one GbX, one GbY, five Mb cDNA sequences, as well as multiple contigs that had resembled the sauropsid GbE (Supplemental Information Figs 1 and 2). No sequences that matched Ngb, Cygb, or Adgb were found in the available transcriptomes of L. paradoxa. The GbE cDNA sequences were verified by backmapping of the reads, reassembled if required, and finally revealed six distinct GbE transcripts, which were named LpaGbE1 and LpaGbE2a to e. (Fig. 1). We should note that the nomenclature is provisional until a full representation of lungfish GbE genes is achieved. LpaGbE1, 2a, and 2c could be further verified by RT-PCR and sequencing. However, some sequences obtained by RT-PCR could not be assembled from the Illumina reads (which derived from a different specimen), indicating either technical issues such as hybrid sequences generated by the PCR step, or biological causes such as different alleles in the population, along with gene conversion or crossing over among the closely related GbE genes. Within the coding region of 456 or 459 bp, respectively, the six LpaGbE sequences differ between 2.6 and 34.8%. After translation, the differences were 4.7 to 41.4% on the amino acid level. The largest difference was found between LpaGbE1 and LpaGbE2b.
We assembled the publically available transcriptomes of the West African lungfish P. annectens 33 and searched them for GbE genes. We found five distinct GbE cDNA sequences, which differ between 1.8 and 27.2% on the nucleotide and between 1.3 and 36.2% on the amino acids level. Because the sequences form two clades in phylogenetic analyses (see below), they were named PanGbE1a and b, and PanGbE2a-c, respectively.
Additionally, we generated 60,785,122 Illumina reads (150 nt, paired-end) from total RNA extracted from the ovary of the marbled lungfish P. aethiopicus. The reads were assembled and searched for GbE sequences, of which seven distinct cDNA sequences were identified. The sequences differ 2.4 to 27.7% on the nucleotide and 2.0 to 41.1% on the amino acid level. The sequences were named PaeGbE1a-c, and PaeGbE2a-d, respectively, based on the position in the phylogenetic tree (see below).

Figure 1.
Comparison of the GbE and Mb amino acid sequences. The GbE sequences of the South American lungfish L. paradoxa (LpaGbE1-2e), the chicken (GgaGbE) and sperm whale Mb (PcaMb) were aligned. The secondary structure of sperm whale Mb is superimposed in the upper row, with α-helices designated A through H; the globin consensus numbering is given below the sequences. Strictly conserved residues are shaded in grey. The conserved globin residues, i.e. the proximal His F8 and the distal His E7, as well as the Phe CD1 are in boldface.
SCIentIfIC REpORTs | (2019) 9:280 | DOI:10.1038/s41598-018-36592-w Conservation and lungfish-specific amplification of GbE genes. Our studies resulted in a total of 18 novel GbE genes from three lungfish species with 151 or 152 amino acids. An alignment with the known GbE amino acid sequences of birds, turtles and coelacanth showed that all lungfish GbEs carry an insertion of six amino acids in the region between helices C and D, a deletion of four amino acids in the GH interhelical region, and are five amino acids shorter (Fig. 1). The maximum divergence within lungfish GbE amino acid sequences was 42.1%. The divergence of lungfish and coelacanth GbE was between 42.1 and 52.8% of the amino acid. Sauropsid and lungfish GbE amino acid sequences differed by up to 56.4%. No other globin is more closely related to the lungfish GbE sequences. Phylogenetic analyses using GbE and other globin amino acid sequences (Supplemental Information Fig. 3) confirmed previous studies, which found a relationship of GbE and Mb, although the support was not particularly high (0.61 posterior probability) ( Fig. 2; Supplemental Information Fig. 4). Within the GbE clade, the overall topology of the tree followed the accepted relationships among gnathostome taxa. The coelacanth GbE formed the sister group of all other GbE sequences. Sauropsid and lungfish GbEs were two separate clades. The lungfish GbEs fell into two clades, of which one was formed by the GbE1 sequences, the other one by the L. paradoxa  Table 3). The cumulative RPKM value of all six GbE genes in the ovaries was 19,942. For comparison: The cumulative RPKM of all five Mb genes was 562 in skeletal muscle and 2,017 in the heart (Supplemental Information Table 3). The total levels of Mb mRNA in ovaries amounted to 5.92 RPKM. To validate the ovary-specific expression of GbE, quantitative real-time RT-PCR was carried out with L. paradoxa ovaries and other selected tissues (Fig. 4). Again, GbE1 mRNA was exclusively detected in the ovaries, but not in other tissues, including the eye. We further analysed the transcriptomes of the West African lungfish P. annectens 33 . These transcriptomes included male and female gonads, along with brain and liver transcriptomes. The five GbE genes were found highly expressed in the female gonads, with cumulative RPKM of 31,916 to 37,275 (Supplemental Information Fig. 8; Supplemental Information Table 4). In the male gonads, expression of GbE was at least 1,800-fold lower with cumulative RPKM of 0.85 and 17.69. In other tissues, cumulative RPKMs of GbE were <4. Except in the gonads, there was no difference in GbE levels between other male and female organs. In P. annectens, we also found an Adgb cDNA, which is expressed at low levels mainly in the mature male gonads (Supplemental   Table 5). We further identified GbE proteins in the ovaries of L. paradoxa and P. aethiopicus by mass spectrometry. After separation of ovary proteins by SDS-PAGE, prominent bands with the expected GbE mass of ~15 kDa were detected (Supplemental Information Fig. 10). Mass spectrometry identified in these bands the proteins LpaGbE1-2e in the L. paradoxa samples, and PaeGbE1a, b, c, and PaeGbE2 a, b in the P. aethiopicus samples, respectively (Supplemental Information Fig. 2).
To check whether the oocyte-specific expression was overlooked in previous studies with birds 20,21 , we evaluated the ovary-and egg-specific transcriptomes of chicken (Gallus gallus) (Supplemental Information Table 1). In none of these transcriptomes, GbE sequences were found (Supplemental Information Fig. 11; Supplemental Information Table 6). We randomly checked further ovary transcriptomes of other vertebrate species, as available at SRA, for putative GbE sequences via BLAST. We included all vertebrate classes, but in none of these datasets, GbE sequences were detected. A comprehensive BLAST search of the available transcriptomes (including the transcriptome shotgun assemblies; TSA) or genomes at Genbank or ENSEMBL identified GbE only of birds, turtles, and the coelacanth.

Localisation of GbE mRNA in previtellogenic oocytes.
To determine the spatial expression on GbE1 of L. paradoxa (LpaGbE1) we performed in situ hybridisations in adult gonads of a female South American lungfish. Hematoxylin-eosin stained sections of the ovary show oocytes in various stages of maturation (Fig. 5A,B). Oocyte development in L. paradoxa consists of an initial stage of previtellogenic oocytes, characterised by a basophilic cytoplasm. Next continuous yolk deposition results in a rapid increase in cellular volume in vitellogenic oocytes. Flattened follicular cells surround the vitellogenic oocyte, and acidophilus yolk granules cortically localized gradually expand into the cytoplasm. Mature oocytes measure about 2 mm 34 . LpaGbE1 antisense probe signal was Spectroscopic studies and O 2 binding equilibria to GbE1 of L. paradoxa. In size-exclusion chromatography, recombinant LpaGbE1 elutes largely as a monomer (data not shown). The absorbance spectrum of purified LpaGbE1 displayed a Soret band at 406 nm, an α band at 533 nm and a β-band at 579 nm ( Fig. 6A), indicating a mixture between ferric and ferrous oxy forms. For comparison, a pure Mb oxy spectrum displays peaks at 418, 543 and 581 nm 35 . After reduction with Na-dithionite under nitrogen, the ferrous deoxy-form was obtained, with a large amplitude of the Soret band (427 nm) and a single peak in the visible region (555 nm

Discussion
RNA-seq and qRT-PCR studies showed that GbE is almost exclusively expressed in the lungfish ovary (Figs 3 and 4). More detailed studies by in situ hybridisation found that LpaGbE1 mRNA is restricted to non-mature previtellogenic oocytes, but was not found in other cells of the ovary, such as follicular cells, and was also not detected in vitellogenic oocytes (Fig. 5). The previtellogenic oocytes have not yet commenced accumulating yolk and other material. Thus, GbE mRNA is massively deposited in the oocyte and, as suggested by the mass spectrometric data, translated into GbE protein (Supplemental Information Fig. 10). Here, GbE may carry out its respiratory function either in the oocyte or may be used to support embryonic development.
After external fertilisation, males provide parental care to the fertilised eggs in underground burrows 36 . Stagnant waters where Lepidosiren nests have been found are characterised by low oxygen levels, ranging from 0.2 to 1 cm 3 per litre of dissolved O 2 37 . Fertilized Lepidosiren eggs can reach 7 mm in diameter and, given the low oxygen conditions of the nests, it was unclear how Lepidosiren eggs obtained sufficient oxygen supply to sustain development. Lepidosiren males develop pelvic fin filaments during the breeding season, which break off and degenerate after larvae hatch and leave the nest 36 . Initially, it was suggested that these filaments could contribute to copulation as spawning brushes to spread seminal fluid. However, given the overall similarity between pelvic fin filaments and external gills, it was also proposed that pelvic fin filaments were used to aerate the eggs by releasing oxygen from the male's blood into the water 38 . Recently, however, it was shown that Lepidosiren pelvic fin filaments neither have the morphology compatible for oxygen diffusion nor the gene expression profile typical of gill filaments 39 . Therefore, it is conceivable that GbE may help to extract O 2 from the water to support the development of the embryo, which may be considered analogous to the function of the embryonic Hb in tetrapods 3 . In contrast to the lungfish Mb genes, which functionally diversified after duplication 32 , the multiple GbE genes most likely encode proteins with similar functions. The presence of multiple palogous GbE genes thus enhance GbE mRNA levels, as evident by the very high RPKM values between 20,000 and 63,000 (depending on the species). These values exceed the RPKM of Mb in the muscle or heart ( Fig. 3; Supplemental Information Figs 5-7; Supplemental Information Table 3) and are even higher than those of Hb in blood. Although we do not know the protein levels, the data suggest that GbE has an Mb-like role and contributes to the O 2 supply of the oocytes. This hypothesis is supported by the notable GbE protein bands in the Coomassie-stained gels (Supplemental Information Fig. 10), and by the O 2 -binding equilibrium curve, with a P 50 similar to that of a typical vertebrate Mb 40 , as well as by the low autoxidation rate of GbE. In addition, deoxy LpaGbE1 shows nitrite reductase activity similar to that of vertebrate Mbs 40 , indicating that GbE may also contribute to nitrite-dependent NO generation and signalling pathways in the ovary during periods of hypoxia. Other functions of GbE, for example as a vitellogenin-like storage protein, cannot be excluded but are less likely.
The globin family is a classic example of subfunctionalisation and neofunctionalisation of genes after duplication 41 . This notion is supported by the existence of eight distinct globin genes in vertebrates, which emerged early in evolution 1 . More recent and lineage-specific amplification events of members of the globin family are mirrored by the multiple Mb 32 and GbE genes (this study) of lungfish, which is unusual among vertebrates. The phylogenetic tree (Fig. 2) showed that amplification of GbE genes commenced already in the lungfish stem lineage (i.e., separation of the GbE1 and GbE2 clades), but then occurred independently within the orders Lepidosiren and Protopterus. Lungfish are well-known for their gigantic genomes, which may, in theory, also result in the amplification of gene numbers. This explanation is unlikely for the globin genes. We only observed multiple copies of Mb and GbE, but of no other globin gene. In fact, lungfish have lost Ngb and Cygb genes. It is possible that their functions have been taken over by Mb copies 32 . The GbE genes seem to have an oocyte-specific function in lungfish, which is -to the best of our knowledge -not mirrored by any other globin-type in another vertebrate.
Although additive effects of the expression of multiple Mb genes in lungfish may be important to increase O 2 supply, it is more likely that the different Mbs carry out distinct functions. This hypothesis is supported by the distinct expression patterns of the P. annectens Mbs, as well as by their different abilities to transfer stress tolerance 32 and O 2 binding properties (J. Lüdemann, A. Fago, T. Burmester, unpublished). By contrast, we propose that the multiple GbE genes have additive functions. This is evident by their expression in a single tissue, the ovary, as well as by the high similarity of the sequences. While Mb amino acid sequences may differ up to 70%, the maximum divergence of the lungfish GbEs is 40%. We further propose that lungfish GbE has a similar function in the ovary There is little doubt that the globins originated from a single ancestor, and multiple gene duplications have led to their functional diversification 1,22,26 . Phylogenetic analyses suggest that all eight globin types were present in the gnathostome ancestor and that subsequent losses in certain clades have shaped the present globin repertoire of the vertebrate taxa. Globin E has been lost in all vertebrate clades except birds, turtles, lungfish, and coelacanth (Fig. 7). Although database searches did not reveal any other GbE sequences, we cannot rule out that in future additional GbE genes will be discovered that are expressed at unexpected sites.
The present data suggest that GbE has a similar role as Mb in cellular O 2 supply. Mb is still considered to exert its function mainly in muscle, but at least in non-tetrapods it displays a more widespread expression pattern in many tissues 19,[42][43][44] . GbE and Mb share an ancient common ancestry (Fig. 7). It is possible that GbE originated as a variant of Mb that meets the particular needs (for example, regarding O 2 affinity) for a specific tissue. It remains unknown whether the original function was in the oocyte (like in lungfish) or the eye (like in sauropsids). In future, the coelacanth may be helpful to decide on the direction of evolutionary change but none the currently available transcriptomes of this taxon derives from eye or ovary. However, it is evident that either in the lungfish or the sauropsid clade, GbE must have changed its expression site. For unknown reasons, GbE was independently lost in at least five vertebrate clades 23 (Fig. 7).

Methods
Lungfish material. A L. paradoxa specimen was collected near Breves, Brazil, and euthanised with a lethal dose of tricaine methanesulfonate. This study was approved by IBAMA/SISBIO under license number 47206-1, and experimental procedures and animal care were conducted in accordance to the Ethics Committee for Animal Research at the Universidade Federal do Pará, under the approved protocol number 037-2015.
West African lungfish (P. annectens) and marbled lungfish (P. aethiopicus) were obtained from a pet shop, euthanised in 1 g/l tricaine methanesulfonate and finally killed by decapitation. Tissues were removed and immediately stored frozen at −80 °C in RNAlater (Qiagen, Hilden, Germany). Animals were treated in accordance with the German Animal Welfare Act.

RNA extraction, library preparation and Illumina sequencing.
Total RNA was purified from lung, brain, buffy coat and ovary of South American Lungfish (L. paradoxa) and extracted for transcriptome using TRIzol ® Reagent (Life Technologies, Cat. 15596-026) according to the manufacturer's protocol. RNA samples were further purified using RNeasy ® Mini Kit (Qiagen) and treated with DNaseI (Qiagen), according to the manufacturer's protocol. Reference transcriptome and transcript abundance estimation were obtained from each library, sequenced on an Illumina Hiseq platform with 100 bp paired-end reads. Sequencing was carried out by a commercial service from Instituto Nacional do Câncer, Brazil.
RNA was purified from the ovary of the marbled lungfish (P. aethiopicus) using the CRYSTAL RNA Mini Kit (biolab products). A library for paired-end sequencing was generated from ~1 µg total RNA. Sequencing of 2 × 150 nt was performed by with the Illumina NextSeq500 technology (StarSEQ, Mainz, Germany).
Additional transcriptomes from various lungfish and other species were retrieved from the public SRA database at GenBank (for accession numbers, see Supplemental Information Table 1). The transcriptomes from each lungfish species were assembled using either the CLC Genomics Workbench (version 11.0.1) or Trinity v2.6.5. Globin cDNA sequences were identified in the assemblies employing TBLASTN using the coelacanth globins 24 as queries. The final assignment of a globin to a specific clade was done by phylogenetic analyses (see below). The consensus sequences were verified using a backmapping approach using the CLC Genomics Workbench. Broken mate-pairs were employed to identify putative misassemblies. If required, the reads were re-assembled with different parameters, and the procedure was repeated until unambiguous GbE cDNA sequences were obtained. Selected GbE sequences were verified by RT-PCR and Sanger sequencing of the cDNA (GATC, Konstanz, Germany). RNA-Seq analyses were performed with the CLC Genomics Workbench. The mRNA levels of the globins were calculated as RPKM.

Sequence analyses and phylogenetic inference. The vertebrate genomes available at ENSEMBL
(https://www.ensembl.org/) and NCBI (https://www.ncbi.nlm.nih.gov/genome/) were searched for the presence of GbE genes. The lungfish GbE amino sequences were included in an alignment that covers the whole range of vertebrate globins and a broad range of classes 25,32,45 (Supplemental Information Table 2). Adgb was excluded from the phylogenetic analyses because of its permutated globin domain 13 . A multiple sequence alignment of the amino acid sequences was obtained with the MAFFT online tool with the L-INS-i method 46,47 . Phylogenetic analysis was done with MrBayes 3.2.3 48,49 using the LG model of amino acid evolution 50 , which was selected using PROTTEST 51 , and which was implemented into MrBayes with the general time reversible model as fixed prior and by specifying the aarevmatpr and statefreqpr options 24,25 . The program was run for 5,000,000 generations using the standard option (two independent runs with four simultaneous chains). Trees were sampled every 1000 th generation, and the posterior probabilities were estimated after discarding the initial 25% of the trees.  , which were grown at 37 °C in 5 ml L-medium (1% bactotryptone, 0.5% yeast extract, 0.5% NaCl, pH 7.5) containing 10 µg/ml ampicillin, 34 µg/ml chloramphenicol over night. The culture was applied to 500 ml L-medium supplemented with 1 mM δ-aminolevulinic acid. The culture was induced at OD 600 = 0.4 to 0.8 by the addition of isopropyl-1 thio-D-galactopyranoside (final concentration 0.4 mM), and expression was continued at 30 °C overnight. Cells were collected by 45 min centrifugation at 4,000 g and resuspended in 50 mM Tris-HCl, pH 8.0, 1 mM MgCl 2 , 1 mM dithiothreitol, 10 μg/ml DNase, 5 µg/ml RNase, Complete TM proteinase inhibitor mix (Roche Applied Science) and Pefabloc (Roth). The cells were broken by three freeze-thaw cycles in liquid nitrogen followed by ultrasonication (10 × 30 s). DNA and RNA were digested for 2 h at 37 °C. The cell debris was removed by centrifugation for 1 h at 4 °C at 4,500 × g. The recombinant GbE1 was purified from the supernatant using His60-Ni columns (Qiagen) according to the manufacturer's instructions. The His-tag was removed by incubation with the Factor Xa protease (20 µg/ml) for 6 h at 37 °C. After inactivation of the protease with 2 µM dansyl-glu-gly-arg-chloromethyl ketone, the recombinant GbE1 protein was brought to 10 mM Hepes, pH 7.8, 0.5 mM EDTA. Gel-filtration was carried out with an ÄktaPure chromatographic system (GE Healthcare, Freiburg, Germany) equipped with a Superdex 75 10/300 column (GE Healthcare). The proteins were eluted with 50 mM potassium phosphate, 0.5 mM EDTA, pH 7.0, 0.15 M NaCl at a flow rate of 1 mL/min. The absorbance was measured at 280 and 415 nm. Recombinant GbE1 was applied at a concentration of 0.2 mM heme; Human Hb and horse Mb were employed as references at the same concentration.
Spectroscopic studies and O 2 binding curves. Absorbance spectra of purified GbE1 were taken in the range 350-650 nm. The deoxy-form was obtained by adding sodium dithionite. The determination of O 2 equilibrium curves was done using a modified diffusion chamber technique previously described [54][55][56] . Samples were measured in at least duplicates of 5 µL (~0.2 mM heme) in 0.1 M Hepes, 0.5 mM EDTA, pH 7.2 at 20 °C. Ferric GbE1 was reduced for 5 to 10 min in N 2 with the met-Hb reductase system 57 .
In the modified diffusion chamber technique, water-saturated gas mixtures of O 2 and ultrapure (>99.998%) N 2 were generated by GMS 500 gas mixing system (Loligo System, Denmark) and used to equilibrate the thin smear of 5 µL GbE1 sample with stepwise increments of oxygen tension (P O2 ). Absorbance traces were sampled at 436 nm by a photomultiplier (model RCA 931-A) and an Eppendorf model 1100 M photometer. The signal was digitalised, and saturation values were obtained using an in-house software 58 . P 50 and cooperativity values were calculated from the zero intercept and slope of Hill plots, respectively. Each curve consisted of four to six saturation steps. adding nitrite, the GbE1 was anaerobically titrated with 1 mM of sodium dithionite in a 1 cm cuvette sealed with a rubber cap. The measurement of the reaction kinetics was started immediately, and absorbance traces were recorded at 435 nm and 20 °C.