Introduction

A constant supply of oxygen (O2) is essential for aerobic organisms. The transport and storage of O2 in vertebrates are mediated by proteins that are members of the globin superfamily1. Some globins may also have other functions and are, for example, involved in the detoxification of reactive O2 species (ROS), NO metabolism, or signaling1,2. The best-known vertebrate globins are haemoglobin (Hb), which is a heterotetramer that transports O2 in the blood3, and myoglobin (Mb), which is a monomer in the heart and the skeletal muscles, where it facilitates the diffusion of O2 and enhances O2 storage4. Within recent years, six additional globins have been identified in vertebrates1. The function of neuroglobin (Ngb), which resides mainly in the nervous system5, is still uncertain6,7. There is evidence that Ngb plays a role in oxidative metabolism8,9. Cytoglobin (Cygb) is expressed in fibroblast-related cell types and some populations of neurons10,11,12. Cygb may supply O2 to specific enzymes and may detoxify ROS7. Androglobin (Adgb) expression is restricted to the testis13. While Hb, Mb, Ngb, Cygb, and Adgb occur in most vertebrates, the occurrence of the globins E, X, and Y (GbE, GbX, and GbY) is restricted to certain taxa. GbX emerged very early in the evolution of Metazoa but is – in vertebrates – only present in non-tetrapods, amphibians and some reptiles14,15. The GbX protein is bound to the cell membrane via N-terminal acylation16,17, where it may protect the cells from ROS18. GbY has an unknown function in some “basal” ray-finned fishes, amphibians, reptiles, and platypus, where it is broadly expressed at low levels14,19.

GbE was initially found in the eye of chicken20 and was additionally identified in the genomes of other birds21,22,23, the coelacanth24 and turtles25. Gene synteny and phylogenetic analyses suggest that Mb is the closest related globin type of GbE, although the divergence of these genes must have occurred before the radiation of the gnathostome classes21,22,23,26. Immunohistochemistry and quantitative realtime RT-PCR (qRT-PCR) studies showed that GbE is highly and almost exclusively expressed in the eye (thus its name) of chicken and turtles25. Estimates of total protein levels were ~10 µM GbE in the chicken retina, which is in the range of Mb in striated muscle cells21. Together, the available evidence is consistent with GbE having a Mb-like function in O2 supply to the metabolically highly active avian retina21.

Lungfish (Dipnoi) have received much scientific interest because of their ability to breathe air, their conserved morphology that remained largely unchanged since the Devoian, and their phylogenetically position as closest living relatives of the tetrapods27,28,29,30. There are six extant lungfish species that dwell in rivers and (seasonal) freshwater lakes in the tropics31. Four species of the genus Protopterus live in Africa, Lepidosiren paradoxa in South America and Neoceratodus forsteri in Australia. Nearly all vertebrates have only a single Mb gene, which is expressed in the skeletal and heart muscles. In striking contrast, the West African lungfish P. annectens harbours at least seven distinct Mb genes with tissue-specific expression patterns32. For example, distinct Mb paralogs occur in the heart and skeletal muscle, and highest levels of Mb mRNA were found in the brain. Recombinant paralogous Mb proteins of P. annectens display different O2 binding affinities and enzymatic activities (J. Lüdemann, A. Fago, T. Burmester, unpublished data). The data suggest that the lungfish Mb paralogs carry out distinct functions and that the Mb genes evolved by neofunctionalisation and/or subfunctionalisation after multiple gene duplications.

In the present study, we demonstrate the occurrence of multiple GbE genes in three lungfish species, which are expressed almost exclusively in the ovary. Together with the biochemical analyses of recombinant lungfish GbE, we propose that GbE may have an Mb-like role in O2 supply in the ovary of lungfish during development. Our findings add a novel level of complexity to the studies of globin evolution and function.

Results

Identification of GbE genes in lungfish species

An assembly of transcriptomes of the South American lungfish L. paradoxa (Supplemental Information Table 1) revealed several globin genes. TBLASTN and BLASTN searches identified two Hb α, three Hb β, one GbX, one GbY, five Mb cDNA sequences, as well as multiple contigs that had resembled the sauropsid GbE (Supplemental Information Figs 1 and 2). No sequences that matched Ngb, Cygb, or Adgb were found in the available transcriptomes of L. paradoxa. The GbE cDNA sequences were verified by backmapping of the reads, reassembled if required, and finally revealed six distinct GbE transcripts, which were named LpaGbE1 and LpaGbE2a to e. (Fig. 1). We should note that the nomenclature is provisional until a full representation of lungfish GbE genes is achieved. LpaGbE1, 2a, and 2c could be further verified by RT-PCR and sequencing. However, some sequences obtained by RT-PCR could not be assembled from the Illumina reads (which derived from a different specimen), indicating either technical issues such as hybrid sequences generated by the PCR step, or biological causes such as different alleles in the population, along with gene conversion or crossing over among the closely related GbE genes. Within the coding region of 456 or 459 bp, respectively, the six LpaGbE sequences differ between 2.6 and 34.8%. After translation, the differences were 4.7 to 41.4% on the amino acid level. The largest difference was found between LpaGbE1 and LpaGbE2b.

Figure 1
figure 1

Comparison of the GbE and Mb amino acid sequences. The GbE sequences of the South American lungfish L. paradoxa (LpaGbE1–2e), the chicken (GgaGbE) and sperm whale Mb (PcaMb) were aligned. The secondary structure of sperm whale Mb is superimposed in the upper row, with α-helices designated A through H; the globin consensus numbering is given below the sequences. Strictly conserved residues are shaded in grey. The conserved globin residues, i.e. the proximal His F8 and the distal His E7, as well as the Phe CD1 are in boldface.

We assembled the publically available transcriptomes of the West African lungfish P. annectens33 and searched them for GbE genes. We found five distinct GbE cDNA sequences, which differ between 1.8 and 27.2% on the nucleotide and between 1.3 and 36.2% on the amino acids level. Because the sequences form two clades in phylogenetic analyses (see below), they were named PanGbE1a and b, and PanGbE2a-c, respectively.

Additionally, we generated 60,785,122 Illumina reads (150 nt, paired-end) from total RNA extracted from the ovary of the marbled lungfish P. aethiopicus. The reads were assembled and searched for GbE sequences, of which seven distinct cDNA sequences were identified. The sequences differ 2.4 to 27.7% on the nucleotide and 2.0 to 41.1% on the amino acid level. The sequences were named PaeGbE1a-c, and PaeGbE2a-d, respectively, based on the position in the phylogenetic tree (see below).

Conservation and lungfish-specific amplification of GbE genes

Our studies resulted in a total of 18 novel GbE genes from three lungfish species with 151 or 152 amino acids. An alignment with the known GbE amino acid sequences of birds, turtles and coelacanth showed that all lungfish GbEs carry an insertion of six amino acids in the region between helices C and D, a deletion of four amino acids in the GH interhelical region, and are five amino acids shorter (Fig. 1). The maximum divergence within lungfish GbE amino acid sequences was 42.1%. The divergence of lungfish and coelacanth GbE was between 42.1 and 52.8% of the amino acid. Sauropsid and lungfish GbE amino acid sequences differed by up to 56.4%. No other globin is more closely related to the lungfish GbE sequences.

Phylogenetic analyses using GbE and other globin amino acid sequences (Supplemental Information Fig. 3) confirmed previous studies, which found a relationship of GbE and Mb, although the support was not particularly high (0.61 posterior probability) (Fig. 2; Supplemental Information Fig. 4). Within the GbE clade, the overall topology of the tree followed the accepted relationships among gnathostome taxa. The coelacanth GbE formed the sister group of all other GbE sequences. Sauropsid and lungfish GbEs were two separate clades. The lungfish GbEs fell into two clades, of which one was formed by the GbE1 sequences, the other one by the L. paradoxa GbE2a-e along with the GbE2 proteins of the genus Protopterus.

Figure 2
figure 2

Bayesian phylogenetic tree of vertebrate globins. Tree reconstruction was carried out with the amino acid sequences assuming the LG model. The bar represents 0.1 substitutions per site. The numbers at the nodes are posterior probabilities. The different globin clades – except GbE – have been collapsed; the full tree is given in Supplemental Information Fig. 4. The species abbreviations are: Apl, Anas platyrhynchos; Cli, Columba livia; Cmy, Chelonia mydas; Cpi, Chrysemys picta bellii; Fal, Ficedula albicollis; Fpe, Falco peregrinus; Gfo, Geospiza fortis; Gga, Gallus gallus; Lch, Latimeria chalumnae; Lpa, Lepidosiren paradoxa; Mga, Meleagris gallopavo; Mun, Melopsittacus undulatus; Pae, Protopterus aethiopicus; Pan, Protopterus annectens; Phu, Pseudopodoces humilis; Psi, Pelodiscus sinensis; Tgu, Taeniopygia guttata; Zal, Zonotrichia albicollis.

Expression of GbE in the lungfish ovary

We checked the expression levels of the six GbE by RNA-seq employing the transcriptomes of L. paradoxa (Fig. 3; Supplemental Information Figs 57). The five Mb genes, as well as GbX and GbY, were included for comparison. All six GbE genes were essentially restricted to the ovary transcriptome. The RPKM (Reads Per Kilobase exon model per Million reads) values were very high and reached 6,364.7 RPKM. In other tissues, the RPKM values were between 0 and 5 (Supplemental Information Table 3). The cumulative RPKM value of all six GbE genes in the ovaries was 19,942. For comparison: The cumulative RPKM of all five Mb genes was 562 in skeletal muscle and 2,017 in the heart (Supplemental Information Table 3). The total levels of Mb mRNA in ovaries amounted to 5.92 RPKM. To validate the ovary-specific expression of GbE, quantitative real-time RT-PCR was carried out with L. paradoxa ovaries and other selected tissues (Fig. 4). Again, GbE1 mRNA was exclusively detected in the ovaries, but not in other tissues, including the eye.

Figure 3
figure 3

Expression of the Mb and GbE genes in selected L. paradoxa tissues. mRNA levels were estimated by RNA-Seq and are displayed as RPKM values. Transcriptome accession numbers are given in Supplemental Information Table 1. Note the dominant expression of LpaGbE1-2e in the ovary. The copy numbers are given in Supplemental Information Table 3. Log-scale data are presented in Supplemental Information Fig. 5. The tissue-specific expression per gene is given in Supplemental Information Figs 6 and 7.

Figure 4
figure 4

Expression of GbE1 in selected L. paradoxa tissues. mRNA levels were determined by qRT-PCR (A) and RT-PCR (B). The standard deviations in (A) derive from three replicates. GbE1 was almost exclusively detected in the ovary.

We further analysed the transcriptomes of the West African lungfish P. annectens33. These transcriptomes included male and female gonads, along with brain and liver transcriptomes. The five GbE genes were found highly expressed in the female gonads, with cumulative RPKM of 31,916 to 37,275 (Supplemental Information Fig. 8; Supplemental Information Table 4). In the male gonads, expression of GbE was at least 1,800-fold lower with cumulative RPKM of 0.85 and 17.69. In other tissues, cumulative RPKMs of GbE were <4. Except in the gonads, there was no difference in GbE levels between other male and female organs. In P. annectens, we also found an Adgb cDNA, which is expressed at low levels mainly in the mature male gonads (Supplemental Information Fig. 8). RNA-Seq analyses of the three available transcriptomes of the marbled lungfish P. aethiopicus showed the same picture: High expression of GbE in the ovary (cumulative RPKM of the seven GbEs: 53,661), but only traces of GbE in the transcriptomes of the developing jaw/mandible or mixed visceral tissues (Supplemental Information Fig. 9; Supplemental Information Table 5).

We further identified GbE proteins in the ovaries of L. paradoxa and P. aethiopicus by mass spectrometry. After separation of ovary proteins by SDS-PAGE, prominent bands with the expected GbE mass of ~15 kDa were detected (Supplemental Information Fig. 10). Mass spectrometry identified in these bands the proteins LpaGbE1-2e in the L. paradoxa samples, and PaeGbE1a, b, c, and PaeGbE2 a, b in the P. aethiopicus samples, respectively (Supplemental Information Fig. 2).

To check whether the oocyte-specific expression was overlooked in previous studies with birds20,21, we evaluated the ovary- and egg-specific transcriptomes of chicken (Gallus gallus) (Supplemental Information Table 1). In none of these transcriptomes, GbE sequences were found (Supplemental Information Fig. 11; Supplemental Information Table 6). We randomly checked further ovary transcriptomes of other vertebrate species, as available at SRA, for putative GbE sequences via BLAST. We included all vertebrate classes, but in none of these datasets, GbE sequences were detected. A comprehensive BLAST search of the available transcriptomes (including the transcriptome shotgun assemblies; TSA) or genomes at Genbank or ENSEMBL identified GbE only of birds, turtles, and the coelacanth.

Localisation of GbE mRNA in previtellogenic oocytes

To determine the spatial expression on GbE1 of L. paradoxa (LpaGbE1) we performed in situ hybridisations in adult gonads of a female South American lungfish. Hematoxylin-eosin stained sections of the ovary show oocytes in various stages of maturation (Fig. 5A,B). Oocyte development in L. paradoxa consists of an initial stage of previtellogenic oocytes, characterised by a basophilic cytoplasm. Next continuous yolk deposition results in a rapid increase in cellular volume in vitellogenic oocytes. Flattened follicular cells surround the vitellogenic oocyte, and acidophilus yolk granules cortically localized gradually expand into the cytoplasm. Mature oocytes measure about 2 mm34. LpaGbE1 antisense probe signal was observed specifically in the cytoplasm of previtellogenic basophilic oocytes (Fig. 5C). LpaGbE1 expression was not detected in vitellogenic oocytes, follicular cells or other ovarian cells (Fig. 5C,D). No signal was detected by the sense control probe (Supplemental Information Fig. 12).

Figure 5
figure 5

GbE1 expression in the ovary of L. paradoxa. (A,B) Haematoxylin and eosin stained in ovary sections, oocytes at various stages of maturation can be seen. (C,D) Antisense GbE1 riboprobe signal was detected in previtellogenic oocytes. (B,D) Represent zoom in of a dashed line box in (A,C) respectively. (B) Arrow indicates follicular cells. Previtellogenic oocytes (PV); Vitellogenic oocyte (V); follicular cell (FC). Scale bar: 2 mm (A,C) and 0.5 mm (B,D). The sense controls are given in Supplemental Information Fig. 12.

Spectroscopic studies and O2 binding equilibria to GbE1 of L. paradoxa

In size-exclusion chromatography, recombinant LpaGbE1 elutes largely as a monomer (data not shown). The absorbance spectrum of purified LpaGbE1 displayed a Soret band at 406 nm, an α band at 533 nm and a β-band at 579 nm (Fig. 6A), indicating a mixture between ferric and ferrous oxy forms. For comparison, a pure Mb oxy spectrum displays peaks at 418, 543 and 581 nm35. After reduction with Na-dithionite under nitrogen, the ferrous deoxy-form was obtained, with a large amplitude of the Soret band (427 nm) and a single peak in the visible region (555 nm). The absorption spectrum of deoxy-LpaGbE1 resembled that of Mb and Hb, indicating a pentacoordinate heme. O2 equilibrium curves (pH 7.2, 20 °C) showed that LpaGbE1 reversibly binds O2 (Fig. 6B), with a P50 of 1.2 ± 0.02 torr (0.16 kPa; 1 torr = 0.133 kPa). The O2 binding curve showed some degree of cooperativity (n = 1.19 ± 0.19), suggesting a dimeric assembly. The autoxidation rate of LpaGbE1 (pH 7.13, 20 °C), measured after removing the met reductase enzymatic system by gel filtration, was low (0.004 ± 0.0003 s−1). The nitrate reductase activity (pH 7.13, 20 °C) of deoxy LpaGbE1 was 18.72 ± 0.0022 s−1M−1.

Figure 6
figure 6

Heme coordination and O2 equilibria of recombinant GbE1 of L. paradoxa. (A) Absorbance spectra of purified (solid line) and deoxygenated (dotted line) recombinant GbE1 after addition of dithionite, indicating penta coordinate heme. (B) Representative O2 equilibrium curve, measured at pH 7.2, 20 °C. Fitting of saturation data is indicated (continuous line).

Discussion

RNA-seq and qRT-PCR studies showed that GbE is almost exclusively expressed in the lungfish ovary (Figs 3 and 4). More detailed studies by in situ hybridisation found that LpaGbE1 mRNA is restricted to non-mature previtellogenic oocytes, but was not found in other cells of the ovary, such as follicular cells, and was also not detected in vitellogenic oocytes (Fig. 5). The previtellogenic oocytes have not yet commenced accumulating yolk and other material. Thus, GbE mRNA is massively deposited in the oocyte and, as suggested by the mass spectrometric data, translated into GbE protein (Supplemental Information Fig. 10). Here, GbE may carry out its respiratory function either in the oocyte or may be used to support embryonic development.

After external fertilisation, males provide parental care to the fertilised eggs in underground burrows36. Stagnant waters where Lepidosiren nests have been found are characterised by low oxygen levels, ranging from 0.2 to 1 cm3 per litre of dissolved O237. Fertilized Lepidosiren eggs can reach 7 mm in diameter and, given the low oxygen conditions of the nests, it was unclear how Lepidosiren eggs obtained sufficient oxygen supply to sustain development. Lepidosiren males develop pelvic fin filaments during the breeding season, which break off and degenerate after larvae hatch and leave the nest36. Initially, it was suggested that these filaments could contribute to copulation as spawning brushes to spread seminal fluid. However, given the overall similarity between pelvic fin filaments and external gills, it was also proposed that pelvic fin filaments were used to aerate the eggs by releasing oxygen from the male’s blood into the water38. Recently, however, it was shown that Lepidosiren pelvic fin filaments neither have the morphology compatible for oxygen diffusion nor the gene expression profile typical of gill filaments39. Therefore, it is conceivable that GbE may help to extract O2 from the water to support the development of the embryo, which may be considered analogous to the function of the embryonic Hb in tetrapods3.

In contrast to the lungfish Mb genes, which functionally diversified after duplication32, the multiple GbE genes most likely encode proteins with similar functions. The presence of multiple palogous GbE genes thus enhance GbE mRNA levels, as evident by the very high RPKM values between 20,000 and 63,000 (depending on the species). These values exceed the RPKM of Mb in the muscle or heart (Fig. 3; Supplemental Information Figs 57; Supplemental Information Table 3) and are even higher than those of Hb in blood. Although we do not know the protein levels, the data suggest that GbE has an Mb-like role and contributes to the O2 supply of the oocytes. This hypothesis is supported by the notable GbE protein bands in the Coomassie-stained gels (Supplemental Information Fig. 10), and by the O2-binding equilibrium curve, with a P50 similar to that of a typical vertebrate Mb40, as well as by the low autoxidation rate of GbE. In addition, deoxy LpaGbE1 shows nitrite reductase activity similar to that of vertebrate Mbs40, indicating that GbE may also contribute to nitrite-dependent NO generation and signalling pathways in the ovary during periods of hypoxia. Other functions of GbE, for example as a vitellogenin-like storage protein, cannot be excluded but are less likely.

The globin family is a classic example of subfunctionalisation and neofunctionalisation of genes after duplication41. This notion is supported by the existence of eight distinct globin genes in vertebrates, which emerged early in evolution1. More recent and lineage-specific amplification events of members of the globin family are mirrored by the multiple Mb32 and GbE genes (this study) of lungfish, which is unusual among vertebrates. The phylogenetic tree (Fig. 2) showed that amplification of GbE genes commenced already in the lungfish stem lineage (i.e., separation of the GbE1 and GbE2 clades), but then occurred independently within the orders Lepidosiren and Protopterus. Lungfish are well-known for their gigantic genomes, which may, in theory, also result in the amplification of gene numbers. This explanation is unlikely for the globin genes. We only observed multiple copies of Mb and GbE, but of no other globin gene. In fact, lungfish have lost Ngb and Cygb genes. It is possible that their functions have been taken over by Mb copies32. The GbE genes seem to have an oocyte-specific function in lungfish, which is – to the best of our knowledge – not mirrored by any other globin-type in another vertebrate.

Although additive effects of the expression of multiple Mb genes in lungfish may be important to increase O2 supply, it is more likely that the different Mbs carry out distinct functions. This hypothesis is supported by the distinct expression patterns of the P. annectens Mbs, as well as by their different abilities to transfer stress tolerance32 and O2 binding properties (J. Lüdemann, A. Fago, T. Burmester, unpublished). By contrast, we propose that the multiple GbE genes have additive functions. This is evident by their expression in a single tissue, the ovary, as well as by the high similarity of the sequences. While Mb amino acid sequences may differ up to 70%, the maximum divergence of the lungfish GbEs is 40%. We further propose that lungfish GbE has a similar function in the ovary as Mb in the heart and skeletal muscle and may thus provide additional O2 either by enhancing O2 storage or by facilitating intracellular O2 diffusion.

There is little doubt that the globins originated from a single ancestor, and multiple gene duplications have led to their functional diversification1,22,26. Phylogenetic analyses suggest that all eight globin types were present in the gnathostome ancestor and that subsequent losses in certain clades have shaped the present globin repertoire of the vertebrate taxa. Globin E has been lost in all vertebrate clades except birds, turtles, lungfish, and coelacanth (Fig. 7). Although database searches did not reveal any other GbE sequences, we cannot rule out that in future additional GbE genes will be discovered that are expressed at unexpected sites.

Figure 7
figure 7

Occurrence of GbE in vertebrates. GbE (purple pentagon) genes are present in the genomes of the coelacanth, lungfishes, turtles, and birds, but has been lost in other vertebrate taxa (indicated by ). The arrow indicates the origin of GbE, i.e., its divergence from Mb. Genomes were searched by BLAST at ENSEMBL (https://www.ensembl.org/) and NCBI (https://www.ncbi.nlm.nih.gov/genome/).

The present data suggest that GbE has a similar role as Mb in cellular O2 supply. Mb is still considered to exert its function mainly in muscle, but at least in non-tetrapods it displays a more widespread expression pattern in many tissues19,42,43,44. GbE and Mb share an ancient common ancestry (Fig. 7). It is possible that GbE originated as a variant of Mb that meets the particular needs (for example, regarding O2 affinity) for a specific tissue. It remains unknown whether the original function was in the oocyte (like in lungfish) or the eye (like in sauropsids). In future, the coelacanth may be helpful to decide on the direction of evolutionary change but none the currently available transcriptomes of this taxon derives from eye or ovary. However, it is evident that either in the lungfish or the sauropsid clade, GbE must have changed its expression site. For unknown reasons, GbE was independently lost in at least five vertebrate clades23 (Fig. 7).

Methods

Lungfish material

A L. paradoxa specimen was collected near Breves, Brazil, and euthanised with a lethal dose of tricaine methanesulfonate. This study was approved by IBAMA/SISBIO under license number 47206-1, and experimental procedures and animal care were conducted in accordance to the Ethics Committee for Animal Research at the Universidade Federal do Pará, under the approved protocol number 037-2015.

West African lungfish (P. annectens) and marbled lungfish (P. aethiopicus) were obtained from a pet shop, euthanised in 1 g/l tricaine methanesulfonate and finally killed by decapitation. Tissues were removed and immediately stored frozen at −80 °C in RNAlater (Qiagen, Hilden, Germany). Animals were treated in accordance with the German Animal Welfare Act.

RNA extraction, library preparation and Illumina sequencing

Total RNA was purified from lung, brain, buffy coat and ovary of South American Lungfish (L. paradoxa) and extracted for transcriptome using TRIzol® Reagent (Life Technologies, Cat. 15596-026) according to the manufacturer’s protocol. RNA samples were further purified using RNeasy® Mini Kit (Qiagen) and treated with DNaseI (Qiagen), according to the manufacturer’s protocol. Reference transcriptome and transcript abundance estimation were obtained from each library, sequenced on an Illumina Hiseq platform with 100 bp paired-end reads. Sequencing was carried out by a commercial service from Instituto Nacional do Câncer, Brazil.

RNA was purified from the ovary of the marbled lungfish (P. aethiopicus) using the CRYSTAL RNA Mini Kit (biolab products). A library for paired-end sequencing was generated from ~1 µg total RNA. Sequencing of 2 × 150 nt was performed by with the Illumina NextSeq500 technology (StarSEQ, Mainz, Germany).

Additional transcriptomes from various lungfish and other species were retrieved from the public SRA database at GenBank (for accession numbers, see Supplemental Information Table 1). The transcriptomes from each lungfish species were assembled using either the CLC Genomics Workbench (version 11.0.1) or Trinity v2.6.5. Globin cDNA sequences were identified in the assemblies employing TBLASTN using the coelacanth globins24 as queries. The final assignment of a globin to a specific clade was done by phylogenetic analyses (see below). The consensus sequences were verified using a backmapping approach using the CLC Genomics Workbench. Broken mate-pairs were employed to identify putative misassemblies. If required, the reads were re-assembled with different parameters, and the procedure was repeated until unambiguous GbE cDNA sequences were obtained. Selected GbE sequences were verified by RT-PCR and Sanger sequencing of the cDNA (GATC, Konstanz, Germany). RNA-Seq analyses were performed with the CLC Genomics Workbench. The mRNA levels of the globins were calculated as RPKM.

Sequence analyses and phylogenetic inference

The vertebrate genomes available at ENSEMBL (https://www.ensembl.org/) and NCBI (https://www.ncbi.nlm.nih.gov/genome/) were searched for the presence of GbE genes. The lungfish GbE amino sequences were included in an alignment that covers the whole range of vertebrate globins and a broad range of classes25,32,45 (Supplemental Information Table 2). Adgb was excluded from the phylogenetic analyses because of its permutated globin domain13. A multiple sequence alignment of the amino acid sequences was obtained with the MAFFT online tool with the L-INS-i method46,47. Phylogenetic analysis was done with MrBayes 3.2.348,49 using the LG model of amino acid evolution50, which was selected using PROTTEST51, and which was implemented into MrBayes with the general time reversible model as fixed prior and by specifying the aarevmatpr and statefreqpr options24,25. The program was run for 5,000,000 generations using the standard option (two independent runs with four simultaneous chains). Trees were sampled every 1000th generation, and the posterior probabilities were estimated after discarding the initial 25% of the trees.

Quantitative real-time reverse-transcription PCR

Reverse transcription was performed with 1 µg of total RNA, oligo-(dT)18 oligonucleotides (10 µM) and 200 U SuperScriptTM II RNase H Reverse Transcriptase (Invitrogen) according to the manufacturer’s protocol. Quantitative real-time RT-PCR (qRT-PCR) experiments were carried out on an ABI 7500 Real-Time PCR system (Applied Biosystems, Darmstadt, Germany) with the “ABI Power SYBR Green Master Mix”. The efficiency of the reaction was measured by the slope of a standard curve, deriving from tenfold dilutions of plasmids. Expression data were normalised according to the amount of total RNA. Further analyses were carried out employing the Microsoft Office Excel spreadsheet program.

In situ hybridisation

Ovary tissue was extracted from L. paradoxa and embedded in Tissue Tek O.C.T compound (Sakura Finetek) in dry ice, and then stored in −80 °C freezer for subsequent cryosectioning. Then, 20 µm sections were produced on ColorFrost Plus microscope slides (Thermo Fisher Scientific) on a cryostat at −20 °C. Sections were fixed according to a previously established protocol52. After drying at room temperature, slides were stored in −80 °C ultrafreezer. Hematoxylin (Sigma-Aldrich) and eosin (Sigma-Aldrich) staining were performed according to standard protocol. A pGEM-T vector (Promega, Mannheim, Germany) containing GbE1 of L. paradoxa served as a template in PCR amplification using forward and reverse M13 primers (reverse: 5′-CAGGAAACAGCTATGAC-3′; forward: 5′-GTAAAACGACGGCCAG-3′). PCR was performed in 50 µl reaction volumes containing 39.2 µl of RNase free water, 1.5 µL of 10 mM MgCL2, 5 µL of 10x buffer, 1 µL of each primer (0.5 M), 1 µL of dNTP mix (10 mM), 0.3 µL of Taq DNA Polymerase, and 1 µL of DNA template. The temperature profile consisted of preheating at 94 °C for 3 min, 32 cycles of denaturation at 94 °C for 45 s, annealing at 56 °C for 30 s, and extension at 72 °C for 90 s, followed by a final extension step at 72 °C for 10 min. Sense and antisense riboprobes were synthesised using T7 RNA and Sp6 RNA polymerases, respectively, and DIG-labelling mix (Roche). The riboprobe reaction (Life Technologies) was performed in 20 µL reaction volumes containing 0.5 µL of RNase inhibitor, 2 µL of DTT (0.01 M), 2 µL of DIG, 2 µL of 10x reaction buffer, 5 µL of template, 2 µL of Sp6/T7 enzyme mix, and 6.5 µL of nuclease-free water. In situ hybridisation was performed according to a previously established protocol53, using 300 ng/slide of DIG-labelled riboprobe. Slides were imaged using a Nikon SMZ1500 microscope (Nikon Digital Sight DS-Ri1).

Identification of GbE proteins by mass spectrometry

Total proteins were isolated from the ovaries of L. paradoxa and P. aethiopicus. The tissues were homogenised in 10 mM Hepes-buffer with a Minilys Personal Homogeniser (Bertin Instruments, Bretonneux, France) 3 times for 30 s. Then, DNA and RNA were destroyed by sonification (Bandelin Sonopuls, Berlin, Germany). After centrifugation at maximum speed (13,000 × g, 10 min, 4 °C), the supernatant was collected. Total protein from each ovary was separated on a 15% SDS-polyacrylamide gel. The gel was stained with Coomassie brilliant blue. Putative GbE bands were excised and analysed by liquid chromatography-mass spectrometry using a commercial service (Core Facility Mass Spectrometric Proteomics, University Medical Center Hamburg-Eppendorf, Germany).

Preparation of recombinant lungfish GbE protein

The coding sequence of GbE1 of L. paradoxa was amplified by RT-PCR and then cloned into the pET16b expression vector (Novagen - Merck Biosciences, Darmstadt, Germany). Recombinant expression was done in E. coli BL21(DE3)pLysS cells (Promega, Mannheim, Germany), which were grown at 37 °C in 5 ml L-medium (1% bactotryptone, 0.5% yeast extract, 0.5% NaCl, pH 7.5) containing 10 µg/ml ampicillin, 34 µg/ml chloramphenicol over night. The culture was applied to 500 ml L-medium supplemented with 1 mM δ-aminolevulinic acid. The culture was induced at OD600 = 0.4 to 0.8 by the addition of isopropyl-1 thio-D-galactopyranoside (final concentration 0.4 mM), and expression was continued at 30 °C overnight. Cells were collected by 45 min centrifugation at 4,000 g and resuspended in 50 mM Tris-HCl, pH 8.0, 1 mM MgCl2, 1 mM dithiothreitol, 10 μg/ml DNase, 5 µg/ml RNase, CompleteTM proteinase inhibitor mix (Roche Applied Science) and Pefabloc (Roth). The cells were broken by three freeze-thaw cycles in liquid nitrogen followed by ultrasonication (10 × 30 s). DNA and RNA were digested for 2 h at 37 °C. The cell debris was removed by centrifugation for 1 h at 4 °C at 4,500 × g. The recombinant GbE1 was purified from the supernatant using His60-Ni columns (Qiagen) according to the manufacturer’s instructions. The His-tag was removed by incubation with the Factor Xa protease (20 µg/ml) for 6 h at 37 °C. After inactivation of the protease with 2 µM dansyl-glu-gly-arg-chloromethyl ketone, the recombinant GbE1 protein was brought to 10 mM Hepes, pH 7.8, 0.5 mM EDTA.

Gel-filtration was carried out with an ÄktaPure chromatographic system (GE Healthcare, Freiburg, Germany) equipped with a Superdex 75 10/300 column (GE Healthcare). The proteins were eluted with 50 mM potassium phosphate, 0.5 mM EDTA, pH 7.0, 0.15 M NaCl at a flow rate of 1 mL/min. The absorbance was measured at 280 and 415 nm. Recombinant GbE1 was applied at a concentration of 0.2 mM heme; Human Hb and horse Mb were employed as references at the same concentration.

Spectroscopic studies and O2 binding curves

Absorbance spectra of purified GbE1 were taken in the range 350–650 nm. The deoxy-form was obtained by adding sodium dithionite. The determination of O2 equilibrium curves was done using a modified diffusion chamber technique previously described54,55,56. Samples were measured in at least duplicates of 5 µL (~0.2 mM heme) in 0.1 M Hepes, 0.5 mM EDTA, pH 7.2 at 20 °C. Ferric GbE1 was reduced for 5 to 10 min in N2 with the met-Hb reductase system57.

In the modified diffusion chamber technique, water-saturated gas mixtures of O2 and ultrapure (>99.998%) N2 were generated by GMS 500 gas mixing system (Loligo System, Denmark) and used to equilibrate the thin smear of 5 µL GbE1 sample with stepwise increments of oxygen tension (PO2). Absorbance traces were sampled at 436 nm by a photomultiplier (model RCA 931-A) and an Eppendorf model 1100 M photometer. The signal was digitalised, and saturation values were obtained using an in-house software58. P50 and cooperativity values were calculated from the zero intercept and slope of Hill plots, respectively. Each curve consisted of four to six saturation steps.

Nitrite reductase activity

The reaction of recombinant deoxy GbE1 (10 µM heme in deoxygenated 50 mM HEPES, pH 7.13) with nitrite (0.1 mM) was carried out under pseudo-first-order conditions at 20 °C59,60. Before adding nitrite, the GbE1 was anaerobically titrated with 1 mM of sodium dithionite in a 1 cm cuvette sealed with a rubber cap. The measurement of the reaction kinetics was started immediately, and absorbance traces were recorded at 435 nm and 20 °C.