Lysyl oxidases (LOX) are a family of copper-dependent amino oxidases for which important roles in cancer and vascular and fibrotic diseases have been proposed1. Five different LOX enzymes have been identified in mammals (LOX and LOX-like 1 to 4), showing a high degree of homology in the catalytic carboxy terminal end and more divergence in the rest of the sequence2. While intracellular functions have been reported for LOX proteins, the primary role of this family of enzymes is the remodeling of the extracellular matrix (ECM), due to their capacity to convert lysine and hydroxylysine residues in collagens and elastin into highly reactive aldehydes, which eventually condense with other oxidized groups or intact lysines to form a variety of inter- and intrachain cross-linkages. The fundamental role of LOX proteins in ECM homeostasis has been demonstrated in experiments with mice lacking the LOX gene, which die just before or soon after birth by severe cardiovascular malformations, most likely involving defective elastogenesis3. Moreover, mice deficient in LOXL1, the closest mammal paralog of LOX, exhibit also cardiovascular defects, although they are perfectly viable and show a normal life span4. The remaining members (LOXL2-4) share the presence of four scavenger receptor cysteine-rich (SRCR) domains, a unique class of ancient, highly conserved polypeptide module present in a number of soluble and membrane-bound proteins for which no unifying function has been so far defined5. Recent work has described the capacity of LOXL2 and LOXL4 to enhance collagen IV deposition and assembly6,7. Nevertheless, it remains to be defined how this ECM remodeling capabilities fit together with the intracellular actions described for some of these SRCR-containing LOX members, such as the role of LOXL2 in the regulation of gene transcription8,9.

It is beyond doubt that the numerous evolutionary transitions from unicellular to multicellular organisms that occurred within eukaryotes could have never happened without their organization into extracellular structures. In contrast to sessile algae, fungi and plants, which acquired a comparatively uniform composition in their cell walls, animals exhibit a complex and heterogeneous ECM, with multiple protein families involved in the construction of intricate structural networks, as well as many protein complexes devoted to intercellular adhesion and communication10. Recent genome data have revealed that some of the large, secreted, multidomain ECM components, including basement membrane-forming collagen IV and fibrillar collagens appear to be specific to the Metazoa11. Nevertheless, important domains from ECM proteins have a pre-metazoan origin. For instance, the filasterean Capsaspora owczarzaki, a close relative of Metazoa, has protein domains related to laminin and fibronectin, as well as a complete integrin adhesome12,13,14. Furthermore, choanoflagellates harbor many collagen motifs and domains otherwise specific to animals, such as the repeated GXY triple helical motif (even though these organisms lack fibrillar collagen)15. Domain shuffling of ancestral, premetazoan domains on the metazoan stem lineage have been proposed to give rise to the fibril-forming collagens, which are conserved throughout the metazoan evolutionary tree16,17. The same is true for collagen IV18,19. From these “founder genes”, rounds of gene duplication and domain or exon shuffling have resulted in the formation of different classes, comprising currently 28 collagen genes in vertebrates, which play structural roles in soft tissues or act as templates for biomineralisation in bone or teeth17,20. However, this family expansion has not been universal for all metazoans. For example, Drosophila lacks any fibrillar collagens that were most likely secondarily lost21. Remarkably, chordates and, specifically, vertebrates have witnessed a significant number of ECM innovations, including not only the duplication of pre-existing deuterostome genes but also the generation of complex forms of collagen (transmembrane collagens, FACIT collagens, among others) or of specific protein innovations22. In particular, elastin is one of the vertebrate-specific ECM novelties and has played a fundamental role in the evolution of a high-pressure, pulsatile blood circulation system23.

Very limited information is available about the existence of LOX isoforms in non-bilaterian animals or other organisms. LOX-generated cross-links have been isolated from a sponge (Haliclona oculata), a sea urchin (Strongylocentrotus droebachensis), a sea cucumber (Thyone briarius), as well as from several annelids, echinodermates and molluscs24,25. Additionally, arthropodes like Drosophila have been reported to have two distinct LOX-like genes, whereas some chordates such as the cyprinidae Danio rerio (zebrafish) present up to 10 LOX genes26,27,28. A preliminary phylogenetic analysis of LOX genes revealed that human LOX and LOXL1 share a common ancestor and form an independent group from LOXL2, LOXL3 and LOXL4, being likely related to the Ciona intestinalis LOX1 and LOX2, respectively22. However, we lack an understanding of the evolutionary origin of the members of the LOX family and how they relate to the evolution of the main ECM components such as collagens and elastin.

We here have surveyed a wide selection of genomes representing all the major eukaryotic and prokaryotic clades, aiming to reconstruct the evolutionary history of LOX enzymes. Our phylogenetic analyses, based on the conserved lysyl oxidase domain of LOX enzymes, show that LOX sequences are identifiable not only in animals, but also in many other eukaryotes, as well as in bacteria and archaea. This points at a much older origin than previously thought for LOX enzymes, preceding the origin of animals21. Our phylogenetic analyses show a significant expansion of LOX types during metazoan evolution, giving rise to three LOX families in Porifera (sponges) and two superfamilies in Eumetazoa (bilaterians and cnidarians). The LOXL2/L3/L4 superfamily is typically associated with SRCR domains, whereas LOX/L1/L5 display distinct N-terminal domains and is related to the mammalian LOX and LOXL1. Based on the existing knowledge on the evolution of collagens and elastin, we propose here that LOXL2/L3/L4 members might contribute to the cross-linking of basement membrane collagen IV, whereas LOX/L1/L5 proteins may have evolved to cover the requirements of more sophisticated ECM in chordate/vertebrate phyla.


The prokaryotic history of LOX enzymes

Figure 1 shows phylogenetic analysis of LOX enzymes in prokaryotes and eukaryotes (panel A, unrooted tree) and Holozoa only (panel B, using ichthyosporean LOX as tree root). The network of reciprocal blast hits with indication of their score is shown in Fig. 2. Complete phylogenies are shown in Supplementary Files S1 to S4, sequences in Files S5 and S6.

Figure 1
figure 1

Phylogenetic trees of LOX enzymes in eukaryotes and prokaryotes.

A) Unrooted tree of 154 LOX domains from eukaryotic and prokaryotic genomes as inferred by bayesian inference. B) Rooted tree of 129 LOX domains from an expanded selection of holozoans (animals and their unicellular relatives, see grey-shadowed area of part A), as inferred by bayesian inference. Nodal support values are shown at key branches (Maximum likelihood bootstrap support/Bayesian posterior probabilities). Sequences are color-coded according to their taxonomic assignment. The consensus protein domain architectures of each LOX family are shown adjacent to each phylogeny, including Pfam domains (green boxes), proline-rich and propeptide regions (blue), transmembrane regions (pink), signal peptide motifs (orange) and the Interpro 019828 motif (red asterisk). The trees are not to scale. See supplementary Figures S1, S2, S3 and S4 for detailed versions of these phylogenies, including scaled branches and complete nodal support.

Figure 2
figure 2

Network of reciprocal BLAST searches for LOX enzymes.

Each node represents a LOX-containing protein. Nodes are connected by edges when they are reciprocal BLAST hits of each other (see Methods). Nodes are color-coded according to their taxonomic assignment (for some clusters of interest, further taxonomic details are also shown). Edges are color-coded according to the E-value of each BLAST hit.

Besides the eukaryotic LOX enzymes, our survey identifies for the first time LOX in both Archaea and Bacteria. In particular, LOX-coding genes are widely distributed in Bacteria, being present in five major clades: Bacteroidetes, Actinobacteria, Proteobacteria, Gemmatimonadetes and Deinococcus-Thermus (Fig. 1A). In contrast, the archaeal LOX homologs cluster into two separate groups of thaumarchaeotes and euryarchaeotes (Fig. 1A). In fact, each of these archaeal groups are associated to bacterial LOX and appear to be composed of sequences from phylogenetically close organisms (Supplementary Fig. S1 and S2). This suggests that thaumarchaeotes and euryarchaeotes could have acquired LOX through two independent horizontal gene transfer (HGT) events from bacteria (Figs. 1A and 2), although identification of the bacterial donors is required to confirm this hypothesis.

Finally, it is interesting to note that, in contrast to eukaryotic LOX, most (except three) of the identified prokaryotic sequences exhibit simple protein domain architectures with just the LOX domain, with or without signal peptide and/or transmembrane region.

LOX in unicellular eukaryotes

Our data also show the presence of LOX enzymes in different eukaryotic non-metazoan lineages (Fig. 1). Specifically, we identified LOX genes in the genomes of some Amorphea/Unikonta taxa (including animals, fungi and a number of unicellular clades), as well as from the Rhodophyta (red algae, from the Diaphoratickes supergroup).

The phylogenetic analysis of LOX recovers a major clade that includes opisthokont LOX homologs (all known animal enzymes, fungi and ichthyosporeans) together with a number of environmental metagenomic sequences (Fig. 1; BS 73%, BPP 0.99). Within fungi, we identify LOX homologs in the chytrid Spizellomyces punctatus and the monoblepharidomycete Gonapodya prolifera. Ichthyosporeans, which are a group of unicellular organisms closely related to animals29, have also the most animal-like LOX genes according to our phylogeny (Fig. 1). They have two sets of LOX, one of which (LOXOb) has acquired C-terminal Kringle, PLAT and Notch protein domains (Fig. 1A). While the function of LOX in ichthyosporeans is at present unknown, the occurrence of the transmembrane region of Notch suggests some membrane-associated role akin to the SRCR-containing LOX of animals.

We also identified LOX homologs in the unicellular amoebozoan Acanthamoeba castellanii and the rhodophytes Cyanidioschyzon merolae (unicellular algae) and Pyropia yezoensis (multicellular seaweed). However, they could not be unambiguously classified to any specific group, probably due to either low statistical support (A. castellanii and C. merolae) or insufficient data (P. yezoensis). According to the network of reciprocal BLAST (Fig. 2), the C. merolae LOX and the 4 copies of A. castellanii (BS 98%, BPP 0.99) seem to be related to prokaryotic, environmental or fungal sequences, whereas P. yezoensis’ proteins cluster separately from the rest of the known LOX enzymes.

It is interesting to note that neither A. castellanii nor fungi have collagen-based ECM structures equivalent to those of animals. As for the multicellular seaweeds, they do have complex polysaccharide-based ECM, but do not possess collagen-based structures.

LOX diversification in animals

It is within animals where we found the greatest variety of LOX forms, with many duplications and frequent rearrangements of protein domain architectures (Fig. 1B).

We identified three groups of LOX enzymes specific to Porifera (sponges), termed LOXP1-3 (pink branches in Fig. 1B). Each of them has different protein domain architectures based on transmembrane SRCR domains, both N- and C-terminal. The LOXP1 family is only present in calcareous sponges (Sycon ciliatum and Leucosolenia complicata) and contains proteins with multiple domains, including not only SRCR but also MAM or Sushi. Given that LOXP1 is the earliest family present in animals, this means that the association between LOX and SRCR domains was already present at the origin of animals. LOXP2 and P3 families, both with the canonical N-terminal SRCR repeats, are present in demosponges (Amphimedon queenslandica), homoscleromorph (Oscarella carmela) and calcareous sponges.

A duplication event at the origin of eumetazoans gave birth to two animal LOX superfamilies that although not statistically supported, are recovered by both Maximum Likelihood and Bayesian inference analyses: LOX/L1/L5 (composed of homologs of human canonical LOX and LOXL1, plus the fish-specific LOXL5) and LOXL2/L3/L4 (homologs of human LOXL2, LOXL3 and LOXL4).

The LOX/L1/L5 superfamily (BS 15%, BPP 0.69) is present in cnidarians (dark orange branch in Fig. 1B), that have the ancestral SRCR-containing form and chordates (red and dark red branches), that lack SRCR domains (Fig. 1B, see also a cladogram with domain gain/loss in Fig. 3). At the origin of vertebrates, this superfamily gives rise to the LOX, LOXL1 and LOXL5 (exclusive to various fish clades) gene families. LOXL1 enzymes have a N-terminal proline-rich region, also conserved in LOXL5 but lost in canonical LOX. Canonical LOX and LOXL5, in turn, share an exclusive propeptide region (Fig. 1B).

Figure 3
figure 3

Reconstruction of the evolutionary history of LOX enzymes and ECM across the tree of life.

The cladogram represents a consensus view of the eukaryotic tree of life (see Methods) with bacteria as outgroup. Each bold, colored line represents a LOX family (as indicated in the legend); its route along the tree represents their pattern of appearance and loss in each taxonomic group. Dashed lines represent unclear phylogenetic relationships. Green- and red-colored boxes represent gains and losses of ECM features, respectively. The consensus protein domain architectures of each LOX family are shown adjacent to each taxonomic group, including Pfam domains (green boxes), proline-rich and propeptide regions (blue), transmembrane regions (pink), signal peptide motifs (orange) and the Interpro 019828 motif (red asterisk).

The LOXL2/L3/L4 superfamily (BS 14%, BPP 0.83) was lost in cnidarians and is only present in bilaterian genomes (Figs. 1B and 3). All the families retain the ancestral SRCR-containing form, with variations in the number of repeats (Fig. 1B). This is the only LOX family present in protostomes (arthropods, molluscs, annelids and platyhelminths) and ambulacrarian deuterostomes (hemichordates and echinoderms). It is also present in tunicates and cephalochordates. The vertebrate-specific LOXL2, LOXL3 and LOXL4 families originated after the divergence of Petromyzon marinus (sea lamprey), which retains the ancestral type. All of them have four N-terminal SRCR repeats.

Overall, vertebrates have the highest count of LOX enzyme types among eukaryotes, with five widespread families (canonical LOX, LOXL1, LOXL2, LOXL3 and LOXL4), one family specific to fishes (LOXL5, found in actinopterygian, sarcopterygian and cartilaginous fishes) and one specific to lampreys (LOXL2/L3/L4). These LOX types display five different protein domain architectures (Fig. 1B).

We could not identify any LOX gene in nematodes, nor in the placozoan Trichoplax adhaerens or the ctenophore Mnemiopsis leidyi.

Assessment of the catalytic activity of novel LOX homologs

The presence of LOX domains in previously unreported eukaryotes and prokaryotes raises the question of whether they are enzymatically active proteins or not. It has been demonstrated that LOX catalytic activity relies on the C-terminal domain of the protein, where two features are needed. First, the core of histidines forming the copper binding site, the so-called “copper-talon”, which matches the conserved motif Interpro 019828 (WEWHSCHQHYHSMD in human LOX, Hsap_ENSP00000231004)30. Second, the lysine and tyrosine residues involved in the association with the lysyl tyrosyl quinone (LTQ) cofactor (K320 and Y355 in Hsap_ENSP00000231004)31. These key amino acids are widely conserved in all the groups analyzed in our study (Fig. 4, see also Supplementary Files S7 and S8) with the exception of the rhodophyte C. merolae, which lacks the histidine core. This observation predicts that these LOX homologs can be enzymatically competent to oxidize substrates. Interestingly, the first histidine residue within the copper binding site (H289 in Hsap_ENSP00000231004) is conserved in animals and ichthyosporeans, but is not present in bacterial, fungal or amebozoan sequences. Recent experimental evidence have provided useful information about whether the loss of this histidine residue can compromise the binding of copper and therefore, the catalytic activity32. These authors sequentially mutated the histidine into alanines (being incapable to bind copper) and showed that the substitution of the first histidine did not significantly alter the ability of the enzyme to bind copper and oxidize substrates. Based on this report, it can be predicted that LOX domains identified in our work would display catalytic activity as they possess the core of the three essential histidines and the residues implicated in the LTQ linkage.

Figure 4
figure 4

Multiple sequence alignment of catalytic LOX domains.

60 LOX proteins representing all of the groups analyzed in our study were aligned in order to inspect for conserved residues involved in the catalysis. Conserved residues highlighted in red constitute the cores of histidines forming the copper binding site within the InterPro 019828 motif (Lysyl oxidase). Note that the histidine depicted in orange within this motif is conserved in animals and ichthyosporeans, but not present in bacterial, fungal or amebozoan sequences. Strictly conserved lysine and tyrosine residues involved in LTQ cofactor linkage are highlighted in blue.


Our results provide the most comprehensive up-to-date phylogenetic analysis of the family of LOX enzymes. A main conclusion is that the LOX domains are more widely distributed than previously thought, as we identify clear homologs in animals and other eukaryotes, as well as bacteria and archaea22.

Based on our phylogenetic analyses with a wide taxon sampling, we can reconstruct the evolution and diversification of LOX enzyme families in eukaryotes and prokaryotes. With respect to the eukaryotic LOX enzymes, we identify a group of ichthyosporean and fungal LOX homologs as the closest relatives to the known animal enzymes (Fig. 1). This clearly indicates that this amino oxidase enzyme family was already present in the opisthokont ancestor, thus predating the origin of metazoans. Different scenarios could explain the origin of this opisthokont LOX according to our results. First, it could have been derived from an ancestral eukaryotic homolog from which the A. castellanii and C. merolae copies could have derived as well. Second, it could have been acquired by a horizontal gene transfer (HGT) event from bacteria to an ancestral opisthokont.

In order to understand the evolutionary history of LOX enzymes outside opisthokonts, we need to understand how LOX enzymes first appeared (in eukaryotes or prokaryotes) and whether HGT events took place (and when). However, the distribution of LOX cannot be conclusively explained by our phylogeny, as several non-exclusive scenarios would fit. For example, a potential explanation would be a bacterial origin of LOX, followed by a later transfer to eukaryotes (either by HGT or during the process of eukaryogenesis) and multiple secondary losses. Another possibility would be a later eukaryotic origin followed by a number of HGT events between eukaryotes and prokaryotes and within prokaryotes as well.

In support of the HGT-driven scenarios, the genomes of A. castellanii and C. merolae are both known to have experienced multiple HGTs from bacteria and the same is true for amoebozoan genes being transferred to prokaryotes33,34,35 It is worth noting that HGT of metabolic genes from prokaryotes is an important factor underlying the diversification of eukaryotes, particularly in the case of amoebas such as A. castellanii or a hypothetical amorphean ancestor12,33,36. If this were the case, the acquisition of LOX by an ancestral microbial eukaryote would have had an important, delayed effect in the evolution of the ECM, as it eased the appearance of the current enzyme types essential for its formation.

The presence of LOX enzymes in bacteria raises the question of the function of LOX within these organisms. Several collagen-like proteins have been identified in bacteria and for some of them, the formation of a stable triple helix has been demonstrated21,37. Some of the best characterized bacterial collagen-like proteins are the streptococcal Scl1 and Scl2, which are expressed on the cell surface of group A Streptococcus and contribute to bacterial pathogenicity through the binding to host ECM components including integrins and fibronectin38,39. Our analysis did not identify LOX isoforms in members of the Streptococcus genus, but, for example, in a number of Streptomyces species, for which collagen-like sequences have also been genome-annotated (see, for instance, Uniprot entries: D9WI30 or D6B4A5, Nevertheless, a higher order structure reminiscent of intra- or interchain covalent association has not yet been described for bacterial collagen-like proteins, therefore making unlikely that LOX may cross-link bacterial collagenous material. While more studies are needed to elucidate the function of bacterial LOX enzymes, it can be hypothesized that LOX proteins may be a component of the enzymatic repertoire of bacterial metabolism transferred to eukaryotes and adapted to new functions, as suggested to have occurred, for instance, with the epigenetic machinery40. Interestingly, collagen-like proteins present in bacteria have also been proposed to originate from an HGT event from metazoans to bacteria41.

Current views of the evolution of the animal ECM envision its constitution as the result of a gradual appearance of specific gene families and domains in pre-metazoan lineages, followed by remarkable expansions in animals. This is best exemplified by the presence of a fully functional integrin adhesome in C. owczarzaki, a unicellular filasterean with aggregative behavior that also has proteins with laminin and fibronectin motifs (although with different domain architectures than their animal counterparts)12,13,14,42,43. This is also the case of the choanoflagellates Monosiga brevicollis and Salpingoeca rosetta, that have proteins with collagen and laminin domains (also without a clear homologs in animals)14,15,44. Further refinement of these pre-existing protein families and the appearance of Metazoa-specific innovations provided the chordates and vertebrates with a wider repertoire of ECM proteins to fulfill novel functions in the vasculature or in the nervous system18.

Our phylogenetic analysis of LOX revealed a relatively similar pattern of evolution: LOX domains were already present in unicellular eukaryotes (notably in the ichthyosporeans, that are closely related to Metazoa) and further expanded during metazoan evolution. Interestingly, unicellular organisms such as the ichthyosporeans Sphaeroforma arctica, Creolimax fragrantissima, Pirum gemmata and Abeoforma whisleri or the amoebozoan Acanthamoeba castellanii, display forms of LOX associated with domains thought to serve extracellular protein-protein interactions, for example PKD, Kringle or PLAT (with or without the presence of transmembrane regions), much in the same role that SRCR has been postulated to play in SRCR-containing LOX forms2.

According to our study, SRCR domains first associated with LOX proteins in Metazoa, specifically in sponges (see Fig. 3). The SRCR domains in sponges are present both at N- and C-terminal, with and without association with other protein architectures, such as MAM or Sushi. Adult sponges consist of two layers of cells with epithelial features supported by a central cavity, the mesohyl, consisting of rigid material. Fibrillar and basement membrane collagens have been identified in the mesohyl and in the lamina were the two layers of cells attach, respectively45,46. Therefore, sponges constitute the first class of organisms where LOX enzymatic activities might have begun to sculpt the ECM. Whether LOX may have provided Porifera with novel capabilities such as spicule biomineralization or body stiffening required for efficient water flow is at present unknown. It is worth mentioning that neither Ctenophora nor Placozoa have LOX genes. The origin of the eumetazoans witnessed the main branching of LOX isoforms, giving place to the LOXL2/L3/L4 and LOX/L1/L5 superfamilies (Fig. 1B and Fig. 3). The former kept the SRCR-LOX architecture invariably from arthropods to vertebrates, with minimal variations in the number of SRCR domains. The observation that this class of LOX is present in arthropods such as Drosophila melanogaster, which lacks fibrillar collagen, suggests that these LOX isoforms might preferentially (but not exclusively) cross-link basement membrane collagen IV and thereby controlling ECM stiffness, as recently described26,47. In fact, collagen IV-cross linking activities for mammalian LOXL2 and LOXL4 have recently been reported6,7. Nevertheless, intracellular functions beyond matrix cross-linking have been also reported for LOXL enzymes, for instance transcriptional regulation or control of cell cycle and apoptosis for LOXL28,9.

In contrast to LOXL2/L3/L4, the LOX/L1/L5 superfamily experienced significant changes in domain architecture during evolution. While forms present in cnidarians retain SRCR domains, LOX/L1/L5 from tunicates and cepholochordates show no recognizable associated domains and chordates and vertebrates display forms with propeptide and proline-rich regions typical of mammalian LOX and LOXL1 (Fig. 1B). As shown in Fig. 3, the appearance of LOX isoforms with these domain architectures is coincident with a significant expansion of vertebrate-specific ECM innovations, a circumstance reinforcing their widely accepted role as catalyzers of lysine-derived cross-links in fibrillar collagens and elastin. To this respect, LOX and LOXL1 have been reported to interact with tropoelastin through sequences in the N-terminal pro-regions48. Although the specific motifs within the pro-regions of LOX and LOXL1 that drive the association with elastin are not known, significant homology exists at the N-terminal sequence to support this interaction. Additionally, strong binding has been reported between LOX and fibulin-4 and LOXL1 and fibulin-54,49. Fibulin-4 and -5 are essential proteins for the assembly of elastic fibers and their interaction with LOX isoforms seems to facilitate the cross-linking of tropoelastin within elastic fibers50. Based on these observations, it can be inferred that LOX and LOXL1 forms evolved to contribute to elastogenesis, an assumption further reinforced by the result of the inactivation of these genes in mouse models, both giving rise to vascular phenotypes due to impaired elastic fiber formation3,4.

It is interesting to mention that LOX and LOXL1 are proteolytically processed by bone morphogenic protein 1 (BMP1)/Tolloid-like metalloproteinases51,52,53,54. First identified as pro-collagen C-proteinases, this family of proteolytic enzymes has been described to cleave a wide repertoire of substrates55. It is worth mentioning that, with the exception of apolipoprotein 1 and gliomedin, which play unique roles in lipid metabolism and peripheral nervous system, respectively, BMP1 substrates belong to the category of ECM proteins or ECM-related factors, including fibrillar procollagens, small leucine-rich proteoglycans, basement membrane components and mineralization factors, among many others55. The fact that LOX and LOXL1 forms are also cleaved by BMP1-related proteases suggests that the primary function of these LOX forms is matrix-oriented. LOX and LOXL1 needs to be processed to yield the catalytically active forms. Therefore, it is conceivable to propose that the proteolysis step serves as a quality control step to keep the LOX enzyme in a latent state until the proper substrate is encountered.

Another important vertebrate ECM innovation is fibronectin, an adhesive protein involved in many cellular responses with a significant role in wound healing56. In this context, the formation of a fibronectin matrix is critical for the subsequent assembly of types I and III collagen fibrils. The canonical LOX has been reported to interact with fibronectin through sequences both in the pro-region and in the C-terminal57. In fact, fibronectin may also contribute to the processing of the pro-enzyme, as fibronectin scaffolds support BMP1 binding through periostin58,59. Taken together, these evidences point out to a significant role for LOX and LOXL1, through their associated domains, in chordate/vertebrate-specific ECM building, particularly in the circulatory system and during tissue repair. Within these functions, it is interesting to note that LOXL5, present in early-branching vertebrate clades of fishes (Actinopterygii, Chondrichthyes and Sarcopterygii), contains both the proline-rich and propeptide regions. Thus, fishes retain both functionalities in the same enzyme, whereas its sister LOX family, present in the other vertebrates, has lost the proline-rich region. This probably reflects the specialization of the canonical LOX in particular functions in non-fish vertebrates.

In conclusion, our phylogenetic analysis of LOX proteins permits to trace the evolution of this family of enzymes, particularly in the context of the acquisition of the ECM components, collagen and elastin. Fig. 3 illustrates the appearance of LOX proteins within the elaboration of ECM components during eukaryotic evolution. Remarkable events include: 1) the presence of LOX forms in unicellular eukaryotes, associated to several domain architectures presumably serving extracellular protein-protein interactions; 2) the acquisition of SRCR domains as a specific feature of animals, presumably coincident with the appearance of true ECM in early metazoans; and 3) the generation of chordate/vertebrate LOX forms possibly supporting novel ECM innovations such as elastin and fibronectin.


Taxon sampling and sequence retrieval

LOX sequences were queried in complete genome or transcriptome sequences of 117 eukaryotic taxa representing all known eukaryotic supergroups, as well as all the major metazoan clades. Taxon sampling includes 37 metazoans, 10 unicellular holozoans, 24 fungi, 2 nucleariids, 1 apusozoan, 4 amoebozoans, 7 plants, 5 chlorophytes, 3 rhodophytes, 1 glaucophyte, 8 heterokonts, 6 alveolates, 1 rhizarian, 1 haptophyte, 1 cryptophyte and 6 excavates (Supplementary Tables S1 and S2, list of sequences in Files S5 and S6). Prokaryotic sequences were queried in the NCBI non-redundant database and the Microbial Dark Matter Project database60. The proteins with LOX domains were retrieved from the complete proteomes with HMMER61, using a Hidden Markov motif of the LOX domain as defined by Pfam (PF01186)62. These proteins were inspected using Pfamscan and manual alignments to assess the presence of protein domains including those found in mammalian LOX, such as the proline-rich and pro-peptide motifs, or scavenger receptor cysteine-rich domains62.

Phylogenetic inference

The LOX domains (PF01186) of the retrieved sequences were aligned using the Mafft 7 L-INS-i algorithm, optimized for local sequence homology63. Two alignments were produced: 1) one containing eukaryotic, bacterial and archaeal proteins (154 sequences, 217 alignment positions; using eukaryotes from Supplementary Table S1); and 2) another one with just animal and ichthyosporean proteins (129 sequences, 283 aligned positions; using animals from Supplementary Table S2). According to ProtTest 3.4 analyses of each alignment64, the most suitable evolutionary models were WAG+Γ+F and LG+Γ+I, respectively (“Γ” stands for a gamma distribution of among-site rate variation with 4 discrete categories; “I” means that a proportion of invariable sites is considered; and “F” means that empirical amino acid frequencies are inferred from the alignment). The phylogenetic trees of each of these alignments were inferred using the corresponding model of evolution, with two independent methods: Maximum Likelihood (ML) and Bayesian Inference (BI). ML trees were estimated with RAxML 8, starting from 100 random trees and selecting the best inference according to the Γ-based likelihood value65. Statistical support for bipartitions was estimated by performing 100 bootstrap replicates, using RaxML with the same evolutionary models. BI trees were estimated with Phylobayes 3.366 (which does not account for empirical amino acid frequencies nor invariable sites), running two parallel chains for each alignment. To decide when to stop the runs, we regularly performed a series of bpcomp tests on each pair of chains every 5,000 generations, consisting in burning-in the tree lists every 1% of the generations run so far. The final trees were built using the number of generations and burn-in values that yielded the lowest maxdiff statistics, sampling every 10 trees (provided it was under the 0.1 threshold recommended by Phylobayes). This resulted in 30,000 generations and 5% of burning for the animal and ichthyosporean alignment and 60,000 and 7% for the eukaryotic and prokaryotic alignment. Bayesian posterior probabilities (BPP) were used for assessing the statistical support of each bipartition. Using these phylogenetic trees, the evolution of LOX enzymes across eukaryotes and prokaryotes was reconstructed, based on a consensus tree of life drawn from different studies67,68,69.

Annotation of molecular features

The protein domain architectures of the retrieved sequences were analyzed using Pfamscan70. The full proteins were also analyzed with SignalIP 4.171 and TMHMM 2.072 to search for signal peptide cleavage sites and transmembrane helical domains, respectively (default parameters in both cases). To assess whether the identified LOX domains can have catalytic activity, the InterPro IPR019828 conserved site was searched73. Proline-rich and propeptide regions were manually checked in the alignments. Annotations of molecular features are provided in Supplementary Files S7 and S8.

Assessment of horizontal gene transfers

In addition to the information provided by phylogenetic inference, the possibility of horizontal gene transfer (HGT) events between taxa was tested using a reciprocal BLAST approach. Two sequences were considered to be connected if they were reciprocal BLAST hits of each other with an e-value <1010, when queried against a combined database consisting of the full NCBI non-redundant protein database, the Microbial Dark Matter database and our selected eukaryotic taxon sampling (see above). The network visualizations of the reciprocal BLAST hits were generated using Cytoscape 3.1.1, clustering the nodes using the built-in force-directed algorithm74.

Additional Information

How to cite this article: Grau-Bové, X. et al. Origin and evolution of lysyl oxidases. Sci. Rep. 5, 10568; doi: 10.1038/srep10568 (2015).