The evolution of BDNF is defined by strict purifying selection and prodomain spatial coevolution, but what does it mean for human brain disease?

Brain-Derived Neurotrophic Factor (BDNF) is an essential mediator of brain assembly, development, and maturation. BDNF has been implicated in a variety of brain disorders such as neurodevelopmental disorders (e.g., autism spectrum disorder), neuropsychiatric disorders (e.g., anxiety, depression, PTSD, and schizophrenia), and various neurodegenerative disorders (e.g., Parkinson’s, Alzheimer’s, etc.). To better understand the role of BDNF in disease, we sought to define the evolution of BDNF within Mammalia. We conducted sequence alignment and phylogenetic reconstruction of BDNF across a diverse selection of >160 mammalian species spanning ~177 million years of evolution. The selective evolutionary change was examined via several independent computational models of codon evolution including FEL (pervasive diversifying selection), MEME (episodic selection), and BGM (structural coevolution of sites within a single molecule). We report strict purifying selection in the main functional domain of BDNF (NGF domain, essentially comprising the mature BDNF protein). Additionally, we discover six sites in our homologous alignment which are under episodic selection in early regulatory regions (i.e. the prodomain) and 23 pairs of coevolving sites that are distributed across the entirety of BDNF. Coevolving BDNF sites exhibited complex spatial relationships and geometric features including triangular relations, acyclic graph networks, double-linked sites, and triple-linked sites, although the most notable pattern to emerge was that changes in the mature region of BDNF tended to coevolve along with sites in the prodomain. Thus, we propose that the discovery of both local and distal sites of coevolution likely reflects ‘evolutionary fine-tuning’ of BDNF’s underlying regulation and function in mammals. This tracks with the observation that BDNF’s mature domain (which encodes mature BDNF protein) is largely conserved, while the prodomain (which is linked to regulation and its own unique functionality) exhibits more pervasive and diversifying evolutionary selection. That said, the fact that negative purifying selection also occurs in BDNF’s prodomain also highlights that this region also contains critical sites of sensitivity which also partially explains its disease relevance (via Val66Met and other prodomain variants). Taken together, these computational evolutionary analyses provide important context as to the origins and sensitivity of genetic changes within BDNF that may help to deconvolute the role of BDNF polymorphisms in human brain disorders.


A Primer of the Molecular Biology of BDNF and its Functional Topology
BDNF is encoded by the BDNF gene [29], whose expression is regulated in humans by an antisense-gene (BDNF-AS) that can form RNA-duplexes to attenuate translation [30]. Thus, the natural antisense for BDNF is capable of directly downregulating endogenous expression on demand [31]. The BDNF gene in humans comprises 11 exons [30] and can produce at least 17 detectable transcript isoforms [29]. Different transcripts are induced in response to activity and/or cellular states, allowing the BDNF gene to adjust to environmental stimuli and potential selection pressures. However, all transcripts ultimately yield a singular preproBDNF protein that (prior to intracellular processing, cleavage, and transport) can be partitioned into three domains [11,29]: a signal peptide, a prodomain, and the mature domain. The signal domain is only 18 amino acid residues long, possessing ambiguously defined functionality, and the majority of BDNFs functional outputs reflect sequence specificity to the prodomain and mature domain. The BDNF prodomain encodes binding sites for intracellular transport of both BDNF mRNA [32] and BDNF protein [33], and contains numerous post-translational modification sites [29]. The BDNF prodomain is also the resident location of a widely studied Single Nucleotide Polymorphism (SNP) in neuroscience (Val66Met, or rs6265) [1], and the Furin consensus sequence (Arg 125) for cleavage to its mature form (including by plasmin [34]). The prodomain is composed of 110 amino acids within the N-terminus, and must be processed via proteases to generate mature BDNF [5]. The mature domain of BDNF is composed of, almost exclusively, the Nerve growth factor (NGF) domain and is responsible for the canonical trophic actions associated with BDNF, e.g., long-term potentiation, rapid acting antidepressant effects. Following intracellular handling, processing, and transport, the preproBDNF isoform is cleaved to yield the mature BDNF peptide (which only contains the mature NGF domain). For many years the prodomain was thought to be cleaved following transport and thus destined for degradation. However, recent work has shown that the cleaved prodomain can be secreted and bind as a ligand to novel receptors (e.g., SorCS2) [27]. Thus, the BDNF prodomain can accordingly influence brain circuits as well as behavior [7]. For a comprehensive, detailed, analysis of the various intricacies of the BDNF gene, protein, and its regulation, more information is provided in [29].

The Conservation of BDNF & Neurotrophins
One of the interesting curiosities surrounding BDNF is its relationship to other neurotrophic (NT) growth factors, comprising NGF, NT-3, NT-4. Thus, neurotrophins retain some intercalated functionality. Each share some commonalities in structure (pre-, pro-, and mature-domains) [29], post-translational modification potential (e.g. glycosylation [35]), as well as catalytic processing, trafficking, and composition [36]. Specifically, neurotrophins share approximately 50% sequence homology [29], and a comparison of domains and motifs reveals that each comprises a prototypic NGF-domain as the principal component of the mature pro-growth peptide for each factor (see PFAM database [37]). While each neurotrophin elicits functionality via binding to cognate receptors, neurotrophins also exhibit cross-affinity amongst neurotrophin receptors [38] presumably due to their high rates of structural homology. Not surprisingly then, there is some redundancy in the pro-trophic effects of neurotrophins, yet each still maintains nuanced functionality which remains specific to each factor during central nervous system development [39]. Differences in the evolution and temporal dynamics of regulatory sequences, which target gene-products to specific destinations within cell-compartments (e.g. dendrites) [40] or to processing routes (e.g. the activity-dependent release pathway) which alter secretory dynamics and/or bioavailability [41], likely contribute to both similarities and differences between neurotrophins. However, almost nothing is known of how the BDNF prodomain has evolutionarily adapted to specifically regulate BDNF dynamics. While evolution has almost certainly shaped the sequences, structure, and function of BDNF, the modeling of such remains relatively unexplored but could provide important insight into the phylogenetic evolutionary history of BDNF, it's selection pressure sensitivity across lineages, and quantitative metrics of evolutionary change across species.

Purpose of this Study
Here, we use computational methods to explore the comparative evolutionary genomics of the neurotrophic factor BDNF. By reconstructing phylogenetic trees of BDNF in mammals (Mammalia), we utilize sequence alignments of over 160 species to determine unique genomic attributes of BDNF. In specific we investigate which sites in BDNF are subject to pervasive (i.e., consistently across the entire phylogeny) diversifying selection (FEL) or pervasive/episodic (i.e., only on a single lineage or subset of lineages, diversifying selection (MEME). Likewise, utilizing multiple models for the inference of selective pressure and the evaluation of evolutionary change, we identify novel sites within the BDNF prodomain and mature peptide coding regions that are susceptible to synonymous and nonsynonymous changes. Additionally we investigate which sites in BDNF may be coevolving (BGM). Taken together, these computational evolutionary analyses provide an important context as to the origins and sensitivity of genetic changes within the BDNF gene, which may be important for providing insight into genetic risk factors linked to disease in humans.

Predominant Purifying Selection in BDNF
A common approach to gain an increased understanding of the evolutionary forces that have shaped proteins is to measure the omega ratio ω consisting of the non-synonymous (ꞵ or dN) and synonymous (α or dS) substitution rates with ω = ꞵ/α for each site in a particular gene of interest [47]. We define two major substitional changes for the amino acid being coded for at each site: synonymous changes, which keep the same amino acid coded for at a particular site and nonsynonymous changes, which change the amino acid coded for at that particular site. Nonsynonymous changes can have strong influences on the structural, functional, and fitness measures of an organism. This is in contrast to synonymous changes which leave the amino acid at a particular site unchanged, but can confer weak fitness effects through the emergent properties of codon usage bias, mRNA structural stability, translation and tRNA availability. However, synonymous changes are typically understood to represent neutral selection acting on coding sequences, and provides a baseline rate against which non-synonymous evolutionary rates can be compared. The omega ratio ω of relative rates of non-synonymous and synonymous substitutions is a common measure in evolutionary biology of the selective pressure acting on protein coding sequences. These estimates provide increased information availability as to the type of selection (positive, with omega > 1 or negative, with omega < 1, or neutral with omega = 1) that has acted upon any given set of protein-coding sequences.
As FEL analysis is a sensitive measure of negative (purifying) selection, for our FEL analysis, we observe a predominant amount of purifying selection (over 66% of sites, 174 sites out of 261; Table S1) in our recombination free alignment for BDNF. The dN/dS estimates for the entire alignment were plotted including 95% lower-and upper-bound estimates (see Figure 1 or Table S1). Overwhelmingly, the mature NGF-domain of the BDNF exhibited evidence of greater pervasive negative purifying selection relative to the prodomain region of BDNF. Thus over the evolutionary history in mammalia, negative selection has predominantly occurred in the regions of BDNF that encode the functional mature protein that binds TrkB to elicit neurotrophic effects. The mature domain of BDNF has exhibited remarkable conservation across innumerous epochs defined by rapid evolutionary adaptation (see Figure 1) in other genes and species. Additionally, we plot 95% confidence intervals (CI) for each site. These results are also available in Table S1. We observe a high degree of strict purifying selection in the Human NGF region. The region for Human NGF corresponds to alignment sites 144-254 (NP_001700.2, and https://github.com/aglucaci/AnalysisOfOrthologousCollections/blob/main/tables/BDNF/B DNF_AlignmentMap.csv). This alignment of BDNF across all selected species (Mammalia, see Table 1) reveals a site-specific positive/adaptive diversifying selection and negative purifying selection. The thick line represents the point estimate (i.e. the evolutionary pressure) and the shadings reflect 95% confidence intervals which relate to the upper and lower bound of the point estimates. As shown, the prodomain sites exhibit more pervasive/episodic and positive/diversifying evolutionary selection, consistent with the fact that more disease associated SNPs occur in this topological region of the BDNF gene in humans (i.e. early prodomain mapping not further shown due to nuanced variation across mammalian species).

Specific Sites that are Evolving Non-Neutrally
To examine specific sites for episodic adaptive evolutionary selection, we utilized an algorithm known as MEME which is fundamentally similar to our FEL analysis (described above) except that it applies a more sensitive method for the detection of both pervasive (persistent) and episodic selection (transient selection occurring only on one or a subset of branches in the phylogenetic tree) as compared to only pervasive selection which occurs across all branches of the phylogenetic tree. Essentially, only a subset of the lineages (i.e. species) are affected allowing for a more granular/sensitive method of detecting selection (whereas FEL is better geared towards broad changes). This analysis revealed that for all sites, only 2.3% (6 of 261; see Table  2) exhibit evidence for episodic diversifying selection (i.e. positive selection) in at least one branch within the phylogeny. Spatially, these mutations occur outside of the NGF functional region of BDNF. Further, this result is essentially relevant as the MEME analysis is a sensitive measure of episodic selection. The sites we observe as statistically significant are as follows: 26, 27, 30, 38, 249, 254. For comparison, these specific sites were re-aligned to the respective human sites with indel (insertion/deletion) events accounting for any respective discrepancy in specific site numbers. When mapping these sites to the human BDNF coordinate system, they correspond to: 26, 27, 29, 36, 238, 240, respectively. Here, we plot only the statistically significant co-evolving pairs. The number of shared substitutions between pairs of co-evolving sites is visualized by the size of the circle, with larger circles indicating more shared substitutions. Poster probability of the interaction (coevolving pair) corresponds to the color blue, with dark blue indicating higher values. Individual BDNF sites are mapped on both the X and Y axis so that readers can view which sites are coevolving with another. Once more, note that the coevolution tends to be focally constrained to the broader BDNF prodomain region at a topological level, which is once more consistent with the idea that the NGF domain (site >144; see Figure 3) is highly conserved and probably deleteriously impacted by variation. However, we did discover four sites of coevolution in the NGF domain (basically, the mature BDNF protein) that are evolving with early prodomain sites. This highlights that both proximal and distal sites in BDNF can, and indeed are, evolving together over time.

Evidence of Coevolutionary Forces
To examine the coevolution of sites, i.e. if one particular amino acid was evolving intandem with another, we subjected our protein-coding gene sequences to the BGM algorithm which leverages Bayesian graphical models [51]. The BGM algorithm infers substitution history through the use of maximum-likelihood analyses for ancestral sequences and maps these to the phylogenetic tree, which allows for the detection of correlated patterns of substitution [51]. For our BGM analysis, we find evidence for 23 pairs of coevolving sites. This suggests interaction dynamics in tertiary space of the 3D, folded, protein level (see relevant sites in Figure 2) BDNF protein structure. Or, otherwise, is evidence for coevolving sites due to other fitness consequences related to compensation for maladaptive changes in one part of the protein sequence that may have occurred. When we review these sites, we notice that several pairs (see Figure 2) occur within alignment sites which correspond to the Human BDNF coordinate system ( Table 2) . Additionally, threedimensional reconstruction -here focusing on a specific heterodimer configuration of BDNF and NT-4 as an example of a spatial protein-protein interaction -highlights that coevolving sites as well as positively evolving sites are likely to have been "fine tuned" over time to help support BDNF's cognate functionality (see Figure 3). Mapping our FEL purifying sites in a structural configuration was not shown due to the overwhelming nature of negative selection acting on BDNF within mammals.

Figure 3. BDNF NT-4 Heterodimer Structural Analysis to Highlight Coevolving and
Adaptive Sites at the 3D Protein Level. Demonstrating the structural configuration of the BDNF (blue) and NT-4 (pink) heterodimer (see https://www.rcsb.org/structure/1B8M), with rotations (arbitrary degrees) shown to accentuate view of coevolving sites (orange, see also Figure 2 and Table 3) and positively evolving sites (red; see Table 2). The PDB structure is limited largely to the NGF domain which limits our ability to highlight sites of interest (SOI), therefore we have limited our annotation only to the modeled sites in the structure. The relative positioning of coevolving and positively evolving sites in this heterodimer visualization (proximity to looping and other macro tertiary structures of protein). An interactive figure that is rotatable in 3D space, where occupations occur in three-dimensions (i.e. teasing out relative proximity in 2D linear space), is available here https://observablehq.com/@aglucaci/bdnf-structure.

DISCUSSION
In this study, we explore the evolutionary history of the BDNF gene in Mammalia. The BDNF gene is implicated in a number of human diseases including a variety of brain disorders such as neurodevelopmental disorders (e.g. autism spectrum disorder), neuropsychiatric disorders (e.g. depression, PTSD, schizophrenia), and some neurodegenerative disorders [1]. By using orthologous BDNF sequences within the Mammalia taxonomic group, our results indicate that within species, unique evolutionary pressures and site-specific changes within the BDNF gene have evolved across time. We performed a number of comparative evolutionary analyses to tease out signals from our orthologous gene collection in BDNF. Of note, the BDNF gene elicits tight regulation and specific functionality that can be separated from other neurotrophins, yet these growth factors remain closely related in their structure and sequence and conservation of the NGF functional domain. In the NGF domain, we observe a high degree of conservation (via purifying selection) across species, owing to the functional importance of this region in proteinprotein interactions. This work additionally provides broad comparative insights into the evolutionary history of the BDNF gene family. Our MEME method identified novel substitutions (see Table 2) in regions of BDNF that may provide significant areas of interest for designing molecular therapeutic approaches, and their potential broader significance are outlined in further detail below.

Predominant purifying selection across BDNF in Mammalia
Over time, evolution drives the divergence of genetic sequences. What can we learn from the direct comparison of the sequences of the BDNF gene in Mammalia? By comparing the BDNF products of orthologous genes in different species we observe the accumulation of mutations at different sites with varying degrees of insight into both BDNF functionality (see [29] for site annotation) and potential disease [1]. These are summarized in-full within Table S1 and Table 2. Coding sequences with highly constrained structures are expected to fix non-synonymous mutations at a slower rate due to the maladaptive nature of changes such as what we observe with the FEL negatively selected sites across BDNF. Additionally, we observe a high degree of negative (purifying) selection across the main functional domain (NGF) of BDNF. While structures for the NGF domain in most species under analysis do not exist, based on our findings we expect a highly conserved tertiary structure. Based on the high degree of purifying selection observed across BDNF, we hypothesize that BDNF plays a critical role in the underlying network of genes governing homeostasis and normal organismal development. This may have happened because BDNF is particularly useful specifically for the phylogenetic branch in question (i.e. mammals). This interpretation also tracks with the observation that BDNF is essential for normative development and is basically lethal in non-conditional full knock-out mammalian models.

Non-Neutral Positive Diversifying Evolution Sites in the BDNF Gene
It has been described that BDNF particularly plays a role as a foundational gene for brain development [11]. Despite a significant level of purifying selection shaping the evolutionary history of BDNF (Figure 1), we observe several novel statistically significant sites under positive episodic diversifying selection across the BDNF gene (see Table 2). Traditionally, evolution of this variety consists of amino acid diversifying events that may promote phylogenetic adaptation and/or functionality. These results are entirely novel -they have not been previously reported (to the best of our knowledge) and MEME is an established and sensitive method for the analysis of episodic diversifying sites. Thus, the very specific and limited sites within BDNF to exhibit such patterns is a highly promising result from which to further disentangle BDNFs complex functionality and disease linkage. Presumably, we would encourage biologists to consider these sites as those that may contain important adaptive functions within the BDNF gene. However, where our results fall within the context of a core protein-protein interaction network of required genes for neural cellular diversity and development is yet to be determined. We do note that at least one identified site (238) overlaps with potential post-translational modifications to the human BDNF peptide (specifically, a disulfide bridge; see UniProt and [29]). This supports the idea that non-neutral positive diversifying sites within BDNF are not spurious and likely reflect specialized, regulatory, or functional capacities that may have yet to be annotated in-full. Given that this manuscript is devoted to analysis of BDNF's evolution in mammals, we highlight the potential importance of these sites but emphasize that their importance remains a hypothesis that should be tested in well-defined experiments under controlled laboratory conditions.

Discovery of Proximal & Distal Coevolving Site-Pairs in the BDNF Gene
Another novel, and potentially important, series of findings in this manuscript was the presence of numerous sites that exhibit coevolution. In fact, we observe a significant number of coevolving sites within the BDNF gene (see Figure 2, Table 3), and these too reflect an entirely novel aspect of BDNF biology that has not previously been reported. Evidence of coevolving sites are not limited to a particular domain (e.g. prodomain vs mature) nor specific motifs. Instead, coevolving pairs seem to be distributed across the entire BDNF gene with, perhaps unsurprisingly, an increased density of interactions early in the pre-pro domain region. However, we also note that there are coevolving sites in the mature NGF domain which are "linked" to early domain sites. Importantly, these relationships may confer strong epistatic interactions shaping the continued evolution of this critically important gene. The new evidence for coevolution may point to the importance of these sites in shaping the early regulatory or main functional (NGF) domain of BDNF. These residues may form important interactions for the functional integrity of BDNF and, importantly, the highly specific pairs which span the BDNF prodomain and its mature region point to a new mechanism by which the BDNF prodomain may have co-regulated the mature domain (or vice versa). Alternatively, these coevolving pairs may be part of a network of residues occupying a shifted fitness landscape in order to accommodate new or species-specific functional requirements.

Potential Structural Implications of Evolving Sites
In considering our observation of both diversifying selection and coevolving sites in the BDNF gene, we considered the potential implications this may have at a protein structural level in three-dimensional space (see Figure 3). While protein structural impacts from evolution remain poorly understood and cannot be completely experimentally disentangled in a confirmatory sense, the implications fall upon our understanding of basic BDNF neurobiology. Here we note that our BGM and FEL analyses implicate the prodomain -the primary topological region of BDNF known for polymorphic variability (e.g. Val66Met, Gln75His) that is often linked to disease [1,11,29], and our 3D modeling suggests that two of our co-evolving sites appear to be associated with looping structures that could have important yet to be discovered functionality. In this regard, we predict that the evolutionary changes described here are likely to reflect some form of specialization and/or divergence in function and/or interaction partners at different points of BDNF's evolutionary history in mammals. Thus, further work may unveil yet more novel sites that could provide further insight into the origins of BDNF's diverse functionality and its role in disease.

Limitations of our Computational Evolutionary Analysis
This analysis focused on BDNF sequences contained in the taxonomic group Mammalia in lieu of examining a more inclusive dataset for BDNF containing sequences from all of Gnathostomata (jawed vertebrates) or extension into invertebrate clades which may contain BDNF or BDNF-like analogue genes. Our results are applicable to mammals, which are our intentional taxonomic group of study, but we nonetheless emphasize that our results do not capture the entirety of BDNF's evolutionary history (e.g. there could be more to learn about BDNF from birds, lizards, fish, and higher order taxonomic groups which we do not evaluate here). In addition we do not explore the patterns or mutational processes occurring outside of codingsequence evolution which include complex structure and dynamics of non-coding regions in the BDNF gene or across. Therefore, evolutionary temporality is important in the context and interpretation of our results because Mammalia represents only a portion of the long evolutionary history of BDNF. Although we failed to find evidence for recombination in our dataset, species where we may find evidence for recombination may have been precluded from our analysis due to our decision to focus on Mammalian BDNF gene evolution. Further, a limitation of the current analysis is owed to the presence of indel events, especially in the early region of the alignment but which also occur in other spatially distributed regions of the BDNF gene. These indel events are not currently modeled in existing codon substitution models but may represent an additional pathway of evolutionary change. Nonetheless, the prominence of indels in our observations indicate that several regions of BDNF may evolve significantly through indel events across species. Lastly, although there is a risk that the "gappy" nature of the early region of our multiple sequence alignment may be a computational artifact of the alignment procedure, based on all other outputs we believe that our results are reasonably interpreted and have subsequently tolerated these potential effects.

Future Directions: Understanding the Remainder of the Neurotrophin Family
We hypothesize that the similarities between neurotrophins reflects conserved evolutionary selection for motifs and domains for which support common functionality in neurotrophic factors between sites and lineages. While we note significant isotropy in mature peptide sequences for these factors, anisotropic pressures likely influenced the prodomain sequences of neurotrophins leading to alterations in processing, trafficking, regulation, and secretion. As such, we also predict differences in the evolutionary fate of other neurotrophins which also exhibit compartmentalized functionality due to similar alterations within their prodomains (i.e. similar results may be reasonably anticipated not just in BDNF, but also NGF, NT-3, and NT-4).

CONCLUSION
To sum up, our research modeled the natural evolutionary history of changes in the BDNF gene across >160 mammalian genomes. Conservatively, this analysis spans approximately ~177 million years of evolution -and going deeper could yet reveal more information on the ontogenesis of BDNF and its topological structure (and, consequently, function). Notably, we observed strict purifying selection in the main functional domain of the BDNF gene in mammals and discovered 6 specific sites in our homologous alignment that are under episodic selection in the early regulatory region of BDNF (i.e. the prodomain) and in the terminal region of BDNF. We also make the case for spatial coevolution within this gene, with 23 pair-sites that have evolved together. In sum, these data go above and beyond the common trope that "BDNF is highly conserved" by defining exactly where and how the mammalian BDNF has evolved. Thus, we confirm the widespread belief that the BDNF prodomain is more prone to change than the mature BDNF protein, having important implications for how we think about and consider genetic variation in BDNF and its linkage to disease.

Data Retrieval
For this study, we queried the NCBI Ortholog database via https://www.ncbi.nlm.nih.gov/kis/ortholog/627/?scope=7776. For the purpose of this study, as we are interested in mammalian BDNF evolution, we limited our search to only include species from this taxonomic group (mammals, Mammalia). This returned 162 full gene transcripts and protein sequences. We downloaded all available files: RefSeq protein sequences, RefSeq transcript sequences, Tabular data (CSV, metadata). In Table 1, we provide a table of the species included in this analysis but we also make this accessible via GitHub. Furthermore, we also make these species NCBI accessions (see also Table 1) available for download on GitHub: • AnalysisOfOrthologousCollections/BDNF_orthologs.csv at main · aglucaci/AnalysisOfOrthologousCollections · GitHub

Data Cleaning
We used the protein sequence and full gene transcripts to derive coding sequences (CDS) (via a custom script, scripts/codons.py). However, this process was met with errors in 20 "PREDICTED" protein sequences which had invalid characters such as sequences which have incorrect 'X', or unresolved amino acids and these sequences were subsequently exempt from analysis. This process removes low-quality protein sequences from analysis which may inflate rates of nonsynonymous change.

Analysis of Orthologous Collections (AOC): Alignment, Recombination Detection, Tree inference & Selection Algorithms
The Analysis of Orthologous Collections (AOC) application is designed for comprehensive protein-coding molecular sequence analysis (https://github.com/aglucaci/AnalysisOfOrthologousCollections). It accomplishes this through a series of comparative evolutionary methods. AOC allows for the inclusion of recombination detection, a powerful force in shaping gene evolution and interpreting analytic results. As well, it allows for lineage assignment and annotation. This feature (lineage assignment) allows between group comparisons of selective pressures. This application currently accepts two input files: a protein sequence unaligned fasta file, and a transcript sequence unaligned fasta file for the same gene. Typically, this can be retrieved from public databases such as NCBI Orthologs. Although other methods of data compilation are also acceptable. In addition, the application is easily modifiable to accept a single CDS input, if that data is available.
If protein and transcript files are provided, a custom script 'scripts/codons.py' is executed and returns coding sequences where available. Note that this script currently is set to use the standard genetic code, this will need to be modified for alternate codon tables. This script also removes "low-quality" sequences if no match is found, see the above Data cleaning section.
This was performed with a Human BDNF coding sequence NM_001709.5 Homo sapiens brain derived neurotrophic factor (BDNF), transcript variant 4, mRNA as a reference based alignment. Our alignment procedure retained 126 unique in-frame sequences.
Step 2. Recombination detection. Performed manually via RDP v5 [45], see below, the "Recombination detection" section for additional details. A recombination free file is placed in the following folder: results/BDNF/Recombinants. For the purpose of this study, we did not detect recombination in our dataset.
Step 3. Tree inference and selection analyses. For the recombination free fasta file, we perform maximum likelihood phylogenetic inference via IQ-TREE [46]. Next, the recombination free alignment and unrooted phylogenetic tree is evaluated through a standard suite of molecular evolutionary methods. This set of selection analyses include the following but for the sake of brevity some of these results were not shown (essentially, most were not statistically significant or not meaningful as relevant to the evolutionary results presented here).
• MEME: locates codon sites with evidence of episodic positive diversifying selection, [49]. • aBSREL: tests if positive selection has occurred on a proportion of branches, [50].
Step 4A. Lineage assignment and tree annotation. For the unrooted phylogenetic tree, we perform lineage discovery, via NCBI and the python package ete3 toolkit. Assigning lineages to a K (by default, K = 20) number of taxonomic groups. Here, the aim is to have a broad representation of taxonomic groups, rather than the species being heavily clustered into a single group. As a reasonable approximation, we aim for <40% of species to be assigned to any one particular taxonomic group.
Step 4B. We perform tree labeling via the hyphy-analyses/Label-Trees (REF, link) method. Resulting in one annotated tree per lineage designation. For the purpose of this study, we will only consider the following five lineages for additional analyses (Artiodactyla, Carnivora, Chiroptera, Glires, Primates) as they are the most populated lineages.
Step 5. Selection analyses on lineages. Here, the recombination free fasta file and the set of annotated phylogenetic trees (where labeling was performed in Step 4) is provided for analysis with the RELAX and Contrast-FEL methods.

Recombination detection
Manually tested via RDPv5.5 with modified settings as follows: • We also included the following algorithms/analyses: RDP [55], GENECONV [56], Chimaera [57], Recombination was not detected within our Human reference based alignment. Therefore we used the single recombination free alignment for analyses.

Data & Software Availability Statement
The AOC application is freely available via a dedicated GitHub repository at

Conflicts of Interest Statement
The authors are without conflict.     These results are also available at the following link: https://github.com/aglucaci/AnalysisOfOrthologousCollections/blob/main/tables/BDNF/BDNF_F EL_CI.csv