Lotus Base: An integrated information portal for the model legume Lotus japonicus

Lotus japonicus is a well-characterized model legume widely used in the study of plant-microbe interactions. However, datasets from various Lotus studies are poorly integrated and lack interoperability. We recognize the need for a comprehensive repository that allows comprehensive and dynamic exploration of Lotus genomic and transcriptomic data. Equally important are user-friendly in-browser tools designed for data visualization and interpretation. Here, we present Lotus Base, which opens to the research community a large, established LORE1 insertion mutant population containing an excess of 120,000 lines, and serves the end-user tightly integrated data from Lotus, such as the reference genome, annotated proteins, and expression profiling data. We report the integration of expression data from the L. japonicus gene expression atlas project, and the development of tools to cluster and export such data, allowing users to construct, visualize, and annotate co-expression gene networks. Lotus Base takes advantage of modern advances in browser technology to deliver powerful data interpretation for biologists. Its modular construction and publicly available application programming interface enable developers to tap into the wealth of integrated Lotus data. Lotus Base is freely accessible at: https://lotus.au.dk.

Genes and predicted proteins. Gene features such as mRNA, alternatively spliced transcripts (also known as isoforms), exons, and coding sequences were made available both in the form of (1) a GFF3 file, used in a customized JBrowse 21 implementation; and (2) individual BLAST databases. Gene and protein predictions were based on Augustus 22 , Cufflinks 23 , Genemark 24 , and Glimmer 25 . Lotus Base currently hosts gene and protein predictions for two versions of the genome assembly-19,713 predicted genes and 38,482 transcripts for v2.5; 44,483 predicted genes and 98,302 transcripts for v3.0. LORE1 resource. Lotus Base is an integrated, one-stop platform for the LORE1 insertional mutagenesis population, hitherto the largest plant mutagenesis population established ( Table 2). The LORE1 insertional mutagenesis population and its accompanying data (Table 3), collectively known as the LORE1 resource, have been previously described 8 . Lotus Base hosts 121,531 mutant lines containing 629,631 unique insertions, sourced from 14 Danish (DK01-03, 05, 07-16; 108,133 lines) and 3 Japanese (JPA, JPL, and JPP; 13,398 lines) batches 26 . All LORE1 lines have been sequenced and the ± 1000 bp flanking sequences were used for automated primer design using Primer3 27 . In addition, all LORE1 associated data can be downloaded at https://lotus.au.dk/data/ lore1. The LORE1 resource on Lotus Base has so far delivered more than 185,000 seeds from 3,800 unique mutant lines, shipped to 21 countries worldwide. The resource has also seen its use in several reverse genetics studies [28][29][30][31][32] .
With such a large volume of data available, the LORE1 search form is designed to be intuitive and easy to use, allowing users to search for LORE1 lines of interest using a variety of user-defined criteria (Fig. 2). Users may search for LORE1 insertions based on: (1) a LORE1 mutant line identifier; (2) an insertion identifier, also known as a BLAST header, which is an underscore delimited string containing the chromosome, position and orientation of a LORE1 insert); (3) or the gene(s), if any, that the insertion is located in. Due to spatial constraints, data from all fields are not displayed on the search results page, although data export options are available on all pages.
Orders can be placed on all Danish LORE1 lines (108,133; 89% of listed lines) for which seed stocks are available. Listed Japanese lines (13,398; 11% of listed lines) are included in our database but are not available for ordering-users are instead directed to LegumeBase (https://www.legumebase.brc.miyazaki-u.ac.jp/lore1Brow-seAction.do) for ordering said lines.
Expression data. Lotus Base also offers Lotus-related expression data sourced from various studies. The first dataset was derived from the L. japonicus gene expression atlas (LjGEA) project 10 , which combined expression data from additional studies [33][34][35][36][37] . The whole LjGEA dataset consists of 81 conditions sourced from 6 independent published studies 10 , such as the investigation of draught responses 33 , effect of mycorrhizal and symbionts inoculation 34,35 , transcriptome changes in symbiosis defective mutants 35 , effect of salt and nitrate treatment [35][36][37] , and transcriptome regulation in various plant organs 10 . We have mapped probe identifiers from the LjGEA dataset against the annotated proteins of L. japonicus genome v3.0 by performing BLAST alignments of LjGEA probe set against the predicted transcripts from L. japonicus genome v3.0 and selecting for hits with the lowest E-value(s). In addition to the LjGEA dataset, we have also integrated expression data from Lotus roots in response to germinating spore exudates from arbuscular mycorrhiza 5 , containing 3 conditions. Genome browser. The Lotus genome browser is powered by a customized version of JBrowse v1.12.0 21 , with the following tracks publicly available: L. japonicus MG20 reference genome v3.0; predicted protein tracks; LORE1 insertions; genome gaps; repeat masks; and L. japonicus Gifu and MG20 RNAseq reads.
Lotus BLAST and SeqRet, an improved NCBI BLAST and sequence retrieval tool. SequenceServer v1.0.4 38 was modified according to our needs and serves as the backbone for Lotus Base Basic Local Alignment Search Tool (BLAST). Lotus BLAST currently runs using NCBI BLAST+ v2.2.31 executables 39 , allowing users to execute the total suite of BLAST algorithms-blastn, blastx, tblastn, tblastx, and blastp. Various toolkits on our site are integrated with an in-house developed Sequence Retrieval (SeqRet) tool, which allows real time retrieval of accession/identifier-based sequence information across all locally hosted Lotus BLAST databases. Users are Figure 1. The server-side design behind Lotus Base. The resource consists of several tools deeply integrated with each other-LORE1 search, LORE1 order, Sequence Retrieval (SeqRet), BLAST, CORx toolkit (CORGI and CORNEA) and Expression Atlas (ExpAt). MySQL tables are indicated in blue entity boxes with column names listed. Highlighted column names, in orange, are used as primary indexes. Tables are grouped by the function they serve, in relation to individual tools. Due to space restraints, expression datasets are described in further detail in Fig. 4. An overview of all integrated datasets on Lotus Base is available in Table 1. presented with the option to view retrieved sequences in a modal box, or to download them as FASTA files for storage and/or further processing. Sequence Processor (SeqPro). The traditional wwwblast package from NCBI still outputs BLAST results in a monospaced, plain text format that can be problematic to parse for the end user. Users carrying such data from other sites may encounter difficulty in extracting useful sequence identifiers. Sequence Processor (SeqPro) tool is designed as a regular-expression based parser to handle wwwblast output and provide a tabular output. In addition, SeqPro also helps to remove line breaks and number lines from plain text FASTA outputs, which improved readability of sequences if users simply want to store the nucleotide/amino acid sequences without any accompanying metadata such as row counts, nucleotide position numbers, and unnecessary line breaks.
Transcript Mapper (TRAM). As each Lotus genome assembly comes with a unique combination of predicted gene/transcript nomenclature and populations, we have designed a simple tool to aid users in mapping v2.5 to v3.0 transcripts and vice versa. A mapping table has a many-to-many relationship is precomputed by performing BLAST alignments between transcripts from both versions, and storing the highest confidence hits for all transcripts.

Transcript Explorer (TREX).
For users to glean quick information about their genes or transcripts of interest, we have designed the Transcript Explorer (TREX) tool, which is simply a full-text search engine that allows users to pull integrated information related to their search candidates. The search result is tabulated and summarized to display the working name (if any), and the function of the candidate gene/transcript, its position in the Lotus genome and any LORE1 lines with exonic insertions in the gene. Further information and deep links to other toolkits on the site, such as to ExpAt, LORE1 search, individual gene pages, are available in a dropdown menu for each candidate.
Expression Atlas (ExpAt). We have developed a data-driven, web-based visualization tool for L. japonicus expression data. Visualization in the L. japonicus Expression Atlas (ExpAt) tool is powered by jQuery and d3.js 40 . The use of client-side JavaScript enables intuitive and dynamic customization, on-the-fly asynchronous Search functionalities. ExpAt features a simple search form to query the expression levels of candidates (genes or probes, depending on the dataset selected) against a list of published datasets (Fig. 3). The user can subset a dataset by checking individual conditions, which can also be filtered by user-defined keyword(s) using an in-browser full-text search engine implemented using Lunr.js 41 .  (Fig. 4). The "metadata" table contains all metadata associated with each column, such as the age of the plant, the treatment type and/or inoculation pressure. Contents of these metadata fields is fed into Lunr.js 41 for in-browser full-text search. The "data" table contains all the expression data of each dataset. Each row in the "data" table presents a unique gene or probe. Each row is tagged with a unique identifier in the first column, followed by three sets of columns representing the raw data: the "sample values" column, where raw expression levels are delimited with an underscore; the "sample mean" column, where the arithmetic average of raw expression levels is stored; and the "standard deviation" column, where the sample standard deviation of raw expression levels is stored. There is therefore a one-to-three relationship between the "metadata" and "data" tables, as each condition maps to three independent data columns.

Data transformation.
For easing quick visual comparison across genes with significantly different levels of absolute expression, measured by either (1) reads per kilobase of transcript (RPKM) for RNAseq datasets, or (2) arbitrary Affymetrix units for Affymetrix MicroArray datasets, we included two possibilities to transform the expression levels, by normalization or standardization. Data normalization is simply the rescaling of expression values to fit the domain [0, 1], by subtracting the log-transformed sample expression levels, x s , with the lowest log-transformed expression level, (log 10 x) min , followed by the division of the difference between the log-transformed maximum and minimum expression levels, as defined in equation (1). In order to allow comparison for extreme values, expression values are log 10 -transformed prior to normalization. Meanwhile, data standardization 10 serves to rescale the expression levels on a per row basis, across conditions, to have a mean of zero and a standard deviation of one. This is performed by subtracting the sample expression levels (x s ) by the average expression level (μ) across all samples, and dividing the difference with the sample standard deviation computed across all samples (σ), as defined in equation (2).   Clustering. Depending on the size of the matrix, we implemented either k-means clustering (for 1-by-n or n-by-1 matrices), or hierarchical agglomerative clustering (for matrices the size of, or larger than, 2-by-2). The clustering is performed asynchronously on the server-side using SciPy 42 . As clustering is based on heuristics and therefore non-deterministic in nature, users are encouraged to export the sorted order of either, or both axes, should they want to preserve the exact clustering order. For k-means clustering, the default number of starting clusters is set to the square root of the number of conditions queried, rounded up to the nearest integer. For hierarchical agglomerative clustering, the cluster cutoff is set to 0.25 of the maximum cluster distance for both axes, and is allowed to vary between 0 and 1. Complete linkage is used by default, with the option of switching to single, centroid, median, ward, or weighted methods. The default linkage metric used is Euclidean, with other options available: Braycurtis, Canberra, Chebyshev, city block (Manhattan), correlation, cosine, standard Euclidean, squared Euclidean, normalized Hamming, Jaccard, or Minkowski.

CORNEA and CORGI: co-expression gene network visualization and co-expressed gene list retrieval. The co-expression (CORx) toolkit comprises the Co-Expression Network Analysis (CORNEA)
and Co-expressed Gene Identifier (CORGI) tools. ExpAt and CORx toolkit share the same expression datasets. Co-expression gene networks in CORNEA are generated on the fly by a dedicated virtual server, which returns JSON-formatted data used for asynchronous network visualization with Sigma.js 43 in the web browser. CORGI performs a similar function to CORNEA, but instead of generation a two-dimensional co-expression network, simply retrieves a one-dimensional slice by calling a unique gene or probe identifier, which in return generates a list of highly co-expressed entities with the gene or probe of interest.
Generating and displaying network jobs. All CORNEA and CORGI requests are handled by a central co-expression network threaded server setup implemented using Remote Python Call (RPyC) 44 . Both client and server-side logic will check for the validity of the job request, before submitting it to the server. An entry in a MySQL table is created per job for the purpose of storing user settings and metadata of the specific network. This information is freely accessible to the user and can be exported, if the user intends to recreate the network in the future, or to reuse similar settings for network generation using alternative datasets. The submission of a valid job will trigger a redirection to a job-specific URL, which will poll the server for the job status at a set interval until completion. Once the job is completed, the user will receive an email notification if they have indicated as such prior to job submission, containing links to view their live network in the CORNEA application, and to download all data associated with their network, contained in a gzipped JSON-formatted file. The file contains all  the necessary information to display a co-expression network, and within it also stores network metadata such as correlation threshold, minimum cluster size, and job runtime. Users may also visualize networks generated by previous jobs by uploading the JSON file, gzipped or decompressed, using a drag-and-drop interface implemented in CORNEA itself. Using client-side JavaScript, the browser will unzip-if the file is gzipped-and parse the JSON file, which is handed off to SigmaJS to handle the construction of the co-expression network.
We anticipate that several basic co-expression network parameters may be heavily utilized, and in order to reduce the load on the server on generating identical or highly similar networks, we have therefore generated static networks that users can utilize for preliminary exploration. An example of a static network is one that was generated from expression data from the LjGEA dataset with an R 2 threshold of 0.85, and a minimum cluster size of 15. The resulting network was produced in 4 minutes and 48 seconds, with a total of 7,839 nodes-connected by 273,018 edges and found in 17 mutually exclusive clusters (Fig. 5). As CORNEA relies heavily on client-side JavaScript on parsing and displaying the co-expression network, the use of a modern, standards-compliant browser with an optimized, efficient JavaScript engine is strongly recommended.
Computation of co-expression relationships. Prior to pairwise calculation of correlation scores among genes or probes (collectively termed "candidates" hereon), the raw dataset is filtered in order to exclude candidates with highly similar expression pattern across conditions. For a dataset containing N number of candidates with a gene expression profile of c i , the candidate will be removed from analysis if its pattern falls below a dissimilarity threshold compared to another gene expression profile c j as seen in equation (3), while making exceptions for highly similar patterns with obvious peaks as defined in equation (4).
The degree of co-expression of genes is calculated as the squared Pearson's correlation coefficient (R 2 ) between gene and/or probe pairs across conditions. Prior to submission of a CORNEA network generation job, the user is provided with an option to subset their conditions of interest from a list of all conditions available for a given dataset.
Node highlighting. To allow easy identification of the node(s) of interest, we implemented a highlight feature which allows the end-user to filter the displayed nodes in the network by (1) searching for a specific node, using an appropriate identifier depending on the type of dataset used, such as a gene identifier for the LjGEA dataset; or by (2) highlighting an array of nodes using a CSV file. The CSV file should contain no headers, and two columns-the first column containing the appropriate identifier for the queried dataset, and the second  (optional) column containing arbitrary grouping (see supplementary, "File format for advanced node highlighting in CORNEA"). Additional columns in the CSV file will not be parsed, but can be used to store additional metadata.
Public API. To allow other developers to benefit from the scope of our Lotus data, we have developed a public API using Slim framework 45 , a PHP Standard Recommendation (PSR) 7-compliant 46 representational state transfer conformant (REST) service. All API calls are to be authenticated with a secure and cryptographically generated JWT known as an API access token. API access tokens are freely available to developers who have signed up for an account with Lotus Base. Due to the possibility to forge HTTP referral headers, we do not enforce domain-based restrictions on API access tokens. However, any API access token can be revoked at the liberty of developers who have created them, in the event of suspicious use by unauthorized third parties.
Lotus Base API uses a versioning system in order to maintain compatibility with developers using various versions of the API, to account for the possibility of major updates and changes. The Lotus Base API is currently at version 1, and is accessible at https://lotus.au.dk/api/v1. Complete documentation of the Lotus Base API v1 is available at https://lotus.au.dk/docs/api/v1. User accounts. Users may opt to sign up for a new account with Lotus Base for a more personalized experience. We have integrated several popular OAuth 2.0 identitiy providers-LinkedIn, GitHub, and Googleso that users can use alternative online services acting as identity providers to sign in, without the need to sign up manually. Existing users may also opt to integrate their Lotus Base user accounts with the aforementioned identity providers. Lotus Base adopts an ethical design principle giving users control over their own data and accounts. Private information of users is never shared with unaffiliated third parties, and their login credentials cryptographically salted and encrypted.

Usage and Application
As a proof-of-concept use of Lotus Base for a typical end user, we will choose to work with LjFls2, the Lotus ortholog of Arabidopsis FLS2 (AtFLS2). AtFLS2 encodes a bacterial flagellin receptor and is an important component in the induction of an evolutionarily conserved, first line defense responses in plants against pathogens 47 . The functionality of the Lotus ortholog, LjFls2, has also been previously confirmed 48 .

Identification and BLAST search for a Lotus ortholog of AtFLS2. The amino acid sequence of
AtFLS2 (AT5G46330) was obtained from Araport 49 , and searched against the L. japonicus MG20 v3.0 protein database in Lotus BLAST. The top candidate was Lj4g3v0281040.1 with an E-value of 0 and a matching length of 1157. There were no other candidates with this degree of similarity, and a reverse BLASTp performed using the amino acid sequence of Lj4g3v0281040.1, retrieved using the SeqRet tool, against the Arabidopsis TAIR10 protein database revealed AtFLS2 as the single, high-confidence match. Therefore, Lj4g3v0281040.1 is tentatively named LjFls2 and referred to as such hereon.

Lotus organ
Draught tolerance AT3G43790 AtZIFL2 Lj6g3v1052420 LjZifl2   Table 4) in a standard co-expressed genes network map generated from the LjGEA dataset, using an R2 threshold of 0.85 and a minimum cluster size of 25. Some root-based genes-LjCob, LjRhd3, LjSuc2, LjCesA1, and LjCesA2-were not found in the network, due to their expression patterns not meeting the minimum threshold on the squared Pearson's correlation score (R 2 ). Abbreviations: CesA, cellulose synthase family; Cob, COBRA-like extracellular glycosyl-phosphatidyl inositolanchored protein family; Fls2, flagellin-sensing 2; Rhd3, root hair defective 3; Suc2, sucrose-proton symporter 2.
Scientific RepoRts | 6:39447 | DOI: 10.1038/srep39447 LjFls2 is strongly expressed in Lotus roots. Next, we checked the expression of LjFls2 and compared it against the closest Lotus homologs of a handpicked subset of genes with distinct expression patterns in plant development using ExpAt (Table 4). We selected homologs of AtEIR1 50 ; AtSUC2, AtCOB, AtRHD3 51 ; and members of the cellulose synthase family, CesA family 52 , for their root-restricted expression. We also selected members of the SEPALATA family for their role in flower development 53 ; AtZIFL1 and AtZIFL2 for their upregulated expression under draught conditions 54 ; and members of the alpha-galactosidase family for their role in seed development in Arabidopsis 55 and tomato 56 .
We discovered that LjFls2 has an expression pattern that strongly mirrors that of LjEir1, LjSuc2, LjCob, LjRhd3, and the CesA family members that show root expression in Arabidopsis, but not those of genes involved in other developmental stages and/or organs (Fig. 6). Hierarchical clustering was performed in ExpAt, using a Euclidean distance matrix over complete linkage based on squared Pearson's correlation values (R 2 ). This revealed distinct clusters of genes and conditions, with genes clustering into groups demarcated by developmental stage and organ in Arabidopsis, and conditions clustering into groups defined by organs and treatment conditions (Fig. 6).
LjFls2 is located in the same co-expression cluster as genes with root-based expression. In order to visualize the co-expression network around LjFls2, we loaded the standard network generated from the LjGEA dataset in CORNEA, and highlighted network nodes using the gene list in Table 4 ( Fig. 7; see supplementary "Node highlighting in CORNEA with selected genes"). Even when genes strongly expressed in the roots do not show highly correlated expression pattern (R 2 ≤ 0.85) with LjFls2, they are still found in the same mega cluster, suggesting overall similarities in expression patterns. More importantly, flower development genes SEPALATA are found in another distinct mega cluster, and so are those involved in draught responses, LjZifl1 and LjZifl2.
Taken together, this suggests that both ExpAt and CORNEA are reliable tools in not only differentiating, but correctly clustering, distinct gene expression patterns in Lotus. Moreover, both tools complement each other by providing a different perspective on the relationship of the expression patterns between candidate genes-ExpAt allows inference of relationship(s) among user-defined candidates, while CORNEA provides spatial information on how user-defined candidates fit into the overall expression network generated from a dataset.  Table 5. The top 25 highly co-expressed genes of LjFls2, generated by CORGI. The candidates were pulled from a one-dimensional slice across the co-expression matrix generated by CORNEA, ranked by the squared Pearson's correlation coefficient (R 2 ) in descending order. CORGI returns 25 rows by default, but may be configured to return up to 100 candidates.
Scientific RepoRts | 6:39447 | DOI: 10.1038/srep39447 Genes that are strongly co-expressed with LjFls2 have been functionally validated. CORGI was used to generate a list of the top 25 highly co-expressed genes of LjFls2 (Table 5), and putative Lotus orthologs of four candidates whose expression patterns have been verified by published literature to be correlated with, or induced by, flagellin exposure-AtCDR1-like, AtNST1-like, MtCHS1-like, and AtMKS1-like. These genes were found not only in the same co-expression megacluster, but also directly connected to LjFLS2 in the network (Fig. 8).
Lj6g3v1880370 (1 st , R 2 = 0.933) is highly similar to a gene encoding for an aspartyl protease-like protein in Arabidopsis. A gene encoding an apoplastic aspartyl protease, AtCDR1, is found to play an important role in conferring salicylic acid-dependent resistance against Pseudomonas syringae in Arabidopsis 57 . Although the role of proteases in defense responses are yet to be clearly elucidated, it is hypothesized that they either aid in the processing of R proteins, or through enzymatic action generate ligands that are recognized by R proteins [58][59][60] .
Lj4g3v2603590 (2 nd , R 2 = 0.911) encodes a NST1-like protein, a member of a family of genes involved in the regulation of secondary cell wall thickening in Arabidopsis 61 due to its role in lignin biosynthesis 62 . Lignification of plant cell walls may be induced by mechanical, environmental and disease stresses 63,64 , and treatment with bacterial flagellin has shown to induce lignin biosynthesis in plants [65][66][67] .
Lj2g3v1155180 (5 th , R 2 = 0.897) is the closest homolog of the Arabidopsis MKS1 (At3G18690), which encodes a protein that is substrate of AtMPK4 70 , a kinase involved in the regulation of defense responses in plants 71 . More poignantly, AtMPK4 is activated by exposure to flagellin purified from P. syringae, an adapted pathogen of Arabidopsis, and results in phosphorylation of AtMKS1.
Multiple LORE1 lines with exonic insertions in LjFls2. Next, we retrieved LORE1 mutant lines that contain exonic insertions in the LjFls2 gene using the TREX tool. Out of the 40 LORE1 lines that contain insertions in LjFls2, 31 are exonic, of which 29 originate from the Danish collection and are therefore orderable through Lotus Base (Table 6). These 29 lines can be propagated (as F0 plants) and allowed to self-fertilize in order to generate F1 homozygous mutant lines, whose progenies (F2) will be useful for further phenotyping studies, if desired.

Discussion
In this paper, we introduced Lotus Base, an integrated information portal for genomic and expression data for the model legume L. japonicus. With the utilization of modern browser technology and cryptographically secure information transmission, Lotus Base poises itself to be at the forefront of accessibility, security, privacy and usability of large-scale scientific data without sacrificing usability. The lack of a central database for Lotus resources has been a strong driving force behind the creation of Lotus Base. This places Lotus japonicus on par with other popular model plants, such as A. thaliana, G. max, and M. truncatula, all of which have dedicated online platforms that serve integrated data, namely Araport 49 , the Arabidopsis Information Resource 72 , SoyBase 73 and the Medicago truncatula Genome Database 74 .
Lotus Base distinguishes itself from other cross-species integration platform such as Legume Information System (LIS) 75,76 , PlantGDB 77 , and Phytozome 78 , by offering comprehensive species-specific data. In addition,   The modular construction and open-source model of Lotus Base ensure continuity and encourage expansion and inclusion of additional dataset with relative ease in the future. In addition, the public API of Lotus Base aims to benefit a larger community by making Lotus data available to developers who are deploying applications that pull integrated data from our databases.
The introduction of Lotus BLAST allows deep integration of Lotus BLAST databases with other toolkits specifically designed to tackle data visualization and analysis. The implementation of various toolkits such as ExpAt, CORNEA and CORGI can be extrapolated to datasets unrelated to Lotus, or even scientific research in general.  We demonstrated that ExpAt offers users a powerful way of visualizing co-expression relationships on a subset of user-defined candidates by leveraging on k-means or hierarchical clustering, while CORNEA presents users a two-dimensional, spatial chart of co-expression relationships among all genes from selected datasets. The use of data-driven documents in these toolkits reveal their prowess in the ability to visualize large volumes of data with ease, by combining the computational power of server-side technologies and the efficiency of client-side JavaScript interpreters. Many features on Lotus Base can therefore be adapted by the community as novel ways to represent, investigate, analyze, and visualize biological data. We believe that Lotus Base will not only make comprehensive Lotus data accessible to researchers easily, but also empower them to perform computationally intensive and complex analysis and visualization without the need for extensive technological skills. Taken in all, Lotus Base will benefit the legume research community and beyond, by providing a framework for a coherent scientific workflow and powerful tools for raw data interpretation.