The Leipzig Catalogue of Vascular Plants (LCVP) – An improved taxonomic reference list for all known vascular plants

The lack of comprehensive and standardized taxonomic reference information is an impediment for robust plant research, e.g. in systematics, biogeography or macroecology. Here we provide an updated and much improved reference list of 1,315,479 scientific plant taxa names for all described vascular plant taxa names globally. The Leipzig Catalogue of Vascular Plants (LCVP; version 1.0.2) contains 351.176 accepted species (plus 6.160 natural hybrids), within 13.422 genera, 561 families and 84 orders. The LCVP a) contains more information on the taxonomic status of global plant names than any other similar resource and b) significantly improves the reliability of our knowledge by e.g. resolving the taxonomic status of ∼184.000 taxa names compared to The Plant List, the up to date most commonly used plant name resource. We used ∼4500 publications, existing relevant databases and available studies on molecular phylogenetics to construct a robust reference backbone. For easy access and integration into automated data processing pipelines, we provide an ‘R’-package (lcvplants) with the LCVP.

LCVP a) contains more information on the taxonomic status of global plant names than any other similar resource and b) significantly improves the reliability of our knowledge by e.g. resolving the taxonomic status of ~184.000 taxa names compared to The Plant List, the up to date most commonly used plant name resource. We used ~4500 publications, existing relevant databases and available studies on molecular phylogenetics to construct a robust reference backbone. For easy access and integration into automated data processing pipelines, we provide an 'R'-package (lcvplants) with the LCVP.

Background and summary
Due to substantial progress in the last decade in improving plant taxonomy with phylogenetic findings, an updated global taxonomic reference list was urgently required. To date, the most commonly used reference list of vascular plant taxa names is The Plant List (TPL 1 Table 1) and thus is a major improvement for global plant research. It is based on existing databases and an additional which helped to clarify the status of plant names (i.e. accepted, synonymy, taxonomic placement; see Methods). In the end, 4059 publications provided relevant and sufficiently robust additional information, e.g. changes in taxa names and/or their status. A guiding principle during the compilation of the LCVP was to avoid polyphyletic genera, which are frequent in TPL, either by splitting genera (e.g. separating Goeppertia from Calathea) or fusing them (e.g. Stapelia and Duvalia in Ceropegia). However, we did not recombine any species name in the LCVP and in cases of unclear phylogenetic position of genera, we used the conservative (i.e. existing) name.
Taxonomists, ecologists and conservation biologists often work with many species (names) and cannot keep pace with the rapid progress in (plant) systematics, boosted by molecular phylogenetic methods 2 . These researchers often rely on taxonomic reference lists as tools to translate taxa names to accepted species names via accepted synonyms. ~315,000 vascular plant species; www.gbif.org), the Botanical Information and Ecology Network (BIEN 6 : ~348,000). The Global Inventory of Floras and Traits (GIFT 7 : ~268,000; http://gift.uni-goettingen.de/home) or the inventory of the Global Naturalized Alien Flora (GloNAF 8~1 4,000; glonaf.org) focus on plant distribution information from regional floras or floristic inventories.
Generally, such databases were compiled from heterogeneous data sources There is a variety of R packages (e.g. taxonstand 9 ; taxize 10 ; RBIEN 11 ) or online tools (e.g. Global Name Resolver http://resolver.globalnames.org/ or the Taxonomic Name Resolution Service 12 http://tnrs.iplantcollaborative.org/TNRSapp.html) supporting researchers to check their taxonomic information (see 13 for a review on some of those tools).
Most of these tools rely on TPL as part of their reference lists, which, however, has not been updated for a decade and originated in a time when phylogenetic information on many genera did not exist. (https://github.com/idiv-biodiversity/LCVP) and will ensure a coherent versioning of the list and future updates. Furthermore, we provide a utility function to use LCVP for taxonomic name resolution (lcvplants), which is also available under the same license from GitHub (https://github.com/idivbiodiversity/lcvplants).
Step 1: Producing the raw data table   TPL provided the core of the basic raw data table for  . All additional names and potential synonyms found in those databases were incorporated in the raw data table.
Step 2: Decision making The extensive raw data table of more than two million entries of plant taxa names contained a high number of orthographic errors, inconsistencies and contradictory opinions concerning the status of the names. A rough guideline for the acceptance of names was a subjective assignment of quality and reliability to the source. Generally, changes were only applied when the authors of the respective publications were clearly suggesting those changes.
We ascribed a higher reliability rank (e.g. for conflicting information) usually to the most recent publications. Additionally, when conflicting information appeared we usually used information from publications with a) a more thorough literature section and b) a more comprehensive synonymy history than to those without. A complete synonymy history should include and properly cite not only the latest accepted taxon, but also the depending taxonomic history of all names connected to this taxon (e.g. if it is a recombined taxon) with all homonymic (i.e. species epitheton is the same) and heteronymic (i.e. genus name is the same) synonyms. Since phylogenies based on morphological data alone are prone to homoplasy, only phylogenetic studies that made taxonomical decisions also based on molecular data were taken into account. We did not create new species name combinations. In case of conflicting evidence on the phylogenetic placement or species name, due to e.g. different methods to build phylogenetic trees, species names were marked "comb.ined." following the basionym author. We also applied changes to the spelling of species names. Generally, we recommend to check the species names prior to automated list treatments, following the guidelines given in 60  In most cases, we adopted the names used by the taxonomic expert (i.e. reference author who is usually a person with a publication record within a certain taxonomic group). However, there are many taxa belonging to genera or species which have not been phylogenetically analyzed yet. For those, we adapted the most frequently used taxon name from the recent literature (see Supplementary Files 3 -5). Despite a major effort, there are still names, which we could not resolve.
Step 3: Implementation in R Besides providing LCVP as downloadable tables, we also provide the LCVP as an R package (LCVP) for easy integration with analyses pipelines. We also provide a tool for fuzzy-matching-based taxonomic name resolution directly linked with LCVP (lcvplants). This fuzzy-matching algorithm is applied at species, infra-species and authority level of a plant taxon; it uses the 'max.distance' argument from the agrep() function' to assess the comparison between the searched plant name and the closest plant name from the LCVP list (in terms of number of the same character and their order). The taxonomic names resolution is implemented in a user-friendly way, and can be done with few lines of code: # install LCVP and lcvplants from GitHub install.packages("devtools") library(devtools) devtools::install_github("idiv-biodiversity/LCVP") devtools::install_github("idiv-biodiversity/lcvplants") # load the package library(lcvplants) # run analyses LCVP("Hibiscus vitifolius") ``` We provide a description of the fuzzy matching algorithm and its implementation in Supplementary File 2 and as detailed tutorial on how to use lcvplants online (https://idiv-biodiversity.github.io/lcvplants/). (5%) taxa (see Table 1). year.

Technical Validation
We tested whether all synonyms lead to an accepted name or another synonym. One major issue with TPL is the high amount of unresolved taxa. A link to another name sometimes is another synonym leading to unresolved loops. LCVP only links to accepted names, not to the taxonomic predecessor.
If taxon A is synonym to taxon B and it turned out, that taxon B is synonym to taxon C, the accepted name given for taxon A is taxon C, not B. We treated invalid names as synonyms and assigned them to their appropriate accepted name.
Most  than TPL and POWO. A user is more likely to resolve a given vascular plant taxa name with LCVP than with the given versions of TPL and POWO. LCVP covers also infraspecific taxa names which are not covered in POWO yet. The information in LCVP to which genus a species belongs and/or thus which accepted name should be used, is based on taxonomic, but also on most recent phylogenetic (i.e. mainly genetic) information. TPL was not updated for many years, and is mainly based on taxonomic information (i.e. not molecular phylogenies). With respect to usability of LCVP, we do see advantages compared to POWO, which to our knowledge does not offer an R package nor any other functionality of (half)automatic name checking or any fuzzy name matching functions.

Code Availability
The LCVP generally consists of (1) the LCVP itself available as R data package (version 1.0.2 as of April 2020) and as tab-delimited textfile file and (2)  Requests for integrating LCVP can be made via the projects GitHub (https://github.com/idiv-biodiversity/LCVP/issues)