Diet assessment of two land planarian species using high-throughput sequencing data

Geoplanidae (Platyhelminthes: Tricladida) feed on soil invertebrates. Observations of their predatory behavior in nature are scarce, and most of the information has been obtained from food preference experiments. Although these experiments are based on a wide variety of prey, this catalog is often far from being representative of the fauna present in the natural habitat of planarians. As some geoplanid species have recently become invasive, obtaining accurate knowledge about their feeding habits is crucial for the development of plans to control and prevent their expansion. Using high throughput sequencing data, we perform a metagenomic analysis to identify the in situ diet of two endemic and codistributed species of geoplanids from the Brazilian Atlantic Forest: Imbira marcusi and Cephaloflexa bergi. We have tested four different methods of taxonomic assignment and find that phylogenetic-based assignment methods outperform those based on similarity. The results show that the diet of I. marcusi is restricted to earthworms, whereas C. bergi preys on spiders, harvestmen, woodlice, grasshoppers, Hymenoptera, Lepidoptera and possibly other geoplanids. Furthermore, both species change their feeding habits among the different sample locations. In conclusion, the integration of metagenomics with phylogenetics should be considered when establishing studies on the feeding habits of invertebrates.


ImSantoA
The Glossoscolecidae family is native to South America and well represented in Brazil 1 , while Lumbricidae is native to Europe but invasive and widely distributed in temperate regions. In Brazil, the Lumbricidae family is restricted to the southern and southeastern states 2 , coinciding with the localities of the study, so we accepted these two assignments as good.
It is unlikely that the leech Helobdella robusta is a prey of I. marcusi. While the Helobdella genus occurs worldwide 3 , it is quite improbable that a land planarian preys on this group because geoplanids avoid flooded water environments 4 . Because the H. robusta genome is completely sequenced as it is a model organism for annelid and lophotrochozoan development, it is overrepresented in GenBank. Moreover, the sequences assigned to H. robusta were annotated to hypothetical proteins from mRNA sequences, a region that is not usually sequenced outside of genomic studies.
Consequently, it is probable that the assignment may actually correspond to a genomic region of an earthworm or, less probable, to a land leech.

ImCubatao
I. marcusi appears to feed on two species of Pontoscolex (Rhinodrilidae: Oligochaeta) in Cubatao. Pontoscolex corethrurus is native from the Guiana Shield region 5 , but currently, the species presents a broad distribution in tropical regions and in as many as four continents and sixty countries 6 . On the other hand, Pontoscolex spiralis has a more restricted distribution. It appears to be originally from Puerto Rico 7,8 but has colonized the Lesser Antilles, as it has been reported in Guadeloupe and Martinique 9,10 , and has reached French Guiana 11 .
The broad distribution of P. corethrurus and the fact that it has been cited within the Cubatão range 12 make our assignment to this species robust. However, for P. spiralis assignments, we must be more cautious as French Guiana is far north of this locality.
Closely examining the sequences that map to 18S ("unplaced_6169", 465 bp) and 28S ("unplaced_7401", 448 bp and "unplaced_11711", 369 bp) of P. spiralis in GenBank, we found that these sequences map onto a region not present in the existing P. corethrurus 18S and 28S sequences because the latter are much shorter. These sequences probably also belong to P. corethrurus, which recent works suggest is actually a species complex 13 . The sequences of the individuals preyed by I. marcusi correspond to the P. corethrurus complex sp. L1, the most widespread lineage.

CbSantoA
The Apoecus ramelauensis assignment is somewhat odd as this species is endemic to Mount Ramelau on Timor island 14 . The sequence has 461 bp and maps onto 28S sequences in GenBank. As the 28S locus is widely used in Mollusca, it is unlikely that this exotic assignment was due to loci misrepresentation. Because there are a handful of consensus assignments to Eupulmonata (Gastropoda) in CbSantoA, we treated this assignment as Eupulmonata despite the unanimous A. ramelauensis assignment.

CbCubatao
The spider genus Caayguara is endemic to the Brazilian Atlantic Forest, with many species occurring in the state of São Paulo 15 . The matching locus is 28S, and the assigned sequences are "unplaced_15105" (371 bp) and "unplaced_10078" (749 bp). As Caayguara albus is the only species of the genus in GenBank, we must take this assignment as a Caayguara sp. Pickeliana pickeli is the only species of the Pickeliana genus represented in GenBank. Its distribution is well known and modeled 16 , and our sampling sites are out of its range. Dr. Marcio Bernardino da Silva confirmed that the Stygnidae family does not occur within the Cubatão range (personal communication).
The sequence used for the assignment is rather long (780 bp), the assignment is almost unanimous, the locus is widely sequenced (28S), and Pickeliana pickeli assignments are present more than once in other localities. As a consequence, we consider this a controversial case; for these cases, the LCA assignment method is the most conservative, and we treated this assignment as a being in Laniatores, a suborder of Opiliones containing, among others, the Stygnidae and Gonyleptidae families. Nonetheless, as we have consensus assignments to these species coming from two different datasets, we did not discard the possibility that these sequences actually belong to a genus of the Stygnidae family.
The Bombyx mori assignment is made with a short sequence (272 bp) where all of the homologues also match Bombyx mori for the chorion locus. Given that and the fact that Bombyx mori is broadly represented (166927 sequences in GenBank), the most plausible hypothesis is that the query sequence belongs to a nonidentified Lepidoptera. The assignment made to Hemileuca sp. (352 bp) matches the 28S locus, and the Trichoplusia ni assignment (1335 bp) has a very limited set of homologues (7) included in the trees and matches with transposable elements. As there are many assignments to Lepidoptera in this dataset, we consider these Hemileuca sp. and Trichoplusia ni sequences as Lepidoptera, despite the LCA assignments being to Arthropoda and Bombyx mori, respectively.
The Dugesiidae assignment is clearly an artifact. It was made with a quite short sequence (268 bp), and the matching locus is 28S. LCA, our most conservative method, assigned it to Continenticola, a clade that contains land and freshwater planarians. Geoplanoidea (Geoplanidae and Dugesiidae) has two types of 28S 17 , but type-I is predominant in most studies. Here, a previously unsequenced type-II of the Cephaloflexa bergi 28S could be mapped with the well-known and represented GenBank type-II 28S of Dugesiidae.

CbCantareira
Gonyleptidae is a family of Neotropical Opiliones especially rich in the Atlantic Forest of Brazil 18 . The sequence "unplaced_86236" (525 bp) from CbCantareira is assigned to the genus Promitobates (Gonyleptidae) (Figure 4). This harvestmen genus is endemic to the area where all the specimens of our study were collected 19 , consisting of the northern state of Santa Catarina, the states of Paraná and São Paulo and the southern state of Rio de Janeiro. Thus, this is a quite robust assignment, as we retrieved a species endemic to the sampling location of its predator from the data. The case of Pickeliana pickeli has already been analyzed in the CbCubatao dataset.

CbItatiaia
The Tetrigidae assignments were numerous in the CbItatiaia dataset. This family of Orthoptera, also known as pygmy grasshoppers, is cosmopolitan and especially rich in tropical forests 20 , so the assignments to Tetrigidae are consistent with its distribution range.