## Background & Summary

Parasites are ubiquitous in food webs1, but only a few food webs systematically include parasites2,3. Parasites can have strong effects on diversity, biomass, and food-web complexity2,4,5,6. Kelp-forest ecosystems are more complex, dynamic, and open than many ecosystems for which food webs with parasites have been built (salt marsh4,6,7, sand flat2, and lake ecosystems8). The extensive knowledge base and research history at the research site in southern California provided both the necessary foundation for this work and the motivation to build a well-resolved food web. Kelp forests along the coast of southern California (San Diego to Point Conception) have been studied more than anywhere else in the world, with over seven decades of research on feeding interactions9,10,11,12,13,14 and the cascading indirect effects15,16,17 that permeate the kelp-forest food web. Santa Barbara Channel (SBC) kelp forests (Fig. 1) were ideal for this work due to monitoring by the Channel Islands National Park and the SBC Long Term Ecological Research programs.

Previous studies have constructed food webs for kelp forests, but these lack resolution that would allow for network analysis, are difficult to compare with other food webs due to the methodology and structure, and they do not include parasites. Three food webs and a links database have been published for California kelp forests18,19,20,21. The nodes in these webs are often highly aggregated for certain taxonomic groups (e.g. all amphipods combined) but resolved to the species level for other taxa (notably fishes). Many invertebrate groups are altogether missing from these webs, as these webs were largely constructed based off SCUBA diver surveys. A high-resolution food web published for intertidal rocky habitats in Chile22 also included non-trophic interactions but does not include small cryptic organisms, notably parasites, amphipods, isopods, fish, and many other small mobile invertebrates. Further studies have examined kelp-forest food-web structure via stable isotopes (e.g. Chile23, France24, Norway25, and Southern California26,27,28,29) and either aggregated species to functional groups, did not attempt to assign trophic interactions between consumer species, or focused on a subset of interactions within the system. We have added to this extensive knowledge base to build an improved kelp-forest food web that systematically resolves the free-living and parasitic species that dominate biodiversity in this system. The resulting food web is more species rich than any other published food web with parasites, and has a larger proportion of parasitic species than other webs (47.7% in kelp forests, 38% in tropical sand flat2, 30–34.8% in salt marshes4,6,7, 26% in lake ecosystems8). We compare the kelp-forest food web structure in detail with other published food webs of similar construction (salt marsh4,6,7, tropical sand flat2, and lake ecosystems8) in a separate manuscript in preparation.

492 species (549 life stages) were included in the resolved free-living web (compared with 217 included in20). Parasites added an additional 450 species (549 life stages), comprising 47.7% of all species (Fig. 2). Improving resolution for small crustaceans and other invertebrate taxa added the most to free-living diversity compared to past efforts by others18,19,20,21. Platyhelminthes added the most parasitic species overall, and trematodes were the most diverse group (Fig. 3). Parasitic crustaceans (mostly copepods) were the second most species-rich group of parasites, with more parasitic crustaceans than free-living crustaceans (132 vs. 113 respectively). The predator-prey subweb comprised 42.6% of links, the predator-parasite subweb had 45.0% of links, and parasite-host subweb had 12.4% of links. The kelp-forest food web taxa became dominated by helminths and crustaceans when parasites were included.

## Methods

### Site description

We define “kelp forest” as rocky-reef habitat within the 5–20 m depth range that supports dense stands of giant kelp, Macrocystis pyrifera. For this study, we considered the Santa Barbara Channel (SBC) to include the mainland region between Point Conception (−120.476° longitude, 34.455° latitude) and Point Mugu (−119.065° longitude, 34.079° latitude), as well the northern and southern sides of the four northern Channel Islands (Fig. 1). Although the SBC is a subset of the Southern California Bight, its strong west-east gradient in cold to warm temperature means the study system includes many of the kelp-forest species throughout California31. This means the SBC kelp-forest food web is a large “metaweb”, characterizing kelp forest meta-communities, rather than a site-specific web. In other words, the system includes cold water and warm water species that might not necessarily co-occur at a single site. However, there are site-specific food webs embedded in the metaweb at particular locations where a subset of species occur.

### Data sources

Our goal was to assemble the food web using both published and novel empirical observations. To this end, we first used published data sets and species’ range boundaries to create free-living species lists. The initial list of fishes, algae, and free-living invertebrates was assembled from the Channel Islands National Park Kelp Forest Monitoring program (CINP KFM, annual reports available at https://irma.nps.gov/DataStore/SavedSearch/Profile/1508, accessed March 6, 2017, or visit https://www.nps.gov/im/medn/kelp-forest-communities.htm to contact David Kushner or Joshua Sprague) and the SBC Long Term Ecological Research program’s ongoing kelp-forest community timeseries (SBC LTER, https://sbclter.msi.ucsb.edu/data/catalog/, accessed March 12, 2017). We added to these lists using primary literature, technical reports (e.g., NOAA, USFW), direct observations, expert opinion, crowd-sourced observations (e.g., eBird.org), guidebooks, and grey literature. We sampled the local kelp forest zooplankton and the algae-associated small-invertebrate community, because these organisms were not well represented in surveys (see below).

We created initial lists of parasite species using published literature and host-parasite databases. A systematic review was conducted to collect parasite records for each free-living species. We searched the Natural History Museum (NHM) of London host-parasite database (https://www.nhm.ac.uk/research-curation/scientific-resources/taxonomy-systematics/host-parasites/database/search.jsp), the FishPest database32, WoRMs (http://www.marinespecies.org/aphia.php?p = search), BIOSIS citation index (http://webofscience.com), and Google Scholar™(https://scholar.google.com/) (Genus + species + parasit*, expanded to Genus + parasit* if no records were found). For each host species, we recorded the number of records found in BIOSIS and NHM as an estimate of study effort. Although parasites are often reported at the host and parasite species level, we were often able to infer parasite and host life stages based on knowledge about life cycles. We added to these lists by sampling local fish and invertebrates, with a focus on hosts that were common in the system and not well-studied (see below). As for any food-web study, we were most interested in including common or important parasites, rather than rarities.

Published diet observations (including in grey literature), direct observations, and inference were used to determine trophic links (see below).

### Free-living species sampling methods

Certain groups of free-living species were under-represented in published survey data, so we conducted sampling to assess species diversity in the following areas.

#### Zooplankton tows

We conducted vertical zooplankton tows within kelp forests at two island locations (on the same date) and two mainland locations (repeated tows, four dates at one site, three of those dates at a second site, including one nighttime sampling date), for eight site by date samples30. While the vessel was at anchor within a kelp forest, a 30 cm diameter, 200 micron plankton net was dropped to the bottom and pulled to the surface at a rate of 0.33 m per second. Care was taken not to scrape the net against kelp plants. The collection jar attached to the net was kept vertical with a small lead weight to ensure that the net did not collect organisms on the way down to the bottom. The depth and time of collection were recorded30. We held collection jars on ice while in the field, then preserved specimens in 95% ethanol when we returned to the lab (within a few hours of collection). All organisms were counted and identified to species when possible, but some groups were identified to Order or Family, and then cross-checked with lists of known local species. If this was not possible, specimens were assigned to morphospecies, indicating they appeared to be a unique species based on morphology. Representative specimens from each species or morphospecies were photographed and measured.

#### Giant kelp holdfasts

Giant kelp holdfasts were sampled for free-living invertebrates. In the field, holdfast circumference and two slant height measures were taken, as well as basal stipe circumference. A subsample of approximately 25% of the holdfast was collected by SCUBA in a large plastic zip bag, and frozen until processing (n = 7). The samples were processed for organisms > 500 microns, and holdfast tissue was weighed after organisms and debris were removed. Organisms were counted, identified to species or morphospecies when possible, and measured30. Some groups were identified to Family, and then matched to lists of known local species.

#### Taxon-specific methods: gastropods

Small gastropods are a diverse but overlooked group that lives in benthic turf algae. Algal clumps were collected haphazardly by either laying down a 7 × 7 cm quadrat and collecting all algae within the quadrat, or by collecting clumps of a particular alga and weighing at the lab. All gastropods were removed by hand under a stereomicroscope, counted, identified to species or morphospecies, measured, and photographed30.

### Parasitological collections

We collected fish and invertebrates and dissected them for parasites, with the goal of identifying the most common parasites in the food web. We targeted host groups that are known to transmit trophically-transmitted parasites in other systems. We collected most organisms from mainland sites, and sampled opportunistically at sites on Anacapa, Santa Cruz, and Santa Rosa islands30 (Fig. 2). A list of all species dissected and sample sizes is provided30.

#### Fish collections

We prioritized collecting the most common and abundant fish species based on survey data from 2000–2014 (SBC LTER), as well as personal observation, expert opinion, and amount of parasite data in the literature. Other species (lower abundance or higher past study effort) were collected opportunistically. Fish were collected primarily by spear on SCUBA. Specific size classes were not targeted and the spear tips used were appropriate for the focal species. Small benthic fish were collected using dip nets. All fish were collected under UCSB IACUC protocol 549.2. Fish were either stored on ice and processed within 24 hours of collection or frozen until processing.

#### Invertebrate collections

Invertebrates are necessary intermediate hosts in many parasite life cycles, but relatively few parasite life cycles have been described in marine environments. We targeted invertebrate species that were abundant and potentially important as intermediate hosts for parasites. We did not collect sessile colonial taxa, such as hydroids, gorgonians, sponges, and tunicates, as they were not expected to be hosts for trophically transmitted parasites (but these hosts do merit further study). Most sampled invertebrates were gastropods and small crustaceans, as they host trophically-transmitted parasites in other food webs. Bivalves, large crustaceans, echinoderms, and polychaetes were also dissected. Large invertebrates were collected by hand or using a rock chisel and scraper when appropriate. Small invertebrates were sampled by collecting benthic substrates in plastic or fine mesh bags and removing organisms in the lab. Invertebrates were held live in flow-through seawater until the time of dissection or frozen until processing.

### Parasitological assessment

For each host dissection, the exterior and all internal soft tissues were examined for parasite life stages. For larger species, entire host organs were usually searched by pressing soft tissues thin between two glass plates (“squashed”) and examining with a stereomicroscope. However, to increase sample size, bilaterally symmetric organs (e.g. gills) were examined from one randomly determined side, and large organs (e.g. muscle, liver) were subsampled in larger fishes. Small crustaceans and soft-bodied invertebrates were squashed whole. We identified gut contents where feasible to improve host diet data and inform parasite life cycles. We recorded host mass, length (or other species-appropriate measurement), collection method, and host condition at time of dissection (e.g. frozen, fresh). We counted and identified all parasites to the lowest possible taxonomic level and assigned a morphospecies code when species-level identification was not possible. Only a few putative parasites were excluded from additional analysis because they had no identifying features. Dissection data30 includes species not included in the full food web (see below for discussion of justifications for node inclusion).

### Node list assembly

Nodes in the web included free-living species that used the water column and benthic zones within kelp forests as feeding habitat (including transient kelp-forest visitors but excluding rare and vagrant species) and parasites of those free-living species. Species was the preferred taxonomic unit, and life stages were included as separate nodes if that life stage was present in the system and had distinct trophic interactions from the adult stage. The fully-resolved free-living food web was constructed with life stage (e.g., larva, adult) nested within species (or morpho-species) (excepting benthic diatoms, planktonic diatoms, dinoflagellates, foraminifera, free-living nematodes, bacteria, free-living ciliates, copepod nauplii, filamentous algae, and invertebrate eggs, which were aggregate nodes). As various forms of detritus are important to energy flow in kelp forests, detritus was broken into four categories based on the typical feeding modes of detritivores and main sources of detritus: carrion, drift macroalgae, small mixed origin (such as would be consumed by a deposit or suspension feeder, with the recognition that this alone is a complex system deserving further resolution) and dissolved organic material. The “drift macroalgae” component was especially important to distinguish, as certain herbivores (sea urchins) are known to prefer drift algae as food but will turn to feeding on live algae when drift algae are sparse. This is a very distinct type of interaction from suspension feeders, which consume small particles of detritus that may be largely bacteria. “Parasites” are consumers which fit the seven types of parasitism defined by Lafferty and Kuris33. Commensal organisms were also recorded. We limited the parasite species list to metazoan species that use kelp-forest species as hosts for at least one stage in their life cycle. Bacterial, viral, fungal, and protozoan pathogens that are important in kelp-forest food webs merit inclusion in further work.

We assigned each node a justification code (see below), confidence level, literature reference, and locality of the reference. Additional node metadata includes site on host (ecto-vs. endoparasite), taxonomic information, and life cycle information30 (see below). The node list contains columns with a species ID, and a species-by-stage ID. To work with the life-stage resolution, select the species-by-stage ID as the node identifier in analyses. To work with the species version, select the species ID as the node identifier in analyses. This will collapse all of the interactions to the species, so all of the trophic interactions are preserved and linked to the species node. Network analysis packages in R (such as Cheddar34) will automatically remove duplicate links if they are generated in this process.

#### Life stages as nodes

Species were partitioned into life-stage nodes (e.g., larva, juvenile, adult) if a species changed its trophic position from one stage to the other and multiple stages were present in the system. Whether or not a distinct life stage resided in the kelp forest was indicated by various data sources (e.g. dissections, published records), or inferred from species life history or trophic interactions. For example, amphipods brood offspring and have crawl-away juveniles. These juveniles remain in the kelp forest (rather than having a pelagic phase), and due to their small size are subject to different predators than adults (e.g. adults are eaten by fishes, while juveniles are eaten by hydroids). This was justification for juvenile amphipods being a distinct node from adult amphipods. On the other hand, many species have planktonic larvae that develop outside of the kelp forest, so only the adult stages were included at the species level. Larval stages of parasites were included if there was no feasible alternative for the focal host to become infected. We assumed that kelp-forest resident hosts became infected through life-cycle stages found within the kelp-forest food web, but that transient hosts could have acquired some parasites outside the kelp forest (e.g., if intermediate hosts were not known from the kelp forest). Likewise, presence of larval parasites in dissections was evidence for including adult stages. For some species, there was insufficient data on life history to infer additional stages. Metadata in the node list indicates whether parasites have additional life stages inside the kelp forest, outside, or unknown. When comparing this food web with others (which rarely separate species into life stages), using our data it is easy to collapse life-stage nodes into species nodes.

#### Justifications for node inclusion

We used multiple lines of evidence to justify whether or not to include a node in the food web. Free-living species were included if they were known from the SBC (see site description above) and were indicated by the data sources described above (e.g. reports, surveys, published papers, guidebooks, expert opinion, etc.). Species lists from regional guidebooks included non-kelp-forest species, so these lists were compared with species lists from long-term monitoring surveys. Following the methods of Lafferty et al. 2006, we excluded most rare species (<1% frequency of detection in surveys, or those described as “rare” qualitatively). However, we included species that seemed rare because they were cryptic or not looked for, if the species ecological role exceeded abundance (top predators), or if the presence of a final host was inferable based on presence of parasites that require it to complete their life cycle. For instance, a cryptic fish species listed in a guidebook may appear rare in monitoring program surveys, but inclusion might be warranted based on personal observations. For top predators, larval parasites in prey species were evidence for the presence of final-host species (e.g. finding shark tapeworm larvae in a fish indicates a shark is likely present in the system). We also included a few locally extinct or rare species of special conservation or fisheries interest that had a larger historical role (e.g. the sea otter, Enhyrdra lutris)35 or potential expanded role with global warming. These species are indicated in the node list so they can be excluded or included based on research questions. The justifications for including a node in the food web were included as metadata, as well as the localities of the species observation and references, and then used to determine a categorical confidence score.

Parasites are not as well studied as free-living species, so we used parasite-host records from San Luis Obispo, California to Punta San Hipolito, Baja California, Mexico, corresponding to the dominant biotic province of the SBC. We excluded parasites from outside this range or those known to have freshwater life cycles, as well as ectoparasites of birds. We made exceptions for parasites with additional evidence of presence (such as a larval stage found locally, or a local occurrence in another host species), and for those with transient and wide-ranging hosts30. For example, if an adult trematode was observed in pelicans in Florida, but larval stages of this worm had been observed in the Carpinteria (CA) Salt Marsh adjacent to our system, the worm was included. We extended the northern range of acceptable parasite records to San Francisco Bay, California for hosts that were known to migrate between northern and southern California regularly (several species of elasmobranchs, birds, and mammals). This also helped account for the relatively low study effort for these hosts in southern California.

#### Assignment of node confidence

Depending on the evidence for including a node, we rated confidence from 1–4, with 1 being the most confident. Nodes that were observed by monitoring surveys or this study were assigned a confidence value of 1 (61.2% of free-living nodes, 35.5% of parasite nodes). Nodes that were known from the SBC through other sources (e.g. guide books, published literature), but that were not reported in surveys were included with a confidence value of 2 (28% of free-living nodes, 37.7% of parasite nodes). For example, gammarid amphipods were not monitored at the species level in monitoring surveys, but other studies in the region provide lists of local species. Species known from the broader Southern California Bight and with reported ranges north to Point Conception or beyond were included with a confidence value of 3 if they were from a taxonomic group that may not have been sampled effectively by methods utilized in the SBC (6.7% of free-living nodes, 14.4% of parasite nodes). This included several sponge species that were not monitored at the species level by monitoring programs. Transient species indicated by expert opinion and crowd-sourced observations, as well as some life stages that were inferred to be present (e.g. juvenile gammarid amphipod species) were also assigned confidence values of 3. Some parasite life stages that were inferred to be present, but were observed north of Point Conception or outside the greater southern California region were included with a confidence value of 4 (4% of free-living nodes, 12.4% of parasite nodes). We also assigned a confidence level of 4 to parasite nodes whose presence in the kelp forest was less certain due to host transience (large mobile predators that forage across multiple different habitats, not exclusive to kelp forests). Parasites are sometimes mis-identified in published records, so, to avoid false positives, we excluded some parasites on the basis of questionable identifications. These were typically parasites that were only known from one host specimen in one local study but were known from an entirely different group of host organisms in a distant locality. Readers can use confidence scores to filter their own node list.

Additional metadata for each node includes species functional group (e.g. predator, herbivore, detritivore, omnivore, autotroph, filter-feeder, ectoparasite, etc.), taxonomic information (phylum, class, order, family), habitat association (e.g. holdfast, water column, rock surface, host), small-scale habitat association (e.g. rock, water-column, macroalgae, etc.), geographic range, thermal association, consumer trophic type (Table 1), and consumer strategy (e.g. autotroph, omnivore, detritivore, filter-feeder, carnivore)30.

Because links in previously published kelp-forest food webs contained errors, we constructed links from scratch using primary sources where possible. Given N nodes in the node list, there are N2 potential trophic links (including cannibalism). Many of these potential feeding interactions are easy to exclude based on logic (e.g., giant kelp doesn’t eat animals) and species life history. For example, a subset of free-living species are possible hosts for each life stage and taxonomic group of parasites (e.g. adult tapeworms in the order Trypanorhyncha only infect elasmobranchs). Parasite-host records in the literature are incomplete lists, so we inferred additional links using species life histories and logic. Parasites can also be killed by free-living species when their hosts are eaten (concomitant predation). We used free-living trophic interactions to infer these feeding links between free-living consumer and parasite. Where possible, this food web reports links at the stage level, but these links could be aggregated to the species level, or even the group level for comparison with other food webs. Each link was assigned a literature reference, locality of the observation, justification code, and confidence level30.

#### Justifications for link inclusion

Links were assigned using several data sources and logic. A systematic literature review was conducted in Google Scholar™ to collect diet records for each free-living species (including synonyms) using standardized search terms (“Genus species” [diet* OR feed* OR prey]). If these search terms did not yield results, the search was expanded to records of the species (“Genus species”). We also used direct observations from gut contents. In many cases, diet information was not available at the species level, creating the possibility of false negative links (e.g., failing to report a diet item due to lack of direct observation). To reduce the probability of false negative links, the search was expanded to the next higher taxonomic level where information was available, under the assumption that diets are often taxonomically conserved. Such links were inferred by assessing both the compatibility of the interaction (e.g., body size ratios, diet generality), as well as the probability of encounter between the species. For example, if two species were known to encounter each other through shared habitat and behaviors, and general feeding habits of the consumer were compatible with the resource species, a link was inferred. Certain trophic links may only be present seasonally or may vary through time. Temporal data sets provided by the SBC LTER and CINP KFM programs provide abundances over time for many key species in the food web. These data sets could be used to assess temporal stability of links in future studies.

Links between parasites and hosts were assigned using several data sources, as in the free-living web. Direct observations of parasite-host interactions through our sampling or published studies (as detected through systematic review, see “Data Sources” section above) were assigned. However, direct observation of all possible interactions was unfeasible and sampling effort varied among hosts, so parasite-host interactions are often under-sampled. To account for this, links between parasites and hosts were added in stages using the free-living web, host life history, and parasite life history. First, parasite life cycles were inferred based off of known hosts and host trophic interactions. Trophic interactions among free-living species were then used to infer either transmission of parasites to additional hosts or concomitant predation (predator-parasite links) if parasites were not ingested by suitable hosts. Each link is identified by a code that indicates whether it was observed directly (and the source), or whether it was inferred (and the method of inference, described below). Users of the food web can choose to filter links by link justification to suit their needs.

#### Life cycle inference

We used several data sources and considered parasite life histories to assign links with likely hosts. If the life cycle was known for the parasite in another system, we inferred links with analogous hosts in the system (a kelp forest species in the same genus or family). For trophically transmitted parasites, we assessed parasite compatibility with potential hosts, and used free-living trophic interactions to determine whether a parasite would encounter a suitable host. For species with unknown life histories, we considered the life history of the next lowest taxonomic grouping and assumed generalism within that level. For example, the trematode Podocotyle californica has an unknown life cycle, but Podocotyle enophrysi is known to infect the snail Lacuna marmorata as its first intermediate host36. Trematodes are host-specific at this stage, and Lacuna unifasciata was the only analogous host species in kelp-forest food web, so it was assigned as the most-likely intermediate host for Podocotyle californica. On the other hand, marine acanthocephalans are thought to be generalists at the ordinal level in the first intermediate host (D. Marcogliese, pers. comm.) and are trophically transmitted. Although a second intermediate host is not necessarily required for development, acanthocephalans of top predators often use fishes as paratenic hosts. In dissections, fishes were often infected with larval acanthocephalans of birds and mammals, so we assigned amphipod species eaten by infected fish as possible first intermediate hosts. For the 15% of the nodes where a parasite from the dissections could not be identified to family, those without a clear possible host in the kelp forest, or those where nothing was known of the parasite’s life history, we did not make any inferences based on life cycle. Such parasites appear as specialists in the data (but see the false-negative assessment below).

#### Parasite-host inference

The number of parasite species detected is often a function of study effort37,38,39. Because study effort varied among hosts, and was sometimes low, we assigned additional parasite-host links based on potential for encounter with infectious stages of parasites and expected host compatibility. Encounter with trophically transmitted parasites occurs through host diet (i.e. through intermediate hosts eaten as prey) and was informed using the free-living food web and life-cycle inferences as described above. Encounter with directly transmitted parasites occurs through shared habitat or contact with other hosts and was informed by other parasite-host records. We based compatibility on the host-specificity, known hosts in the system, as well as the life stage of the parasite (e.g. adult tapeworms do not survive if their host is eaten, whereas juvenile tapeworms can infect repeated paratenic hosts and remain viable). For example, if a monogene was reported from 15 rockfish species in British Columbia and observed in two species locally, it was assumed to infect other rockfish species present in the SBC kelp-forest food web.

#### Predator-parasite interactions

Host death by predation is a major source of parasite mortality and may strongly influence parasite-host dynamics. We inferred these predator-parasite interactions using trophic interactions between free-living species. For each free-living consumer interaction, we assessed whether the parasites of the prey host would be killed or transmitted to the predator. If the predator was not a compatible host (see discussion above), we assigned a consumptive link between the free-living consumer and parasite. Users should consider that although predator-parasite links influence parasite vulnerability, they rarely constitute a significant flow of energy from parasites to predators. Food-web metrics that imply energy transfer (e.,g robustness and other bottom-up effects) should therefore not include predator-parasite links.

#### False negative estimation for host-parasite links

Even though many unobserved host-parasite links were inferred to occur based on logic, under-sampling leads to the potential for other false negative links. Such links are particularly likely for generalist parasites that have low prevalence in under-sampled hosts. For instance, if a metacercaria species infects any rockfish species at 5% prevalence, and we sample ten individuals from each of ten rockfish species, we can expect by chance to observe the parasite in only six of the ten species. The remaining four rockfish species might appear to be uninfectable by the parasite, but, assigning 0 s in the bipartite host-parasite network would result in false negative links. False negative links make parasites look more like specialists than they actually are, thereby underestimating their importance in food-web measures such as generality, vulnerability, linkage density, and connectance (the proportion of realized links relative to the number of possible links). We estimated false-negative probabilities for unobserved links at the species level and individual host level (we assumed the probability of a false positive observation was low enough to be ignored unless noted). We applied this approach separately to the following bipartite networks: trophically transmitted parasite-fish, directly transmitted parasite-fish, parasite-shark, parasite-bird, and parasite-mammal.

The first step to estimating a false negative probability is to calculate a prior statistical expectation that a parasite group infects a host group based on previously reported host-parasite links in the literature. At the node-level, we used a generalized linear model with observed or inferred link (0,1) as a dependent variable and taxonomic information (host order, host family, parasite order, parasite family, parasite species), host trophic level (calculated from the free-living web), host habitat association, and proportion of the host diet that may contain infective stages as independent variables (JMP Pro V1440). Because false negatives arising from under-sampling are common in the parasitological literature39, we included a square-root transformed sampling effort term (the number of parasite studies on the host in the literature). Model selection was based on Akaike information criterion (AIC)41, and found that host and parasite taxonomy and traits helped predict links (see Table 2 for model results of each network). The interaction between host order and parasite family was important in all bipartite networks, indicating parasite specialization at higher taxonomic levels. Study effort was less important in subnetworks with higher sampling effort across hosts. From the best-fitting model, we generated predicted probabilities for each link between species i and j, at existing effort $${\widehat{\psi }}_{ij}$$. We then assumed that with increasing effort, the probability that a link was observed $${\widehat{\psi }}_{ij}$$ approached the probability that the link exists Ψij. Then, by parameterizing the prediction equation with a hypothetical “high” effort (see Table 2) for values for each bipartite network), we projected the probability that a link exists $${\widehat{\varPsi }}_{ij}$$. According to Bayes’ Theorem, the probability of a false negative Fij, is:

$${\mathbb{P}}({\varPsi }_{ij}=1\,\& \,{\psi }_{ij}=0)/{\mathbb{P}}({\psi }_{ij}=0)$$

Which translates to:

$${F}_{ij}=({\widehat{\varPsi }}_{ij}-{\widehat{\psi }}_{ij})/(1-{\widehat{\psi }}_{ij})$$

Which is a first approximation for the probability of a false negative link based on species-level data. Namely, the more likely a link occurs based on taxonomy and traits, and the less likely it is to be sampled with existing effort, the more likely an unobserved link is a false negative link due to insufficient sampling effort. We therefore estimated $${\widehat{\varPsi }}_{ij}$$ (and its standard error) and $${\widehat{F}}_{ij}$$ from data at the species level.

We also had individual-level data for many potential links, making it possible to refine the estimate for $${\widehat{F}}_{ij}$$ based on dissections. Now, Bayes’ Theorem translates to:

$${\widehat{F}}_{ij}={\widehat{\varPsi }}_{ij}(1-{\widehat{d}}_{ij})/(1-{\widehat{d}}_{ij}\,{\widehat{\varPsi }}_{ij})$$

Where $${\widehat{\varPsi }}_{ij}$$ is estimated as above from the prior species-level data and is $${\widehat{d}}_{ij}$$ link detectability from dissections (the probability of detecting a link in a sample if that link occurs). $${\widehat{d}}_{ij}$$ can be estimated from individual-level data (e.g., several dissected host individuals). In a host species j that is known to be infected by a parasite species i, the probability dij of finding an infected individual after dissecting K hosts is akin to a series of K independent Bernoulli trials, each with a probability of detecting a parasite in a host equal to the parasite’s prevalence in the host population, pij.

$${\widehat{d}}_{ij}=1-{(1-{p}_{ij})}^{{{\rm{K}}}_{j}}$$

In the case of a host species where a parasite species i has never been detected, the parasite’s detectability in dissections is also akin to a series of K independent Bernoulli trials, but the parasite’s prevalence in the host population must be estimated from infectable hosts. The simplest assumption is that infectable species do not differ in prevalence, so that $${\widehat{p}}_{ij}$$ is just the number of individual (h) parasitized hosts $$\left({\sum }_{\left(h=1,j=1\right)}^{\left(h=K,j=m\right)}{\psi }_{hij}\right)$$ found in combined samples from those m host species that are infectable by parasite species i. E.g.,

$${p}_{ij}=\frac{{\varPsi }_{ij}{\sum }_{\left(h=1,j=1\right)}^{\left(h=K,j=m\right)}{\psi }_{hij}}{{\sum }_{j=1}^{m}{K}_{ij}}$$

Which we estimated as

$${\widehat{p}}_{ij}=\frac{{\psi }_{ij}{\sum }_{\left(h=1,j=1\right)}^{\left(h=K,j=m\right)}{\psi }_{hij}}{{\sum }_{j=1}^{m}{K}_{ij}}$$

Although there are more complicated ways to estimate prevalence that take into account individual host traits, and biases from excluding infectable hosts where infections have not been detected, the simple method was sufficient to distinguish between likely and unlikely false negatives. Thus, to recap, we estimated $${\widehat{\varPsi }}_{ij}$$ using species-level data as above, then further refined the estimate of $${\widehat{F}}_{ij}$$ from dissection data. We used error propagation to report 95% confidence limits30.

With information about $${\widehat{F}}_{ij}$$, we estimated unseen parasite-host links as probabilities, rather than as 0 s (observed links were set to 1, and unobserved links were set to $${\widehat{F}}_{ij}$$). Doing so identified some likely parasite links that were missed. In this case, when the probability of a false negative was >0.5, we assumed that an unobserved link actually occurred unless otherwise contradicted by species life history. We also then noted the probability of a false positive link (1 - $${\widehat{F}}_{{\rm{ij}}}$$). We further identified those few host and parasite species that generated substantial error in the network. To keep the overall error rate to <4%, we therefore removed error-prone species from the network30. These species were typically rare generalists that were easily missed in dissections. To that extent, the decision to remove them was consistent with our decision to remove rare free-living species from the network. We report these removed species and their known links30 as potentially useful information for other purposes. Finally, we used the false-negative estimates to correct for biases in network and species-level measures like generality, connectance, and linkage density.

In addition to metadata on locality, literature source, justification, and confidence, we categorize links based on different types of trophic interactions. We specified the interaction type for each consumer-resource link following the framework of Lafferty and Kuris33 (Table 1). For instance, links where a consumer kills the resource were coded as predator-prey interactions, while links where a consumer eats a small portion of a resource individual without killing it (e.g. herbivores) were assigned as micropredator/grazer interactions. Thus, the free-living web contained predation and micropredation/grazing links. Some organisms often referred to as “parasites” fit the definition of micropredation (e.g. gnathiid isopods). Several more types of interactions are possible between symbiotic organisms and their hosts, depending on transmission strategy (trophic transmission or direct transmission), effects on host fitness, and reproduction method (within the host or in the environment). Metadata in the node list (such as site of infection) allows investigators to simplify these link types according to research questions of interest.

## Data Records

The data package includes 14 files (.csv), four directories containing several thousand images of organisms, and metadata for the directories30. Each data file is described in Table 3. The food web itself is comprised of a nodes list (“1_Nodes.csv”) and links list (“2_Links.csv”). References for these two lists are contained in “13_Ref.ID.csv”. The remaining files provide detailed dissection data, free-living organism sampling data, and information on false negative links (Table 3). The image directories contain photos of parasites, as well as the free-living organisms found in zooplankton sampling, holdfast sampling, and small gastropod collections. “READ ME” documents included within each directory describe the image file naming system and which data file the images correspond to.

Data available from the Dryad Digital Repository30: https://doi.org/10.25349/D9JG70.

## Technical Validation

Multiple lines of evidence were used to justify node and link inclusion. References are included for all nodes, body size estimates, and links. Where links were inferred, the references which provided the logical basis for the inference are listed, along with a code indicating the specific type of inference (e.g. “closely related species in literature”).

## Usage Notes

We organized both the nodes and links data and metadata to facilitate analyses and appropriate usage by other researchers. The food web is resolved to life stage and includes free-living species and parasites, but nodes can be aggregated or separated by taxonomy, lifestyle, or habitat niche. The links list can be filtered by trophic interaction type, justification, or confidence. Code definitions for justification, confidence, and trophic interaction type are described in metadata30. Correct interpretation of food-web structure requires understanding limitations, which are often dissociated from food web data sets in repositories. We describe the limitations of the food web database below.

### Food-web boundaries

Food-web research in general may limited by “soft boundaries” of the food web, as very few food webs are completely isolated from their surroundings. We restricted our definition of kelp forests to rocky-reef habitat and the water-column above it, but kelp forests can have sand channels throughout and are often surrounded by sand. For this reason, we included sand-dwelling species that are known to associate with kelp forests specifically, however we did not include the sand community in general, even though this habitat is often interspersed and adjacent to the kelp forest. Once a subtidal sandy food web has been created, it should be straightforward to connect kelp forest and sand-associated food webs.

### Life stages as nodes

Although the food web separates distinct life stages into separate nodes, it does not include multiple size classes for each species. Changes in diet associated with size are common across fishes and could alter network structure. Additional resolution could be added to the web by including size classes for species that experience strong ontogenetic shifts in diet.

### Species abundances

Because our web is a meta-web for a larger region, we did not include density estimates for species. However, site-specific densities are available for >200 organisms surveyed by CINP KFM (https://irma.nps.gov/DataStore/SavedSearch/Profile/1508) and the SBC LTER program ongoing community timeseries (https://sbclter.msi.ucsb.edu/data/catalog/). Adding this information, and perhaps inferring densities for other taxa based on allometric scaling might make it possible to use this food web for dynamic modeling.

### Node inclusion

Although this food web improves resolution for many groups of organisms (including crustaceans, gastropods, invertebrates, birds, cryptic fishes), it does not capture all species or links. This is a commonly cited criticism of food webs, in particular large networks (e.g.42,43,44,45,46). We attempted to minimize this by using information from many sources, inferring links, and constructing a web that was cumulative over space and time. We did not attempt to resolve other potentially important taxa like protozoa (ciliates, flagellates, etc.), diatoms, and other microbes (viruses, bacteria, fungi). The food-web construction allows for additional types of organisms, life stages, and interactions to be added. Nodes such as small particles of detritus represent their own complex systems that surely deserve future study. Although more sampling would lead to a longer and more complete species list, new additions would more likely be uncommon species that contribute less to biomass and energy flow than do the species included here. Additional sampling would be expected to further increase network size and complexity, but adding or removing small numbers of species did not affect overall network structure.

### Link inclusion and weighting

This food web does not include interaction strengths (link weights). In many cases, adaptation is realized in terms of changes in link weights, so topological food webs in general are limited in their ability to detect adaptation to change. Adding link weights to a network of this size would be very complex, and potentially impossible, because interaction strengths may vary seasonally, spatially, and temporally.

The near future will likely see enhanced molecular approaches in food-web research as barcoding and eDNA-based techniques continue to develop. This could allow for inclusion of additional cryptic and transient species, as well as statistical inference of species associations (co-occurrence). At present these techniques would only be possible for a small subset of well-known species (sharks, some fish, marine mammals), as many invertebrates and parasites do not have reliable species-level sequences, but these methods may be useful for detecting top predator presence47.

In addition to missing nodes, there are likely numerous missing (false-negative) links. We focused on missing links between existing nodes, but missing links also occur between existing and missing nodes, and between missing nodes. We attempted to correct for false negative host-parasite links through inference of parasite life stages, additional host interactions, and false negative estimation, but recognize that additional sampling and resolution of cryptic diversity would improve network accuracy. Although missing links bias food-web properties, by estimating false-negative probabilities, it is possible to correct for much of this bias simply by replacing 0 s with false negative probabilities when computing network statistics that count observed links.

Although false negatives are a concern in ecological networks, false positives are possible due to inaccurate life cycle inferences, particularly for parasites with assumed low-host specificity. Our assumptions of parasite generalism were supported by literature and expert opinion (Marcogliese pers. comm.,48,49). By assuming generalism at the level indicated by parasite life history, we ensured that at least one correct host (likely more) was included, with reduced chance of false negatives. Parasite species for which generalism in larval stages was assumed (a few nematodes, some tapeworms, and acanthocephalans) were widespread in many second-intermediate and paratenic host species in dissections, suggesting that there should be more than one infection pathway for such a wide range of hosts to become infected. However, use of paratenic hosts makes it more challenging to identify first intermediate hosts by diet alone. We restricted assumptions of generalism to cosmopolitan parasites of wide-ranging hosts that may be less likely to host cryptic species due to increased gene-flow among populations50. By including link justification and confidence levels, readers can treat these links as predictions and filter the node and links list to suit their research questions. Despite these limitations, we note that few other studies justify reported food-web links or distinguish between inferred and observed links. The metadata included with the links list makes users of our data aware of limitations and will ensure that conclusions drawn from the food-web are appropriate to the data.