Species richness and taxonomic composition of trawl macrofauna of the North Pacific and its adjacent seas

A checklist is presented of animal species obtained in 68,903 trawl tows during 459 research surveys performed by the Pacific Research Fisheries Center (TINRO-Center) over an area measuring nearly 25 million km2 in the Chukchi and Bering seas, Sea of Okhotsk, Sea of Japan and North Pacific Ocean in 1977–2014 at depths of 5 to 2,200 m. The checklist comprises 949 fish species, 588 invertebrate species, and four cyclostome species (some specimens were identified only to genus or family level). For each species details are given on the type of trawl (benthic and/or pelagic) and basins where the species was found. Comprehensiveness of data, taxonomic composition of catches, dependence of species richness on the survey area, sample size, and habitat, are considered. Ratios of various taxonomic groups of trawl macrofauna in pelagic and benthic zones and in different basins are analysed. Basins are compared based on species composition.


Materials and Methods
Sources of data. Information was obtained mainly from two large databases 25,26 supplemented by materials from trawl surveys conducted until 2014. These surveys were conducted in accordance with the programs approved by the TINRO-Center management and agreed with the Russian Federation Ministry of Agriculture Federal Agency for Fisheries. The sampling area (Table 1)  This study does not include information from commercial fisheries, and is based only on reliable information from 459 selected research cruises, where data were obtained by skilled ichthyologists and hydrobiologists. In these cruises, unidentified specimens were preserved and delivered to onshore laboratories for further identification by experts in zoological taxonomy. However, this does not mean that all species identifications were correct, especially in groups with difficult and complex higher taxa, such as the fish families Myctophidae, Liparidae and Zoarcidae. Therefore, data obtained from the databases were further scrutinised by taxonomic experts.
Verification of data. Where species occurrence is considered ambiguous, information (coordinates, depth, time, catch size, and size of individuals) was analysed further and compared with published data. If an individual of a species was found too far outside the known species range, it was excluded from the list. In such cases, the specimen was referred to a higher taxon (genus or family, as considered appropriate). Conversely, in cases when an animal was identified to genus or family level, and only a single species of this higher taxon is known to occur in a region (based on published data), the animal was identified as that species. A number of records  In the checklist, there are no species that did not occur in trawl catches but were taken by grabs, hydraulic dredges, longlines, traps, divers, etc. Therefore, the checklist is named "the trawl macrofauna", since it includes only the animals that were caught by trawl in the surveyed area.
The minimum depth of bottom hauls in this study ranged from five to 13 m depending on the size of the trawler, its draught, trawl construction and operational characters, such as the smallest vertical opening. Most pelagic hauls (including surface trawls at 0 m depth) were made in areas deeper than 25-30 m, corresponding to the minimum vertical opening of the majority of Russian midwater trawls. Consequently, coastal zone fauna is weakly represented, since many inhabitants of the littoral and upper sublittoral zones were not encountered  Pelagic  2003-2014  0-91  239  298  162  40  1,701,314   Benthic  1995-2014  13-222  237  286  118  10  631,531   Combined  1995-2014  0-222  476  298  280  50  2,332,845   Bering Sea   Pelagic  1982-2014  0-920  4,959  1,419  5,939  1,966  68,718,728   Benthic  1977-2014  6-1,400  9,235  1,028  6,608  901  23,978,418   Combined  1977-2014  0-1,400  14,194  2,126  12,547  2, and only species from its outer periphery are included. That is why, for example, the common commercial clam Ruditapes philippinarum, which occurs in coastal habitats at depths 0.5-4 m protected from strong surf, is not included in the checklist. This is a burrowing mollusk with main populations living at 1-3 m depth in sandy or gravel-pebble sediments 27 .
Additional data sources. To verify information on geographical distribution, taxonomic status, and accepted scientific species names, we used 54 publications 28-81 and 42 online resources (Table 2). For fishes and cyclostomes, we relied on the following websites: Eschmeyer W.N., Fricke R., van der Laan R. editors (2018) "Catalog of fishes: genera, species, references"; and Froese R., Pauly D. editors (2018) "FishBase" (No. 22 and 28 in Table 2). For invertebrates, we used WoRMS Editorial Board (2018) "World Register of Marine Species" (No. 34 in Table 2). We consider these sources of information as the most reliable professional modern knowledge bases. However, in some cases, where other authors convincingly argue in favour of other species names or ranges, such alternatives were accepted. Statistical methods. Relationships between the ratio of species missed by trawl surveys and the number of discovered species to the size of the surveyed area and other parameters were investigated by regression analysis using the method of least squares, with the use, where necessary, of linearizing transformations of variables 82,83 . Comparisons among basins by species composition using cluster analysis [84][85][86][87][88] were performed using three different algorithms: (1) single linkage (SL) or nearest neighbour, when clusters are joined based on the smallest distance between the two groups; (2) unweighted pair-group average (UPA), when clusters are joined based on the average distance between all members in the two groups; and (3) Ward's method, when clusters are joined in a way that the increase in within-group variance is minimized. Also, 13 measures of similarity were used based on binary (presence-absence) data listed in the section on comparison of basins by species composition. The following symbols are used traditionally in this approach (ibid): a, number of species present in both of the compared lists; b, number of species present in the second list but missing from the first list; c, the number of species present in the first list but missing in the second list; and d, the number of species missing from both lists, but present in other lists with the total number of S species. Combinations of symbols: a + c, number of species present in the first list; a + b, the number of species present in the second list; b + d, number of species missing from the first list; c + d, number of species missing from the second list; a + b + c + d = S, total number of species; a + b + c = S − d, number of species present in at least one of the two lists. When pairs of lists are compared: a corresponds to the number of positive matches; d, the number of negative matches; a + d, the number of positive and negative matches; b + c, the number of mismatches of either kind.
Subsequently, measures of similarity were subdivided into two groups: (1) similarity coefficients that treat a and d symmetrically, taking into account both species presence and absence (i.e., the number of positive and negative matches: we used five of such measures); (2) coefficients that take into account only species presence and ignoring the number of negative matches (the value of d: we used eight such measures). The same measures of similarity were used for comparison of species lists by alternative non-hierarchical methods of multivariate analyses: metric and non-metric multidimensional scaling 89 .

Results and Discussion
The checklist. The list compiled is presented in Supplementary Table. It includes 1,541 lines (corresponding to our minimum estimate of the trawl macrofauna species richness in the study area) and 10 columns. The first column shows the scientific name of a species (genus, family) in alphabetical order: to simplify the use of the table by non-experts in taxonomy, they are not arranged by taxa. This enables users to quickly find the scientific name of a species of interest without requiring detailed knowledge of its taxonomy.
The third column, "Gear", notes occurrence in midwater (pelagic) and/or bottom trawl catches. Columns four to eight indicate occurrence in each of five basins: the Bering Sea (B), Chukchi Sea (C), Sea of Japan (J), Sea of Okhotsk (O), Pacific Ocean (P). Presence of a species is indicated by "+", absence by "−", and "*" means absence from catches but presence according to previously published data.
Comprehensiveness of data and state of our knowledge of trawl macrofauna. Analysis of the comprehensiveness of databases reveals that the macrofauna is represented unevenly in TINRO-Centre trawl surveys (Table 3). In the Sea of Okhotsk and Sea of Japan, 15% and 18% of species, respectively, were absent from trawl surveys and were included in the checklist based on published data. The proportion of absent species is almost a quarter in the Pacific Ocean, slightly less than a third in the Bering Sea, and almost one half in the Chukchi Sea. There is an inverse relationship between these ratios and sample size in each basin (Table 4).
The fauna of the pelagic zone is more completely represented in the surveys than the seafloor fauna. Only 5% of pelagic species were not captured in trawl nets in the Bering Sea, Sea of Okhotsk and Pacific Ocean, 12% in the Sea of Japan, and 30% in the Chukchi Sea. Non-capture proportions are higher for benthic species, ranging from 15% to 44% (Table 3).
As expected, despite the difference in numbers, the inverse relationship between the ratio of species missed by trawl surveys and the survey effort is true for both pelagic and benthic fauna and also for combined fauna (Table 4). Pelagic surveys show better comprehensiveness than benthic surveys at a smaller number of stations. However, other sample size features for the pelagic zone are greater than those for the seafloor (Table 1).
Different taxa are unevenly represented in databases ( Table 5). The best represented are cyclostomes and fishes, since known species absent from the database account for only 0% and 5% of known species, respectively, not captured in at least one basin. Among invertebrates (41% of species absent from catches) the best represented are molluscs (<1/3 of known species absent), and the worst are sponges (>50% of species absent), polychaetes, and some rare benthic invertebrate macrofauna taxa. In part, this reflects failure to obtain them by the survey gear used, because of factors such as their small size, or a sessile or burrowing mode of life.
To reveal the proportion of the trawl macrofauna in the whole marine fauna, total species numbers ("+" and "*" in Table 3) for each basin were compared with data published by Parin et al. 75 for fishes and cyclostomes, including species at depths that were not covered by our surveys, and species that were not caught by trawls; and data published by Sirenko et al. 65 for invertebrates, including littoral, deep-sea species, meso-and microfauna, plankton, infauna, etc. The macrofauna in the checklists corresponds to 10% of all fauna (including fish, cyclostomes and invertebrates) in the Chukchi Sea; 19% in the Bering Sea; 22% in the Sea of Okhotsk; 12% in the Sea of Japan; and 23% in the Pacific Ocean. There is a direct relationship between these proportions and sample sizes ( Table 1).
Accepting that, within the entire study area, there exist four species of cyclostome 75 , 1,455 fish species 44 and 6,771 macrobenthic species 65 , it appears that the trawl macrofauna covers all cyclostomes, 65% of fish species and only 11% of invertebrates (Table 5 second column). Some 23% of all macrofauna species (1,541 out of 6,771) can be considered as the "trawl macrofauna".
Another concern related to the comprehensiveness of the checklist is the accuracy of macrofaunal taxonomic identifications. Table 6 indicates that 94% of animals that occurred in trawl hauls were identified to species, 4% to Basin Zone Present in database, i.e. captured at trawl stations shown in Fig. 1 (symbol "+" in Supplementary Table) Added from publications (symbol "*" in Supplementary   Table 4. Pearson's correlations between the share of the number of species in region that were not observed in TINRO-Center trawl surveys (from Table 3) and five sample parameters (from Table 1). Linear, exponential, reciprocal, logarithmic, multiplicative, square root and other simple regression models were tested in the course of correlation analysis. Best models were selected by minimum residual variance and p-values, and maximum correlation coefficients. Coefficients with p-values < 0.05 are shown in bold. Taxonomic composition of fauna. The following patterns have been revealed from species richness distribution by taxa (Table 7; Figs 2 and 3): First, there are more fish and cephalopod species in the pelagic zone than on the seabed, but all other groups are more speciose at the seabed, and some are completely absent from the pelagic zone. Second, the percentage of invertebrates is much higher in the Chukchi Sea and lower in the Pacific Ocean, compared to other basins. The first pattern is commonplace; the second stems from the fact that, in the relatively shallow Chukchi Sea, the pelagic zone and the seabed were almost equally studied. On the other hand, in the Pacific Ocean, mainly narrow shelf and seamount summits have been surveyed using bottom trawls, whereas pelagic hauls were much more numerous and covered a much larger area (Fig. 1).
Another peculiarity of our checklist pointed out in the previous section is related to the selectivity of trawls. The number of fish species in the list is similar to, or somewhat higher than, that of invertebrates (Figs 2 and 3), whereas total species richness of invertebrates is much higher than that of fishes, based on data from different    sampling gear 90 . This is not unexpected, since fewer fish species can be sampled, for example, by a benthic grab sampler than by a trawl, and vice versa for small and burrowing forms. As a result, polychaetes and bivalves often dominate the grab benthos; whereas large gastropods such as the Buccinidae dominate the trawl benthos (

Comparison of basins by species richness. Species richness is expected to increase from the Chukchi
Sea to the Sea of Japan and further into the Pacific Ocean, following the Humboldt-Wallace rule [91][92][93][94][95][96][97][98] . That is, proceeding from the poles to the equator, it indeed increases from north to south, following the decrease in latitude and increase in water temperature (Table 7, Fig. 4). However, for all macrofauna and many taxa, this generalization fails for the Sea of Japan, where species richness is significantly lower than expected: for most higher taxa (55%) it is lower than in the Sea of Okhotsk and for some taxa (40%) even lower than in the Bering Sea.
This phenomenon may have several explanations, each not necessarily contradicting the others. First, our data come from only the subarctic northwestern Sea of Japan (Fig. 1): there were no trawl surveys in the eastern and southern parts, where water temperature is higher and species richness is significantly higher. Second, for an extended geological period, the Sea of Japan was a shallow-water isolated basin. At present this basin is deep, with a narrow shelf and low temperature in deep-sea areas, isolated from deep-sea areas of adjacent basins by relatively narrow and shallow straits. Therefore the species richness of deep-sea fauna in the Sea of Japan is lower than in adjacent seas and the Pacific Ocean 1,99 . This is also true for fishes: there are 50 macrourid species known in the Pacific Ocean off Japan, and only one or two in the Sea of Japan; 33 myctophid species on the Pacific Ocean side and two in the Sea of Japan 99 . Third and finally, the area surveyed in the Sea of Japan is much smaller compared to that in the Bering Sea, the Sea of Okhotsk and the Pacific Ocean ( Table 1).
The species richness appears to be positively and statistically significantly related to all five of the sample size characteristics by which it was estimated (Fig. 5). All these relationships are satisfactorily described by  (1) As the area surveyed increases, new types of habitats with their inherent species are included, so species richness grows according to the long-standing and well-known "species-area" law 113,114 . (2) At low levels of species evenness, as with the fauna under consideration here 115 , many species occur rarely or very rarely. Therefore, no matter how large a sample is taken, there is always a chance that one more captured individual will belong to a rare species, still absent from the sample, even if the survey area does not expand and samples are taken in the same places. As a consequence, the more samples taken, the more time spent on sampling, the larger the total sample size (the area and volume surveyed by trawls), the number of individuals in samples used for estimation of species richness and, consequently, the higher the resulting species richness. It is worth noting that the effect of the listed sample characteristics on species richness decreases in reverse order (see Fig. 5). The number of individuals directly affects species richness (the strongest relationship). The influence of the total sample size is weakened due to the unequal number of individuals in each sample. The survey time is not directly proportional to the size of a sample, and moreover, to the number of individuals sampled. Finally, the number of samples (the weakest relationship) is a sample characteristic with the maximum uncertainty, since trawl hauls vary greatly in duration, speed, opening of the trawl mouth and catch size. In particular, this explains the above-mentioned phenomenon: fewer pelagic stations provide better comprehensiveness for pelagic surveys, and more stations are required for bottom surveys, since all other sample size features in the water column are larger than at the seafloor (see Table 1).
The same analysis was repeated taking the pelagic and seafloor data separately and similar results were obtained (Fig. 6). It was also found that species richness correlates with all sample size features more strongly in the pelagic zone than on the seabed. In general, there are more species at the seafloor than in the water column. In almost all cases, the relationships among variables are satisfactorily described by the multiplicative model and, in all equations, the value of the slope is close to 0.4. The exception is the "species-area" relationship in the benthic zone, which is better approximated by a simple linear model.
We recalculated five pairs of equations (Fig. 6) in the form of a multiplicative model with degree (slope) b the same for every pair "water column/bottom" (since in each pair the differences between these values are statistically insignificant) and factor (intercept) a different. On logarithmic scales, the regression lines of each pair were parallel straight lines, the values of b for different pairs (the upper half of Table 8) varied from 0.361 to 0.429 and did not significantly differ from 0.4. The values of a for each pair differed by a factor of 1.6-3.0.
A series of calculations was also made (the lower half of Table 8) in which b in all equations was a constant of 0.4 (i.e. all regression lines are parallel on logarithmic scales), and only the value of a was estimated (distance along the ordinate between parallel lines). At the same time, in each pair "water column/bottom", the values of a differed by a factor of 1.5-3.3. The results of these calculations show that estimates for species richness will increase with increasing extent and intensity of surveys, but with equal effort will yield 2-3 times more species on the seafloor than in the water column.

Comparison of basins by species composition.
Cluster analyses of the species lists using various similarity measures and algorithms for constructing dendrograms are summarized in Table 9 and Fig. 7. The results are inconclusive: first, the SL and UPA algorithms often yielded different results; and second, measures of the first type (Nos 1-5 in Table 9) take into account both presence and absence of a species, so the Pacific Ocean, with the longest species list, differs most strongly from all the other seas investigated (Fig. 7A-E). In the most common scenarios of clustering (A and B), the Bering and Okhotsk seas are most similar. The difference between A and B is in how similar to them is the Chukchi Sea (A) or the Sea of Japan (B). Measures of the second type (Nos 6-13 in Table 9) are characterized by separation not of the Pacific Ocean but of the Chukchi Sea (Fig. 7F-H) or the Sea of Japan (I). In the most common cases (F and G), the Bering and Okhotsk seas are also the closest in species composition. The difference between F and G is in how similar to this pair is the Sea of Japan or the Pacific Ocean. As a result, taking into account rare cases (C-E, H, I), we obtained nine different scenarios. We tried to reduce uncertainty in the results by analysing separately bottom and pelagic fauna or exclusively the fish fauna using the same methods, but the same scenarios plus a few more variants were obtained. Theoretically, any given method (the combination of a distance measure and a clustering algorithm) is no better than any other: they are not able to check statistical hypotheses about the adequacy of resulting classifications. Different methods of clustering give the same (or very similar) results where analyzed data sets are clearly divided into natural groups. The less clear the differences between the groups, the larger the number of specific clustering results that need to be checked-up in order to determine in them meaningful and predictive patterns 116 . It is suggested that clustering methods that take into account species occurrence or abundance (e.g. 117 ), might have yielded less ambiguous results, but this is a task for a separate study, the initial data of which go beyond the simple species list presented herein. We can only use as a basis the most frequent variants A, B, F and G, since the differences between them are relatively small and easily understood (see above)). Beyond these, the results should be checked using other non-hierarchical methods of multivariate analyses (cf. the recommendation of Kafanov et al. 116 ).
For check-up we used the non-metric multidimensional scaling (MDS): the algorithm based on the approach developed by Taguchi and Oono 118 ; and the principal coordinates analysis (PCO) also known as the metric multidimensional scaling (MMS), the algorithm from Davis 119 . The results from using these methods coincided almost completely, so here, only the results from MDS are shown (Fig. 8) and discussed.
Measures of the first group (Nos 1-3) yielded results similar to those measures 6-9 and 12, if the y-axis is reversed. Along this axis, B, O and J are separated by similar distances. The only noticeable difference is that, for measures 1-3), this first group is located closer to C, whereas (for measures 6-9, 12) it is shifted along the x-axis towards P. Measure 11 yielded results similar to those of measures 6-9 and 12, but rotated counter-clockwise. Applying measure 10 gave a similar result, the only difference being a slightly larger angle of rotation. The results of measures 4, 5 and 13 differed from all the others. Therefore, in ten out of thirteen measures, the points B, O and P form an almost equilateral triangle, with J and C outside it and farther from P; C closer to B; and J closer to O. This combination in general corresponds to the most frequent dendrograms (Fig. 7A,B,F and G), and also to the remoteness of these basins from each other (Fig. 9), the water exchange between them, animal migration possibilities, and faunal mixing, etc.

Conclusions
The present paper provides examples of the analysis of information present in the checklist compiled, revealing the following points of interest: (1) Trawls catch approximately 23% of all species of macrofauna ("the trawl macrofauna" in the presented checklist). These include all Cyclostomata species, 65% of fish species and not more than 11% of invertebrate species from the examined area.  Table 9. Classification of nine outputs (A-I) for comparison of trawl macrofauna composition among basins according to different approaches: 13 measures of similarity and 2 clustering algorithms (SL -single linkage, UPA -unweighted pair-group average). Resulting dendrograms (A-I) are shown in Fig. 7. Results based on binary similarity coefficients No. 1-3 were also obtained using five distances (dissimilarity coefficients) commonly used for abundance data -Euclidean, Gower, Hemming, Manhattan, and Jukes-Kantor. Measure 4 produced the same results as 1-Pearson's and 1-Spearman's correlations. Measures 6-11 yielded the same results as five distances -Chord, Bray-Curtis, Cosine, Morisita, and Horn. Ward's clustering algorithm was also applied but the output was similar to UPA and so is omitted here. (2) These percentages vary among basins and taxa. They are positively related to sample size (i.e. effort spent to examine a particular area) and negatively related to catching efficiency of a given trawl for a particular taxon.  Table 9.  Table 9 are shown at the base of each graph; letters in parentheses indicate corresponding clustering variants as in Table 9 and Fig. 7. The closest points are connected by lines on similar graphs. (3) Despite the enormous amount of material collected, the compiled list of 1,541 species is not complete. It will grow both with expansion of the study area and with continuing research in the area already examined, owing to future addition of rare species and/or species with low catching efficiency. (4) Such an increase in the number of species will be largely due to the near-bottom species, since their number is 2-3 times higher than that of pelagic species, and the pelagic zone is better studied than the benthic. Among all basins, the greatest increase in the species number can be expected in the Sea of Japan, since the trawl macrofauna of that region in Russia is inadequately studied. In the future, more valuable information can be obtained from the checklist presented herein using other methods of data processing and/or addition of data (such as abundance, occurrence and catches). Comparisons with similar lists from other areas or with lists from the same area obtained using different techniques also may be of interest. The list published here should be of interest to ichthyologists, hydrobiologists, ecologists, biogeographers, conservation biologists and fishery managers, as well as teachers and students of relevant specialties.