Background & Summary

Collecting historical data on species diversity and occurrence from time periods earlier than the major impactful human activities taking place (e.g., damming, the Industrial Revolution, modern fishing, and river channelisation), may lead to an improvement of knowledge about species ecology. However, historical documents have limitations that need to be understood to avoid incorrect interpretations1 and/or extrapolations2. There is cultural filtering that affects not only the spatial and temporal availability, completeness, and reliability of documentary records but also their quantity and quality2. Most historical records rely factually on questionnaires/interviews, hence if there are erroneous answers, the inventories or scientific surveys will present incorrect data3. Nevertheless, the utility of historical insight cannot be underestimated and using historical data for ecological studies is valid2. Using an interdisciplinary approach4, combining information from several independent spatial and temporal sources1,2,4,5, cross-checking lines of evidence with independent datasets6 and blending different methods2 can help mitigate the limitations and lead to more accurate knowledge about past ecosystem conditions.

For some organisms and specific purposes, historical data might be essential to model the potential species distribution (e.g., Lassalle and Rochard7, Clavero and Hermoso8) because current distributions are often highly constrained by anthropogenic pressures that alter the natural realised ecological niche. A typical example is the case of diadromous fish species with their inland progression being gradually constrained by the presence of artificial barriers8,9. Consequently, current occurrence data will only cover a restricted range of the full conditions tolerated by species. To create the Portuguese Historical Fish Database (PHish–DB) we scouted 194 historical documents, resulting in 2214 records from 30 basins, 280 sub-basins and 490 segments. Data collection started in 2007 and was performed by researchers in history and ecology. Despite some underrepresentation of coastal areas, the spatial distribution of the historical records is homogeneous throughout the country and covers all the major river basins (Fig. 1). Three international river basins stand out (Douro, Minho and Guadiana) with a high number of records (Fig. 1a). The sub-basins with the highest number of records are from the River Tâmega (Douro) and River Zêzere (Tagus) (Fig. 1b). Spatial acuity of the records depended on the information present in the historical source. Thus, we opted for a three-scale approach to maximise the collected information. This has resulted in 557 records limited to the basin scale, 184 reaching the sub-basin scale, and 1473 records identified down to the highest accurate spatial scale, the river segment. PHish database covers a time span of one millennium, from the 11th until the 20th century, having a larger number of records for the second half of the millennium and particularly for the 18th and 19th centuries (Fig. 1c). The Interpretation of historical data can be very subjective2, and matching ancient fish common names with current taxonomy was challenging. To minimise uncertainty in the taxonomical classification of a fish record, a conservative approach was followed to establish the adequate taxonomical groups. Of the 25 group names defined from the records gathered, three stood out: Petromizontidae, Chondrostoma sp. and Salmo trutta (Fig. 1d).

Figure 1: Summary results of the PHish database.
figure 1

(a)–Number of Records per river basin; (b)–Number of Records per river sub-basin; (c)–Number of Records per century; (d)–Number of Records per taxonomical group present in the field “Group Name”.

The information present in the database has been partially used in the work of Segurado, et al.10, and also incorporated in a relevant European project, the European Fish Index–Plus (EFI + ) ( This database can nevertheless be useful to: improve existing scientific knowledge in Iberian context (e.g., Clavero and Hermoso8, Clavero, et al.11); expand scientific knowledge in European context via an Iberian occurrence scenario of a species with broad-European distribution (e.g., Filipe, et al.12); be used for research where historical interactions between human activities and riverine fish communities and population are relevant.


These methods are expanded and updated versions of descriptions in our related work Segurado, et al.10.

Historical sources

The present historical records compilation of riverine fish distribution was based on geographical dictionaries and other published information for Portugal, dated between the 11th and early 20th centuries. Portugal is the most south-western part of the European continent, representing 15% of the Iberian Peninsula. There are four major international rivers (Douro, Guadiana, Minho, and Tagus) and numerous other relevant national rivers. The available historical information on fish populations for this period was almost exclusively based on qualitative data of species occurrence. Available sources dated before the 16th century included charters, inquiries, donations, and monastic chronicles. From the 16th century onwards, more thorough recordings of the patrimony of the Portuguese kingdom were available, with the emergence of chorographies, historical-geographical memos, parish inquiries and dictionaries that recorded historically and geographically the Portuguese landscape. In addition to these sources, information from private libraries was also included. A total of 194 documents were consulted (Table 1 (available online only)). These historical sources contain information varying from aspects of the Portuguese physical territory, records about the natural resources of rivers, or cultural context of fisheries exploitation. Most of this data were compiled in the context of the EU projects EFI + ( and DURERO (Douro River Basin: Water Resources, Water Accounts and Target Sustainability Indices;, with the main purpose of providing data on reference conditions to compute biotic indicators based on diadromous species. Many regions of Europe have been shaped for centuries by human activities, leading to an absence of natural reference conditions for many water body types13. Hence, the definition of benchmark conditions may depend on the availability of historical sources of information on species occurrence14. This is especially relevant in the context of the Water Framework Directive of the European Union (WFD)15, which involves the assessment of the ecological quality of water bodies using the reference condition approach, in which quality classes are defined according to deviations from benchmark conditions.

Table 1 List of historical sources surveyed to create the PHish database.

Taxonomical precision

Taxonomic acuity is critical to provide the best possible taxonomy insights from historical records and to derive reliable databases to be used as sources of information to test scientific or management hypotheses. However, this condition is rather challenging to attain when looking at large spatial scales. Indeed, in historical texts, the norm is to use local common names, mostly because many of the records predate the scientific description of the species. Therefore, the first step to produce this database was to establish a reference list by collecting and compiling ancient and current common names with their correspondence to scientific nomenclature. The second step was to attribute a valid species to each record, with an extra challenge when distinct common names are attributed to the same species among different regions. However, the most challenging issues are posed when several species share common names in certain regions or when very similar and even congener species are sympatric. Despite these caveats, because of the known present distribution of species, the reduced sympatry of similar species and the fact that most shared common names are of similar allopatric species, it is possible to attribute valid scientific identities to each record without errors. Whenever this attribution was impossible or uncertain, the genus, family or order was attributed to the record, instead of the species-specific epithet. This was the case for the genera Alosa, Luciobarbus, Lampetra, Salmo and Squalius, for the families Petromyzontidae and Mugilidae, and for order Pleuronectiformes. For the nases, it was decided to use an older genus’ name – Chondrostoma, valid in Europe, but with no current taxonomic validity in the Iberian Peninsula – that currently represents seven species in this database. This older genus aggregates three recently described genera (Achondrostoma, Iberochondrostoma and Pseudochondrostoma)16 that are, basin-wise, sympatric, coexisting at least two of these genera per basin with historical records. For Pleuronectiformes, the decision was made because it was unsure whether the species record corresponded to a freshwater species (Plactichthys flesus) or to a marine fish, of which there are several species. A conservative approach was followed and whenever a possibility for misinterpretation existed, the species were aggregated to the corresponding upper taxonomic level under the column “Group Name” (information that we recommend to use without any uncertainty). Whenever there were plausible reasons to believe that the record belonged to a given species, the full scientific binomial nomenclature was attributed to the column “Sub-group Name”. This has some associated uncertainty as the decision was made by expert judgement based on the available information. Whenever no plausibility existed, the higher taxonomic group (genus or family) was maintained without the attribution of a “Sub-group Name”. If there existed a possibility of confusion between species that did not fit the higher taxonomical groups defined, NA was attributed to “Group Name”. If plausible, an educated guess, for a species or a taxonomical group, was made into the “Sub-group Name”, based on the interpretation of the historical text extract. All the species and species groups considered are detailed in Table 2. To add value to the database, whenever available, information about the phenology and conservation status (national and international) was included.

Table 2 Combination of “Group Name” field and “Sub-group Name” field occurring in the historical records table of the PHish database.


To create a spatial representation of the historical data we have used the Catchment Characterisation and Modelling– River and Catchment database v2.1 (CCM2) ( An advantage of this pan-European database is its hierarchical structure, besides representing a fully integrated system between rivers and drainage catchments17. Using three spatial scales (basin, sub-basin and segment) allowed storing historical records with distinct spatial accuracy. Even though finer scales are more informative, historical data at a coarser scale is not irrelevant. For the basin scale, we used the identification code that CCM2 gives for each basin (WSO_ID) to link an historical record to this scale level. The same procedure was established at the segment scale, using the ID code that CCM2 assigns for each segment (WSO1_ID). Since CCM2 does not have any identification or spatial representation of the sub-basins within each sea outlet basin, we used a free software to create this information, the River Network Toolkit (RivTool). This software (available at uses integrated data about river networks and landscape/environmental datasets to produce new or aggregated data via calculations that consider the directional hierarchical network nature of rivers. The set of natural sub-basins of all sea outlet basins of the study area was created using the “sub-basin ID” function of RivTool.

The descriptions found in the historical sources varied greatly in their geographical precision. Most presence records referred to a given river or stream within a restricted region, usually described as being near a given village, township or city. When the geographic location was available, the record was georeferenced in a Geographical Information System (GIS) using CCM2. These were the most spatially precise records, the segment scale, where a segment corresponds to a river reach between two consecutive tributaries. In some cases, regions or town names were obsolete and further investigation was needed to clarify the current location and/or designation associated with that former nomination. Nevertheless, for some historical records, the former names did not have any information or relation with the current designations or did not have enough precision to be linked to a river segment. In those cases, the record was attributed to a higher spatial scale (Sub-basin or Basin). Data entries that could only be related to a watercourse that is a major river or stream that flows to the Atlantic Ocean, coastal lagoons or estuaries, were considered as low precision records and spatially defined at the basin scale. When the watercourse was identified as a tributary (i.e., smaller river or stream not flowing to the Ocean, coastal lagoon or estuary), the precision was considered higher and the record was spatially defined at the Sub-basin scale.

Data Records

A relational database structure (Fig. 2) was created in Microsoft Access® (available in the .accdb file extension) to adequately organise and store the historical data collected with their spatial and temporal dependencies, and also to maintain their link to the historical sources (Table 1 (available online only)). The PHish database is publicly available at the Open Science Framework (Data Citation 1) and at the University of Lisbon, School of Agriculture repository The database contains six tables: three related with spatial organisation, “Basins”, “Sub-basins”, “Segments”; one with taxonomical identification, “Taxonomical Groups”; one establishing the details of historical sources, “Historical Documents”; and finally, one aggregating historical record information with respective spatial, taxonomical and historical source information, “Historical Records” (Table 3 (available online only)). The latter table is the core of the relational database structure, relating to all other tables (Fig. 2) and where resulting historical data are stored (Table 3 (available online only)).

Figure 2: The relational structure of the Portuguese Historical Fish Database.
figure 2

Each box represents one table, with the header of a box indicating the name of the table, followed by the list of fields included in the table. The red lines indicate the primary relationship between tables; red fields are the primary key of each table; fields in blue indicate secondary relationships between tables.

Table 3 Fields and respective description contained in each table of the Portuguese Historical Fish Database.

Technical Validation

Interpretation of historical data can be very subjective and historical science is mostly inductive2. To increase objectivity and guarantee a correct assessment of historical information it is necessary to perform a critical evaluation of sources3, while comparing and combining multiple and independent sources and methods18. Special attention was taken to verify if authors were not just replicating information from other sources, and by that leading to duplication of results in the database. This was done not only while researchers were reading and surveying the historical documents and sources, but also by analysing, comparing and searching within the final set of historical records for similarities. For example, combined similarities in taxonomical groups and spatial references, similarities in paragraphs, sentences or parts of sentences were normally an indication that the author was just citing text from another document without acknowledging it explicitly. Whenever there was reasonable doubt about the originality of the information present in the historical source, or of the historical record, only the oldest one was included in the database.

Despite the numerous hurdles, taxonomical identification of ancient species common names followed a conservative approach that guaranteed no uncertainty for the “Group Name” field. Concerning the “Sub-group Name” field, the integration of information between spatial location and taxonomical identification of a record, the reliable considerations based on literature and the knowledge of experienced ecologists assured low levels of uncertainty. Moreover, when there was reasonable doubt or lack of plausibility, no consideration was made.

Spatial information for the records location was primarily accessed based on three Portuguese official water management and administrative sources at GIS environment: 1) Rivers map (Shapefile from; 2) Administrative regions and municipalities map (Shapefile from; and 3) Online orthophotomaps (WMS link from When the record city/council name or region was not easily connected to the information available in maps, numerous municipalities and parish websites were consulted, along with other websites from relevant local or regional associations, to help identify the more site-specific or out-dated spatial references. The connection with the CCM2 database was performed only after this thorough process. Records for places or historical locations which were not geographically identifiable were conservatively handled, either by discarding or including them in upper spatial scales (sub-basin or basin scale) when the river name was objectively identified.

Usage Notes

Just like every database of historical records, the PHish database is neither definitive nor complete. All reasonable and possible updates will be held, though nevertheless dependent on resources and future funding. Methods will be maintained to avoid usage biases and/or interpretation issues. Future surveys for historical data should focus on rectifying spatial and temporal data heterogeneity. Obtaining information for older times (backward from the 16th century) and focusing on the coastal areas of Portugal should bridge the spatial and temporal knowledge disparity.

Despite our best efforts, and even considering future updates to this database, true species occurrence will inevitably be broader than what historians and chroniclers may have reported. The cultural filtering2, accidental or intentional destruction of documents, doubtful sources of the historians/chroniclers and bias towards certain species3 affect both spatial and temporal availability, completeness, and reliability of documentary records2. In England, copyright law was limited to a special group of people until the 18th century; indeed documents availability was still censored and limited to printers and publishers rights rather than to authors properties19. The concept of author’s intellectual property over its work only proliferated during the 19th century, particularly in culturally developed countries such as France and Great Britain20. This means that at least until the 18th century, and in Portugal most likely until the 19th century, authors could quote other works without acknowledging them. Thus, data duplication is a possibility within this database, though we consider it in a very low probability given our cautious approach to this conundrum. Also, users must be aware that PHish database is a presence-only compilation of historical records. Without a wary systematic sampling, absence data is inevitably prone to a high degree of uncertainty, and to our best knowledge, no modern-day systematic survey of fish assemblages across the country was undertaken in Portugal until the end of the 20th century.

The lack of data for coastal areas, and more specifically for the southern coastal regions of Portugal may result from several particular circumstances. Smaller basins are composed of smaller rivers and inevitably with less human settlements. Adding to this, southern Iberian rivers are Mediterranean-type freshwater ecosystems strongly shaped by autumn/winter flooding and summer drought events21. This seasonal instability has implications in the structure of freshwater communities22, meaning that these rivers will probably tend to support less interesting fishing areas and species. The database is also temporally unbalanced, i.e., although covering a vast time-scale it does not represent a consistent time-series due to lack or reduced number of records for the first half of the millennium. Future updates to this database will probably not fully overcome this as it is also the result of temporal filtering3 of historical sources. Another relevant issue is the heterogeneity of the established taxonomical groups in the user recommended field (Group Name). Our conservative approach followed herein avoids uncertainty in this classification, translating into correct and objective taxonomical information. However, for example, it may be thwarting to use some taxonomical groups (e.g., “Mugilidae” or “Chondrostoma”) when the objective is to perform species environmental niche modelling.

All mentioned issues, biases and unbalances are normal for historical databases, and none of them hampers the usage of the database. To our knowledge, this database is the first public compilation of historical distribution data based on freshwater fish species in Portuguese rivers and South-western Europe. Notwithstanding, users should keep in mind all these features and caveats whenever making any considerations or extrapolation based on this database. The PHish database is geographically limited since it is restricted to inland Portugal. However, the current database contains valuable information by covering data for the Portuguese areas, including sea outlets, of four Iberian international rivers and most Portuguese major watercourses. Moreover, because Portuguese data were not compiled and made available until now, researchers have been using only Spanish records to study fish species distribution for the whole Iberian Peninsula, minimizing its importance as a meaningful biogeographical entity23. By using only Spanish data, authors concede to uncertain premises (e.g., Clavero and Villero24) and/or extrapolate when predicting for the entire Iberian Peninsula (e.g., Clavero and Hermoso8). Hence, this database will fill an important gap in the current knowledge and contribute to the development of new studies covering the whole Iberian Peninsula without being hindered by political borders.

Additional information

How to cite this article: Duarte, G. et al. One millennium of historical freshwater fish occurrence data for Portuguese rivers and streams. Sci. Data 5:180163 doi: 10.1038/sdata.2018.163 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.