Introduction

Many man-made structures have been installed throughout the world’s oceans, where they inevitably interact with the environment in which they are situated. These structures affect many aspects of the physics and biology of the ecosystem in which they are placed, e.g. altering water flow1,2,3 and providing hard substrate on which sessile organisms can settle and grow4,5,6,7,8,9. The complex 3D structure can also act as a nursery ground10,11 and attract more mobile species from the surrounding area12,13,14,15. As such they can provide essential habitat, leading to areas of elevated local biodiversity16 and animal density14,17,18,19. This in turn can result in increased feeding opportunities for predators at higher trophic levels9,20,21,22. The interactions between a structure and its environment will vary with many aspects of the structure’s physical properties, including the material from which it is constructed4,23, its size, and shape24. They may also be affected by the properties of the location in which it is situated, included water movement patterns, water depth, and the availability of similar alternative natural (or unnatural) structures or substrates25.

Due to both the proximity to shore and the relative shallow depth of the seabed, the majority of human activity in the oceans is concentrated on the continental shelves. These areas are also where most man-made structures have been installed26,27,28, in particular in those accessible areas with reserves of valuable natural mineral resources. The northern Gulf of Mexico, the Persian Gulf, and the North Sea, for example, contain very high densities of man-made structures, installed as part of a network of offshore oil and gas infrastructure. These structures show extreme diversity in their size, form and function, including pipelines, subsea manifolds, wellheads and templates, and surface piercing platforms. Even within these broad classes of infrastructure type, there can be a great variety in the size and form of these structures; for example, pipelines can be exposed, buried, or covered, and can form spans. Other subsea infrastructure such as manifolds and anchor moorings can range greatly in their size and complexity. However, it is the surface piercing platforms which perhaps show the greatest diversity in size and design. The platforms themselves can be fixed (drilled into the seabed), floating (and kept in place by a system of mooring lines), or gravity-based (resting on the seabed, held in place simply by the weight of the structure itself). They can be made from either concrete or steel. Even platforms of the same style and material can be very different, with, for example, fixed steel platforms ranging from small normally-unmanned monopod platforms to large complexes consisting of several large, bridge-linked, 4- 6- or 8-legged platforms.

In the North Sea, as elsewhere, much of the current oil and gas infrastructure has been in place for several decades. It is now at, or nearing the end of, its operational life, such that it will soon require decommissioning29,30. The North Sea is currently covered by legislation which states that (with some exceptions) ‘the dumping, and leaving wholly or partly in place, of disused offshore installations with the maritime area is prohibited’31,32, and so decommissioning will involve the complete removal of these installations and associated infrastructure. As well as the large financial costs associated with the break-down and removal of these structures, there will also be potentially significant environmental impacts of the decommissioning process, through seabed disturbance33,34, potential contamination risk33,35,36, and indeed, through the removal of the habitat and opportunities afforded to the local flora and fauna by the structures’ physical presence37.

This has led to renewed interest in studying the ecological and environmental impacts of these structures. For regulators to make informed decisions about decommissioning options, more information is needed about the roles played by these structures within the North Sea ecosystem, and the potential impacts of their decommissioning and removal. To this end, many environmental studies have looked to describe or investigate the biology and ecology of these systems, with many more currently underway (e.g. INSITE, www.insitenorthsea.org). However, one thing that is rarely considered, or can be limiting to the broad utility of a study’s findings, is the selection of a representative sample of structures at which to collect data.

With such a large number of offshore oil and gas assets in the North Sea, it is practically (and, normally, financially) impossible to conduct in-depth sampling at all locations, e.g. sampling at every single surface piercing platform throughout the region. The environmental interactions and ecological impacts of two different structures may be vastly different, however, due to differences in their physical shape, size and design. As such, it may be impossible to extrapolate the findings of a study conducted on only a small number of a single type of platform, to other platform types and locations, and so the conclusions may not be useful when considering ecosystem-wide management planning.

To enhance the applicability of the findings of such studies, ensuring efficient use of the limited funding available (by eliminating the need to repeat the work for a different type of structure), it is essential that a representative sample of structures is selected. Alternatively, if a subsample has already been selected and surveyed, it is important to understand how representative the subsample is of the wider population, so that any limitations of the conclusions can be acknowledged.

To a) select a representative subsample before data collection or b) assess the representativeness of a previously selected subsample from the population of North Sea oil and gas platforms, a formal typology is required, whereby platforms are classified into clusters based on common characteristics. This will mean that, based on the relevant variables selected on which to base the clustering, variability within clusters is much lower than variability between clusters. The relative split of the population between clusters can then be used to either select a representative sample, or to assess the representativeness of a previously selected sample.

Methods

To create a formal typology, a comprehensive list of the items to be clustered [platforms in this case] is required, along with the corresponding complete dataset of variables on which the clustering will be based. Here, the OSPAR inventory of offshore installations38 was used for both the list of the ‘population’ of offshore platforms (n = 552) and the variables of interest to be used for the clustering. The variables selected were: hydrocarbon product, platform type, operational status, and substructure weight. Other variables were considered (e.g. water depth, whether the platform is manned or unmanned, latitude and longitude, and produced water disposal method), but were not included for reasons given later, see “Discussion”).

These variables include both categorical and continuous data, and so it was necessary to select a clustering methodology that performs effectively with mixed datasets. Partitioning around medoids39 (PAM) has previously been used for clustering with mixed categorical and continuous data, for a wide variety of applications, including, for example, identifying the psychological effects of COVID-1940, clustering fishing vessels into discrete fleets41, grouping Indonesian districts for priority for intervention to address stunting42, grouping estuaries by a range of biotic and abiotic factors43, grouping similar patients presenting with back pain44, and identifying different fishing tactics from catch composition45 among others46,47,48.

Prior to the execution of a clustering algorithm, some measure of the distance between individuals is required, based on the variables selected. Here, a Gower distance matrix was used49, due to its utility with mixed categorical and continuous data. Gower distance is calculated as an average of the distances between two individuals calculated for each variable being considered. If the variable is continuous, a standardised difference is used (absolute difference divided by the range), and if the variable is categorical, the distance is 1 if the individuals differ, or 0 if they are the same. One drawback of the Gower distance metric is that it is sensitive to outliers and non-normality of continuous variables. Consequentlly, due to the significant right-skewness of substructure weight, the data from this variable were log-transformed to approximate normality; a log(x+ 1) transformation was used due to the presence of zeroes in the data (e.g. from the floating structures).

The PAM algorithm applies the following steps, based on the Gower distance matrix, to assign a population of n individuals to k clusters:

  1. 1.

    Assign k randomly selected individuals as cluster medoids.

  2. 2.

    Assign all remaining n-k individuals to the cluster with the most proximate medoid.

  3. 3.

    Reassign as medoid the individual in each cluster which would yield the lowest average distance for that cluster.

  4. 4.

    If a change is made at step 3, return to step 2.

In order to select the optimum number of clusters, the average silhouette width of the population was calculated when arranged into 2–25 clusters. Silhouette width is a measure of the closeness of each individual to the rest of the individuals in its cluster, relative to their closeness to the individuals of the nearest neighbouring cluster. It is calculated as:

$$s\left(i\right)= \frac{b\left(i\right)-a(i)}{b(i)}$$

where for individual i, s(i) is the silhouette width, a(i) is the average dissimilarity from other members of i’s assigned cluster, and b(i) is the average dissimilarity from the members of the nearest neighbouring cluster, i.e. the minimum average dissimilarity between i and the members of each of the other clusters to which i was not assigned. The algorithm was applied using the ‘cluster’ package in the R statistical programming language50.

Results and discussion

Examining the average silhouette width revealed 13 to be the optimum number of clusters. These clusters, as assigned using the PAM algorithm, can be characterised using their medoids as an exemplar individual from the group (Table 1), similar in interpretation to the median of the group.

Table 1 Characteristics of the medoids of 13 clusters classifying the 552 oil and gas structures in the North Sea.

Using complete-linkage clustering, it is possible to build a dendrogram using the separation between the medoids hierarchically based on their Gower distances, to show the how clusters relate to one another in distance (Fig. 1). The most important variable for differentiating clusters was structure type (floating or fixed steel, concrete); the two largest splits separate out first the floating platforms, then the concrete platforms. Examining the spatial distribution of the various clusters, the most obvious spatial trend is a north–south split of oil and gas respectively (Fig. 2).

Figure 1
figure 1

Dendrogram displaying the distances between medoids of the clusters, as arranged by the clustering algorithm. The properties of the medoids are designated as structure type_status_product, with the abbreviations being used: Fi, Fl and Co for Fixed steel, Floating steel and Concrete gravity base; Op, Cl and Deco for Operational, Closed down and Decommissioned; and Oil, Gas and Con being Oil, Gas, and Condensate. The number of structures in the cluster (n) is given in brackets.

Figure 2
figure 2

Map of oil and gas platforms in the North Sea. Symbols denote the cluster to which the platform was assigned during the clustering process, and are coloured by hydrocarbon product. The clusters are designated as structure type_status_product, with the abbreviations being used: Fi, Fl and Co for Fixed steel, Floating steel and Concrete gravity base; Op, Cl and Deco for Operational, Closed down and Decommissioned; and Oil, Gas and Con being Oil, Gas, and Condensate. The map was generated using the ‘maps’ and ‘mapdata’ packages in R (v4.1.2; https://www.r-project.org).

A formal typology of the oil and gas platforms of the North Sea was created, classifying the 552 individual platforms into 13 clusters. With this typology, and the relative numbers of platforms in each cluster (Table 1), it is possible to select a representative subsample of structures as part of the survey design process for a study which is unable to visit the entire population of platforms. Alternatively, if a subsample has already been selected or sampled, or a survey designer does not have complete freedom to choose which platforms can be surveyed, the representativeness of a sample can be assessed, and so the applicability of the results to the wider population can be highlighted.

The variables selected here were relatively basic, dealing only with some aspects of the platforms’ physical size and structure, as well as the hydrocarbon product. For each specific application, a set of variables which are likely to be important in the context of the ecological question being asked should be selected, where available. One difficulty in this, is that the currently available publicly accessible databases (e.g. the OSPAR and OGA databases) are lacking information on some important variables, are incomplete in their records of others, and indeed are inaccurate in yet others.

For example, for a study of fish around oil and gas platforms, there are factors relating to substances discharged from the platform which may affect the fish populations below them. These included whether the structures are normally manned or unmanned (and so have discharge of organic matter in the form of kitchen waste and black- and grey-water), and whether the platform is permitted to discharge produced water (formation water extracted along with the hydrocarbon product and process chemicals) or reinject it back into the reservoir. These data, however, are not included in any public database, and so would require a significant data-mining effort to collect for the entire population, something which was beyond the scope of this study. It may be possible to gather data on these variables for a small selection of platforms (e.g. by contacting the operators directly) and so they could at least be reported as a factor which may affect the ecology of the platforms, even if their comparability with the wider population of unsampled platforms is unknown.

There are also transient variables which can differ temporally at any given platform but may impact the surrounding environment, particularly mobile species which can vary their distribution over short timescales (whereas sessile organisms cannot). For example, activities such as drilling will emit noise and vibration into the surrounding water, but only whilst they are actively occurring. These activities can vary over a range of timescales, but can extend up to several months at a time of activity or inactivity. While these variables might be impossible to include in a typology (due to both their highly transient nature and the amount of data gathering required for their inclusion, as mentioned above), it is essential that they be considered as important contextual information which may bias the data collected at any given time and location.

Some variables have been deliberately omitted following consideration of their relative importance to the ‘definition’ of each cluster. Water depth, for example, could have been included due to the potential influence it would have on the ecology of the system around a platform51,52,53,54,55,56. Additionally, platform location (latitude, longitude, or both) could have been included in the clustering process, as they will affect the ecology of the site57,58. These variables were omitted, however, because they are more descriptors of the environment itself, than of the platform. It was decided therefore, that only information about the platform itself would be used for clustering, and the environmental variables can be controlled for (or investigated) as part of the survey design or data analysis of the environmental study. For example, it will be important to look at the distribution of water depths in each cluster, post-hoc, and ensure that representative samples (in particular in the event of a bimodal distribution in the depth data of a given cluster) are selected.

One thing that became apparent over the course of this study is the need for high quality, accurate, publicly accessible databases to be maintained, so that the sort of analysis carried out here can be conducted for future studies using case-appropriate variables for clustering. Much of the information resulting from ecological studies of oil and gas infrastructure may be limited by the number of platforms sampled and a lack of clarity over the respresentativeness of the subsample selected. The current readily accessible databases, while a useful starting point, are limited in the number of potentially ecologically relevant variables they contain, and there are some issues with the accuracy and maintenance of some of the datasets contained therein (e.g. the location data in the OSPAR inventory of the offshore installations contains numerous inaccuracies).

Conclusions

A typology of oil and gas structures in a given study area (here, the North Sea) is essential for selecting a subsample which is suitably representative of the wider ‘population’. This will increase the extent to which the conclusions drawn from a study can be generalised, allowing the more efficient use of limited resources available for such studies. The work highlights the need for high quality, accurate databases of information about offshore oil and gas infrastructure to be maintained (including a range of relevant variables) so that a similar typology can be created using any and all characteristics deemed of importance to a new study.