Population biobanks meet GIS

Population biobanks are designed to aid studies into the causes of health and disease by linking nature (genotype), nurture (phenotype) and exposure (environment). To that end, they collect data about the phenotype, genotype, lifestyle and family medical history from large numbers of participants, with longitudinal follow-up.1 To link the individual phenotypes and genotypes to their past, present and future exposures to their environments, biobanks are increasingly incorporating multiple environmental variables such as air and noise pollution exposure data from traffic-related sources, electromagnetic field exposures data from mobile phone masts or overhead powerlines and exposures related to industrial point sources. A few of these environmental data are available from public sources but many are estimated for specific research projects. Such data can be linked to recorded place of residence, school or workplace through the use of Geographical Information Systems (GIS).

GIS have been defined as computer systems designed to capture, store, process and display spatial or geographical data helping to understand patterns and relationships. GIS was used to link population to exposure data within the recent ESCAPE study where air pollution exposures were combined with population data from over 30 cohorts/biobanks2 and investigating the effects on a range of health end points.3, 4 Currently the BioSHaRE project is investigating the link between air pollution and noise exposures and health effects within UK Biobank, the Norwegian Hunt Study, the Dutch population biobank Lifelines and EPIC. Within these studies, environmental data were assigned to place of residence, which potentially could help identify the local health hazards of the biobank participants concerned. In addition, in some studies such as LifeLines, occupational addresses of participants are also being collected and geocoded. Address history available from the Municipal Personal Record Database provides insights in residential mobility and length of exposure to different environments, enabling cradle to grave exposure assessment.

Exposing participants by publishing their exposure

As science dictates, any associations observed should be published, for peer review and verification. As an upside, publication of local health hazards inform public health policies, and perhaps help advance personal geo-medicine.5 As a downside, these publications might expose the pertinent biobank participants – and non-participants living in the same area – to social and economic risks. If published at a high spatial resolution, using modern day visualization techniques, the findings would enable the rest of the world to, literally, zoom in and out of their local health hazards: simply imagine Google Street View enriched with scientifically published exposure studies. More specifically, published biobank exposure studies could potentially provide knowledge or information that could be used to deny or condition access of both their participants and non-participant neighbors to private or social services or impact on the value of their private property. Just as the publication of two avian (H5N1) influenza studies recently re-ignited concerns over misuse that could undermine ‘bio-security’,6 the publication of biobank geo-exposure studies and the underlying health data, create concern over possible misuse thereby undermining ‘social, private and economic security’.

Risks real or perceived?

Presently, it is unknown whether these risks are perceived or real. And even if they are found to be real, these risks should neither override the requirements of science, that findings be published, nor the interests of public health, that epidemiological outbreaks be reported. However, the situation we describe is somewhat different from public health studies of epidemiological outbreaks. Outbreaks of infectious disease are (certainly in the UK and in other European countries) covered by specific legislation that protects public health in which the rights of the individual may be outweighed by needs to protect the health of the population as a whole. Moreover, if registry data (eg, cancer registration) are being explored by public health professionals to consider, for example, reporting clusters of a particular cancer, there will be conditions of use in place that protect inadvertent disclosure through either small numbers in a cell of a table or by mapping. Also, access to the data in both infectious disease outbreak and in clusters of cancers/chronic disease will usually be restricted to approved individuals usually with some health care remit (eg, public health doctors, infection control nurses, etc), that is, who have a 'duty of care' and who are familiar with dealing with identifiable (confidential) patient data.

The situation we describe is also different from that of government agencies who make their spatial data available to the general public.7 It is generally presumed that exposure data that is publicly available, could be used by anyone with internet access to link exposure to location. But while this is true for some exposures, our own experience suggests that many publicly available environmental data are at a rather coarse spatial resolution, or only available in certain locations or have other limitations that can hinder estimation of community- and individual-level exposures.8 In contrast, biobank geo-exposure studies actually link and combine genotype, phenotype and exposure data to establish relationships and associations. And the exposure data are collected over the participant’s lifespan, from before birth (for three generation studies) until death. Publication of these findings could expose both participants and non-participants to the risk that the public and/or the private sector abuse the refined biobank data as scientifically validated estimates of exposures in the local community concerned.

Assessing the risks

As population biobanks have a legal obligation to inform their participants about all risks of their participation as well as to promote responsible publishing, we recommend that they identify, assess and mitigate the risks posed by the performance and/or the publication of GIS-enabled exposure studies to the legitimate interests of their participants and the public. Although we did not perform an exhaustive survey, it seems that policies to deal with these issues vary considerably. In the ESCAPE study, for example, some cohorts have very strict policies where no outputs with mapping below region level are permitted and geocodes should be separated from health data, whereas some other cohorts have no restrictions at all.

To that end, we submit that population biobanks and/or their supervising review boards subject the collection and/or publication of biobank geo-exposure data and studies to a social and economic human rights impact assessment to:

  • Analyze whether geo-referencing of address data of biobank participants, (even when encrypted with restricted researcher access), is covered by their consent or by IRB approval.

  • Examine whether statutory protections against misapplication of (personal) health and genetic data extend to biobank exposure findings and data.

  • Determine whether the risks described above are real and, if so, to what extent these studies indeed qualify as ‘dual use research of concern’, that is, research conducted for legitimate purposes that can be utilized for benevolent or harmful purposes.6

  • Undertake a risk–benefit assessment, to balance the divergent arguments in favor of (unfettered) publication – the potential to damage the social and economic human rights and interests of both biobank participants and non-participants and against publication – progress of science and promotion of public health.

  • If the arguments in favor of publication prevail, does the published matter need to be sensitive to non-identifiability concerns (eg, publish an overall dose–response association but not maps of rates for areas with small numbers of individuals).

As an additional benefit, such an impact assessment would help cultivate a sense of responsibility for potential misapplications of population biobank GIS publications among population biobankers and scientists.

At present, while individual scientists might consider the social and economic human rights implications of their publications as a matter of responsible reporting, there seems to be no systematic approach for a thorough assessment of impact, at least, not for population biobanks. This gap should be filled, either by a set of ‘points to consider’ as part of the review of research protocol and publication, a Code of Conduct or an institutional oversight model, as appropriate. For the latter, the US Government Policy for Institutional Oversight of Life Sciences Dual Use Research of Concern to Bio Security could serve as an example.9