Building an international consortium for tracking coronavirus health status

Segal, Eran; Zhang, Feng; Lin, Xihong; King, Gary; Shalem, Ophir; Shilo, Smadar; Allen, William E.; Alquaddoomi, Faisal; Altae-Tran, Han; Anders, Simon; Balicer, Ran; Bauman, Tal; Bonilla, Ximena; Booman, Gisel; Chan, Andrew T.; Cohen, Ori; Coletti, Silvano; Davidson, Natalie; Dor, Yuval; Drew, David A.; Elemento, Olivier; Evans, Georgina; Ewels, Phil; Gale, Joshua; Gavrieli, Amir; Geiger, Benjamin; Grad, Yonatan H.; Greene, Casey S.; Hajirasouliha, Iman; Jerala, Roman; Kahles, Andre; Kallioniemi, Olli; Keshet, Ayya; Kocarev, Ljupco; Landua, Gregory; Meir, Tomer; Muller, Aline; Nguyen, Long H.; Oresic, Matej; Ovchinnikova, Svetlana; Peterson, Hedi; Prodanova, Jana; Rajagopal, Jay; Rätsch, Gunnar; Rossman, Hagai; Rung, Johan; Sboner, Andrea; Sigaras, Alexandros; Spector, Tim; Steinherz, Ron; Stevens, Irene; Vilo, Jaak; Wilmes, Paul

doi:10.1038/s41591-020-0929-x

Download PDF

Comment
Published: 02 June 2020

Building an international consortium for tracking coronavirus health status

Nature Medicine volume 26, pages 1161–1165 (2020)Cite this article

6991 Accesses
16 Citations
33 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 26 June 2020

This article has been updated

We call upon the research community to standardize efforts to use daily self-reported data about COVID-19 symptoms in the response to the pandemic and to form a collaborative consortium to maximize global gain while protecting participant privacy.

The rapid and global spread of COVID-19 led the World Health Organization to declare it a pandemic on 11 March 2020. One factor contributing to the spread of the pandemic is the lack of information about who is infected, in large part because of the lack of testing. This facilitated the silent spread of the causative coronavirus (SARS-CoV-2), which led to delays in public-health and government responses and an explosion in cases. In countries that have tested more aggressively and that had the capacity to transparently share this data, such as South Korea and Singapore, the spread of disease has been greatly slowed¹.

Although efforts are underway around the world to substantially ramp up testing capacity, technology-driven approaches to collecting self-reported information can fill an immediate need and complement official diagnostic results. This type of approach has been used for tracking other diseases, notably influenza². The information collected may include health status that is self-reported through surveys, including those from mobile apps; results of diagnostic laboratory tests; and other static and real-time geospatial data. The collection of privacy-protected information from volunteers about health status over time may enable researchers to leverage these data to predict, respond to and learn about the spread of COVID-19. Given the global nature of the disease, we aim to form an international consortium, tentatively named the ‘Coronavirus Census Collective’, to serve as a hub for amassing this type of data and to create a unified platform for global epidemiological data collection and analysis.

The mission of the Coronavirus Census Collective

The Coronavirus Census Collective (CCC) will be committed to a mission of saving lives through the open sharing of information covering all aspects of the COVID-19 pandemic, to the greatest extent possible, while simultaneously ensuring privacy. This infrastructure could immediately help in the current COVID-19 pandemic, and its consolidation will also facilitate the global response to other communicable diseases that may emerge in the future, as well as those that are currently present. Although it is our hope that the capacity for diagnostic testing will rapidly increase, testing will probably never provide global population-wide coverage, and there is thus a critical and immediate need for collecting additional data on self-reported symptoms and health status at a population level. Moreover, our plan is to integrate the growing official diagnostic testing data from reverse-transcriptase PCR results and serology results with real-time informative data, such as information on self-reported symptoms, to better estimate the incidence of symptoms experienced by patients diagnosed with COVID-19 and to improve the epidemiological and predictive models that we will develop.

The CCC will integrate these data streams and thereby increase the statistical power of analyses carried out on these data, and provide a framework for data collection and a single, central data bank that researchers from around the world can query securely. In addition to ensuring comparability in the data, the CCC will serve as a resource for entities in other countries that are developing surveys for self-reporting, with the goal of facilitating their rapid deployment around the world. The CCC will seek individual grants in the respective countries as well as international funding for its central activities. As many countries are now struggling to find the best strategy to handle the pandemic, we believe that a global collaborative effort to obtain data that can be used to predict outbreaks of COVID-19 is urgently needed. In the long term, this will serve as a rich source of information for understanding disease outbreaks that could facilitate policy decision-making and ensure that society is better positioned to respond to and prepare for future pandemics.

In carrying out this global endeavor, it is very important to engage low- and middle-income countries and to make efforts to include populations that are under-represented or of lower socioeconomic status, such as by distributing the survey in several languages, engaging leaders in local communities and promoting the survey through several diverse channels to increase compliance across all sectors of the population.

Current status of data collection

Early epidemiological studies of the COVID-19 outbreak in China, where COVID-19 was first documented, demonstrated the overwhelming importance of slowing the rate of transmission to reduce the spread of COVID-19^3,4,5. Slowing the rate of transmission requires information about who is infected and where they are located. The Chinese government achieved this goal in part through testing large numbers of people thought to be infected and moving people who had positive test results into isolation^3,4,5. In South Korea, government officials took an approach of combining large-scale testing with transparent data sharing on the whereabouts of affected people to contain the outbreak. Although the success of these approaches is clear, there are two major limitations that prevent their application in the vast majority of other countries: diagnostic testing capacity, and personal and health privacy laws. For example, in the USA, currently the world epicenter of the pandemic, large-scale testing for SARS-CoV-2 infection is still not available in the vast majority of states. For several weeks after community spreading had been documented in multiple US states, testing was still limited to people with severe symptoms and with international travel history to places with early outbreaks (such as China)⁶. With limited capacity for testing, the numbers reported as ‘confirmed cases’ in these countries (e.g., https://coronavirus.jhu.edu/) probably do not reflect the true numbers or the actual rate of COVID-19 spread, as screening tests reveal a substantial number of asymptomatic people infected with SARS-CoV-2^7,8,9.

Technology-driven approaches, such as designated apps or online surveys for collecting voluntarily self-reported data on health status, can overcome these limitations. These data can be further integrated with other relevant real-time data resources, including meteorological data and population density at a given time and place, as well as other dynamic data sources. Together, these can provide crucial information that can be immediately leveraged for early identification of disease clusters, with the goal of slowing the spread of disease.

Several such efforts have been started independently. For example, in Israel, a daily one-minute online survey, called Predict-Corona (https://coronaisrael.org/) (Fig. 1a), has been developed, in which respondents are asked to report their daily body temperature and whether they are experiencing any of the symptoms found to be common in patients with COVID-19, according to the existing literature. Within approximately 2 months of its launch, there have been over 2.5 million responses, and initial analyses demonstrate the potential of this approach for detecting future outbreak regions¹⁰. In the USA, the app HowWeFeel (http://www.howwefeel.org), which administers a 30-second survey of the person’s well-being to collect epidemiological data, has been developed (Fig. 1b). Similarly, CovidNearYou (https://covidnearyou.org/) and covid19 Risk Survey, a web app with open-access visualization tools collecting the same information in a fully anonymous way (https://covid19.eipm-research.org), were launched. In the USA and the UK, the Coronovirus Pandemic Consortium was established (https://www.monganinstitute.org/cope-consortium) to use the Covid Symptom Tracker (https://covid.joinzoe.com) and join the efforts of international prospective observational cohorts (e.g., population based, clinical data) and clinical trials¹¹. This has led to a model that predicted COVID-19 incidence in over 2.7 million users¹².

**Fig. 1: Examples of questionnaires for symptom tracking.**

This work parallels efforts in other countries to develop similar surveys, such as the COVID-19:CH Survey Project, which was developed in Switzerland (https://covidtracker.ch), and COVID-19 self-reporting in Slovenia (https://covid-19-stats.si/) and Estonia (koroona.ut.ee). These apps are currently managed and distributed by researchers in their respective research institutions.

A framework for collecting and sharing data

Survey responses containing symptom information relevant to COVID-19, along with geospatial location, time, and demographic information on age and pre-existing and comorbid medical conditions, will be collected. Once the data have been collected, they will be de-identified and made further differentially private¹³ before researchers obtain access. Indeed, where feasible, we are implementing differential privacy at the data-collection source — the cellphone, website or app — so that any information that leaves the control of the research subjects cannot be used to learn about any person or whether a person is part of our dataset. We thus plan to provide specific mathematical assurances for the privacy of research subjects while still enabling scholars to make statistically valid inferences from these data. Because the privacy of our participants is essential to our mission, we commit to remaining at the cutting edge of privacy-protective technologies, developing novel methods where needed and implementing improvements over time wherever possible. Given differences in privacy regulations from region to region, individual-level data from the surveys from some countries may not be accessible, but we will aim to make the results of differentially private statistical analyses available to all researchers. The mechanism of data sharing has not been determined yet and may be provided by an independent third party.

One of the main goals of the CCC is to create a federated common data model to facilitate data sharing while ensuring data security and privacy across different countries. To achieve this goal, we will define guidelines to ensure that the underlying data model collected from all contributors can be easily amalgamated and harmonized, and a consortium data-sharing policy will be developed. We will also apply different methods, such as differential privacy¹³, that will enable researchers to share and analyze the data while preserving the confidentiality of the participants. To maximize the impact of the data collected, we will provide an app programming interface to allow accredited researchers to access the data to perform additional statistical analyses (Fig. 2).

Our technologies and the code for our app will be open source so that others can adapt them for their particular situations, find bugs and help us improve it. We envision that all members will use surveys with a common set of ‘core’ questions, but additional region-specific questions may be added, in accordance with local regulations, researcher interest and community need. This set of core questions may grow over time as more is learned about COVID-19.

These data will allow researchers to perform several immediately useful analyses. These include the following: monitoring of the health status of the respondent population, using statistics on self-reported symptoms and/or official COVID-19 test results within particular geographic regions at any given time; analysis of the epidemiological factors associated with symptoms and testing results; and identification of the areas likely to have COVID-19 outbreaks on the basis of the co-occurrence of many people’s reporting similar patterns of symptoms at the same time. This will be potentially useful for identifying areas to which additional testing or medical resources should be allocated.

Additional applications include using methods of quantification, distinct from those of classification, to produce accurate estimates of the population prevalence of COVID-19, based on participants who have positive or negative test results, even when people cannot be reliably classified from their symptoms alone¹⁴, and evaluating the effectiveness of the various social-distancing measures taken and their contribution to reducing the number of symptomatic people.

Data coordination by a consortium

We envision that the CCC will be coordinated by a board of directors that will be agreed upon by all consortium members and who will vet new members, maintain a secured centralized data repository or federated multisite data repositories and develop mechanisms to enable researchers from around the world to query the data. The selection of the board of directors will take into account issues of expertise in organizing consortia, data privacy, epidemiology and technology. Individual members will be responsible for ensuring adherence to local regulations.

Conclusions

In summary, we call for participation in an international consortium, the CCC, that will serve as a hub for the integration of COVID-19-related information. This collective effort to track and share information will be invaluable in predicting hotspots of disease outbreak; identifying which factors control the rate of spreading; informing immediate policy decisions; evaluating the effectiveness of measures taken by health organizations on pandemic control; and providing critical insights on the etiology of COVID-19. It will also help people stay informed on this rapidly evolving situation and contribute to other global efforts to slow the spread of disease.

Change history

26 June 2020
A Correction to this paper has been published: https://doi.org/10.1038/s41591-020-0983-4

References

Tariq, A. et al. medRxiv https://doi.org/10.1101/2020.02.21.20026435 (2020).
Smolinski, M. S. et al. Am. J. Public Health 105, 2124–2130 (2015).
Article Google Scholar
Tian, H. et al. Science 368, 638–642 (2020).
Article CAS Google Scholar
Guan, W.-J. et al. N. Engl. J. Med. 382, 1708–1720 (2020).
Article CAS Google Scholar
Wu, Z. & McGoogan, J. M. JAMA 323, 1239–1242 (2020).
Article CAS Google Scholar
Adalja, A. A., Toner, E. & Inglesby, T. V. JAMA 323, 1343–1344 (2020).
Article CAS Google Scholar
Gudbjartsson, D. F. et al. N. Engl. J. Med. https://doi.org/10.1056/NEJMoa2006100 (2020).
Mizumoto, K., Kagaya, K., Zarebski, A. & Chowell, G. Eurosurveillance https://doi.org/10.2807/1560-7917.ES.2020.25.10.2000180 (2020).
Sutton, D., Fuchs, K., D’Alton, M. & Goffman, D. N. Engl. J. Med. https://doi.org/10.1056/NEJMc2009316 (2020).
Article PubMed PubMed Central Google Scholar
Rossman, H. et al. Nat. Med. 26, 634–638 (2020).
Article CAS Google Scholar
Drew, D. A. et al. Science https://doi.org/10.1126/science.abc0473 (2020).
Menni, C. et al. Nat. Med. https://doi.org/10.1038/s41591-020-0916-2 (2020).
Dankar, F. K. & El Emam, K. Trans. Data Priv. 6, 35–67 (2013).
Google Scholar
King, G. & Lu, Y. Stat. Sci. 23, 78–91 (2008).
Article Google Scholar

Download references

Acknowledgements

The CCC is a nonprofit consortium open to anyone who shares the vision of making data available to help the public good and fight COVID-19; as of May 2020 participating countries are Argentina, Canada, Estonia, Germany, Israel, Luxembourg, Macedonia, Slovenia, Sweden, Switzerland, UK and USA. There are no membership fees. Please contact us at info@coronaviruscensuscollective.org if you are interested in joining.

Author information

Authors and Affiliations

Department of Computer Science and Applied Mathematics, and Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
Eran Segal, Smadar Shilo, Ori Cohen, Amir Gavrieli, Ayya Keshet, Tomer Meir & Hagai Rossman
Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
Feng Zhang
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Feng Zhang & Han Altae-Tran
Departments of Biostatistics and Statistics, Harvard T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
Xihong Lin
Institute for Quantitative Social Science, Harvard University, Cambridge, MA, USA
Gary King
Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Ophir Shalem
Pediatric Diabetes Unit, Ruth Rappaport Children’s Hospital, Rambam Healthcare Campus, Haifa, Israel
Smadar Shilo
Harvard Society of Fellows, Harvard University, Cambridge, MA, USA
William E. Allen
ETH Zurich, NEXUS Personalized Health Technologies, Zurich, Switzerland
Faisal Alquaddoomi
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Han Altae-Tran
Center for Molecular Biology (ZMBH), University of Heidelberg, Heidelberg, Germany
Simon Anders & Svetlana Ovchinnikova
Clalit Research Institute, Clalit Health Services, Ramat Gan, Israel
Ran Balicer
Mapping and Geo-Information Engineering, Civil and Environmental Engineering Faculty, The Technion, Haifa, Israel
Tal Bauman
ETH Zurich, Department for Computer Science, Zurich, University Hospital Zurich, Medical Informatics, Zurich and SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Ximena Bonilla, Natalie Davidson & Andre Kahles
Regen Network, Mar del Plata, Argentina
Gisel Booman
Massachusetts General Hospital (MGH), Boston, MA, USA
Andrew T. Chan, David A. Drew & Long H. Nguyen
Chelonia Applied Science, Allschwil, Switzerland
Silvano Coletti
IMRIC Developmental Biology and Cancer Research, School of Medicine, The Hebrew University, Jerusalem, Israel
Yuval Dor
Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
Olivier Elemento, Iman Hajirasouliha, Andrea Sboner & Alexandros Sigaras
Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
Olivier Elemento, Iman Hajirasouliha & Alexandros Sigaras
Institute for Quantitative Social Science, Harvard University, Cambridge, MA, USA
Georgina Evans
Science for Life Laboratory (SciLifeLab), Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
Phil Ewels
http://symptometrics.org, Canada https://symptometrics.org
Joshua Gale
Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
Benjamin Geiger
Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
Yonatan H. Grad
Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Casey S. Greene
Department of Synthetic Biology and Immunology, National Institute of Chemistry, Ljubljana, Slovenia
Roman Jerala
Science for Life Laboratory (SciLifeLab), Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden
Olli Kallioniemi
Macedonian Academy of Sciences and Arts, Skopje, Macedonia
Ljupco Kocarev & Jana Prodanova
Regen Network, Great Barrington, MA, USA
Gregory Landua & Ron Steinherz
Luxembourg Institute of Socio-Economic Research and University of Luxembourg, Esch-sur-Alzette, Luxembourg
Aline Muller
School of Medical Sciences, Örebro University, Örebro, Sweden
Matej Oresic
Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
Matej Oresic
Institute of Computer Science, University of Tartu, Tartu, Estonia
Hedi Peterson & Jaak Vilo
Internal Medicine, Harvard Medical School, Boston, MA, USA
Jay Rajagopal
Department of Pulmonary Medicine and Critical Care, Massachusetts General Hospital (MGH), Boston, MA, USA
Jay Rajagopal
ETH Zurich, Department for Computer Science, Zurich, University Hospital Zurich, Medical Informatics, Zurich and SIB Swiss Institute of Bioinformatics, Zurich and ELLIS Unit, ETH Zurich, Switzerland
Gunnar Rätsch
Science for Life Laboratory (SciLifeLab), Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
Johan Rung
Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
Andrea Sboner
King’s College, London, UK
Tim Spector
Science for Life Laboratory (SciLifeLab), Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
Irene Stevens
Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
Paul Wilmes

Authors

Eran Segal
View author publications
You can also search for this author in PubMed Google Scholar
Feng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xihong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Gary King
View author publications
You can also search for this author in PubMed Google Scholar
Ophir Shalem
View author publications
You can also search for this author in PubMed Google Scholar
Smadar Shilo
View author publications
You can also search for this author in PubMed Google Scholar
William E. Allen
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Alquaddoomi
View author publications
You can also search for this author in PubMed Google Scholar
Han Altae-Tran
View author publications
You can also search for this author in PubMed Google Scholar
Simon Anders
View author publications
You can also search for this author in PubMed Google Scholar
Ran Balicer
View author publications
You can also search for this author in PubMed Google Scholar
Tal Bauman
View author publications
You can also search for this author in PubMed Google Scholar
Ximena Bonilla
View author publications
You can also search for this author in PubMed Google Scholar
Gisel Booman
View author publications
You can also search for this author in PubMed Google Scholar
Andrew T. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Ori Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Silvano Coletti
View author publications
You can also search for this author in PubMed Google Scholar
Natalie Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Dor
View author publications
You can also search for this author in PubMed Google Scholar
David A. Drew
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Elemento
View author publications
You can also search for this author in PubMed Google Scholar
Georgina Evans
View author publications
You can also search for this author in PubMed Google Scholar
Phil Ewels
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Gale
View author publications
You can also search for this author in PubMed Google Scholar
Amir Gavrieli
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Geiger
View author publications
You can also search for this author in PubMed Google Scholar
Yonatan H. Grad
View author publications
You can also search for this author in PubMed Google Scholar
Casey S. Greene
View author publications
You can also search for this author in PubMed Google Scholar
Iman Hajirasouliha
View author publications
You can also search for this author in PubMed Google Scholar
Roman Jerala
View author publications
You can also search for this author in PubMed Google Scholar
Andre Kahles
View author publications
You can also search for this author in PubMed Google Scholar
Olli Kallioniemi
View author publications
You can also search for this author in PubMed Google Scholar
Ayya Keshet
View author publications
You can also search for this author in PubMed Google Scholar
Ljupco Kocarev
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Landua
View author publications
You can also search for this author in PubMed Google Scholar
Tomer Meir
View author publications
You can also search for this author in PubMed Google Scholar
Aline Muller
View author publications
You can also search for this author in PubMed Google Scholar
Long H. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Matej Oresic
View author publications
You can also search for this author in PubMed Google Scholar
Svetlana Ovchinnikova
View author publications
You can also search for this author in PubMed Google Scholar
Hedi Peterson
View author publications
You can also search for this author in PubMed Google Scholar
Jana Prodanova
View author publications
You can also search for this author in PubMed Google Scholar
Jay Rajagopal
View author publications
You can also search for this author in PubMed Google Scholar
Gunnar Rätsch
View author publications
You can also search for this author in PubMed Google Scholar
Hagai Rossman
View author publications
You can also search for this author in PubMed Google Scholar
Johan Rung
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Sboner
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Sigaras
View author publications
You can also search for this author in PubMed Google Scholar
Tim Spector
View author publications
You can also search for this author in PubMed Google Scholar
Ron Steinherz
View author publications
You can also search for this author in PubMed Google Scholar
Irene Stevens
View author publications
You can also search for this author in PubMed Google Scholar
Jaak Vilo
View author publications
You can also search for this author in PubMed Google Scholar
Paul Wilmes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eran Segal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Segal, E., Zhang, F., Lin, X. et al. Building an international consortium for tracking coronavirus health status. Nat Med 26, 1161–1165 (2020). https://doi.org/10.1038/s41591-020-0929-x

Download citation

Published: 02 June 2020
Issue Date: 01 August 2020
DOI: https://doi.org/10.1038/s41591-020-0929-x

This article is cited by

COVID-19 and its sequelae: a platform for optimal patient care, discovery and training
- Richard C. Becker
Journal of Thrombosis and Thrombolysis (2021)
Population-scale longitudinal mapping of COVID-19 symptoms, behaviour and testing
- William E. Allen
- Han Altae-Tran
- Xihong Lin
Nature Human Behaviour (2020)