The emergence of coronavirus disease 2019 (COVID-19) prompted a surge in research activity. Funding bodies swiftly allocated resources to establish research infrastructures and partnerships to study the novel virus. The scientific community realigned existing research and launched new studies to define the clinical course of COVID-19 and identify therapeutic candidates. Overall, comparatively fewer studies were initiated in children relative to adults,1 in part due to the lower prevalence and disease morbidity recorded in pediatric populations. However, characterizing the disease in pediatric patients is critical to elucidate transmission dynamics, inform public health measures, and generate evidence on best practices for clinical care and therapeutic interventions. The life-threatening multisystem inflammatory syndrome further underscores the need for natural history studies and drug development in pediatric populations.2

Numerous observational studies have been established to collect information on pediatric patients with laboratory-confirmed COVID-19, including children in inpatient and outpatient settings, and those with specific underlying conditions (Table 1). These efforts have been driven largely by single institutions and research organizations. As would be expected during a pandemic response, there was initially limited opportunity to establish extensive regional or international collaborations, and there has been little coordination across existing initiatives. However, as we approach a steady state in morbidity and gain a better view of the long-term trajectory of the pandemic, it is important that we re-assess our approaches and take a deliberate and strategic approach to further resource investments. If the current fragmented approach is maintained, many studies will not yield the sample sizes needed to support robust analyses—especially as infection rates decline in certain areas—and resources will be wasted on inefficient enrollment and duplicative efforts. For example, as illustrated in Table 1, we identified at least three unique non-interventional studies focused on pediatric patients requiring critical care in hospitals in North America.

Table 1 Sample of observational pediatric-specific registries and studies on COVID-19.

Building on what has been accomplished to date, pediatric clinical data collected in studies could be aggregated to develop a larger, representative dataset from single countries as well as global patient populations. Sharing pediatric COVID-19 data and resources across existing studies would strengthen our research capacity during the current pandemic, including long-term study of natural history and identification of rare complications. This resource would not replace any individual studies but provide a collaborative entity to complement local efforts. As an illustration of this approach, we have adopted this method with the launch of the Repository of Aggregated Pediatric International Data on COVID-19 (RAPID-19).3 This infrastructure leverages contemporary informatics methodologies that efficiently pool data, as has been previously accomplished with routine electronic health record and patient cohort data.4 RAPID-19 aggregates de-identified clinical data on pediatric patients with laboratory-confirmed severe acute respiratory syndrome coronavirus 2 infection, curated across existing research activities. All data contributors are able to access the integrated dataset, which is maintained on a secure HIPAA and HITRUST compliant, cloud-based, Apache 2.0 open source platform (https://github.com/hms-dbmi/pic-sure-documentation). The infrastructure includes embedded analytic tools and export functionality, which are managed by an API-enabled user interface with access controls for individual investigations. A steering group with representation from contributing sites oversees access and export of patient-level data. To ensure efficient and equitable use of the dataset, a number of mechanisms are in place to provide full transparency on site participation, research activities, data use, and opportunities for collaboration and authorship.

One fundamental challenge inherent to collaborative networks is the need for data standardization across sites. This is a limitation we have encountered, as certain studies participating in the Repository collect different variables and use distinct definitions for data elements. This limits options for combining datasets or variables across studies. While not all variables will be amenable to harmonization, a core set of data elements can be standardized to support queries and analyses across healthcare sites and study networks. We aim to also promote standardization of new data collection and have made study documents and tools available at www.RAPID-19.org to support COVID-19 research by other investigator teams and to facilitate collaborative work.

Our experiences building the RAPID-19 Repository highlight key principles the scientific community should consider in the study of any pediatric condition or population.

First, as a research community, we must build consensus on the importance of sharing pediatric data and apply multi-institutional approaches as the default design. Research infrastructures and networks must implement contemporary informatics methodologies that support systematic computational approaches across research entities. At a minimum, all data should be FAIR: Findable, Accessible, Interoperable, and Reusable.5 Whenever possible, new studies should deploy standard definitions that have been established for many pediatric specialties, such as the Pediatric Terminology sets developed by the National Institute of Child Health and Human Development.6 Further, harmonized approaches to structuring datasets are also essential to data sharing and reuse, including common formats and representations (e.g., terminologies and coding schemes). Pediatric observational studies need to use existing data models, such as the Observational Medical Outcomes Partnership, the i2b2 framework, or the PCORNet Common Data Model to support transformation to a shared format amenable to reproducible analyses.

Second, pediatric studies must be developed with data sharing and combined analyses in mind, even if a prospective network is not possible or all relevant collaborations are not yet established at the outset of a study. Specifically, collaborative work must be considered in the study organization, design, and documentation to enable future partnerships. This includes using informed consent documents that specify sharing of de-identified patient information with external collaborators and allow secondary data use beyond primary research questions.7 Data use agreements for registries and other multi-institutional studies should be written to allow for additional sharing and research analyses outside of the original study, building in measures for embargo periods, country-specific laws for protection of personal and medical information, and appropriate authorship attributions. Study governance must also allow for flexibility to enable onboarding of new co-investigators and expansion of study infrastructure and oversight.

Third, participation in collaborative studies should be incentivized. This can be achieved, for example, by providing access to integrated datasets for approved investigators and by offering authorship opportunities, such as group authorship by default for all data contributors. Study committees and working groups should include representation across sites and networks, including steering committees governing policies and procedures for research activities. More broadly, academic institutions need to recognize data sharing and input to patient-level datasets as an important contribution to the research enterprise and endorse these, for example, in promotion and tenure criteria.8 Finally, resources for data sharing need to be built into grants and institutional research infrastructure to make them feasible and sustainable.

The COVID-19 pandemic has highlighted the fragmentation of pediatric research. Ideally, research involves prospective formation of multi-institutional networks, but absent such infrastructure, it is essential that pediatric data be collected with the potential for collaboration and reuse. The principles suggested in this Commentary are being applied already in other scientific domains, placing pediatric research at risk of being left behind. Insights gained from the pandemic should serve as a stimulus to advance pediatric research programs involving other conditions as well and pave the way for routine use of coordinated approaches and application of sophisticated methodologies for efficient and scalable data integration.