Credit: Karina de Souza Costa

After my PhD defense, one of my thesis reviewers offered me the opportunity to join a project in epidemiology, asking: “Would you like to contribute to the eradication of infectious diseases through mathematical modeling?”

At that time, I was deeply immersed in the field of bifurcation theory, where my research focused on finding solutions to physical systems governed by partial differential equations, particularly those that generate intricate symmetrical patterns. Transitioning to epidemiological modeling was a risk. The prospect of playing a role in eradicating infectious diseases felt both daunting and exhilarating. However, the undeniable potential impact of developing mathematical tools for this purpose motivated me. Moreover, as a Black woman from an upper-middle-income country, navigating the male-dominated field of mathematics while balancing motherhood, I understood the importance of this opportunity. I accepted the invitation.

Mathematical models offer a versatile toolkit for understanding the dynamics of pathogen emergence and transmission within populations. Based on epidemiological and clinical assumptions, these models provide invaluable insights into the quantitative and qualitative behavior of disease spread. They play a crucial role in advancing understanding and identifying targeted interventions to alleviate the disease burden on populations. Undoubtedly, their utility is widely recognized in the health sector, empowering public health authorities and policymakers to make informed decisions and ultimately bolstering efforts to safeguard public health and well-being.

Despite this, mathematical models can yield unsatisfactory results when they fail to consider population heterogeneity. This may include overlooking differences in disease acquisition risks among individuals, inadequate model parameterization that does not account for local characteristics and uncertainties, and lack of sufficient local data inputs. An ongoing challenge is how to use a model to generate solutions that are tailored to local environmental and socioeconomic health conditions — which is required to deliver precision health solutions.

To overcome this barrier, new methodologies were needed, including innovative data analysis techniques and the inclusion in mathematical models of metrics that account for both observed and unobserved heterogeneity among individuals. Mathematical modelers believe that these approaches can enhance accuracy and predictive capabilities.

Developing these new models required me to face my second career turning point, when I joined the Centre for Data and Knowledge Integration for Health (CIDACS) research team. Located in Salvador, Bahia, Brazil, CIDACS happens to be just a 20-minute walk from Bairro da Paz, a ‘favela’ (slum) where I was raised. At CIDACS, I gained valuable insight into the development of the 100 Million Brazilians Cohort — a groundbreaking dataset derived from administrative records. This dataset, when integrated with clinical patient data, has the potential to reveal factors influencing individual infection frequencies and offers the opportunity to explore various conditions (such as socioeconomic status, climate and nutrition) that may shape an individual’s risk factors for specific diseases.

The 100 Million Brazilians Cohort was established to evaluate the effects of governmental policies, especially social protection policies, on health outcomes. This cohort comprises individuals eligible for governmental benefits registered in the Unified Registry for Social Programs (CadUnico) database. With over 131 million people registered as of 2018, the cohort encompasses approximately half of Brazil’s population, making it one of the largest population cohorts globally. Several administrative databases are linked to this baseline to facilitate studies, providing comprehensive individual information on health, education, housing programs, bioclimatic conditions and other aspects of the Brazilian population.

Numerous studies have used data from the entire cohort or sub-cohorts within it. Examples include research on the impact of Zika virus circulation in Brazil, which provided insights into improving diagnostics for newborns affected by congenital Zika syndrome; the effects of governmental policies on the detection and control of tuberculosis and leprosy; and the impact of suicide on vulnerable populations. More recently, studies have explored possible relationships between extreme climate events and health outcomes in this population.

I still face challenges in fully harnessing the potential of the cohort at hand. First, I aim to properly characterize emergent patterns from individual behaviors and interactions, which is crucial for obtaining insightful metrics and parameter values, needed to accurately reconstruct the dynamics of individuals in this population. Second, financial support is essential for advancing this field, a need I would like to highlight through conceptual findings published within the scientific community.

Nevertheless, the potential for reusing cohort data offers many research opportunities across diverse fields. Using this wealth of data allows researchers to explore complex relationships and uncover previously unseen patterns within the population, shedding light on how social determinants affect vulnerable populations. This data provides a unique opportunity to advance precision health initiatives through the development of sophisticated mathematical models, which was my primary aim upon joining CIDACS. By incorporating a comprehensive understanding of population heterogeneity, these models can provide more accurate predictions and reliable targeted interventions, ultimately leading to disease prevention and elimination.