The hottest news in psychiatric epidemiology is the availability of large, well-characterized, diverse samples with clinical and biological assessments that have recently become available from repositories to qualified scientists. Big Data has come to psychiatry [1]. For years, we have studied families at high and low risk for depression in samples of several hundred offspring and grandchildren [2]. They include carefully documented clinical assessments and later MRI, EEG, and DNA. The studies clearly showed the early onset of symptoms, often before puberty, challenging the notion that major depression was a disorder of menopausal women, or the question, when the study began, whether children had sufficient ego development to become depressed. The follow up over the years showed the enduring nature of depression in high-risk offspring and grandchildren, particularly grandchildren with two previous generations affected.

As time went on, new imaging tools became available, and potential mechanisms of transmission were identified. For example, we found that at-risk offspring had thinner cortices, particularly over bilateral frontal, and left parietal lobes. This thinning was stable across two scans 7.8 years apart and was observed even among offspring who had no lifetime symptoms, suggesting that it represented a trait marker rather than a consequence of depression [3]. We subsequently showed that the high-risk offspring also had impaired hippocampal microstructure and hippocampal-prefrontal connectivity; these predicted future, but not current, depressive symptoms [4]. Finally, we showed that concordance between parent and child cortico-subcortical connectivity was disrupted if the parent had depression, but this could be reversed in the presence of high parental nurturing [5]. Together these findings speak to the ability of high-risk family designs to identify disease biomarkers that are not simply correlates of illness.

There were limitations to these studies. The sample, considered large in 1982 when the study began, became insufficiently powered to leverage new tools for analyzing large-scale brain connectivity. The sample was also homogenous, as was the norm when it began. The arrival of Big Data resources like the UK biobank and the Adolescent Brain and Cognitive Development (ABCD) study have changed this landscape. Big Data also are not without flaws: the diagnostic assessments or family history are often not obtained through direct clinical interviews, but there are ways around that. For example, we were able to replicate in the ABCD study clinical findings derived from our smaller studies on the effects on grandchildren of two generations of previously affected generations [6]. Through ABCD, we could also show that findings held across racial/ethnic and socioeconomic groups. This opens the way for testing familial brain findings from boutique studies such as ours in these large, diverse samples.

While some exploration is inevitable, a danger with large, well-powered studies is that fishing without a hypothesis may lead to highly statistically significant yet nonconsequential findings. However, testing hypotheses generated and replicated through well-designed, albeit smaller studies, may be a powerful direction. Nothing is perfect in the Big or Little Data studies, but their marriage provides great opportunities for a productive next generation.

Funding and disclosure

This work was funded by NIMH R01MH036197 (“Children at high and low risk for depression”, Weissman, Posner, P.I.s). MMW in the last 3 years has received research funding from NIMH, Brain and Behavior Foundation, Templeton Foundation and has received book royalties from Perseus Press, Oxford Press, and APA Publishing and receives royalties on the social adjustment scale from Multihealth Systems. None of these present a conflict of interest. AT does not have any conflicts to disclose.