Mister President, members, and guests.

It is a great privilege for me to be asked to give this year's Honorary Lectureship at the Opening Ceremony of the 39th Annual Conference of our Society. I would like to thank the Council and particularly the President, Professor Harry Halliday, for this honor.

As I was informed, the Council has planned to establish a traditional lecture at our annual Conference in which one senior speaker is invited to relate his personal experience in the hope it will allow younger members to understand where our Society is today and help define priorities for the future.

As most of you know, I have followed and participated in the development of neonatal medicine for more than 35 years. During this time, I witnessed an explosion of new knowledge in physiology and pathology that has been exploited in a profusion of powerful interventions in neonatal medicine. The new technology has been adjusted to the needs of the smallest patients, and this has eased the work of doctors and nurses considerably.

When I started my career in the neonatal ward in the middle 1960s, the methods of intensive care were not yet applied to babies. Artificial ventilation was not used, blood gases were not available, and measurements of electrolytes and blood glucose were the exception. Neonatal care was dominated by experienced nurses who often “protected” the babies against innovative doctors.

In “looking back” today, I ask myself this question: “Amid all the innovations I witnessed during these last 35 years, which turned out to be the most important for the growth of my professional life?”

I could cite a number of breakthroughs in neonatal care that increased my enthusiasm as a clinician and contributed to the spectacular achievements we have seen in the last few years: for example, the ability to prevent and treat the respiratory distress syndrome, the availability of machines to sustain respiration, the development of new drugs to fight infections, the capacity to monitor blood gases continuously, and the techniques for assessing perfusion, oxygenation, metabolism, and morphology of the brain. All these developments and others provided very rewarding experiences in the course of my professional life. Rewarding and even enjoyable, yes, but not the most important!

The crucial experience (and not a particularly enjoyable one) was my confrontation in the late 1960s with a new species of doctor who seemed to take pleasure in asking constantly for proof every time I made a suggestion for a therapeutic intervention.

“How do you know it works?”

When I answered that I had found the proposed treatment in a classical handbook written by an authority in neonatal medicine, the sceptical doctor was not at all impressed. He kept asking: “How does the expert know? What method did he or she use as proof for the recommendation?”

“What method?” That disturbing question made me increasingly insecure about everything I learned in the past. This painful experience happened in l967, at Babies Hospital, Columbia Medical Center in New York, where I had been appointed for a 2-year fellowship in neonatology in the famous unit of Professor William Silverman. The first contact with this mecca of pediatrics was particularly disturbing for a freshly disembarked Swiss pediatrician. In the entrance of the intensive care unit, I found this bizarre inscription: “All who drink of this remedy recover in a short time Except those whom it does not help, who all die. It is obvious, therefore, that it fails only in incurable cases” (Galen, 2nd century).

The following recommendation was addressed to the attending neonatologists making rounds: “Teach thy tongue to say, I do not know, and thou shalt progress” (Maimonides, l2th century).

I had chosen the Babies Hospital because I considered Professor Silverman an authority in neonatal medicine and I hoped he would teach me “the truth.” But instead of that, he continually questioned what I had learned before as established truths, and even what he had written himself. For example, in his classical book “Premature Infants,” which I read with great care before coming to New York, he, the authority, often said: “It is dangerous to trust authority in medicine! Look for the proof!” Quite disturbing advice for a young novice looking for a guru!

When I was a medical student, the facts in clinical medicine seemed to be very well established because they were based, we were taught, on clear physiologic laws, morphologic states and biochemical pathways, leading to a model description for each disease. A firm understanding of the cause of the disturbance would therefore lead to a rational choice of a therapeutic intervention. An observation showing a positive correlation between treatment and outcome was considered to be a sufficient test. In those days, my primary interest was to learn the opinion of experts. When the authority spoke, my uncertainty disappeared: I knew what to do! During my training I never heard anything about methodology in clinical research. The medical books we consulted described the pathogenesis and the semiology of the diseases in great detail. The paragraph on treatment was usually very short. Names of drugs and doses were indicated, without quantitative information about the size of positive and negative effects.

My teachers in internal medicine and pediatrics were remarkable clinicians, impressive in the analysis of medical history and precise in clinical examination. They had vast knowledge of the medical literature. They were excellent teachers, hard workers and dedicated doctors, but they never spoke about a “methods” approach to “knowing.” They usually expressed great confidence about what they knew, and we were told to do what they recommended, and we did it. What they knew was important to us, not how they knew!

In New York, I was exposed to very different types of teachers, to a new world, to a world of sceptics, constantly questioning and using a new-to-me vocabulary such as biases, cohort, sample estimation, power, peeking, randomized allotment, and other equally strange concepts. It was hard to understand this new jargon and to learn the new discipline.

Fortunately, this teaching of epidemiology was not made in abstract lectures, but at the cotside using clinical examples during daily rounds. Professor Silverman used examples from the past, particularly the errors made by his predecessors and by himself. “We learn more from mistakes than from success!” he said.

Among the various errors about therapeutic interventions he witnessed during his professional life, the history of the use of supplemental oxygen for premature infants was the one experience he referred to most frequently. He pointed out that the history of oxygen treatment summarizes the hierarchy of research designs for evaluating treatments.

The progression goes from noncontrolled individual observation or case series, to nonrandomized studies using historical controls or concurrent controls, and finally to randomized-control trials at the top (most rigorous) of the hierarchy. He used the oxygen story to illustrate how our predecessors became progressively aware of the relative weakness of inductive reasoning based on collection of observations, compared with the strength of evidence based on experimentation. Let's first summarize the story, briefly, before illustrating the point he was making.

The first epoch of the use of oxygen in newborns began at the end of the 19th century. It was characterized by the faith that oxygen can only be beneficial. This optimistic era ended abruptly half a century later (1954) with the formal demonstration in a multicenter randomized trial that premature infants had a lower incidence of retinopathy when they were cared for in an environment in which the oxygen concentration was lower than the one recommended by prior authorities. The second epoch, which lasted until the 1960s, was characterized by the untested conviction that retinopathy could be eliminated if the inspired oxygen concentration did not exceed 40%, and that this restriction had no detrimental effects on the babies. This short-lived epoch ended with the observation that the “under 40%-is-safe-policy” was associated with an increase in early neonatal mortality and an increase in neurologic morbidity among survivors. The most recent epoch began in the mid-1960s when it became practical to measure the PCO2 in small samples of arterial blood. This epoch (our own) is characterized by the postulate (again not formally tested) that the opposing risks of retinopathy, neurologic damage and death can be minimized if the target of measurement is arterial oxygen pressure or saturation, rather than the ambient oxygen.

I will limit myself to the first epoch of the story. I use it to illustrate how our predecessors walked through each step of quality control in clinical research before they became aware of the weakness of inductive reasoning based on a collection of observations. They were forced then to accept the concept of experimentation using patients (randomized control trial). The optimism of the first epoch (oxygen is “beneficial”) was justified by isolated uncontrolled observations (case series) of short time improvement in the clinical status of cyanotic or asphyxiated babies exposed to supplemental oxygen. The observed benefit was judged by pediatric authorities as sufficient evidence to justify the continuous administration of oxygen in concentrations of more than 50% for periods as long as 6 weeks.

Next step: nonrandomized controls.

Hess in Chicago (l934) deserves the credit for performing the first retrospective cohort study using an historical control in an attempt to estimate the size of the effect of oxygen treatment on survival rate (1). Comparing the years before and after the use of oxygen, he observed an increase in survival rate that he attributed to the new treatment (79%versus 52%). We have learned, since Hess's day, that a retrospective cohort study without concurrent controls is a weak design for establishing a trustworthy relationship between intervention and outcome. Our uncertainty is based on the fact that this design fails to control for confounding variables; that is to say, variables that are directly or indirectly associated with both the predictor and outcome of interest.

Despite this reservation, we must recognize that this early study did have some merit. For the first time in the history of oxygen treatment, an attempt was made to measure the size of the effect of a treatment on an exposed group of prematures compared with others who were not exposed. The results are expressed in the form of a proportion of survival (not as a simple dichotomy of “yes” or “no”). The outcome of interest is important and the study includes a relatively well-defined population of at-risk patients. This step is an improvement in the quality of the evidence compared with the case series strategy discussed earlier.

Confidence in the benefits of the oxygen treatment was reinforced by an observation made in 1942 by Wilson et al.: “Periodic breathing in small prematures can be converted to a regular rhythm in a high oxygen environment” (2). Instead of using this observation as an additional argument for oxygen treatment, the authors demonstrated a remarkably cautious attitude when they wrote:

“We have no proof that the regular type of respiration, which we are accustomed to consider “normal” is “better” for a premature infant than the observed periodic breathing. Likewise, we have no convincing evidence that an increased oxygen content of arterial blood is beneficial or necessarily of importance.”

This remarkable disclaimer represents the first warning of the tragedy that was to follow. It is particularly impressive because it was made in the same year that the first cases of retrolental fibroplasia (RLF), (named today retinopathy of prematurity: ROP) were published (3). The frequency of occurrence of the disease rose sharply throughout the world, and most dramatically in the “developed” countries. By mid-century the disease was found to be the main cause of blindness in infants. It was estimated that by the year 1954 approximately 10,000 children were blinded by the disorder. In the search for an explanation for the new affliction, over 50 separate causes mostly based on retrospective observations were proposed between 1942 and 1952.

Question of Professor Silverman to the Swiss fellow at morning rounds: “What is the strength of a retrospective correlation between a putative factor and an outcome of interest to demonstrate a causal relationship between the two events?”

Figure 1 illustrates an example of such a retrospective association (4). The increased frequency in RLF after 1942 was associated with the administration of iron, the administration of a water-soluble vitamin preparation, and with the more frequent use of oxygen. These associations were found in a retrospective analysis of 361 children born prematurely, 53 of them affected with severe retinopathy.

Figure 1
figure 1

Correlation between the incidence of retrolental fibroplasia in premature infants and the administration of oxygen, ferrous sulfates, and drops of water soluble vitamins (4). Example of positive correlations based on noncontrolled passive observations.

To evaluate the strength of a correlation, we need to examine the methodology used in the study, and ask first whether the observation was “passive” or “active.” This important and clear distinction had been proposed at the end of the last century by the great physiologist, Claude Bernard in his famous monograph published in 1865, “Une introduction à l'étude de la médecine expérimentale” (5).

“A spontaneous or passive observation (is one) which the physician makes by chance and without being led to it by any preconceived idea… an active observation (is) made with a preconceived idea, i.e., with the intention to verify the accuracy of a mental conception.” This distinction is important because the passive association between unpredicted events and outcome is not as strong as an active association. This latter relationship is based on a preconceived hypothesis. In our example, the observation was passive because the authors didn't have a specific hypothesis in mind. They were looking at the trend of 47 factors related to mother and infant, associated with an outcome over an interval of time before and after the rise in frequency of the disease. It should be obvious that this type of question-seeking observation is weaker than it would be if the authors, before starting their survey, had formulated the hypothesis that oxygen-treatment may be the cause of the retinopathy. In other words, when we sift through a long list of possible predictors, the laws of probability guarantee that we will eventually find one that correlates with the outcome. If we set the significance level of the association at 5%, we can expect, in repeated searches, to find one statistically significant association for every 20 predictors examined. It's been said that “if the data are tortured sufficiently, they will confess to something!”

Following this nonrandomized passive observation with historical controls, four active observations using nonrandomized historical or concurrent controls were published between 1949 and l954 (Table 1). Three studies showed a direct association between restriction of oxygen use and reduction of incidence of ROP. However, one (Lelong et al.) showed the opposite effect. These observations are “active,” this means that before starting the study, there was a formal hypothesis: “oxygen could be the cause of ROP!” This narrows the probability of occurrence of an association purely by chance, since the observer was not searching through a body of data looking for any and all associated events. The association found by this restricted observation is more credible than the correlation noted in the earlier passive observational study. The quality of the methodology was improved, but there were still conflicting data (the observation of Lelong et al. and others not discussed here).

Table 1 The effect of unrestricted oxygen therapy on the incidence of retinopathy of prematurity The results indicate the percentage of infants developing ROP after exposure to restricted or unrestricted oxygen therapy. The figures in parentheses represent the number of cases.

One of the reasons for this kind of discrepancy is that observational studies are prone to serious and uncontrolled confounding variables, and these factors can influence the results in unrecognized ways. Strictly speaking one can never prove causality based on association, no matter how perfect such an association appears to be. Figure 2 was published in the highly respected journal:Nature (10). It demonstrated a clear association over time (1965 to 1980) between two events observed in Germany, suggesting that they are causally related. The upper line represents the annual birth rate; the lower one, the number of storks nesting in the same region.

Figure 2
figure 2

Example of a positive correlation based on noncontrolled observation: positive association over time (1965 to 1980) between two events observed in Germany, suggesting that they are causally related: the upper line represents the annual birth rate, the lower one, the number of storks nesting in the same region (10).

Before going on, it would be worth pausing for a moment to consider the frustrations and the dilemmas that our colleagues of the 1950s faced. More then 10 years after the description of a new dramatic eye disease in the premature infant leading sometimes to blindness, the etiology of the disease was still unknown. Careful clinical studies based on observation led to the suspicion that the causal agent was one of the most popular treatments of this time: oxygen. But the data were conflicting: some studies even suggested a protective effect of oxygen on the retina (8). Moreover extensive studies of high oxygen exposure in several species of newborn animals were able to reproduce only the early vascular stages of the human disease, but not the final, the cicatricial lesion. The experimental animals did not become blind! Thus, doubt persisted concerning the role of oxygen as the prime cause of the blinding complication in babies.

Finally the dilemma was complicated by the conviction of most physicians that oxygen was protective to the brain and favorable for survival. It is easy to imagine how passionate the debates were when by the beginning of 1953, some clinicians proposed that the only way to advance was to conduct a formal experiment in which babies chosen by random allotment would be exposed to one of two treatments: high or low concentration of supplemental oxygen. To move from clinical observations to this kind of “experiment” (reminiscent of laboratory methods and using babies as research subjects) was a difficult new step for most physicians who were used to accepting conclusions based on observations with nonrandomized controls as clinical “truth.” Moreover, some felt that experimental studies involving babies should be condemned because the method was basically immoral.

The idea of using experimental methodology in clinical research had already been proposed by Claude Bernard (5) when he wrote: “Medicine is by nature an experimental science but must apply the experimental method systematically.” But as he noted: “many physicians attack experimentation believing that medicine should be a science of observation.” He added “physicians make therapeutic experiments daily on their patients so this inconsistency cannot stand careful thought.… Ideas born from observations must be controlled by experiments.” Despite this clear statement, physicians had long been reluctant to submit their treatments to experimental tests.

The writings of A. Bradford Hill in England and the publication in 1937 of his book “Principles of Medical Statistics” (11) played a major role in slowly overcoming the aversion of physicians to the use of experimental methods in clinical research.

The first courageous doctors to apply the new method in the hope of settling the oxygen question were two young physicians, Arnall Patz, an ophthalmologist, and Leroy Hoeck, a pediatrician. They conducted the first experimental trial in 1952 in a Washington D.C. hospital (12). The results (Table 2) seemed to indict high oxygen. But there were doubts about the quality of the trial: infants were assigned in alternate not in random order (the effect of potential biases simply cannot be estimated); and compliance with the prescribed treatment was uncertain (some nurses were so convinced that oxygen was life saving that some of them turned the oxygen on at night for babies who were not receiving the gas, and then stopped the flow in the morning before the arrival of the doctors). The risks of death and brain damage possibly associated with oxygen restriction were not evaluated. At this point, courageous voices (among them Richard Day and William Silverman) felt that the only way to clear the confusion was to use the newly devised investigative plan proposed and tested in England by Bradford Hill to evaluate the effect of streptomycin on pulmonary tuberculosis, the first rigorous randomized clinical trial in the history of medicine (13).

Table 2 The effect of high (65–70%) vs low (under 40%) ambient oxygen therapy on the percentage of infants developing retinopathy The first single center randomized controlled trial on oxygen treatment (12).

I remember the vivid descriptions of Professor Silverman on rounds about the debate concerning the protocol of the RLF study between those responsible for the care of premature infants and the ophthalmologists. Although most felt that a controlled trial must be undertaken immediately, in which random allotment would create two groups of prematures exposed either to high or low oxygen concentration, others argued that the proposed approach was immoral.

Some of them felt that excess oxygen had already been demonstrated to be the cause of RLF, others contended that high oxygen treatment was absolutely necessary for survival and prevention of brain damage. It is easy to understand that the planners of the study protocol had a difficult time finding a compromise. They needed a design that would allow them to determine whether the administration of substantially less oxygen might increase mortality and brain damage, or whether continued use of the routine amounts might increase the frequency of the retinopathies. This seemed to be an insoluble dilemma. But the planners found a strategy that satisfied those in charge of 18 hospitals in the United States who agreed to participate.

This experience should be a lesson for us today! The highly complex problems we have to deal with today should not preclude the use of the experimental method in clinical research. The method has a number of flexible formats that allow use of the approach in apparently insoluble dilemmas. The major advantage of the “experiment” over “observation” is that it acts to neutralize the effects of unidentified factors on outcome. Since these potentially confounding influences are distributed by chance in each of the compared groups, we can use the law of probability to make numerical judgements.

In the historic multicenter cooperative study of 1953–54 (14), a complex strategy was used to minimize the number of infants exposed to each of the two blindness and death risks. In an initial phase (part I) a formal randomized controlled trial was performed, in the remaining period (part II), a nonrandomized observation using historical controls. In the initial phase, the focus was on the question: “Is oxygen restriction associated with an increase in mortality?”

During a period of 3 months, two thirds of the infants were managed according to a “curtailed oxygen policy” (less than 50%) and one third according to the accepted policy of continuous high concentrations of the gas. The results seemed clear: Oxygen restriction was associated with a decrease in the incidence of RLF without an appreciable decrease in survival rate. For the remaining 9 months of the trial (part II), all enrolled babies were managed under the new curtailed-oxygen regimen (less than 50%). These observational results confirmed those in the experimental phase of the study. The trial (Table 3) demonstrated that the relative risk of cicatricial lesions could be reduced about two thirds by using the relatively low gas concentration without changes in survival rate. But, it is important to emphasize that this policy did not totally eliminate the risk of blindness and that the effect of oxygen restriction on neurologic outcome was not measured, despite the original intention of following survivors for 5 years.

Table 3 The effect of high (more than 50%) vs low (under 50%) ambient oxygen on retinopathy and mortality Part I: Multicenter randomized controlled trial on mortality (53 infants in each group). Part II: Prospective study in which all infants were treated with low dose ambient oxygen. The results of this group of 480 additional infants was added to the group receiving low dose therapy in Part I of the study.

It is not my intention to discuss here the whole story of oxygen treatment, which Professor Silverman has described in detail in his monograph: “Retrolental Fibroplasia: A Modern Parable” (15), which I highly recommend that the members of our Society read.

My goal was only to use the first part of the story to illustrate the steps in methodology our predecessors had to climb to learn how to approach evidence and to estimate better the respective size of the beneficial and the undesirable effects of their interventions. The relative importance of each of these steps still remains valid.

Clinical research starts with individual observations collected by the physician. This step is essential to formulate an hypothesis to be tested. In most clinical situations, effects of a treatment for an individual patient are uncertain and therefore must be expressed in terms of probability. The numerical chance of success or failure of a treatment for an individual patient is usually estimated by referring to past experience with groups of similar patients. Because these kinds of observations are made under different conditions by clinicians with different skills and prejudices, the observations of past experience may be influenced by a variety of systematic errors that can distort the true nature of the observation and, thereby, mislead us. To deal with these confounding influences, clinical observations should be made under the formal condition of an experiment, the randomized control trial.

Learning the relative value of these steps and understanding their relative importance when choosing a treatment for my patients turned out to be the most important event in my professional life. As said before, this experience took place in Babies Hospital in New York at the end of the 1960s long before the term, Evidence-Based Medicine, was created. I did know at this time that the sceptics who gave me a hard time were contributing to a revolution in setting the foundation of a new discipline, clinical epidemiology, which opened entirely new dimensions in my activity as a neonatologist.

To illustrate, for example, the difference in the way I managed medical information before and after my own “scepticemia,” let us imagine a situation in 1965 and in 1980. In 1965, I read in an abstract that a newly discovered substance, “Miraculix,” gives better results than the classic, “Spectaculix,” to prevent cerebral hemorrhages in the premature. To study the original paper, in 1965, I would usually take the following strategy. First, I looked at the institution where the study was done to see if one of the experts in the field of cerebral hemorrhage was among the authors. I then looked carefully at the results: number of patients involved, difference between the two groups, and particularly if the significance (p less than 0.05) was reached. I read the discussion and particularly looked for recommendations by experts or committees. We then review the literature in the intensive care unit and make a decision.

Let us now suppose that I read the same abstract in 1980. Consulting the original paper, I must confess that I would probably look first at the name of the institution and the list of authors. If I wanted to be the perfect sceptic, I should not do that, because I have learned that it is dangerous to trust authorities. I would then strictly follow the advice of the sceptics and read first the most important part of the paper (paradoxically written in small print) the method. I don't think I would read the entire paper if the study were not a prospective randomized control trial, because I have learned not to trust conclusions on treatment effect based on observations alone. If the study were a randomized control trial now I particularly pay attention to the following questions:

  • Was the question formulated before the study began?

  • Was the primary outcome clearly defined?

  • What was the smallest difference between treatments that the authors considered before the study to be important enough to find?

  • What was the power of the study?

  • What was the probability that the effect observed is due to chance alone?

  • What was the size of the difference with the confidence interval (and not with the p value)?

  • Was the difference big enough to have practical implications for my clinical decision-making?

This list does not reach the standard for a master in epidemiology, but it does help the busy clinician make a quick critical appraisal. If you want to be more sophisticated, consult the Consort Statement on randomized control trials published in British Medical Journal in 1996 (16). Various measures of treatment effects used in epidemiology will help to evaluate the clinical importance of the results. The risk ratio, or relative risk, provides an answer to the question: “What is the proportion of treated patients, relative to control patients who benefit from the treatment?” The risk difference provides an answer to a second question: “What is the absolute difference in event rate between the treated and the control group?” The corollary of this is particularly important for the clinician because it provides the answer to the practical question: “How many patients does one need to treat to benefit one patient?” Among the various important concepts I have learned from the sceptics, this last, the number needed to treat, I consider to be one of the nicest presents the epidemiologists have given to clinicians. The concept opened my eyes to the social and ethical dimensions of my individual interventions. “How many patients will I expose to a treatment (with possible side effects and costs) to prevent one single pathologic event?” In other words: “How many patients will receive no advantage (or even suffer harm as a result of my intervention) to allow one patient to benefit from it?” During my medical training, I was barely aware of the importance of this information and was unable to calculate an estimate of this important number. Now with a simple computation, I will be confronted with new important information that allows me to estimate the broad consequences of my decision, and makes me better able to weigh advantages and disadvantages of my intervention for a patient population. At this road crossing “clinical epidemiology” meets “social ethic.” I have tried today to show why my painful encounter in 1967 with the sceptics at the Babies Hospital in New York was the most important event in my professional life. I am thankful to those who made me doubt about what I knew and at the same time who taught me a method to approach “knowing,” a methodology that allowed me to test the quality of the evidence acquired during my training or claimed in the medical literature. I am thankful to them for giving me the tools to better weigh the advantages and disadvantages of my intervention for the sake of my patients. Thank you!