Cross-study comparisons of health care interventions are difficult because of varying patient populations, comparators, concomitant treatments and outcomes, which all affect the absolute risk of events.
The 1-year-death number needed to treat (NNT) standardizes comparisons of treatment impact by focusing on all-cause mortality 1 year from treatment initiation using NNT methodology to account for distinct baseline risks.
It is interpreted as the annual number of people requiring treatment and observation to avoid 1 death, with most beneficial treatments having positive values closest to 0 and most harmful treatments having negative values closest to 0.
The 1-year-death NNT permits the standardized comparison of diverse health care technologies, which could allow physicians to prioritize distinct treatment options for an individual patient or health care organizations to determine which new technologies to introduce first.
After concluding that a randomized controlled trial (RCT) is both internally and externally valid, consumers of evidence may want to consider an intervention’s overall importance compared with other interventions or health technologies. Such an exercise contributes to our perception of an intervention’s importance to health. This process could also help health care organizations determine which of an assortment of new technologies should be introduced first, or identify processes of care that should be subjected to quality-of-care reviews and optimization. Finally, it might also help with treatment selection for an individual patient; for example, in a patient with newly diagnosed left ventricular systolic dysfunction, should one start treatment with angiotensin-converting-enzyme (ACE) inhibition or β-blockade?
However, making comparisons across trials and between different technologies can be difficult because studies differ with respect to patient populations, comparators (placebo v. active control interventions or active controls of differing efficacy), concomitant treatments, follow-up times and outcomes. Comparisons across trials from different eras are even more difficult given secular trends in health outcomes over time. Each of these factors can influence the baseline absolute risk of an event and, therefore, interstudy comparisons.
To help with this, we propose here a method to compare the impact of diverse interventions applied to different populations over varying observation times and with distinct outcomes using a summary statistic we call the 1-year-death number needed to treat (NNT). We illustrate the application of this method using a convenience sample of landmark RCTs. We then present the studied interventions in a league table of 1-year-death NNTs to show how the statistic can be used to gauge the impact of interventions on patient outcomes.
What is the 1-year-death NNT?
Although reported outcomes vary extensively between studies, all-cause mortality is commonly reported and is an outcome whose clinical importance is arguably independent of both the patient population and treatment type. We therefore used all-cause mortality as the basis for the 1-year-death NNT. This statistic summarizes the influence of treatment on patient survival by accounting for between-study variations in baseline death risk and follow-up duration. It is calculated as

in which ARR is the absolute risk reduction (calculated as death riskcontrol group – death riskintervention group, with death risks in each treatment arm expressed as proportions from 0 to 1 and calculated as the number dying divided by the number randomly assigned). Outcome time is the number of years after randomization when death status was measured in the study. If the study did not follow all patients for the same amount of time, the outcome time is the average observation time. Some interventions have a finite treatment duration, with death status measured later on; in such cases, outcome time occurs at the end of the “treatment cycle,” which includes both treatment time and the subsequent lag time before measurement of death status.
The 1-year-death NNT can have values that range from negative to positive infinity. The most beneficial treatments will have positive values closest to zero. Negative 1-year-death NNT values indicate interventions that increase death risk (with the absolute value of the 1-year-death NNT being interpreted as the “number needed to treat to harm”). Interpretation of the 1-year-death NNT depends on the treatment cycle duration (Figure 1). If the treatment cycle is 1 year or longer, the 1-year-death NNT is the annual number of people requiring treatment and observation to avoid 1 death. If the treatment cycle is less than a year, the 1-year-death NNT is the number of people required per treatment cycle (with cycles repeated sequentially for 1 year) to avoid 1 death. For example, consider a 1-year-death NNT of 5. If the treatment cycle is 2 years, this means that 5 people need treatment and observation for 1 year to avoid a single annual death; if the treatment cycle is 6 months, this means that 5 people need treatment and observation for 6 months twice (for a total of 10 treated people or 5 patient-years) to avoid a single annual death.
Interpretation of the 1-year-death number needed to treat.
Note: treatment cycle duration = time from start of treatment to measurement of death status.
How can the 1-year-death NNT be applied?
To illustrate application of the 1-year-death NNT, we used as our sampling frame all RCTs included in 2 Minute Medicine’s The Classics in Medicine: Summaries of the Landmark Trials.1 This is a collation of 190 “landmark clinical studies,” defined by the authors as those identified by academic clinicians in general internal medicine and its subspecialties as “practice-changing,” with a focus on common medical conditions (Dr. Andrew Cheung, St. Joseph’s Healthcare Hamilton, Hamilton, Ont.: personal communication, 2019). We limited our convenience sample to RCTs (since these provide the least biased evaluations of treatment effects) and those reporting all-cause mortality (regardless of whether it was the study’s primary outcome). Further methods and results are presented in Appendix 1 (available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.190367/-/DC1).
Table 1 summarizes the 31 RCTs reporting a significant difference in all-cause death risk between treatment groups. These trials included patients with congestive heart failure (n = 8), coronary artery disease (n = 5), intensive care issues (n = 4), established (or at a high risk of attaining) cardiovascular disease (n = 3), infectious diseases (n = 3), cirrhosis and its complications (n = 3), arrhythmia (n = 2), and 1 each of diabetes, premature labour and cancer. Publication dates ranged from 1972 to 2013.
Summary of landmark randomized controlled trials in which treatments significantly reduced all-cause mortality
Five trials (16.1%) involved a procedural intervention, 13 trials (41.9%) involved continuous treatments (i.e., daily medication), and all other studies (n = 13, 41.9%) involved finite treatment durations ranging from 6 hours to 3 months (Table 2). Studies randomly assigned a mean of 5912 patients (range 61 to 45 852). The mean outcome time was 1.7 years after randomization (range 1 wk to 7.6 yr). Patient loss to follow-up was minimal in all studies except one.10 Death risks in control groups varied extensively between studies (range 2.8% to 88.5%, median 22.7%, mean 26.7%). A total of 27 studies reported on interventions that decreased death risk, and the remaining 4 identified interventions that increased death risk.
All-cause mortality outcomes in landmark randomized controlled trials with statistically significant results
The median 1-year-death NNT (in absolute values) was 18. The intervention saving the greatest number of lives per year (in this case, neonatal lives) was betamethasone during premature labour, with a 1-year-death NNT of 0.16 (95% confidence interval [CI] 0.09 to 0.64).2 Given this study’s treatment cycle duration of less than 1 year, this means (Figure 1) that 0.16 mothers in premature labour required 1 day of betamethasone every week for a year (for a total of 0.16 × 52 = 8.3 people treated and observed) to avoid 1 death. Eleven interventions were beneficial and associated with a 1-year-death NNT of less than 10 (betamethasone in premature labour, prone ventilation in severe adult respiratory distress syndrome, prednisolone in alcoholic hepatitis, goal-directed resuscitation in sepsis, albumin in spontaneous bacterial peritonitis, voriconazole instead of amphotericin B in immunocompromised invasive aspergillosis, dexamethasone in acute meningitis, hypothermia after resuscitation, transjugular intrahepatic portosystemic shunting in acute variceal bleeding, tranexamic acid in trauma, and enalapril in severe congestive heart failure).2–12 The largest 1-year-death NNT (for prolonged tamoxifen after initial 5 years of treatment in early breast cancer) was 556 (95% CI 272 to ∞).
Four trials (12.9%) found a significantly increased death risk with the study’s intervention (including strict glucose control for critical care patients [1-year-death NNT −9.4, 95% CI −63.8 to −5.1],13 encainide or flecainide for frequent premature ventricular contractions after myocardial infarction [1-year-death NNT −21, 95% CI −47 to −13],18 warfarin for symptomatic intracranial arterial stenosis [1-year-death NNT −34, 95% CI −148 to −19],21 and intensive glucose control in high-risk patients with type 2 diabetes [1-year-death NNT −334, 95% CI −∞ to −54]30).
What can we learn from estimating the 1-year-death NNT?
By calculating the 1-year-death NNT, we were able to standardize both changes in death risk between treatment groups and the time after randomization when death risk was calculated. This let us directly compare the impact of very distinct health care technologies on survival (Table 2). These benefits ranged from a life “being saved” annually after treatment of 8.3 mothers in premature labour with corticosteroids34 to continuing 556 women with early breast cancer on tamoxifen (after an initial 5 years of treatment) for another year.32 Table 2 shows that, even after limiting ourselves to the “best of the best” landmark studies (i.e., recognized landmark RCTs of interventions that significantly influenced all-cause mortality), large variations remained in the interventions’ influence on patient survival. These results highlight that the critical review of RCTs should not stop with “significantly decreased all-cause mortality.”
The 1-year-death NNT and Table 2 help put RCT outcomes into perspective and better gauge the importance of interventions on patient survival. However, the interventions presented in these 31 studies should all be considered special. All-cause mortality is critical to patients and is almost invariably measured accurately by researchers, making it a key RCT outcome. Interventions that significantly change the likelihood of death beyond that expected by chance alone are important. If desired, the formula presented above could be modified to calculate 1-year standardized NNTs for any outcome that one cares to compare between studies.
What are the potential limitations of our example?
Any method used to sample all published studies evaluating treatments has its drawbacks. We used a collection of studies subjectively identified by experts to be important to and influential on clinical practice.1 However, there is considerable overlap between studies included in The Classics1 and other collections of key studies; almost a third of the trials published between 1991 and 2003 in The Classics were included in Ioannidis’ seminal report of landmark RCTs,35 and more than 40% of RCTs in Hochman’s collection of key studies36 were in The Classics.1 Our study may have excluded key RCTs reporting important treatment effects on all-cause survival. Further work is required to systematically identify all other methodologically robust RCTs that report a significant influence of treatment on all-cause survival and adding them to our table.
Two issues arise from our results being based on RCT outcomes. First, with the exception of ACE inhibitors12,23,25 and implantable cardioverter-defibrillators20,22 for left ventricular systolic dysfunction, results in our league table are based on a single RCT of interventions. Single studies, even those that are large and internally valid, can return exaggerated results that can be smaller (or sometimes even absent) when trials are replicated or when the intervention is applied in clinical practice.35 This issue could be addressed using data from meta-analyses rather than single studies. Second, the intervention effect measured in RCTs may generalize poorly.37
Other issues regarding our example should be kept in mind. First, some of the health care technologies compared in our study are quite distinct, making their direct comparison relevant only in a numerical sense regarding the 1-year-death NNT. Second, we reported risk using a simple proportion because this was reported in all studies. Risk-measurement methods that account for observation time will more accurately measure risk; such measurements can be used instead of proportions if calculable in the RCTs of all technologies that are to be compared. Finally, we used the p value for the intertreatment difference in death risk to select treatments that were highlighted in Table 2. This dichotomization of a continuous measure (i.e., the likelihood that differences seen exceed that arising by chance) will unfairly exclude some studies (such as those with a p value of 0.05) whose importance is not materially distinct from those presented here.
What are the potential limitations of using the 1-year-death NNT statistic?
Several potential limitations of the number needed to treat (NNT) will also apply to the 1-year-death NNT.38,39 First, both statistics will tend to increase extensively when baseline risks decrease.38 Second, baseline death risks tend to exhibit secular decreases over time as concomitant therapies and other aspects of medical care improve; for example, 1-year mortality for adults admitted to hospital in Ontario decreased by 20% between 1994 and 2009. This makes comparisons of NNT and 1-year-death NNT between years problematic.40 Third, death risk estimates — and differences between treatment groups — get larger the longer patients are observed. Therefore, the NNT (the reciprocal of the absolute risk reduction) will decrease as time from randomization to outcome measurement increases.38 The 1-year-death NNT avoids this limitation by standardizing all NNT time frames to 1 year after randomization. However, this standardization assumes that the relative influence of treatment on death risk is the same 1 year after randomization as at the study’s reported outcome time. If treatment effect changes significantly over time, the 1-year-death NNT may return a biased estimate. The outcome times of almost half of the studies we analyzed (n = 15) were very close to 1 year (between 6 mo and 2 yr), thereby reducing the influence of this potential bias. Fourth, the 1-year-death NNT simplifies the comparison of absolute death risk with potentially important loss of information. Therefore, one should always examine the absolute death risks in each patient group to truly appreciate treatment effects. Fifth, the 1-year-death NNTs reported in Table 2 (just as with all NNT measures) represent the average value of patients in those trials; values could vary widely in particular subgroups. Finally, in RCTs having variable observation time for patients, death risks were most accurate when censoring is considered using survival analysis techniques.41 Since most RCTs in our worked example did not use survival analysis or provide the necessary data to permit recalculation, death risks were summarized using proportions. This made interstudy comparisons possible but could underestimate death risk in studies with long observation times and large treatment effects.
Several limitations are particular to the 1-year death NNT. First, in using all-cause mortality to permit cross-study and cross-technology comparisons, we assumed that all deaths are equal. This is arguably not true since some deaths are symptomatically worse than others (e.g., contrast a person dying in their sleep to someone dying with progressive, incapacitating respiratory failure). Further research might explore the possibility of weighting deaths using health utilities and health-related quality of life measures.42 Second, the 1-year-death NNT also does not account for life years lost owing to the death. Neonatal death during premature labour is more consequential with regard to population health than death in an older adult. Further work could explore weighting 1-year-death NNT by expected life years lost. Third, interventions might provide important benefits to patients without influencing the likelihood of death. Fourth, 1-year-death NNT is but one statistic that could be used to compare health care technologies. Other factors, such as cost and disease prevalence, should also be considered. Finally, the 1-year-death NNT will vary with the outcome time chosen in the RCT. Consider a study in which deaths in both treatment groups are primarily clustered close to the start of observation; the corresponding survival curves will be increasingly flat over time. If such a study chose an observation time that notably exceeded the period during which deaths occurred, 1-year-death NNT would return an inappropriately large value. This possibility supports the point made above: to truly understand the 1-year-death NNT (as with any summary statistic), one should always examine its individual data components.
Conclusion
Despite its limitations, the 1-year-death NNT represents a useful statistic to help consumers of medical evidence contextualize potential impacts of health care interventions. It could also help health care organizations prioritize technologies for evaluation or funding and help individual physicians discriminate between distinct treatment options.
Footnotes
Competing interests: None declared.
This article has been peer reviewed.
Contributors: Brett Hryciw and Carl van Walraven contributed to the conception and data collection. Brett Hryciw, Carl van Walraven and Meltem Tuna contributed to the data analysis. All of the authors contributed to the data interpretation. Carl van Walraven and Finlay McAlister drafted the article, which all of the authors reviewed. All of the authors gave final approval of the version to be published and agreed to be accountable for all aspects of the work.