Review Article
An unadjusted NNT was a moderately good predictor of health benefit

https://doi.org/10.1016/j.jclinepi.2005.08.005Get rights and content

Abstract

Background and Objective

Whether the number needed to treat (NNT) is sufficiently precise to use in clinical practice remains unclear. We compared unadjusted NNTs to quality-adjusted life years (QALYs) gained, a more comprehensive measures of health benefit.

Study Design and Setting

From a subset (n = 65) of a dataset of 228 cost-effectiveness analyses, we compared how well NNTs predicted clinically important QALY gains using correlation analysis, multivariable models and receiver-operator curve (ROC) analysis.

Results

NNT was inversely correlated with QALY gains (P < .001); this relationship was affected by quality of life and life-expectancy gains of treatment (P ≤ .04). The NNT is a moderately accurate predictor of treatments that provide large health benefits (area under ROC 0.74–0.81). For ruling out therapies with low QALY gains (threshold ≤0.125 to ≤0.5 QALYs), an NNT >15 had a sensitivity of 82% to 100%. For ruling in therapies with high QALY gains (threshold ≥0.125 to ≥0.5 QALYs), an NNT ≤5 had a specificity of 77%.

Conclusion

Using NNT thresholds of ≤5 and >15 to rule in and out therapies with large QALY gains may provide general guidance regarding the magnitude of health benefit.

Introduction

Introduced in the late 1980s, the number needed to treat (NNT) has become a widely used method of interpreting the magnitude of treatment benefit [1]. The NNT is calculated as the reciprocal of the absolute risk reduction (ARR) and is commonly defined as how many patients must undergo a therapy to prevent one adverse outcome. For example, if a randomized controlled trial (RCT) demonstrates that patients taking a placebo have a 30% chance of death but those taking a drug have a 10% chance, the drug reduces the absolate risk of death by (30% − 10%) = 20%. The NNT = 1/0.20 = 5. Five patients “need to be treated” to prevent one death.

Texts in evidence-based medicine (EBM) advocate using the NNT to report and interpret RCT results [2], [3], [4]. In searching for the best measure of clinical benefit, it is argued that, in comparison with other outcome measures such as the odds ratio (OR) and relative risk reduction (RRR), the NNT offers clinicians a simpler, more intuitive, and yet more accurate method of understanding the magnitude of health benefit associated with a treatment [5], [6]. EBM advocates promote the NNT as “the most useful measure of clinical effort” [2]. In five general medical journals, the frequency with which RCTs included an NNT (and/or ARR) increased from 4.4% in 1992 to 16.7% in 1998 [7].

Despite widespread use, the NNT does have several well-described limitations. First, users of the NNT must supply their own implicit adjustment for the severity of the illness under consideration [8]. For example, preventing a stroke is of greater value than preventing a headache. Thus, treatments with dramatically different overall benefit may have similar NNTs. Also, the NNT does not incorporate the type of treatment or treatment adverse effects. A different statistic, the “number needed to harm” (NNH) [2] must be calculated to capture side-effect risks.

Similarly, the NNT does not explicitly incorporate the duration of therapy [1]. For chronic conditions, the NNT measures less the avoidance of an adverse outcome than it does the postponement of one, and depends greatly on the point of time in the disease process at which the statistic is measured [9]. Taking the importance of this time component into account, the simple definition of the NNT as “the number of patients who must undergo a therapy to prevent one adverse event” might be more accurately expressed as “the average number of patients who must undergo a therapy over a specified time period to observe one less adverse event at the end of the same or different time period.”

Furthermore, there is a growing body of evidence that despite its apparent simplicity, the NNT is frequently misinterpreted by both lay people and health professionals [9], [10], [11]. Other limitations of the NNT include its failure to take baseline risks into account [8], its limitation to only dichotomous outcomes (it is not possible to calculate an NNT for outcomes measured on continuous scales), and potentially undesirable statistical properties [12].

Our clinical experience has suggested that the NNT does appeal to physicians and trainees. Unfortunately, in contrast to EBM recommendations that the NNT be interpreted in the context of the above limitations and personal patient values [3], we have also observed, particularly in journal club and critical appraisal settings, that clinicians seem to have implicit thresholds, such that a value of <10–15 is considered to be an attractive ratio irrespective of differences in clinical conditions, side effects, length of treatment, and other factors. We are not aware of studies exploring how the NNT is used in clinical settings, but a review of the literature does provide some support to our hypothesis. An informal poll of respiratory health professionals suggested that an NNT ≤20 or less was considered “clinically worthwhile” [13], and other tutorials teaching the NNT concept suggest that a value of ≤10 indicates a clinically significant effect [14]. Published statements such as “These small NNTs suggest that . . . the cholinesterase inhibitors have a valuable place in the current clinical management of [Alzheimer's disease]” [15] suggest an implicit belief that an unadjusted NNT value adequately captures the overall worth of a treatment. The use of NNT league tables [2], [16] may further add to the impression that NNT values in and of themselves are broadly comparable. Indeed, studies have shown that clinicians tend to misinterpret the NNT and use it incorrectly [9], [11].

A number of authors have proposed modifications to the NNT to improve the fidelity of its representation of overall health benefit. Dividing the NNT by the length of the study has been proposed as a way to adjust for variable observation time [2]. Calculating a “number needed to harm” has been recommended to adjust for the risk of side effects [2]. Dividing the NNT by an individual-to-study-population risk ratio has been offered as a way to adjust for baseline risk [17]. A formula has been derived to calculate an “NNT threshold,” the point at which clinical benefit equals clinical risk [18]. Such modifications address some of the potential limitations of the NNT as an outcome measure, but diminish the simplicity and intuitive appeal that make the NNT so attractive to clinicians. The present study addresses the question of whether an unadjusted NNT is a useful clinical tool despite its potential for oversimplifying the expression of potential benefits.

To evaluate the clinical utility of the NNT, a reference standard of health benefit is required. An alternative outcome measure that, like the NNT, represents the magnitude of clinical benefit is the quality-adjusted-life-year (QALY). Developed in the 1970s, the QALY represents health using two attributes, length of life and quality of life (QOL). The key idea underlying the QALY is that the gains associated with any health intervention or program can be accurately represented and expressed using these two dimensions. The QOL changes are usually represented using utility, a quality of life scale from 0 (imminent death) to 1 (full health). Thus, a treatment that is estimated to extend a patient's life for 5 years but at a utility that is 50% of full health would gain (5 years × 0.5 utility) = 2.5 QALYs [19].

QALYs are not often compared to NNTs. The latter are usually used to interpret empirical research (e.g., clinical trials), whereas the former are mostly used in decision and cost-effectiveness analyses (CEAs). Nonetheless, both concepts represent alternative ways of representing the net health benefit associated with a program or intervention. As an outcome measure, the QALY offers some theoretical advantages over the NNT. For example, the QALY more explicitly incorporates benefits, side effects, and length of therapy. Also, QALYs are designed to facilitate comparison across different conditions and interventions [19]. The QALY does, however, have its own, well-described limitations [20]. Furthermore, the QALY is typically too complex to be used in everyday clinical practice. Nonetheless, it is arguably one of the most comprehensive health outcome measures available, and more completely captures health benefit than the NNT can.

Our objective was to determine how well the NNT, with its theoretical limitations, predicts the net health benefit of interventions, using the QALY as a reference standard. We did not primarily focus on the relative utility of other outcome measures (such as RRR or OR) as compared to the NNT.

Section snippets

Article selection

We used a set of 228 CEAs performed between 1976 to 1997 inclusive as our primary data set [21]. This database represents the results of a systematic search for original CEAs up to 1997 that expressed health benefit in QALYs and were published in English. It was compiled through extensive electronic database searches [21] and a review of 6,500 titles in two paper-based bibliographies [22], [23]. From these searches, >1,500 candidate articles were extracted. Based on a reading of the study

Characteristics of selected articles

The characteristics of the 65 articles selected are shown in Table 1. Most of the articles examined drug interventions (60.5%) for cardiovascular, neoplastic, or infectious conditions (60%) at the tertiary prevention stage (73%). The majority of the probabilities used to calculate the NNTs were based on RCTs or formal meta-analyses of RCTs (71%).

Empiric exploration of theoretical limitations of NNT

When comparing NNTs to the change in QALYs across all treatments, we observed that as NNTs fell (i.e., increasing health benefit) the QALY gain

Discussion

The NNT has become a popular method of evaluating the magnitude of treatment benefit. Here, we show that, despite its theoretical limitations, the unadjusted NNT is a moderately good predictor of interventions that are associated with clinically significant health benefit, defined as high QALY gains. We provide evidence that an unadjusted NNT may be sufficiently accurate to be used as an easily calculated shorthand measure of clinical benefit. Our ROC analysis suggests that for identifying

Acknowledgments

M.D.K. is supported by the F. Norman Hughes Chair in Pharmacoeconomics, Faculty of Pharmacy, University of Toronto, and an Investigator Award from the Canadian Institutes for Health Research. G.N. is supported by the Mary Trimmer Chair in Geriatric Medicine Research, University of Toronto.

References (122)

  • A. Ortega et al.

    Cost-utility analysis of paclitaxel in combination with cisplatin for patients with advanced ovarian cancer

    Gynecol Oncol

    (1997)
  • T. Schneider et al.

    Economic analysis of an immunosuppressive strategy in renal transplantation

    Health Policy

    (1988)
  • J.C. Teran et al.

    Primary prophylaxis of variceal bleeding in cirrhosis: a cost-effectiveness analysis

    Gastroenterology

    (1997)
  • J. Tsevat et al.

    Cost-effectiveness of captopril therapy after myocardial infarction

    J Am Coll Cardiol

    (1995)
  • Early Breast Cancer Trialists' Collaborative Group

    Systemic treatment of early breast cancer by hormonal, cytotoxic, or immune therapy: 133 randomised trials involving 31,000 recurrences and 24,000 deaths among 75,000 women

    Lancet

    (1992)
  • A. Laupacis et al.

    An assessment of clinically useful measures of the consequences of treatment

    N Engl J Med

    (1988)
  • D.L. Sackett et al.

    Evidence-based medicine: how to practise and teach EBM

    (2000)
  • F.A. McAlister et al.

    Evidence-Based Medicine Working Group. Users' guides to the medical literature: XX. Integrating research evidence with the care of the individual patient

    JAMA

    (2000)
  • C.D. Naylor et al.

    Measured enthusiasm: does the method of reporting trial results alter perceptions of therapeutic effectiveness?

    Ann Intern Med

    (1992)
  • A. Laupacis et al.

    Therapeutic priorities of Canadian internists

    CMAJ

    (1990)
  • J. Nuovo et al.

    Reporting number needed to treat and absolute risk reduction in randomized controlled trials

    JAMA

    (2002)
  • H.J. McQuay et al.

    Using numerical results from systematic reviews in clinical practice

    Ann Intern Med

    (1997)
  • S.L. Sheridan et al.

    A randomized comparison of patients' understanding of number needed to treat and other common risk reduction formats

    J Gen Intern Med

    (2003)
  • S.L. Sheridan et al.

    Numeracy and the medical student's ability to interpret data

    Eff Clin Pract

    (2002)
  • R.J. Flaherty

    A simple method for evaluating the clinical literature

    Fam Pract Manag

    (2004)
  • G.K. Livingston et al.

    How useful are cholinesterase inhibitors in the treatment of Alzheimer's disease? A number needed to treat analysis

    Int J Geriatr Psychiatry

    (2000)
  • NNT Tables [database on the Internet]. Toronto (ON): Centre for Evidence Based Medicine, University Health Network,...
  • R.J.S. Cook et al.

    The number needed to treat: a clinically useful measure of treatment effect

    BMJ

    (1995)
  • G.W. Torrance et al.

    Utilities and quality-adjusted life years

    Int J Technol Assess Health Care

    (1989)
  • M. McGregor

    Cost-utility analysis: use QALYs only with great caution

    CMAJ

    (2003)
  • P.J. Neumann et al.

    The quality of reporting in published cost-utility analyses, 1976–1997

    Ann Intern Med

    (2000)
  • A. Elixhauser et al.

    Health care CBA/CEA: an update on the growth and composition of the literature

    Med Care

    (1993)
  • A. Elixhauser et al.

    Health care CBA and CEA from 1991 to 1996: an updated bibliography

    Med Care

    (1998)
  • Victorian Infant Collaborative Study Group

    Economic outcome for intensive care of infants of birthweight 500–999 g born in Victoria in the post surfactant era

    J Paediatr Child Health

    (1997)
  • C.L. Bennett et al.

    Cost-effective models for flutamide for prostate carcinoma patients: are they helpful to policy makers?

    Cancer

    (1996)
  • W.G. Bennett et al.

    Estimates of the cost-effectiveness of a single course of interferon-alpha 2b in patients with histologically mild chronic hepatitis C

    Ann Intern Med

    (1997)
  • M.H. Boyle et al.

    Economic evaluation of neonatal intensive care of very-low-birth-weight infants

    N Engl J Med

    (1983)
  • M.L. Brown et al.

    Adjuvant therapy for stage III colon cancer: economics returns to research and cost-effectiveness of treatment

    J Natl Cancer Inst

    (1994)
  • G. Chouinard et al.

    Economic and health state utility determinations for schizophrenic patients treated with risperidone or haloperidol

    J Clin Psychopharmacol

    (1997)
  • D.J. Cohen et al.

    Evaluating the potential cost-effectiveness of stenting as a treatment for symptomatic single-vessel coronary disease: use of a decision-analytic model

    Circulation

    (1994)
  • J.L. Cronenwett et al.

    Cost-effectiveness of carotid endarterectomy in asymptomatic patients

    J Vasc Surg

    (1997)
  • C.E. Desch et al.

    Should the elderly receive chemotherapy for node-negative breast cancer? A cost-effectiveness analysis examining total and active life-expectancy outcomes

    J Clin Oncol

    (1993)
  • D.N. Rose et al.

    Cost effectiveness of isoniazid chemoprophylaxis

  • R.C. Eastman et al.

    Model of complications of NIDDM. II. Analysis of the health benefits and cost-effectiveness of treating NIDDM with the goal of normoglycemia

    Diabetes Care

    (1997)
  • K. Fiscella et al.

    Cost-effectiveness of the transdermal nicotine patch as an adjunct to physicians' smoking cessation counseling

    JAMA

    (1996)
  • B.F. Gage et al.

    Cost-effectiveness of warfarin and aspirin for prophylaxis of stroke in patients with nonvalvular atrial fibrillation

    JAMA

    (1995)
  • D.E. Glotzer et al.

    Management of childhood lead poisoning: clinical impact and cost-effectiveness

    Med Decis Making

    (1995)
  • L.M. Goodnough et al.

    Efficacy and cost-effectiveness of autologous blood predeposit in patients undergoing radical prostatectomy procedures

    Urology

    (1994)
  • P.J. Goodwin et al.

    Cost-effectiveness of cancer chemotherapy: an economic evaluation of a randomized trial in small-cell lung cancer

    J Clin Oncol

    (1988)
  • M.E. Goossens et al.

    Cognitive-educational treatment of fibromyalgia: a randomized clinical trial. II. Economic evaluation

    J Rheumatol

    (1996)
  • Cited by (23)

    • Continuing, reducing, switching, or stopping antipsychotics in individuals with schizophrenia-spectrum disorders who are clinically stable: a systematic review and network meta-analysis

      2022, The Lancet Psychiatry
      Citation Excerpt :

      Contrary to our original hypothesis based on available literature,10,23 switching to another antipsychotic was similarly effective compared to continuing at standard doses, whereas reducing the antipsychotic dose was significantly inferior compared to both continuing and switching. For every three individuals continuing antipsychotic treatment at standard doses, one additional individual will avoid relapse compared to stopping antipsychotic treatment, which can be regarded as a large effect magnitude according to commonly used thresholds27 and results from RCTs in acute schizophrenia.28 The NNT slightly increased to about 3·5 for patients who switched antipsychotic treatment (still regarded as a large effect magnitude), and increased further to about 6·0 for those reducing the dose (a moderate effect magnitude, although notably imprecise) versus stopping antipsychotics altogether.

    • Group cognitive-behavioural therapy for perinatal anxiety disorders: Treatment development, content, and pilot results

      2021, Journal of Affective Disorders Reports
      Citation Excerpt :

      Our study achieved an NNT of 2.9, meaning we would need to treat 2.9 individuals to have at least one respond. This an encouraging finding, given that NNTs below 5 are considered clinically meaningful (Chong et al., 2006). A repeated measures ANOVA was conducted with time as the within-subjects independent variable and total EDPS score as the dependent variable.

    • Is resectable hepatocellular carcinoma a contraindication to liver transplantation? A novel decision model based on "number of patients needed to transplant" as measure of transplant benefit

      2014, Journal of Hepatology
      Citation Excerpt :

      The problem, therefore, is not simply to show when LT is superior to HR (absolute criterion), but how to judge the minimal benefit (in months) obtained from LT over HR. In this view, this “relative NNT classification” [13] represents one potential tool to define the optimal selection strategy between HR and LR based therapies by quantification of the benefit. It is important to underline that this is only a proposal, and not an absolute criterion, to ponder on this complex evaluation between the best therapies available for the treatment of HCC.

    View all citing articles on Scopus
    View full text