|
From the Department of Medicine, University of British Columbia, Vancouver, BC (Hatala); Durham Veterans Affairs Medical Center and Duke University Medical Center, Durham, NC (Keitz); the Columbia University College of Physicians and Surgeons, New York, NY (Wyer); and the Departments of Medicine and of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ont. (Guyatt)
Members of the Evidence-Based Medicine Teaching Tips Working Group: Peter C. Wyer (project director), College of Physicians and Surgeons, Columbia University, New York, NY; Deborah Cook, Gordon Guyatt (general editor), Ted Haines, Roman Jaeschke, McMaster University, Hamilton, Ont.; Rose Hatala (internal review coordinator), University of British Columbia, Vancouver, BC; Robert Hayward (editor, online version), Bruce Fisher, University of Alberta, Edmonton, Alta.; Sheri Keitz (field test coordinator), Durham Veterans Affairs Medical Center and Duke University Medical Center, Durham, NC; Alexandra Barratt, University of Sydney, Sydney, Australia; Pamela Charney, Albert Einstein College of Medicine, Bronx, NY; Antonio L. Dans, University of the Philippines College of Medicine, Manila, The Philippines; Barnet Eskin, Morristown Memorial Hospital, Morristown, NJ; Jennifer Kleinbart, Emory University School of Medicine, Atlanta, Ga.; Hui Lee, formerly Group Health Centre, Sault Ste. Marie, Ont. (deceased); Rosanne Leipzig, Thomas McGinn, Mount Sinai Medical Center, New York, NY; Victor M. Montori, Mayo Clinic College of Medicine, Rochester, Minn.; Virginia Moyer, University of Texas, Houston, Tex.; Thomas B. Newman, University of California, San Francisco, San Francisco, Calif.; Jim Nishikawa, University of Ottawa, Ottawa, Ont.; Kameshwar Prasad, Arabian Gulf University, Manama, Bahrain; W. Scott Richardson, Wright State University, Dayton, Ohio; Mark C. Wilson, University of Iowa, Iowa City, Iowa
Correspondence to: Dr. Peter C. Wyer, 446 Pelhamdale Ave., Pelham NY 10804; fax 914 738-9368; pwyer{at}att.net
Clinicians wishing to quickly answer a clinical question may seek a systematic review, rather than searching for primary articles. Such a review is also called a meta-analysis when the investigators have used statistical techniques to combine results across studies. Databases useful for this purpose include the Cochrane Library (www.thecochranelibrary.com) and the ACP Journal Club (www.acpjc.org; use the search term "review"), both of which are available through personal or institutional subscription. Clinicians can use systematic reviews to guide clinical practice if they are able to understand and interpret the results.
|
Systematic reviews differ from traditional reviews in that they are usually confined to a single focused question, which serves as the basis for systematic searching, selection and critical evaluation of the relevant research.1 Authors of systematic reviews use explicit methods to minimize bias and consider using statistical techniques to combine the results of individual studies. When appropriate, such pooling allows a more precise estimate of the magnitude of benefit or harm of a therapy. It may also increase the applicability of the result to a broader range of patient populations.
Clinicians encountering a meta-analysis frequently find the pooling process mysterious. Specifically, they wonder how authors decide whether the ranges of patients, interventions and outcomes are too broad to sensibly pool the results of the primary studies.
In this article we present an approach to evaluating potentially important differences in the results of individual studies being considered for a meta-analysis. These differences are frequently referred to as heterogeneity.1 Our discussion focuses on the qualitative, rather than the statistical, assessment of heterogeneity (see Box 1).
|
Two concepts are commonly implied in the assessment of heterogeneity. The first is an assessment for heterogeneity within 4 key elements of the design of the original studies: the patients, interventions, outcomes and methods. This assessment bears on the question of whether pooling the results is at all sensible. The second concept relates to assessing heterogeneity among the results of the original studies. Even if the study designs are similar, the researchers must decide whether it is useful to combine the primary studies' results. Our discussion assumes a basic familiarity with how investigators present the magnitude2,3 and precision4 of treatment effects in individual randomized trials.
The tips in this article are adapted from approaches developed by educators with experience in teaching evidence-based medicine skills to clinicians.1,5,6 A related article, intended for people who teach these concepts to clinicians, is available online at www.cmaj.ca/cgi/content/full/172/5/661/DC1.
Clinician learners' objectives
Tip 1: Qualitative assessment of the design of primary studies
Consider the following 3 hypothetical systematic reviews. For which of these systematic reviews does it make sense to combine the primary studies?
Most clinicians would instinctively reject the first of these proposed reviews as overly broad but would be comfortable with the idea of combining the results of trials relevant to the third question. What about the second review? What aspects of the primary studies must be similar to justify combining their results in this systematic review?
Table 1 lists features that would be relevant to the question considered in the second review and categorizes them according to the 4 key elements of study design: the patients, interventions, outcomes and methods of the primary studies. Combining results is appropriate when the biology is such that across the range of patients, interventions, outcomes and study methods, one can anticipate more or less the same magnitude of treatment effect.
|
In other words, the judgement as to whether the primary studies are similar enough to be combined in a systematic review is based on whether the underlying pathophysiology would predict a similar treatment effect across the range of patients, interventions, outcomes and study methods of the primary studies. If you think back to the first systematic review all therapies for all cancers you probably recognize that there is significant variability in the pathophysiology of different cancers ("patients" in Table 1) and in the mechanisms of action of different cancer therapies ("interventions" in Table 1).
If you were inclined to reject pooling the results of the studies to be considered in the second systematic review, you might have reasoned that we would expect substantially different effects with different antibiotics, different infecting agents or different underlying lung pathology. If you were inclined to accept pooling of results in this review, you might argue that the antibiotics used in the different studies are all effective against the most common organisms underlying pulmonary exacerbations. You might also assert that the biology of an acute exacerbation of an obstructive lung disease (e.g., inflammation) is similar, despite variability in the underlying pathology. In other words, we would expect more or less the same effect across agents and across patients.
Finally, you probably accepted the validity of pooling results for the third systematic review tPA for myocardial infarction because you consider that the mechanism of myocardial infarction is relatively constant across a broad range of patients.
Tip 2: Qualitative assessment of the results of primary studies
You should now understand that combining the results of different studies is sensible only when we expect more or less the same magnitude of treatment effects across the range of patients, interventions and outcomes that the investigators have included in their systematic review. However, even when we are confident of the similarity in design among the individual studies, we may still wonder whether the results of the studies should be pooled. The following graphic demonstration shows how to qualitatively assess the results of the primary studies to decide if meta-analysis (i.e., statistical pooling) is appropriate. You can find discussions of quantitative, or statistical, approaches to the assessment of heterogeneity elsewhere (see Box 1 or Higgins and associates9).
Consider the results of the studies in 2 hypothetical systematic reviews (Fig. 1A and Fig. 1B). The central vertical line, labelled "no difference," represents a treatment effect of 0. This would be equivalent to a risk ratio or relative risk of 1 or an absolute or relative risk reduction of 0.2 Values to the left of the "no difference" line indicate that the treatment is superior to the control, whereas those to the right of the line indicate that the control is superior to the treatment. For each of the 4 studies represented in the figures, the dot represents the point estimate of the treatment effect (the value observed in the study), and the horizontal line represents the confidence interval around that observed effect. For which systematic review does it make sense to combine results? Decide on the answer to this question before you read on.
|
You have probably concluded that pooling is appropriate for the studies represented in Fig. 1B but not for those represented in Fig. 1A. Can you explain why? Is it because the point estimates for the studies in Fig. 1A lie on opposite sides of the "no difference" line, whereas those for the studies in Fig. 1B lie on the same side of the "no difference" line?
Before you answer this question, consider the studies represented in Fig. 2. Here, the point estimates of 2 studies are on the "favours new treatment" side of the "no difference" line, and the point estimates of 2 other studies are on the "favours control" side. However, all 4 point estimates are very close to the "no difference" line, and, in this case, investigators doing a systematic review will be satisfied that it is appropriate to pool the results. Therefore, it is not the position of the point estimates relative to the "no difference" line that determines the appropriateness of pooling.
|
There are 2 criteria for not combining the results of studies in a meta-analysis: highly disparate point estimates and confidence intervals with little overlap, both of which are exemplified by Fig. 1A. When pooling is appropriate on the basis of these criteria, where is the best estimate of the underlying magnitude of effect likely to be? Look again at Fig. 1B and make a guess. Now look at Fig. 3.
|
The pooled estimate at the bottom of Fig. 3 is centred on the midpoint of the area of overlap of the confidence intervals around the estimates of the individual trials. It provides our best guess as to the underlying treatment effect. Of course, we cannot actually know the "truth" and must be content with potentially misleading estimates. The intent of a meta-analysis is to include enough studies to narrow the confidence interval around the resulting pooled estimate sufficiently to provide estimates of benefit for our patients in which we can be confident. Thus, our best estimate of the truth will lie in the area of overlap among the confidence intervals around the point estimates of treatment effect presented in the primary studies.
What is the clinician to do when presented with results such as those in Fig. 1A? If the investigators have done a good job of planning and executing the meta-analysis, they will provide some assistance.6 Before examining the study results in detail, they will have generated a priori hypotheses to explain the heterogeneity in magnitude of effect across studies that they are liable to encounter. These hypotheses will include differences in patients (effects may be larger in sicker patients), in interventions (larger doses may result in larger effects), in outcomes (longer follow-up may diminish the magnitude of effect) and in study design (methodologically weaker studies may generate larger effects).
The investigators will then have examined the extent to which these hypotheses can explain the differences in magnitude of effect across studies. These subgroup analyses may be misleading, but if they meet 7 criteria suggested elsewhere10 (see Box 2), they may provide credible and satisfying explanations for the variability in results.
|
Conclusions
Understanding the concept of heterogeneity in a systematic review or meta-analysis is central to a full appreciation of the implications of such reviews for clinical practice. We have presented 2 tips aimed at helping clinical readers overcome commonly encountered difficulties in understanding this concept.
Footnotes
This article has been peer reviewed.
Contributors: Rose Hatala modified the original ideas for tips 1 and 2, drafted the manuscript, coordinated input from reviewers and field-testing, and revised all drafts. Sheri Keitz used all of the tips as part of a live teaching exercise and submitted comments, suggestions and the possible variations that are described in the article. Peter Wyer reviewed and revised the final draft of the manuscript to achieve uniform adherence with format specifications. Gordon Guyatt developed the original ideas for tips 1 and 2, reviewed the manuscript at all phases of development, contributed to the writing as a coauthor, and, as general editor, reviewed and revised the final draft of the manuscript to achieve accuracy and consistency of content.
Competing interests: None declared.
References
Related Articles
This article has been cited by other articles:
![]() |
M. E Kho, M. Duffett, D. J Willison, D. J Cook, and M. C Brouwers Written informed consent and selection bias in observational studies using medical records: systematic review BMJ, March 12, 2009; 338(mar12_2): b866 - b866. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Y. Chan, A. Ruest, M. O Meade, and D. J Cook Oral decontamination for prevention of pneumonia in mechanically ventilated adults: systematic review and meta-analysis BMJ, April 28, 2007; 334(7599): 889 - 889. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Boonen, P. Lips, R. Bouillon, H. A. Bischoff-Ferrari, D. Vanderschueren, and P. Haentjens Need for Additional Calcium to Reduce the Risk of Hip Fracture with Vitamin D Supplementation: Evidence from a Comparative Metaanalysis of Randomized Controlled Trials J. Clin. Endocrinol. Metab., April 1, 2007; 92(4): 1415 - 1423. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Bongartz, E. L. Matteson, V. M. Montori, A. J. Sutton, M. Sweeting, and I. Buchan Risk of Serious Infections and Malignancies With Anti-TNF Antibody Therapy in Rheumatoid Arthritis--Reply JAMA, November 8, 2006; 296(18): 2203 - 2204. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||