Abstract
Background: It is unclear whether participation in a randomized controlled trial (RCT), irrespective of assigned treatment, is harmful or beneficial to participants. We compared outcomes for patients with the same diagnoses who did (“insiders”) and did not (“outsiders”) enter RCTs, without regard to the specific therapies received for their respective diagnoses.
Methods: By searching the MEDLINE (1966–2010), Embase (1980–2010), CENTRAL (1960–2010) and PsycINFO (1880–2010) databases, we identified 147 studies that reported the health outcomes of “insiders” and a group of parallel or consecutive “outsiders” within the same time period. We prepared a narrative review and, as appropriate, meta-analyses of patients’ outcomes.
Results: We found no clinically or statistically significant differences in outcomes between “insiders” and “outsiders” in the 23 studies in which the experimental intervention was ineffective (standard mean difference in continuous outcomes −0.03, 95% confidence interval [CI] −0.1 to 0.04) or in the 7 studies in which the experimental intervention was effective and was received by both “insiders” and “outsiders” (mean difference 0.04, 95% CI −0.04 to 0.13). However, in 9 studies in which an effective intervention was received only by “insiders,” the “outsiders” experienced significantly worse health outcomes (mean difference −0.36, 95% CI −0.61 to −0.12).
Interpretation: We found no evidence to support clinically important overall harm or benefit arising from participation in RCTs. This conclusion refutes earlier claims that trial participants are at increased risk of harm.
When people are asked to participate in a randomized controlled trial (RCT), it is natural for them to ask several questions in return. How safe are these treatments? How many extra visits and tests must I undergo? Will the researchers keep my family doctor informed about what’s going on? What outcomes are to be measured, and do they include ones that are of interest to me as a patient?
These multiple questions can be summarized as follows: Would I fare better being treated within the trial (as an “insider”) or in routine clinical care outside it (as an “outsider”)? Patients may ask this question in 1 of 2 ways. The first is highly specific: “Am I better off receiving this specific treatment as an insider or as an outsider?” Alternatively, they might ask a more general question: “Am I better off having my illness managed, regardless of the specific treatment I would receive, as an insider or as an outsider?” These questions are highly appropriate, and both deserve to be asked and answered,1,2 especially given that nonsystematic reviews have suggested a possible “inclusion benefit” from participating in trials.3
These 2 specific patient questions are analogous to those posed by researchers asking whether treatments do more good than harm when applied under “ideal” circumstances (in explanatory trials) or in the “real world” of routine health care (in pragmatic trials). Vist and colleagues answered the explanatory question when their earlier review4 found no advantage or disadvantage from receiving the same treatment inside or outside an RCT. Left unanswered, however, was the broader, more pragmatic question. In our experience, trial participants are often offered new, as-yet-untested treatments that would not be available to them outside the trial. This review looks at the dilemma faced by these patients, which needs to be addressed before general conclusions can be drawn about trial safety.
Methods
Data sources and searches
We searched the following databases: MEDLINE (1966 to November 2010), Embase (1980 to November 2010), Cochrane Central Register of Controlled Trials (CENTRAL; 1960 to last quarter of 2010) and PsycINFO (1880 to November 2010). The search strategy for each database is available upon request to the corresponding author. Studies were eligible for inclusion if they reported the same set of outcomes for “insiders” and “outsiders,” either simultaneously or within 2 months, where “insiders” were patients with a particular diagnosis who entered an RCT (whether treated with the intervention or a comparator) and “outsiders” were patients with the same diagnosis who did not enter the RCT. To validate our search, we compared our yield with the list of articles reviewed by Vist and colleagues.4
Study selection
Working in pairs, we reviewed the resulting titles and abstracts to screen for eligibility. Two reviewers independently screened the full text of eligible articles, with an independent third adjudicator resolving disagreements. Agreement was summarized with a weighted kappa coefficient.
Data extraction
Our primary outcome was mortality, and secondary outcomes included patient-reported or other clinically important outcomes. We calculated the relative risk (RR), unless count data were not reported, in which case we extracted the authors’ RR. We used adjusted RRs whenever they were reported.5 When RRs could not be calculated, we assumed that the reported odds ratios (ORs) approximated the RR for low event-rate outcomes.
For continuous outcomes, we extracted mean between-group differences and their standard deviations. We created rules for calculating missing outcomes using various statistical measures that were reported (Table 1).
Assumptions and imputations used to calculate data if missing from published report
Prespecified causes of heterogeneity
We used the I2 statistic to measure the extent of heterogeneity between studies, where I2 values of 25%, 50% and 75% indicated low, medium and high heterogeneity, respectively.6 In addition, we constructed a priori hypotheses to potentially explain between-study heterogeneity, based on differences in types of outcomes, methodologic quality, types of care provided, potential for detection bias (due to differential follow-up or use of better diagnostic tools), potential for exclusion bias (if patients were excluded after enrolment because of characteristics related to outcome), potential for selection bias (due to imbalance of baseline characteristics), medical specialty and treatments provided.
In particular, we proposed 6 subgroups to explain observed heterogeneity due to treatment effect:
when the randomized experimental intervention given to “insiders” was effective (i.e., the outcome was statistically significantly superior to the comparator), and “outsiders” received that same intervention or comparator
when the randomized experimental intervention was effective, and “outsiders” received that same effective intervention only (without the comparator that was provided within the RCT)
when the randomized experimental intervention was effective, and “outsiders” received the less effective comparator intervention only (without the experimental intervention provided within the RCT)
when the randomized experimental intervention was effective, and “outsiders” received a different intervention (this subgroup acted as a positive control for the current analysis, since we anticipated better outcomes in the RCT group)
when the randomized experimental and comparator interventions generated equivalent outcomes, with no further subdivision of this group (because any differences in outcomes between those treated inside and outside the RCT could be attributed to a trial effect)
when insufficient information was provided about the effectiveness of the treatment in the trial and/or insufficient details were provided about the interventions received by “outsiders”
Data synthesis and analysis
Statistical calculations were performed with SPSS (version 20).7 Forest plots and funnel plots were created using Review Manager (version 5.1).8 When event counts were available, we used the Mantel–Haenszel method to estimate overall RR.9 If a study had a zero event rate in one group, we added a 0.5 correction to all cells. If only estimates of effect size and standard errors were provided, we used the generic inverse-variance meta-analysis function of Review Manager 5.1. We used the random-effects model to summarize outcomes.9
We first separated the studies into 2 groups according to whether randomization was applied in determining whether potential participants would be “insiders” or “outsiders.” Next, we separated studies by type of outcome: continuous or dichotomous, with the latter being further subdivided as nonmortality or mortality.
We created a funnel plot and conducted a sensitivity analysis to determine the stability of our conclusions.
Results
Summary of evidence
Following elimination of duplicate records and exclusions on the basis of initial screening and full-text review, 147 articles met our eligibility criteria and provided sufficient information to be included in our analysis (Figure 1).10–156 Details for the 576 articles excluded after full-text review, including reasons for exclusion, are available upon request. The eligibility of the remaining 74 articles was uncertain, and they were not included in the analysis.
For full-text screening, the calculated average of the weighted kappa for eligibility was 0.68. There was 83% raw agreement between reviewers in the data-extraction phase for outcomes.
In 5 of the 147 eligible studies, patients were randomly assigned to become “insiders” and “outsiders.”38,41,86,87,141 In the remaining 142 studies, patients became part of the “outsiders” group for a variety of reasons. Table 2 presents the details about each included study.
Characteristics of included studies
We analyzed a total of 48 continuous outcomes and 99 dichotomous outcomes; of the dichotomous outcomes, 74 were nonmortality outcomes, 4 were recurring outcomes (such as relapse rates), and 21 were mortality outcomes.
Risk of bias
Sources of risk of bias are detailed by individual study in Appendix 1 (available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.131693/-/DC1). In terms of detection bias, about two-thirds of the studies (n = 100) employed identical follow-up strategies for “insiders” and “outsiders.” In terms of exclusion bias affecting “insiders,” 67 studies had no exclusions, 1 study employed a deliberate but appropriate exclusion, and 74 studies inappropriately excluded “insiders” unequally between treatment groups; for the remaining 5 studies, the details were unclear. Forest plots based on subgroups created for each of these sources of bias did not change the results described below.
Replication of earlier studies
As a method of calibrating our search strategies and statistical methods, we carried out analyses of our dataset that were restricted to “insiders” and “outsiders” receiving identical treatments. These restricted analyses replicated the results of previous studies by Vist and colleagues4 and Gross and associates.157
Outcomes for studies with participants not randomized as “insiders” or “outsiders”
Our initial pooled analyses revealed a high degree of between-study heterogeneity (p < 0.001, I 2 = 84% for studies with dichotomous mortality outcomes; p < 0.001, I2 = 70% for studies with dichotomous nonmortality outcomes; p < 0.001, I 2 = 88% for studies with continuous outcomes). In total, mortality was determined for 53 714 “insiders” and 25 817 “outsiders” (see Table 3 and Appendix 2, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.131693/-/DC1). Dichotomous nonmortality outcomes were reported for 30 253 “insiders” and 30 000 “outsiders” (see Table 4 and Appendix 3, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.131693/-/DC1). We present the results of our nonrandomized continuous outcomes and randomized comparisons according to treatment effects, by presenting the subgrouping that left the least amount of remaining heterogeneity. All other forest plots are available upon request.
Summary of meta-analyses for studies with mortality as an outcome, without randomization of potential participants as “insiders” v. “outsiders” (subgroups based on effectiveness of trial treatment)
Summary of meta-analyses for studies with dichotomous nonmortality outcomes, without randomization of potential participants as “insiders” v. “outsiders” (subgroups based on effectiveness of trial treatment)
Results for clinically relevant subgroups
The results for continuous outcomes are summarized by subgroup in Table 5 (see also Appendix 4, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.131693/-/DC1).
There were 7 studies in which the randomized experimental intervention given to “insiders” (n = 6626) was effective, and “outsiders” (n = 2293) received that same intervention or the comparator. The heterogeneity was low to moderate (p = 0.2, I 2 = 37%), and the pooled result indicated neither significant harm nor significant benefit attributable to being an “insider” or an “outsider” (standardized mean difference 0.04, 95% confidence interval [CI] −0.04 to 0.13).
There were 3 studies in which the randomized experimental intervention (given to 1391 “insiders”) was effective, and the 5072 “outsiders” received only that same effective intervention. In this subgroup, there was a high degree of heterogeneity (p < 0.001, I2 = 95%).
There were 4 studies in which the randomized experimental intervention was effective, and “outsiders” received only the less effective comparator. In these studies, the 5794 “insiders” (those assigned to receive the active intervention or comparator) experienced a positive effect of the intervention, but the 9035 “outsiders” were offered only the ineffective comparator. In this subgroup, there was also a high degree of heterogeneity (p = 0.01, I2 = 74%).
There were 9 studies in which the randomized experimental intervention had a positive effect inside the RCT, but “outsiders” received a completely different intervention or comparator. For these studies, results could be pooled for the 649 “insiders” and 188 “outsiders” (standardized mean difference −0.36, 95% CI −0.61 to −0.12, p = 0.08, I 2 = 43%). In this subgroup, “insiders” fared statistically significantly better than “outsiders.”
The largest subgroup consisted of 23 studies in which the randomized experimental and comparator interventions generated equivalent outcomes. In this subgroup, the 5 940 “insiders” and 11 927 “outsiders” were given both treatments, only the control or only the experimental treatment, or completely different interventions. Heterogeneity among these studies was low to moderate (p = 0.10, I 2 = 29%). The pooled result revealed neither net harm nor net benefit for “insiders” compared with “outsiders” (standardized mean difference −0.03, 95% CI −0.1 to 0.04).
For the final subgroup of 2 studies, it was unclear whether there was a treatment effect or which interventions the “outsiders” received. We requested additional information from the study authors, but as of the date of publication, were still awaiting this clarification.
Outcomes for studies with participants randomized as “insiders” or “outsiders”
In 5 studies, potential participants were randomly assigned to become “insiders” or “outsiders.” One of these studies used a continuous outcome, with no reported difference between the 180 “insiders” and 97 “outsiders” (95% CI −0.22 to 0.27). The remaining 4 studies reported dichotomous nonmortality outcomes, with a moderate degree of heterogeneity (p = 0.06, I 2 = 60%). Their overall pooled effect indicated neither harm nor benefit when patients were treated inside or outside a trial (RR 0.94, 95% CI 0.56 to 1.57).
Additional analyses
Our investigation into publication bias showed a lack of smaller studies (both positive and negative) in our study. Because the included studies were symmetric around the pooled estimate, we are confident that our estimates are valid.
Our sensitivity analysis confirmed the robust nature of our imputations. Removing the studies with imputed outcomes had no significant effect on our results. Similarly, the results were not affected by clinical specialty.
Interpretation
Our study has confirmed the earlier findings of Vist and colleagues4 and Gross and associates,157 who reported that when trial participants (“insiders”) and nonparticipants (“outsiders”) receive the same treatments, they experience similar outcomes. As such, there is neither a “trial advantage” nor a “guinea pig disadvantage” of participating in an RCT. Furthermore, we have shown that even when “insiders” and “outsiders” are offered different interventions, there is no disadvantage to trial participation.
Our findings do not support the theory of “inclusion benefits,” “protocol effects” or “care effects” proposed by other authors.3,158 We found no differences in outcomes that could be attributed to health care workers providing additional care to “insiders,” the setting in which “insiders” were treated or the closer follow-up and attention that “insiders” receive. Had there been better care because physicians were following strict study protocol, a difference would have been detected between the groups for whom treatments were identical and would have been amplified within the subgroup of studies in which detection bias and expertise bias were most probable.
As expected, our subanalysis of “insiders” and “outsiders” who received the same treatments confirmed the results of the Vist and Gross reviews.4,157 However, we suggest that their insistence on identical interventions for patients inside and outside of an RCT answered only a narrow, explanatory question. For our review, we posed a more pragmatic question: Will patients fare better being treated within a trial (as “insiders”) or in routine clinical care outside it (as “outsiders”), regardless of the treatment received? In other words, will they be “sacrificial guinea pigs,” or, conversely, will they enjoy an “inclusion benefit”? Or will they fare the same inside the RCT or outside it? Our pragmatic study supports the last of these options, that patients will, in general, fare just as well regardless of whether they are “insiders” or “outsiders.”
Stiller159 reported a beneficial effect on mortality for “insiders.” However, that conclusion was based on simply counting the number of studies in which “insiders” had lower mortality than “outsiders,” ignoring the size of each study. As such, smaller studies (which are more prone to type II error) were weighted the same as much larger studies. Our random-effects meta-analysis took into account the size and weight of each study, and we found no such benefit from trial participation.
Limitations
Although 68% of the studies included here employed identical follow-up protocols for both “insiders” and “outsiders,” some studies did not explicitly state whether “outsiders” included all eligible patients or only those for whom data could be obtained. If “outsiders” are more likely to become lost to follow-up, in part because they have died or suffered other adverse events, true trial advantages might be missed.
Conclusion
We found no evidence to support either clinically important harm or clinically important benefit when patients’ illnesses were managed inside or outside an RCT. These results can inform discussions between clinicians and the patients to whom they are offering entry into peer-reviewed, ethically conducted RCTs. These results are also relevant to the policies, procedures and actions of institutions, ethics committees and granting agencies that permit and support the execution of RCTs.
Our findings and conclusions are only as good as the publication base of relevant RCTs, and we look forward to the day when the proposals of Vickers160 and Altman and Cates161 are fully realized, with all trials registered and reported and with raw trial data made readily available. When that day arrives, our study should be repeated to determine the validity of the conclusions reached here.
Acknowledgments
The authors would like to thank Dr. David Sackett for initiating this project. His insight and guidance throughout development of the manuscript were invaluable resources.
Footnotes
Competing interests: John Riva has received an NCMIC Foundation grant for work unrelated to the study reported here. He is also a board member of the Ontario Chiropractic Association. Lyndsay Somerville receives salary support (through her institution) from Smith & Nephew Canada. No other competing interests declared.
This article has been peer reviewed.
Contributors: Neera Bhatnagar designed and carried out the search. Natasha Fernandes, Dianne Bryant, Mohamed El-Rabbany, Nisha Fernandes, Crystal Kean, Jacquelyn Marsh, Siddhi Mathur, Rebecca Moyer, Clare Reade, John Riva and Lyndsay Somerville chose the included studies and extracted data. Natasha Fernandes analyzed the data with supervision from Lauren Griffith. Natasha Fernandes wrote the primary draft of the protocol and manuscript, and all other authors edited and further developed these components. All authors approved the final version. Dianne Bryant supervised this project. All authors agree to act as guarantors of this paper.
Funding: This study was supported by an internal grant from the University of Western Ontario to Dianne Bryant; no external funding was received. Natasha Fernandes was supported by McMaster University, the Canadian Institutes of Health Research Frederick Banting and Charles Best Canadian Graduate Scholarship and an Ontario Graduate Scholarship.
Data sharing: The dataset is available from the corresponding author.