Observer bias in randomized clinical trials with measurement scale outcomes: a systematic review of trials with both blinded and nonblinded assessors
=====================================================================================================================================================

* Asbjørn Hróbjartsson
* Ann Sofia Skou Thomsen
* Frida Emanuelsson
* Britta Tendal
* Jørgen Hilden
* Isabelle Boutron
* Philippe Ravaud
* Stig Brorson

## Abstract

**Background:** Clinical trials are commonly done without blinded outcome assessors despite the risk of bias. We wanted to evaluate the effect of nonblinded outcome assessment on estimated effects in randomized clinical trials with outcomes that involved subjective measurement scales.

**Methods:** We conducted a systematic review of randomized clinical trials with both blinded and nonblinded assessment of the same measurement scale outcome. We searched PubMed, EMBASE, PsycINFO, CINAHL, Cochrane Central Register of Controlled Trials, HighWire Press and Google Scholar for relevant studies. Two investigators agreed on the inclusion of trials and the outcome scale. For each trial, we calculated the difference in effect size (i.e., standardized mean difference between nonblinded and blinded assessments). A difference in effect size of less than 0 suggested that nonblinded assessors generated more optimistic estimates of effect. We pooled the differences in effect size using inverse variance random-effects meta-analysis and used metaregression to identify potential reasons for variation.

**Results:** We included 24 trials in our review. The main meta-analysis included 16 trials (involving 2854 patients) with subjective outcomes. The estimated treatment effect was more beneficial when based on nonblinded assessors (pooled difference in effect size −0.23 [95% confidence interval (CI) −0.40 to −0.06]). In relative terms, nonblinded assessors exaggerated the pooled effect size by 68% (95% CI 14% to 230%). Heterogeneity was moderate (*I**2* = 46%, *p* = 0.02) and unexplained by metaregression.

**Interpretation:** We provide empirical evidence for observer bias in randomized clinical trials with subjective measurement scale outcomes. A failure to blind assessors of outcomes in such trials results in a high risk of substantial bias.

A failure to blind assessors of outcomes in randomized clinical trials may result in bias. Observer bias, sometimes called “detection bias” or “ascertainment bias,” occurs when outcome assessments are systematically influenced by the assessors’ conscious or unconscious predispositions — for example, because of hope or expectations, often favouring the experimental intervention.1

Blinded outcome assessors are used in many trials to avoid such bias. However, the use of non-blinded assessors remains common,2–4 especially in nonpharmacological trials; for example, non-blinded outcome assessment was used in 90% of trials involving orthopedic traumatology3 and 74% of trials involving strength training for muscles.4

Unfortunately, the empirical evidence on observer bias in randomized clinical trials has been incomplete. Meta-epidemiological studies have compared double-blind trials with similar trials that were not double-blind.5,6 However, such studies address blinding crudely because “double-blind” is an ambiguous term.3,7 Furthermore, the risk of confounding is considerable in indirect between-trial analyses, as “double-blind” trials may have better overall methods and larger sample sizes than trials that are not reported as “double-blind.”

A more reliable approach involves analyses of trials that use both blinded and nonblinded outcome assessors, because such a within-trial design provides a direct comparison between blinded and nonblinded assessments of the same outcome in the same patients. Our previous analysis of such trials with binary outcomes found substantial observer bias.8

Although subjective measurement scales such as illness severity scores are popular, they may be susceptible to observer bias. They are frequently used as outcomes in clinical scenarios with no naturally distinct categories, and adjacent subcategories on a scale typically involve minor and vaguely defined differences.

We decided to systematically review trials with both blinded and nonblinded assessment of outcomes using the same measurement scales. Our primary objective was to evaluate the impact of nonblinded outcome assessment on estimated treatment effects in randomized clinical trials. Our secondary objective was to examine reasons for variation in observer bias.

## Methods

### Eligibility criteria

We included randomized clinical trials with blinded and nonblinded assessment of the same measurement scale outcome. We excluded trials for which the distinction between the experimental and control groups was unclear, because such trials would not allow us to determine the direction of any bias; trials for which only a subgroup of patients were evaluated by blinded and nonblinded assessors, unless selected at random; trials in which blinded and nonblinded assessors had access to each others’ results; and trials in which initially blinded assessors became unblinded (e.g., when radiographs showed ceramic material indicative of the experimental intervention).

### Search strategy

We searched the following databases from their inception onwards without language restrictions: PubMed, EMBASE, PsycINFO, CINAHL, The Cochrane Central Register of Controlled Trials, HighWire Press and Google Scholar. Our core search string was random* AND (“blind* and unblind*” OR “masked and unmasked”) with variations according to the specific database (Appendix 1, available at [www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.120744/-/DC1](http://www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.120744/-/DC1)). We performed the last search on Jan. 26, 2010. We read the references of all of the included trials and asked the authors of all included trials whether they knew of additional trials to identify any further studies that should be included.

### Data abstraction

One investigator read all abstracts from standard databases and all text fragments from full-text databases. If a study was identified as potentially eligible for inclusion, we retrieved a full study report, which was read by an investigator who excluded all clearly ineligible studies. Two investigators read all other study reports and decided on eligibility. Disagreements were resolved by discussion.

We selected a single measurement scale from each trial. If several outcomes had been assessed under both blinded and nonblinded conditions, we preferred the primary outcome of the trial and the first assessment after the end of treatment (unless the primary outcome prescribed a different time point). Two investigators selected the outcomes independently. Again, disagreements were resolved by discussion. For trials with more than 2 groups, we pooled the results in the experimental or control groups.1

From each trial we extracted the following data: posttreatment mean, standard deviation and the numbers of patients in the experimental and control groups in the blinded assessments, and the corresponding data from the nonblinded assessments. For crossover and split-body trials, we extracted the standard deviation of the paired difference between treatments. If possible, we also extracted data on the correlation between blinded and non-blinded assessment (e.g., Spearman rank correlation coefficient) and data on interobserver variation between assessors (blinded or nonblinded).

If data were incomplete, we contacted the authors of the trial by email or telephone. We also searched the US Food and Drug Administration (FDA) website for trial outcome data. If standard deviations were not reported, we used standard deviations from a comparable trial that used the same measurement scale. If interobserver data were not available, we tried to obtain them from independent scale-validation studies.

For each trial, we evaluated 5 prespecified potential confounders in the comparison between blinded and nonblinded outcome assessments: a considerable time lapse between the 2 assessments, different types of assessors (e.g., nurses v. physicians), different assessment procedures (e.g., direct visual assessment of a wound v. a photograph of a wound), a substantial risk of ineffective blinding and different patients being assessed (i.e., some patients who had been evaluated blindly had not been evaluated nonblindly and vice versa). The first 4 items were evaluated by 2 investigators masked to any information relating to the comparison between blinded and nonblinded assessors. The masking was done by manipulating PDF versions of the trial reports so that tables, graphs or text describing the results of any comparison between blinded and nonblinded assessors were blanked out. There were no cases of accidental unmasking.

In addition, for each trial, we evaluated 3 characteristics of the outcomes that could possibly explain variations in observer bias. Two masked investigators independently evaluated the following 3 factors on a scale from 1 to 5 (1 = low, 5 = high): the degree of subjectivity of the outcome (i.e., the degree to which the assessors’ judgment affected the outcome; high in global assessment of patient improvement and low in reading a laboratory sheet); the non-blinded assessor’s overall involvement in the trial (i.e., a proxy for the degree of personal preference for a result favourable to the experimental intervention); and the vulnerability of the outcome to nonblinded patients (high in outcomes based on interviews with nonblinded patients and low in outcomes involving pure observation, such as the inspection of photographs). Disagreements were resolved by discussion.

### Statistical analysis

For each trial, we calculated the effect size (i.e., standardized mean difference) based on the blinded and nonblinded assessments using the pooled standard deviation of the blinded assessments as the common standardizing unit. An effect size of less than 0 suggests a beneficial effect of the experimental intervention. We subsequently summarized the impact of nonblinded outcome assessment as the difference between the 2 effect sizes. A difference in effect size of less than 0 suggests that the nonblinded assessments generate more optimistic estimates of effect than do the blinded assessments.

We pooled the differences in effect size from individual trials by meta-analysis using random-effects models and inverse variance weights.9 The standard error of the difference in effect size used for the main analysis disregarded the correlation between blinded and nonblinded assessments (Appendix 2, available at [www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.120744/-/DC1](http://www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.120744/-/DC1)).

We tested the robustness of our main analysis with secondary analyses addressing the type of analysis (e.g., incorporating the correlation between blinded and nonblinded assessments), type of data, clinical condition, trial characteristics, risk of confounding and trial size. In addition, we examined the percentage by which the nonblinded effect estimate exceeded the blinded effect estimate (effect size difference/blinded effect size), approximating the confidence interval for the percentage according to Fieller.10

Finally, we used univariable random-effects metaregression to determine whether variations in effect size differences were associated with the 3 prespecified outcome characteristics we described earlier.

## Results

We identified 537 publications from 1835 hits in standard databases and 2200 hits in full-text databases. We excluded 513 studies, mostly because they were not randomized clinical trials or because they lacked blinded or nonblinded outcome assessment (Figure 1). Thus, 24 trials were included in our qualitative synthesis.11–36

![Figure 1:](http://www.cmaj.ca/https://www.cmaj.ca/content/cmaj/185/4/E201/F1.medium.gif)

[Figure 1:](http://www.cmaj.ca/content/185/4/E201/F1)

Figure 1: 
Flow diagram for identification of eligible trials. *A standard database (e.g., Medline) indexes publications that are searchable by title, keywords and abstract, but does not contain the full text of the publication; a full-text database (e.g., Google Scholar) indexes the searchable full-text of publications. †Other reasons for exclusion incude different interventions for patients assessed by blinded and non-blinded assessors, retrospective analysis of a risk factor, the same patients as involved in another included trial, lack of clarity as to whether nonblinded clinicians formally assessed outcomes or the use of blinded versus nonblinded assessment in only 1 arm of the trial.

Of these 24 trials, 16 (involving 2854 patients) provided outcome data for both the blinded and nonblinded assessors. The characteristics of the trials are described in Table 1. The clinical specialties represented were neurology, cosmetic surgery, cardiology, psychiatry, otolaryngology, dermatology, gynecology and infectious diseases.

View this table:
[Table 1:](http://www.cmaj.ca/content/185/4/E201/T1)

Table 1: 
Characteristics of randomized clinical trials included in our meta-analysis

The outcomes of the trials were generally subjective; 13 of the 16 trials (81%) scored 4 or 5 on our scale of subjectivity (Table 2). The median Spearman rank correlation coefficient between blinded and nonblinded assessments in the 7 trials with such data was 0.67 (Appendix 3, available at [www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.120744/-/DC1](http://www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.120744/-/DC1).). We identified validation studies for scales used in 10 of the included trials, which generally reported good interobserver agreement (median weighted κ 0.64 [5 trials]; median intraclass correlation coefficient 0.82 [5 trials] (Appendix 3).

View this table:
[Table 2:](http://www.cmaj.ca/content/185/4/E201/T2)

Table 2: 
Characteristics of the outcome assessments in the trials included in our meta-analysis

In 10 trials (63%), the effect size point estimate was more optimistic as determined by the nonblinded assessors (Figure 2). Among all 16 trials, the difference in effect size ranged from −1.10 to 0.14. The pooled difference in effect size was −0.23 (95% confidence interval [CI] −0.40 to −0.06), with moderate heterogeneity (*I**2* = 46%, *p* = 0.02) (Figure 3). Thus, the estimated treatment effect based on the assessments of the nonblinded assessors was exaggerated by about one-quarter of the standard deviation of the measurement scale used.

![Figure 2:](http://www.cmaj.ca/https://www.cmaj.ca/content/cmaj/185/4/E201/F2.medium.gif)

[Figure 2:](http://www.cmaj.ca/content/185/4/E201/F2)

Figure 2: 
Estimated treatment effect as determined by blinded or nonblinded assessors of outcome. CI = confidence interval, SMD = standard mean difference, US FDA = US Food and Drug Administration.

![Figure 3:](http://www.cmaj.ca/https://www.cmaj.ca/content/cmaj/185/4/E201/F3.medium.gif)

[Figure 3:](http://www.cmaj.ca/content/185/4/E201/F3)

Figure 3: 
The effect of nonblinded assessors on estimated treatment effects in randomized clinical trials with subjective measurement scale outcomes. Weights were calculated using random effects analysis. CI = confidence interval, SMD = standard mean difference, US FDA = US Food and Drug Administration.

The pooled effect size based on the assessments of the blinded assessors was −0.34 (95% CI −0.55 to −0.14). Thus, the nonblinded assessors exaggerated the estimated effect size by about 68% (95% CI 14% to 230%) (i.e., −0.23/−0.34 = 0.68).

Our main result was robust, although CIs in our secondary analyses were wide (Table 3). One trial was free from any of the 5 prespecified possible confounders (effect size difference −0.22 [95% CI −0.61 to 0.16].15 The difference in effect size seemed not to be influenced by any of the suspected confounders (Table 3) or by trial size (data not shown).

View this table:
[Table 3:](http://www.cmaj.ca/content/185/4/E201/T3)

Table 3: 
Sensitivity and subgroup analyses

Eight trials (involving 980 patients) were included in our review but not in our main meta-analysis because of incomplete or inconsistent data. Qualitative information, or results from other similar trials, suggested notable observer bias in 3 of these trials and no or little bias in 2 trials (Appendix 3).

Using univariable metaregression, we found no statistically significant associations between differences in effect size and high scores for outcome subjectivity (*p* = 0.29), the degree to which the nonblinded assessors were involved in the trials (*p* = 0.64), or the vulnerability of the outcome to nonblinded patients (*p* = 0.80). However, the slope of the regression line between differences in effect sizes and scores for outcome subjectivity was in the expected direction (data not shown). The 13 trials with clearly subjective outcomes had a pooled effect size difference of −0.29 (−0.50 to −0.08) (data not shown). The 3 trials with moderately subjective outcomes had a pooled effect size difference of −0.04 (−0.32 to 0.25) (data not shown).

## Interpretation

Nonblinded assessors of subjective measurement scale outcomes in randomized clinical trials tended to generate substantially biased effect sizes. Standardized mean differences were exaggerated by a pooled standard deviation of 0.23 (95% CI 0.40 to 0.06) or, in relative terms, by 68% (95% CI 14% to 230%).

Observer bias can be perceived as the result of the interaction between observers’ predispositions and the subjectivity of the outcome. Predispositions are likely to differ substantially from observer to observer and from trial to trial. In some trials, conscientious nonblinded assessors may overcompensate for an expected bias in favour of the experimental intervention and paradoxically induce a bias favouring the control, whereas other trials will have fairly neutral assessors with no important bias. Thus, the degree of observer bias in trials with clearly predisposed outcome assessors is likely to be considerably higher than the mean we see here, which is based on all of the included trials. When determining the risk of bias attributable to nonblinded assessors in a randomized trial, we suggest being mindful of the range of observer bias we have found, and not only the pooled mean.

Based largely on convention, standardized mean differences of −0.2 are considered small effects, −0.5 are considered medium effects, and −0.8 are considered large effects.37 By such standards, our result constitutes a small to moderate difference. However, it seems inappropriate to interpret a degree of bias in the same way as we would interpret a treatment effect. The relevant problem is how much bias can be expected when using a nonblinded assessor, not whether that degree of bias represents a clinically worthwhile effect. In a situation with a large true treatment effect with a standardized mean difference of −0.8, the average degree of observer bias when using nonblinded observers, −0.23, would imply an exaggeration of the treatment effect estimate by 29%. This percentage increases to 115% if effects are small (i.e., if the standardized mean difference is −0.2). In the 16 trials we analyzed, the pooled estimated treatment effect was exaggerated by 68% (14% to 230%) when based on data from nonblinded assessors. Thus, we interpret our result as evidence for a substantial degree of observer bias.

In a Cochrane review of the effect of progressive resistance strength training, Liu and colleagues compared pooled standardized mean differences in a subgroup of 54 randomized trials using nonblinded assessors (−0.88 [95% CI −0.77 to −0.99]) with that of 19 trials using blinded assessors (−0.23 [95% CI −0.13 to −0.34]).4,38 The result of this indirect comparison is within the range of our findings. Meta-epidemiological studies of trials with binary outcomes have reported inconsistent estimates of the effect of a lack of double-blinding.5 However, our result is consistent with that of Savovic and colleagues,6 and with our previous study of observer bias in trials with binary outcomes.8

It may be tempting to use measures for interobserver agreement (e.g., weighted κ, intraclass correlation coefficients) as surrogate markers for risk of observer bias. Similarly, training non-blinded observers to reduce interobserver variation39 could be seen as an appealing alternative to blinding in a situation where blinding is challenging. However, good interobserver agreement does not prevent observer bias. For example, the trial with the largest degree of observer bias11 used a scale reported to have an intraclass correlation coefficient as high as 0.87.40

Some researchers consider the blinding of outcome assessors too resource-demanding, superfluous, or misconceived;41,42 however, planning and running a randomized clinical trial is already a logistically very challenging undertaking. The comparatively minor investment of using blinded outcome assessors reduces the risk of bias considerably. Blinding outcome assessors is possible in most trials.43,44

### Limitations

The trials we included in our analysis are contemporary and represent a variety of clinical specialties, and their design implies a low risk of confounding. However, these trials are not representative of medical trials in general. We included no trials with clearly objective measurement scale outcomes, such as nonrepeatable automatized laboratory measures. The included trials had subjective outcomes, and our results apply only to similar trials. Furthermore, extrapolating our results to all trials with subjective measurement scale outcomes assumes that trials with both blinded and nonblinded assessors are comparable with trials with only nonblinded assessors.

Our preplanned main analysis disregarded the correlation between blinded and nonblinded assessments, and its confidence interval may thus be somewhat inflated. However, the correlation was available for 7 trials, and secondary analyses incorporating the correlation between blinded and nonblinded assessments provided results similar to those of the main analysis.

Because searching for trials with both blinded and nonblinded assessors is challenging, some such studies may not have been identified by our literature search. However, it is unclear whether such trials would report substantially different results. Publication bias is normally driven by the effect of a treatment45 and may have a limited, yet unpredictable, effect on our comparison between types of assessments.

### Conclusion

We provide empirical evidence for observer bias in randomized clinical trials with subjective measurement scale outcomes. Failure to blind outcome assessors in such trials results in a high risk of substantial bias.

## Acknowledgements

The authors thank the following trial authors for sharing unpublished outcome data: Peggy Vandervoort, George C. Ebers, Daniel Burkhoff, Cheryl Iglesia, Borwin Bandelow and Dina S. Reddihough, and Frances S. Weaver and the US Department of Veterans Affairs (VA) Cooperative Study Program, as well as the VA CSP study #468 “A comparison of best medical therapy and deep brain stimulation of subthalamic nucleus and globus pallidus for the treatment of Parkinson’s disease.” The authors also thank Peter C. Gøtzsche and Andreas Lundh for valuable comments on previous versions of the manuscript.

## Footnotes

*   **Competing interests:** Frida Emanuelsson and Ann Sofia Skou Thomsen have received grants from the Danish Council of Independent Research. No other competing interests were declared.

*   This article has been peer reviewed.

*   **Contributors:** Asbjørn Hróbjartsson conceived the idea and design of the study, organized the study and wrote the first draft of the manuscript. Ann Thomsen and Asbjørn Hróbjartsson developed the search strategy. Ann Thomsen, Frida Emanuelsson, Britta Tendal, Stig Brorson and Asbjørn Hróbjartsson did the nonmasked data collection. Isabelle Boutron, Philippe Ravaud, Stig Brorson, Britta Tendal and Asbjørn Hróbjartsson did the masked data collection. Asbjørn Hróbjartsson and Jørgen Hilden did the statistical analyses. All of the authors revised the manuscript for important intellectual content and approved the final version submitted for publication.

*   **Funding:** The study was partially funded by the Danish Council for Independent Research: Medical Sciences. The funder had no influence on the study’s design, the collection, analysis, and interpretation of data, or the writing of the article and the decision to submit it for publication.

## References

1.  1.  Higgins JPT, 
    2.  Green S
    
    , editors. Cochrane handbook for systematic reviews of interventions. Oxford (UK): The Cochrane Collaboration; 2011.
    
    

2.  Haahr MT, Hróbjartsson A. Who is blind in randomised clinical trials? An analysis of 200 trials and a survey of authors. Clin Trials 2006;3:360–5.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1177/1740774506069153&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=17060210&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000242185400003&link_type=ISI) 

3.  Poolman RW, Struijs PA, Krips R, et al. Reporting of outcomes in orthopaedic randomized trials: Does blinding of outcome assessors matter? J Bone Joint Surg Am 2007;89:550–8.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.2106/JBJS.F.00683&link_type=DOI) 

4.  Liu CJ, LaValley M, Latham NK. Do unblinded assessors bias muscle strength outcomes in randomized controlled trials of progressive resistance strength training in older adults? Am J Phys Med Rehabil 2011;90:190–6.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1097/PHM.0b013e31820174b3&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=21173683&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

5.  Pildal J, Hróbjartsson A, Jørgensen KJ, et al. Impact of allocation concealment on conclusions drawn from meta-analyses of randomised trials. Int J Epidemiol 2007;36:847–57.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1093/ije/dym087&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=17517809&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000250050300028&link_type=ISI) 

6.  Savovic J, Jones HE, Altman DG, et al. Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med 2012;157: 429–38.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.7326/0003-4819-157-6-201209180-00537&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=22945832&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000308912800017&link_type=ISI) 

7.  Devereaux PJ, Manns BJ, Ghali WA, et al. Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials. JAMA 2001;285:2000–3.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1001/jama.285.15.2000&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=11308438&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000167995900030&link_type=ISI) 

8.  Hróbjartsson A, Thomsen AS, Emanuelsson F, et al. Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors. BMJ 2012;344:e1119.
    
    [Abstract/FREE Full Text](http://www.cmaj.ca/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNDQvZmViMjdfMi9lMTExOSI7czo0OiJhdG9tIjtzOjIxOiIvY21hai8xODUvNC9FMjAxLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

9.  DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986;7:177–88.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/0197-2456(86)90046-2&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=3802833&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=A1986F013900001&link_type=ISI) 

10. Fieller EC. Some problems in interval estimation. J R Stat Soc [Ser A] 1954;16:175–85.
    
    

11. Cohen SR, Holmes RE. Artecoll: a long-lasting injectable wrinkle filler material: report of a controlled, randomized, multicenter clinical trial of 251 subjects. Plast Reconstr Surg 2004;114: 964–76, discussion 977–9.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1097/01.PRS.0000133169.16467.5F&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=15468406&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

12. Cohen SR. FDA summary of safety and effectiveness data. Silver Spring (MD): US Food and Drug Administration; 2004. Available: [www.accessdata.fda.gov/cdrh_docs/pdf2/P020012b.pdf](http://www.accessdata.fda.gov/cdrh_docs/pdf2/P020012b.pdf) (accessed 2011 Aug. 5).
    
    

13. Oesterle SN, Sanborn TA, Ali N, et al. Percutaneous transmyocardial laser revascularisation for severe angina: the PACIFIC randomised trial. Potential class improvement from intramyocardial channels. Lancet 2000;356:1705–10.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/S0140-6736(00)03203-7&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=11095257&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000165462300008&link_type=ISI) 

14. Powell NB, Zonato AI, Weaver EM, et al. Radiofrequency treatment of turbinate hypertrophy in subjects using continuous positive airway pressure: a randomized, double-blind, placebo-controlled clinical pilot trial. Laryngoscope 2001;111: 1783–90.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1097/00005537-200110000-00023&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=11801946&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000171422900023&link_type=ISI) 

15. Burkhoff D, Schmidt S, Schulman SP, et al. Transmyocardial laser revascularisation compared with continued medical therapy for treatment of refractory angina pectoris: a prospective randomised trial. ATLANTIC Investigators. Angina treatments — lasers and normal therapies in comparison. Lancet 1999;354: 885–90.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/S0140-6736(99)08113-1&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=10489946&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000082511800009&link_type=ISI) 

16. Wedekind D, Broocks A, Weiss N, et al. A randomized, controlled trial of aerobic exercise in combination with paroxetine in the treatment of panic disorder. World J Biol Psychiatry 2010; 11:904–13.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.3109/15622975.2010.489620&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=20602575&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000282191400008&link_type=ISI) 

17. Weaver FM, Follett K, Stern M, et al.CSP468 Study Group. Bilateral deep brain stimulation vs best medical therapy for patients with advanced Parkinson disease: a randomized controlled trial. JAMA 2009;301:63–73.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1001/jama.2008.929&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=19126811&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000262220400021&link_type=ISI) 

18. Noseworthy JH, Vandervoort MK, Penman M, et al. Cyclophosphamide and plasma exchange in multiple sclerosis. Lancet 1991;337:1540–1.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/0140-6736(91)93226-Y&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=1675382&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=A1991FT11600023&link_type=ISI) 

19. Narins RS, Coleman W, Donofrio L, et al. Nonanimal sourced hyaluronic acid–based dermal filler using a cohesive polydensified matrix technology is superior to bovine collagen in the correction of moderate to severe nasolabial folds: results from a 6-month, randomized, blinded, controlled, multicenter study. Dermatol Surg 2010;36:730–40.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1111/j.1524-4725.2010.01553.x&link_type=DOI) 

20. Ulm G, Schüler P. Cabergolin versus pergolid: a video-blinded, randomised multicenter cross-over study. Akt Neurol 1999;26: 360–5.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1055/s-2007-1017651&link_type=DOI) 

21. Meltzer HY, Alphs L, Green AI, et al. Clozapine treatment for suicidality in schizophrenia: International Suicide Prevention Trial (InterSePT). Arch Gen Psychiatry 2003;60:82–91.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1001/archpsyc.60.1.82&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=12511175&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000180286800010&link_type=ISI) 

22. Meltzer HY. FDA statistical review and evaluation. Silver Spring (MD): US Food and Drug Administration; 2003. Available: [www.fda.gov/ohrms/dockets/ac/02/briefing/3908B1\_02_E-%20Statistical%20Review.pdf](http://www.fda.gov/ohrms/dockets/ac/02/briefing/3908B1_02_E-%20Statistical%20Review.pdf) (accessed 2013 Jan. 8).
    
    

23. Miller RS, Steward DL, Tami TA, et al. The clinical effects of hyaluronic acid ester nasal dressing (Merogel) on intranasal wound healing after functional endoscopic sinus surgery. Otolaryngol Head Neck Surg 2003;128:862–9.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/S0194-5998(03)00460-1&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=12825038&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000183755400014&link_type=ISI) 

24. Taber LH, Knight V, Gilbert BE, et al. Ribavirin aerosol treatment of bronchiolitis associated with respiratory syncytial virus infection in infants. Pediatrics 1983;72:613–8.
    
    [Abstract/FREE Full Text](http://www.cmaj.ca/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6InBlZGlhdHJpY3MiO3M6NToicmVzaWQiO3M6ODoiNzIvNS82MTMiO3M6NDoiYXRvbSI7czoyMToiL2NtYWovMTg1LzQvRTIwMS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

25. US Food and Drug Administration. FDA summary of safety and effectiveness data: Hylaform. Silver Spring (MD): The Administration; 2004. Available: [www.accessdata.fda.gov/cdrh_docs/pdf3/P030032b.pdf](http://www.accessdata.fda.gov/cdrh_docs/pdf3/P030032b.pdf) (accessed 2011 Aug. 5).
    
    

26. Landsman AS, Robbins AH, Angelini PF, et al. Treatment of mild, moderate, and severe onychomycosis using 870-and 930-nm light exposure. J Am Podiatr Med Assoc 2010;100:166–77.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.7547/1000166&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=20479446&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

27. Iglesia CB, Sokol AI, Sokol ER, et al. Vaginal mesh for prolapse: a randomized controlled trial. Obstet Gynecol 2010;116: 293–303.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1097/AOG.0b013e3181e7d7f8&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=20664388&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000280186300008&link_type=ISI) 

28. Reddihough DS, King JA, Coleman GJ, et al. Functional outcome of botulinum toxin A injections to the lower limbs in cerebral palsy. Dev Med Child Neurol 2002;44:820–7.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1017/S0012162201002997&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=12455858&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000179464800005&link_type=ISI) 

29. Baumann LS, Shamban AT, Lupo MP, et al.JUVEDERM vs. ZYPLAST Nasolabial Fold Study Group. Comparison of smooth-gelhyaluronic acid dermal fillers with cross-linked bovine collagen: a multicenter, double-masked, randomized, within-subject study. Dermatol Surg 2007;33(Suppl 2):S128–35.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1111/j.1524-4725.2007.33026.x&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=18086050&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

30. Purdue GF, Hunt JL, Still JM Jr., et al. A multicenter clinical trial of a biosynthetic skin replacement, Dermagraft-TC, compared with cryopreserved human cadaver skin for temporary coverage of excised burn wounds. J Burn Care Rehabil 1997;18:52–7.
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=9063788&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

31. Herberger K, Franzke N, Blome C, et al. Efficacy, tolerability and patient benefit of ultrasound-assisted wound treatment versus surgical debridement: a randomized clinical study. Dermatology 2011;222:244–9.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1159/000326116&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=21464563&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000292467400012&link_type=ISI) 

32. Alam M, Pon K, Van Laborde S, et al. Clinical effect of a single pulsed dye laser treatment of fresh surgical scars randomized controlled trial. Dermatol Surg 2006;32:21–5.
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=16393594&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

33. Ash K, Lord J, Zukowski M, et al. Comparison of topical therapy for striae alba (20% glycolic acid/0.05% tretinoin versus 20% glycolic acid/10% l-ascorbic acid). Dermatol Surg 1998; 24:849–56.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/S1076-0512(98)00050-8&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=9723049&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

34. Realmuto GM, Erickson WD, Yellin AM, et al. Clinical comparison of thiothixene and thioridazine in schizophrenic adolescents. Am J Psychiatry 1984;141:440–2.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1176/ajp.141.3.440&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=6367494&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=A1984SF49200026&link_type=ISI) 

35. Havel CJ Jr., Strait RT, Hennes H. A clinical trial of propofol vs. midazolam for procedural sedation in a pediatric emergency department. Acad Emerg Med 1999;6:989–97.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1111/j.1553-2712.1999.tb01180.x&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=10530656&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000083003800005&link_type=ISI) 

36. Kadish A, Nademanee K, Volosin K, et al. A randomized controlled trial evaluating the safety and efficacy of cardiac contractility modulation in advanced heart failure. Am Heart J 2011; 161:329–337.e1–2.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/j.ahj.2010.10.025&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=21315216&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000287188000018&link_type=ISI) 

37. Cohen J. Statistical power analysis for the behavioral sciences. New York (NY): Academic Press; 1977.
    
    

38. Liu CJ, Latham NK. Progressive resistance strength training for improving physical function in older adults. Cochrane Database Syst Rev 2009;(3):CD002759.
    
    

39. Brorson S, Bagger J, Sylvest A, et al. Improved interobserver variation after training of doctors in the Neer system. A randomised trial [published erratum in *J Bone Joint Surg Br* 2003; 85:153]. J Bone Joint Surg Br 2002;84:950–4.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1302/0301-620X.84B7.13010&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=12358384&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

40. Lemperle G, Holmes RE, Cohen SR, Lemperle SM. A classification of facial wrinkles. Plast Reconstr Surg 2001;108:1735–50; discussion 1751–2.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1097/00006534-200111000-00048&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=11711957&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

41. Dodd DC. Blind slide reading or the uninformed versus the informed pathologist. Comments Toxicol 1988;2:88–91.
    
    

42. Burkhardt JE, Ennulat D, Pandher K, et al. Topic of histopathology blinding in nonclinical safety biomarker qualification studies. Toxicol Pathol 2010;38:666–7.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1177/0192623310371221&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=20530250&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 
    
    [Web of Science](http://www.cmaj.ca/lookup/external-ref?access_num=000278809600015&link_type=ISI) 

43. Boutron I, Guittet L, Estellat C, et al. Reporting methods of blinding in randomized controlled trials assessing non-pharmacological treatments. A systematic review. PLoS Med 2007;4:e61.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1371/journal.pmed.0040061&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=17311468&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

44. Karanicolas PJ, Bhandari M, Walter SD, et al.Collaboration for Outcomes Assessment in Surgical Trials (COAST) Musculoskeletal Group. Radiographs of hip fractures were digitally altered to mask surgeons to the type of implant without compromising the reliability of quality ratings or making the rating process more difficult. J Clin Epidemiol 2009;62:214–223.e1.
    
    [CrossRef](http://www.cmaj.ca/lookup/external-ref?access_num=10.1016/j.jclinepi.2008.05.006&link_type=DOI) 
    
    [PubMed](http://www.cmaj.ca/lookup/external-ref?access_num=18778914&link_type=MED&atom=%2Fcmaj%2F185%2F4%2FE201.atom) 

45. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ 1997; 315:640–5.
    
    [Abstract/FREE Full Text](http://www.cmaj.ca/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEyOiIzMTUvNzEwOS82NDAiO3M6NDoiYXRvbSI7czoyMToiL2NtYWovMTg1LzQvRTIwMS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)