In the last decade, guidelines for the presentation of statistical results in medical journals have emphasized confidence intervals (CIs) as an adjunct to, or even a replacement for, statistical tests and p values. Because of the intimate links between the 2 concepts, authors now use statements like “the 95% CI overlaps 0” where they would formerly have stated “the difference is not statistically significant at the 5% level.” Although this interchangeability is technically correct in 1sample situations, it does not carry over fully to comparisons involving 2 samples. A frequently encountered misconception is that if 2 independent 95% CIs overlap each other, as they do in Fig. 1, then a statistical test of the difference will not be statistically significant at the 5% level.
Why is this not necessarily so? Consider the means in 2 independent groups, mean_{A} and mean_{B}, with for simplicity mean_{A} being the smaller of the 2. The 95% CI for the mean in group A is approximately given by mean_{A} plus or minus twice the standard error of the mean for that group, SE_{A}, and correspondingly for group B. A mathematical check for whether these CIs overlap is given by adding the distance 2SE_{A} (from mean_{A} to the upper bound of the CI) to 2SE_{B} and comparing this sum with the distance between the 2 means, that is, mean_{B} minus mean_{A }(Fig. 2). The CIs overlap when
But overlapping confidence intervals do not demonstrate that group means are not statistically significantly different from each other. In a 2sample ttest to compare 2 means, significance is attained at the 0.05 level if the t statistic exceeds the critical value of about 2, which occurs when the difference between the means exceeds twice its standard error, namely, if
This standard error reflects the fact that the standard error of a difference involves summing the standard error of each estimate, but doing so by “adding in quadrature,” for example,
Thus, to evaluate the overlap of 2 95% CIs and to determine whether at the same time the difference between the means is significant at the 0.05 level, the following rough rule can be used:
If SE_{A} and SE_{B} are equal, the condition is as follows:
When one SE is 25% larger than the other, the boundaries are 3.2 and 4.5 times the smaller SE. As the lower boundary remains close to 3, Moses1 was prompted to display group means with error bars that were 1.5 SE around the mean in order to have a “by eye” test of significance between the 2 group means while presenting the information in the 2 groups separately.
Footnotes

This article has been peer reviewed.
Contributors: Both authors independently conceived of the material for this article. Both were involved in writing the article, and both have seen and approved the final version.
Competing interests: None declared.
Reference
 1.