STATISTICS
Multiple comparisons

https://doi.org/10.1016/j.cacc.2006.03.005Get rights and content

Abstract

This article discusses the, sometimes vexed, issues relating to the consideration of multiple comparisons and endpoints that are commonplace in reports of biomedical research. Some pragmatic guidance is offered in attempting to minimize the problem and with respect to the procedures available.

Introduction

Multiple comparison testing is defined as the process of returning more than one hypothesis, difference or effect from the same set of data. If we go on testing long enough, we can expect to find something that is significant. Examples of multiple comparisons include multiple end points, multiple groups or subgroup analyses and repeated measurements taken over time. In this article, for simplicity, we will discuss these issues in relation to P-values only. However it is important to remember that similar considerations apply to the adjustment of confidence intervals.

Section snippets

The problem

If a trial involves three or more groups, obviously there may be a need to compare these groups. Methods, such as one-way analysis of variance (ANOVA), will examine the question: Is there any significant difference in the groups? This, however, does not address the issue of which groups are significantly different when compared pair-wise. If, say, you have four groups A, B, C and D, then the maximum number of possible comparisons is six:A→B, A→C, A→D, B→C, B→D, C→D.

If we have m number of

Adjusting the P-value

In general, when comparing several groups, one should only investigate differences among individual groups when the overall analysis is significant, unless certain comparisons were intended in advance.1 For example, if we have four groups, which gives us six (4×3/2) possible comparisons, then the probability that we will have at least one false positive significant result is 1−0.956=0.27. Table 1 shows that as the number of tests increases, so also does the risk of type I error. One approach to

Controlling type I error rate

The challenge for any MCP is to balance the conflicting requirements of reducing the risk of rejecting a true null hypothesis whilst maintaining the likelihood that an experimental effect is detected. In general, the emphasis on the issue of multiple comparisons is heavily biased towards controlling type I error rate. MCPs can be classified on the basis of control of the type I error rate into two categories—weak and strong.

  • Weak: These control the type I error rate only when all null hypotheses

One-step (simultaneous inference) procedures

These are often based on the Bonferroni method,2 which guarantees complete control of the error rate. The formula for executing it is Pb=mP, where m is the number of hypotheses tested, P is the value resulting from the statistical procedure used to test the hypothesis and Pb is the adjusted or Bonferroni-corrected value. A minor limitation of this method is that the corrected Pb value can exceed 1.0 thus returning implausible estimates. To overcome this, the Šidák2 inequality was described,

Step-wise (sequentially rejective) procedures

Here the P-values resulting from testing the hypotheses are placed in rank order and the corrections are progressively more or less severe, depending on whether the ranking is in descending (step-up) or ascending (step-down) order. Because of the simplicity of the Holm step-down procedures, they are regarded as all-purpose MCPs for biomedical investigators.2 Their advantage is the combination of simplicity, accuracy, power and versatility.

Discussion

The issue of MCPs is important in that we need to appreciate that type I errors may result in promoting ineffective treatments whilst type II errors can lead to effective treatments being discarded. The choice of controlling type I or type II error rates depends upon whether it is more costly to allow false positive or false negative results. It is often argued that it is better to punish the truth than to let falsehood gain respectability and therefore go for strict type I error control.

A sensible approach!

So, what should be our strategy regarding MCPs? Clearly we should not get lost in the maze of statistical significance (adjusted or not) but try to assess the overall quality of the research, the reporting of the methodology and analyses.3 There are two approaches to reduce the use and misuse of MCPs—design and minimization.

Design—Here we define contrasts or effects of interest where, in general, MCPs are simply not required:

  • (1)

    Define, preferably, a single primary outcome measure. Secondary

Conclusion

Clearly it is important to be aware of the issues of multiple comparison testing. As suggested, the scientific literature is perhaps too preoccupied by this issue and a more balanced approach would not be unreasonable. Many of the issues can usually be resolved by researchers presenting a more balanced attitude to the importance of their results; or perhaps, at least helped to this end by a more constructive peer review process! Attention to study design, minimization and clear descriptions of

References (3)

  • D.G. Altman

    Practical statistics for medical research

    (1991)
There are more references available in the full text version of this article.
View full text