P-values and confidence intervals: two sides of the same unsatisfactory coin

J Clin Epidemiol. 1998 Apr;51(4):355-60. doi: 10.1016/s0895-4356(97)00295-3.

Abstract

For both P-values and confidence intervals, an alpha level is chosen to set limits of acceptable probability for the role of chance in the observed distinctions. The level of alpha is used either for direct comparison with a single P-value, or for determining the extent of a confidence interval. "Statistical significance" is proclaimed if the calculations yield a P-value that is below alpha, or a 1-alpha confidence interval whose range excludes the null result of "no difference." Both the P-value and confidence-interval methods are essentially reciprocal, since they use the same principles of probabilistic calculation; and both can yield distorted or misleading results if the data do not adequately conform to the underlying mathematical requirements. The major scientific disadvantage of both methods is that their "significance" is merely an inference derived from principles of mathematical probability, not an evaluation of substantive importance for the "big" or "small" magnitude of the observed distinction. The latter evaluation has not received adequate attention during the emphasis on probabilistic decisions; and careful principles have not been developed either for the substantive reasoning or for setting appropriate boundaries for "big" or "small." After a century of "significance" inferred exclusively from probabilities, a basic scientific challenge is to develop methods for deciding what is substantively impressive or trivial.

Publication types

  • Comparative Study

MeSH terms

  • Confidence Intervals*
  • Humans
  • Probability*