Some issues in resolution of diagnostic tests using an imperfect gold standard

D M Hawkins; J A Garrett; B Stephenson

doi:10.1002/sim.819

Some issues in resolution of diagnostic tests using an imperfect gold standard

Stat Med. 2001 Jul 15;20(13):1987-2001. doi: 10.1002/sim.819.

Authors

D M Hawkins¹, J A Garrett, B Stephenson

Affiliation

¹ School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church Street SE, Minneapolis, MN 55455-0493, USA. doug@stat.umn.edu

PMID: 11427955
DOI: 10.1002/sim.819

Abstract

As a subject's true disease status is seldom known with certainty, it is necessary to compare the performance of new diagnostic tests with those of a currently accepted but imperfect 'gold standard'. Errors made by the gold standard mean that the sensitivity and specificity calculated for the new test are biased, and do not correctly estimate the new method's sensitivity and specificity. The traditional approach to this problem was 'discrepant resolution', in which the subjects for whom the two methods disagreed were subjected to a third 'resolver' test. Recent work has pointed out that this does not automatically solve the problem. A sounder approach goes beyond the discordant test results and tests at least some of the subjects with concordant results with the resolver also. This leaves some issues unresolved. One is the basic question of the direction of biases in various estimators. We point out that this question does not have a simple universal answer. Another issue, if one is to test a sample of the subjects with concordant results rather than all cases, is how to compute estimates and standard errors of the measures of test performance, notably sensitivity and specificity of the test method relative to the resolver. Expressions for these standard errors are given and illustrated with a numeric example. It is shown that using just a sample of subjects with concordant results may lead to great savings in assays. The design issue of how many concordant cells to test depends on the numbers of concordants and discordants. The formulae given show how to evaluate impact of different choices for these numbers and hence settle on a design that gives the required precision of estimates.

MeSH terms

Chlamydia Infections / diagnosis
Chlamydia trachomatis / isolation & purification
Diagnostic Tests, Routine / standards*
False Negative Reactions
False Positive Reactions
Humans
Numerical Analysis, Computer-Assisted
Reference Standards
Sensitivity and Specificity
Statistics as Topic / methods*