Abstract
Background
Empirical studies in psychiatric research and other fields often show substantially high refusal and drop-out rates. Non-participation and drop-out may introduce a bias whose magnitude depends on how strongly its determinants are related to the respective parameter of interest.
Methods
When most information is missing, the standard approach is to estimate each respondent’s probability of participating and assign each respondent a weight that is inversely proportional to this probability. This paper contains a review of the major ideas and principles regarding the computation of statistical weights and the analysis of weighted data.
Results
A short software review for weighted data is provided and the use of statistical weights is illustrated through data from the EDSP (Early Developmental Stages of Psychopathology) Study. The results show that disregarding different sampling and response probabilities can have a major impact on estimated odds ratios.
Conclusions
The benefit of using statistical weights in reducing sampling bias should be balanced against increased variances in the weighted parameter estimates.
Similar content being viewed by others
References
Allehoff WH, Esser G, Schmidt MH, Hennicke K (1983) Die Bedeutung der Informations- und Kooperationsverweigerung für die Interpretationsreichweite einer mehrstufigen kinderpsychiatrisch-epidemiologischen Untersuchung. Soc Psychiatry 18:29–36
Binder DA (1983) On the variances of asymptotically normal estimators from complex surveys. Int Stat Rev 51:279–292
Brogan DJ (1998) Pitfalls of using standard statistical software packages for sample survey data. In: Armitage P, Colton T (eds) Encyclopedia of Biostatistics, 4167–4174. New York: John Wiley and Sons
Carpenter J, Bithell J (2000) Bootstrap confidence intervals: when, which, what? A practical guide for statisticians. Stat Med 19:1141–1164
Cox DR, Wermuth N (1996) Multivariate dependencies, Chapman und Hall, London
Diggle PJ, Liang K-Y, Zeger SL (1994) Analysis of longitudinal data, Oxford University Press, Oxford
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, Chapman and Hall, London
Greenland S (1977) Response and follow-up bias in cohort studies. Am J Epidemiol 106:184–187
Greenland S (in press) Multiple-bias modeling for analysis of observational data. J Roy Stat Soc
Heeringa SG,Liu J (1998) Complex sample design effects and inference for mental health survey data. Int J Methods Psychiatr Res 7:56–65
Heyting A, Tolboom JTBM, Essers JGA (1992) Statistical handling of dropouts in longitudinal clinical trials. Stat Med 11:2043–2061
Insightful Corp. (2003) Documentation for S-PLUS 6.2, Seattle, WA: Insightful Corp
Jacobi F, Wittchen HU, Holting C, Sommer S, Lieb R, Höfler M, Pfister H (2002) Estimating the prevalence of mental and somatic disorders in the community: aims and methods of the German National Health—Interview and Examination Survey. Int J Methods Psychiatr Res 11:1–18
Kauermann G, Carroll RJ (2001) A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Ass 96:1387–1396
Kessler RC, McGonagle KA, Zhao S, Nelson CB, Hughes M, Eshleman S, Wittchen HU, Kendler KS (1994) Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: Results from the National Comorbidity Survey. Arch Gen Psychiatr 51:8–19
Kessler RC, Little RJA, Groves RM (1995) Advances in strategies for minimizing and adjusting for survey nonresponse. Epidem Rev 17:192–204
Kish L, Frankel MR (1970) Balanced repeated replications for standard errors. J Amer Stat Ass 65:1071–1095
Levy PS, Lemeshow S (1999) Sampling of populations—methods and application, John Wiley and Sons, New York
Lieb R, Isensee B, Von Sydow K, Wittchen HU (2000) The Early Developmental Stages of the Psychopathology Study (EDSP): A methodological update. Eur Add Res 6:170–182
Light RJ, Singer JD, Willett JB (1990) By design—planning research on higher education. Harvard University Press, Cambridge MA
Little RJA (1986) Survey nonresponse adjustments for estimates of means. Int Stat Rev 54:139–157
Little RJA, Lewitzky S, Heeringa S, Lepkowski J, Kessler RC (1997) Assessment of weighted methodology for the national comorbidity survey. Am J Epidem 146:439–449
McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd edition. Chapman and Hall, London
Miller ME, Ten Have TR, Reboussin BA, Lohmann KK, Rejeski WJ (2001) A marginal model for analysing discrete outcomes from longitudinal surveys with outcomes subject to multiple-cause non-response. J Am Stat Ass 95:844–857
Pigeot I (2001) The jackknife and bootstrap in biomedical research—Common principles and possible pitfalls. Drug Information Journal 35:1431–1443
Preisser JS, Galecki AT, Lohmann KK, Wagenknecht LE (2000) Analysis of smoking trends with incomplete longitudinal binary responses. J Am Stat Ass 95:1021–1031
Rao JNK, Shao J (1999) Modified balanced repeated replication for complex survey data. Biometrika 86:403–415
Rotnitzky A, Robins J (1997) Analysis of semi-parametric regression models with non-ignorable non-response. Stat in Med 16:81–102
Rosenbaum PR (2002) Observational Studies, 2nd edition Springer, New York
Rosenbaum PR,Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
Rubin DB (1996) Multiple imputation after 18+ years. J Amer Stat Ass 91:473–489
Rubin DB (2003) Discussion on multiple imputation. Int Stat Rev 71:619–625
Rubin DB, Schenker N (1991) Multiple imputation in health care databases: an overview and some applications. Stat Med 10:585–598
Royall RM (1986) Model robust confidence intervals using maximum likelihood estimators. Int Stat Rev 54:221–226
Schafer JL (1997) Analysis of incomplete multivariate data, Chapman and Hall, London
Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable dropout using semiparametric nonresponse models. J Am Stat Ass 94:1096–1120
Smith TMF (2001) Biometrika Centenary: sample surveys. Biometrika 88:167–194
SAS Institute Inc. (2003) SAS OnlineDoc® 9.1. Cary, NC: SAS Institute Inc
Shah BV, Barnwell BG, Bieler GS (2004). SUDAAN User’s manual: Release 9.0, NC: Research Triangle Institute, Research Triangle Parc
SPSS Inc. (2004) SPSS for Windows Version 13. Chicago, IL: SPSS Inc
StataCorp. Stata Statistical Software: Release 8.0 (2003) College Station, TX: Stata Corporation
Touloumi G, Pocock SJ, Babiker AG, Darbyshire JH (2002) Impact of missing data due to selective dropouts in cohort studies and clinical trials. Epidem 13:347–355
Troxel AB, Lipsitz SR, Brennan TA (1997) Weighted estimation equations with nonignorably missing response data. Biometrics 53:857–869
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50:1–25
Wittchen H, Höfler M, Gander F, Pfister H, Storz S, Üstün TB, Miller N, Kessler RC (1999) Screening for mental disorders: performance of the Composite International Diagnostic-Screener (CID-S). Int J Meth Psychiatr Res 8:59–70
Wittchen HU, Perkonigg A, Lachner G, Nelson CB (1998a) Early Developmental Stages of Psychopathology Study (EDSP): Objectives and design. Eur Add Res 4:18–27
Wittchen HU, Nelson CB, Ladner G (1988b) Prevalence of mental disorders and psychosocial impairments in adolescents and young adults. Psychol Med 28:109–126
Yung W, Rao JNK (2000) Jackknife variance estimation under imputation for estimators using poststratification information. J Am Stat Ass 903–915
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Höfler, M., Pfister, H., Lieb, R. et al. The use of weights to account for non-response and drop-out. Soc Psychiat Epidemiol 40, 291–299 (2005). https://doi.org/10.1007/s00127-005-0882-5
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00127-005-0882-5