A guide for multilevel modeling of dyadic data with binary outcomes using SAS PROC NLMIXED

https://doi.org/10.1016/j.csda.2005.08.008Get rights and content

Abstract

In the social and health sciences, data are often structured hierarchically, with individuals nested within groups. Dyads constitute a special case of hierarchically structured data with variation at both the individual and dyadic level. Analyses of data from dyads pose several challenges due to the interdependence between members within dyads and issues related to small group sizes. Multilevel analytic techniques have been developed and applied to dyadic data in an attempt to resolve these issues. In this article, we describe a set of analyses for modeling individual- and dyad-level influences on binary outcomes using SAS statistical software; and we discuss the benefits and limitations of such an approach. For illustrative purposes, we apply these techniques to estimate individual- and dyad-level predictors of viral hepatitis C infection among heterosexual couples in East Harlem, New York City.

Introduction

Many processes under study in the health sciences, such as treatment delivery, childcare, and disease transmission, involve interpersonal relationships and mutual influence involving two persons (e.g., physician–patient, parent–child, wife–husband). Conventional methods for inferential data analyses, including analysis of variance (ANOVA) and general linear regression, assume that observations obtained from each individual are independent. When such analyses are applied to data obtained from interacting dyads, the assumption of independent observations may be violated, leading to underestimation of standard errors and invalid inferences (i.e., increased Type I error).

To overcome the problem of nonindependence in the case of distinguishable dyad members, such as female–male couples, researchers often conduct separate analysis for each member class. For example, in a study of the effects of spousal support on health indicators, Heffner et al. (2004) performed separate analyses for females and males. Although this approach maintains independence of observations, it can obscure class interactions that may be of theoretical interest. In the example cited, although theory suggested that couple-level effects might moderate health status through interactions with spousal support and other variables, such effects could not be assessed in separate analyses. Several techniques have been developed to circumvent the problem of nonindependence while permitting estimation of dyad-level effects, including interactions (e.g., Kenny, 1996, Gonzalez and Griffin, 1999, Newsom, 2002).

Multilevel linear modeling (MLM) is one such technique developed and applied to dyadic data analysis (Barnett et al., 1993, Raudenbush et al., 1995, Windle and Dumenci, 1997, Kenny and Cook, 1999, Kashy and Kenny, 2000, Hoff, 2005). To make multilevel modeling techniques more accessible to data analysts, Campbell and Kashy (2002) have provided a practical guide for MLM analysis of dyadic data with continuous outcomes using two commercial software programs—SAS PROC MIXED and HLM. Here, we extend the work of these authors by providing a guide for nonlinear multilevel modeling of dyadic data with binary outcomes using NLMIXED and other procedures in SAS.

In Section 2, we briefly introduce multilevel modeling techniques and discuss limitations of this approach to analysis of dyadic data with binary outcomes. We also present model equations for both conditional and unconditional multilevel models and discuss methods for determining the appropriate use of the multilevel approach with dyadic binary response data. Section 3 lists the statistical assumptions of multilevel modeling with binary outcomes. In Section 4, we identify different types of dyadic data and discuss the implications of data properties and structure for multilevel modeling. An illustrative analysis involving risk factors for viral hepatitis C within heterosexual couples is introduced in Section 5. In Section 6, we provide practical guidelines for data preparation, using examples from the couples’ risk study. In Section 7, we describe SAS PROC NLMIXED and provide a step-by-step guide for performing multilevel modeling analysis and interpretation using data from the couples risk study as exemplar. We conclude with a discussion, including limitations of the approach, in Section 8.

Section snippets

Multilevel modeling approaches to dyadic analysis with binary outcomes

Multilevel linear modeling refers to a family of regression estimation techniques applied to data organized into hierarchically structured clusters, such as students (level-1) nested within classrooms (level-2) (Raudenbush and Bryk, 2002). Dyadic data represent a special case of hierarchically clustered data, with individuals nested within dyads. Multilevel analysis combines the effects of variables at different levels into a single model, while accounting for the interdependence among

Statistical assumptions

There is no single definitive set of assumptions that apply to all multilevel logistic models. The primary assumptions that are relevant to multilevel models involving binary outcomes using the logit link function (as shown in Eq. (10)) are that (a) the probability of success (yij=1) is identical for individuals within clusters, (b) observations between clusters are independent, whereas pairs of observations within clusters have a common correlation, (c) each random effect is independent and

Types of dyadic variables and data structure

In multilevel models, dyadic interdependence is accounted for by modeling variance and covariance within and across dyads. Variance constraints on particular types of dyadic variables are thus an important issue in multilevel analysis of dyadic data. Kashy and Kenny (2000) have identified three types of dyadic variables based on their locus of variance: between-dyads, within-dyads, and mixed. Between-dyad variables measure shared experiences or behavior and do not vary within the dyad, but do

Example: viral hepatitis C infection among heterosexual couples

To illustrate the application of multilevel modeling to dyadic data with binary outcomes, we employed data from a recent epidemiological study conducted to examine risk for hepatitis C and other viral infections among drug-using couples in East Harlem, New York City (Tortu et al., 2003, McMahon et al., 2003). Heterosexual couples reporting recent substance use were recruited from East Harlem and administered risk assessment surveys and screened for viral hepatitis C antibodies. Protocols for

Variable coding and centering

Between-dyad variables (such as frequency of intercourse reported by each couple) require that a single value be entered in the dataset for each dyad. However, in cases in which data are collected from both members of each dyad, responses can disagree (i.e., measurement error). One way to resolve this problem is to employ the response of one dyad member only (e.g., the female member of each couple); a second method is to use within-dyad means. The latter method was used in the current analysis,

SAS PROC NLMIXED

Previous versions of SAS software have provided a variety of procedures for constructing multilevel mixed models. The MIXED procedure was developed to handle linear multilevel random effects models with continuous outcomes. Two subsequent SAS macros—GLIMMIX and NLINMIX—were written to extend the capabilities of PROC MIXED to include nonlinear mixed models (Littell et al., 1996, Wolfinger, 1997). Although these macros provide several different estimation options, most are iteratively fit to a

Discussion and conclusions

The SAS procedures outlined in this paper provide a practical guide for evaluating multilevel mixed models with binary outcomes using data from distinguishable dyads. One of the strengths of the SAS NLMIXED procedure is the flexibility it allows for specifying a variety of models, which may include any combination of actor, partner, and dyad-level effects, within-level and cross-level interaction terms, and random components. Interaction terms may be added to NLMIXED models directly using the

Acknowledgements

Support for this work was provided by grants from the National Institutes of Health (NIH), National Institute on Drug Abuse (NIDA) to Dr. James McMahon (R01 DA15641) and Dr. Stephanie Tortu (R01 DA12805). The authors thank Dr. Janet Rice, Department of Biostatistics, Tulane University, for her helpful remarks on the manuscript. Dr. Peter Flom and the NDRI Statistics Support Group provided insightful comments on an earlier version of the paper. Jeanine Botta provided technical assistance with

References (47)

  • A. Agresti et al.

    Random effects modeling of categorical response data

    Sociological Methodology

    (2000)
  • R.C. Barnett et al.

    Gender and the relationship between job experiences and psychological distress: a study of dual-earner couples

    J. Personality Social Psychology

    (1993)
  • N. Breslow et al.

    Approximate inference in generalized linear mixed models

    J. Amer. Statist. Assoc.

    (1993)
  • L. Campbell et al.

    Estimating actor partner and interaction effects for dyadic data using PROC MIXED and HLM: a user-friendly guide

    Personal Relationships

    (2002)
  • J.B. Carlin et al.

    A case study on the choice, interpretation and checking of multilevel models for longitudinal binary outcomes

    Biostatistics

    (2001)
  • Centers for Disease Control and Prevention, 2004. Hepatitis Surveillance Report No. 59. Atlanta, GA: U.S. Department of...
  • V. Cherkassky et al.

    Comparison of model selection for regression

    Neural Computation

    (2003)
  • A. Donner et al.

    The estimation of intraclass correlation in the analysis of family data

    Biometrics

    (1980)
  • A. Donner et al.

    Interval estimation for a difference between intraclass kappa statistics

    Biometrics

    (2002)
  • J.L. Fleiss et al.

    The reliability of dichotomous judgments: unequal numbers of judges per subject

    Appl. Psychological Measurement

    (1979)
  • R. Gonzalez et al.

    The correlational analysis of dyad-level data in the distinguishable case

    Personal Relationships

    (1999)
  • K.L. Heffner et al.

    Spousal support satisfaction as a modifier of physiological responses to marital conflict in younger and older couples

    J. Behavioral Medicine

    (2004)
  • P.D. Hoff

    Bilinear mixed-effects models for dyadic data

    J. Amer. Statist. Assoc.

    (2005)
  • J.J. Hox

    Multilevel Modeling: When and Why? Classification, Data Analysis, and Data Highways

    (1998)
  • Hox, J.J., Maas, C.J.M., 2002. Sample sizes for multilevel modeling. Social science methodology in the new millennium....
  • D.A. Kashy et al.

    The analysis of data from dyads and groups. Handbook of Research Methods in Social and Personality Psychology

    (2000)
  • D.A. Kenny

    Models of non-independence in dyadic research

    J. Social Personal Relationships

    (1996)
  • D.A. Kenny et al.

    Partner effects in relationship research: conceptual issues, analytic difficulties, and illustrations

    Personal Relationships

    (1999)
  • D.A. Kenny et al.

    Analyzing interdependence in dyads. Studying Interpersonal Interaction

    (1991)
  • D.A. Kenny et al.

    Data analysis in social psychology. The Handbook of Social Psychology

    (1998)
  • D.A. Kenny et al.

    The statistical analysis of data from small groups

    J. Personality Social Psychology

    (2002)
  • I.G.G. Kreft et al.

    The effect of different forms of centering in hierarchical linear modeling

    Multivariate Behavioral Research

    (1995)
  • Kuss, O., 2002. How to use SAS for logistic regression with correlated data. SAS Users Group—27th Annual SAS Users...
  • Cited by (65)

    • Contingency management and cognitive behavior therapy for smoking cessation among veterans with posttraumatic stress disorder: Design and methodology of a randomized clinical trial

      2022, Contemporary Clinical Trials
      Citation Excerpt :

      Although we do not anticipate outcomes among members of yoked pairs to be correlated, we will examine correlations using mixed modeling for dyadic analyses [8]. If the intraclass correlation is significant at alpha <0.15 (given the sample size), we will continue to use mixed modeling to account for within-pair correlations [37]. To test our hypothesis that treatment with CM will be associated with increased smoking abstinence, we will specify one model of the relationship of CPT-SMART to bioverified smoking abstinence across four timepoints: the first month of treatment, post-treatment, 4-month, and 6-month follow-ups.

    • Lexical entrainment without conceptual pacts? Revisiting the matching task

      2020, Journal of Memory and Language
      Citation Excerpt :

      However, models including the maximal random effects structure often fail to converge, as some of the random effects included do not contribute significantly to the models. When this happened, we identified the random effects causing convergence issues (this is performed automatically in SAS; see McMahon, Pouget, & Tortu, 2006). We then removed these effects and ran the analysis again (removing random effects which do not significantly contribute to the model does not affect the outcome of the models; it only affects how the degrees of freedom are calculated; Kiernan, Tao, & Gibbs, 2012).

    • I remember emotional content better, but I'm struggling to remember who said it!

      2018, Cognition
      Citation Excerpt :

      In this case, these models were used to calculate odd ratios, which quantify the probability of an event (e.g., correctly recognizing a target noun) occurring relative to another event (e.g., failing to recognize a target noun) (Agresti, 2002). As for mixed models, they include random intercepts, which account for potential variability across dyads, participants and items (i.e., nouns), and random slopes, which account for the fact that dyads, participants and items may differ in their sensitivity to within-unit IVs (by-dyad random effects were included in this study because the participants completed the collaborative production phase in pairs; see McMahon, Pouget, & Tortu, 2006). The maximal random structure justified by the experimental design (i.e., all random intercepts and all random slopes corresponding to within-unit IVs) was initially implemented, in line with Barr, Levy, Scheepers, and Tily’s (2013) recommendations.

    • Post-migration employment changes and health: A dyadic spousal analysis

      2017, Social Science and Medicine
      Citation Excerpt :

      Because there was no identical variable for spousal respondents, this was a couple-level variable in our analyses. To assess the influences of one's own (“actor effect”) and one's partner's (“partner effect”) employment trajectory on the health outcomes, while accounting for spousal interdependence, we implemented the Actor Partner Interdependence Model (APIM), a general model for measuring bidirectional effects in dyadic relationships (Cook and Kenny, 2005; McMahon et al., 2006). We used multilevel modeling procedures, nesting individuals within dyads.

    View all citing articles on Scopus
    View full text