Interval-Censored Survival Analysis

Primary Author: JohnKornak

Description

Survival analysis methods play an important role in the statistical analysis of medical data. In particular the advent of the Cox proportional hazards model coupled with the increasing computational power of the standard desktop computer, has led to widespread (and virtually automated) implementation of the Cox model for a wide range of survival data problems.

One notable exception for analyzing survival data for which the Cox model cannot be implemented "off-the-shelf" occurs when data are interval-censored: the usual quick and dirty approach of implementing the Cox model based on mid-points of intervals defining the time-of-event leads to conservative results (and referees are objecting more frequently to the use of mid-points). With interval-censored data, not only are many observations right-censored as in conventional survival data (that is the event [death] has not necessarily occurred by the time the subject is lost to follow up), but also for events that have occurred we do not have precise information as to when the event occurred; we only know that the event occurred within the last two follow up times.

The objectives of this document are to a) describe the tools available to the statistical consultant for analyzing interval-censored data; and b) provide guidance to efficiently approach interval-censored analysis (considering both the stability of implementation, as well as the ease of interpretability when disseminating results to medical researchers).

There are three primary approaches to dealing with interval-censored data: a) parametric modeling (accelerated failure time); b) non-parametric maximum likelihood (NPLME) Kaplan-Meier-Turnbull interval-censored methods; and c) complementary log-log link based ordered logistic regression proportional hazards modeling (this last approach requires that the intervals are somewhat consistent across subjects: e.g., patients might be followed up at pre-defined times post treatment such as 1 month, 3 months and 6 months post-surgery.

We now discuss options available in the two major statistical packages for each of these interval-censored analysis methods and how to implement the analyses:

R

The symbol > is used to define the R prompt, so that text beyond that corresponds to commands given to R.

Non-parametric:

Parametric:

Ordered logistic regression:

SAS

Non-parametric:

Parametric:

Ordered logistic regression:

General Proposed Analysis Strategy For Interval-Censored Data

  1. Are the intervals defined by a small discrete set of times that are consistent for all subjects? If so, the logistic regression based approach should be used.
  2. Is there a primary group comparison to be performed? If so, implement group test based on Kaplan-Meier-Turnbull based estimates
  3. Perform parametric accelerated failure time interval-censored analysis with Weibull-modeled survival times mirroring the (unadjusted) primary group comparison performed in 2). Note that the Weibull-model is recommended here primarily for interpretation reasons. The Weibull model for survival times is the only parametric distribution available that corresponds to a proportional hazards model and therefore is the only one for which interpretation in terms of hazard ratios can be made. Interpretation via hazard ratios is desirable because clinicians tend to have acquired some understanding of their meaning. Of course, if the proportional hazard assumption is violated then either alternative survival distributions or time-varying covariate methods must be used.
  4. Generate Hazard Ratios, confidence intervals and p-values from Weibull-model fitted parameter estimates.
  5. Compare parametric model fitted group survival curves to corresponding Kaplan-Meier-Turnbull curve estimates.
  6. Check whether proportional hazards or parametric modeling assumptions are reasonable via diagnostic probability and log-log plots. If assumptions are violated consider alternative parametric survival distributions, modeling non-proportionality with time-varying covariates, or semi-parametric baseline hazard models (see methods developed for this article for a template for how to include time-varying covariates in models with interval-censoring, late entry, and clustering).
  7. Repeat parametric interval-censored analysis with any additional covariates and potential confounding variable included in the model. Compare estimates with unadjusted models.