| Title: | Time-Dependent ROC Curve Estimation for Correlated Right-Censored Survival Data |
|---|---|
| Description: | This contains functions that can be used to estimate a smoothed and a non-smoothed (empirical) time-dependent receiver operating characteristic curve and the corresponding area under the receiver operating characteristic curve for correlated right-censored time-to-event data. See Beyene and Chen (2024) <doi:10.1177/09622802231220496>. |
| Authors: | Kassu Mehari Beyene [aut, cre] (ORCID: <https://orcid.org/0000-0002-2067-6054>), Ding-Geng Chen [ctb] |
| Maintainer: | Kassu Mehari Beyene <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0.0 |
| Built: | 2026-05-13 06:58:43 UTC |
| Source: | https://github.com/cran/frailtyROC |
frailtyROC is a tool for estimating and visualizing time-dependent receiver operating characteristic (ROC) curves, as well as the corresponding time-dependent area under the curve (AUC), in the context of correlated right-censored time-to-event data. Confidence bands for the ROC curve and confidence intervals for the AUC can be constructed using bootstrap-derived standard errors.
frailtyROC is a comprehensive tool for estimating and visualizing time-dependent receiver operating characteristic (ROC) curves and their corresponding area under the curve (AUC) in the context of correlated right-censored time-to-event data. The ROC curves can be estimated either empirically (non-smoothed) or smoothed with or without boundary correction. For the latter case, the data-driven smoothing parameter selection methods introduced by Beyene and El Ghouch (2020) for smoothed ROC curves are implemented, offering an automatic approach to bandwidth selection during the smoothing process.
The package enables the estimation of time-dependent ROC curves at specific time points to evaluate the discriminatory ability of prognostic models or biomarkers. The time-dependent AUC serves as a summary measure of predictive accuracy over time. Confidence bands for the ROC curves and confidence intervals for AUC estimates are obtained via non-parametric bootstrap sampling based on the percentile method. This approach yields standard error estimates, which serve as the basis for bootstrap-based hypothesis testing of AUC at a given time point.
An essential component of the methodology involves estimating the survival function conditional on the observed data. This is accomplished through the use of shared frailty models specifically developed for correlated censored data. To this end, both semi-parametric and parametric frailty models are estimated using a penalized likelihood estimation framework that implemented in Rondeau et al. (2012). Users may specify either a gamma or log-normal frailty distribution.
In this package, the following abbreviations are commonly used:
Receiver Operating Characteristic curve.
Area under the ROC curve at a given time horizon t.
This package comes with a correlated right-censored marker data sets. For details see kidney and LungCancer
Ensure that your system has an active internet connection, then execute the following command in the R console to install the package:
install.packages("frailtyROC")
To load the package after installation, use the following command:
library(frailtyROC)
Kassu Mehari Beyene and Ding-Geng Chen
Maintainer: Kassu Mehari Beyene <[email protected]>
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
Rondeau, V., Marzroui, Y., & Gonzalez, J. R. (2012). frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. Journal of Statistical Software, 47, 1-28.
This function computes a time-dependent ROC curve for correlated right censored survival data using the cumulative sensitivity and dynamic specificity definitions. The ROC curves can be either empirical (non-smoothed) or smoothed with/wtihout boundary correction. It also calculates the time-dependent area under the ROC curve (AUC).
frailtyROC( Y, M, censor, group = NULL, w, t = 1e-04, U = NULL, bw = "NR", len = 151, method = "tra", method1 = "marg", ktype = "normal", knots = 10, kappa = 10000, RandDist = "Gamma", hazard = "Splines", maxit = 300, B = 0, alpha = 0.05, plot = "TRUE" )frailtyROC( Y, M, censor, group = NULL, w, t = 1e-04, U = NULL, bw = "NR", len = 151, method = "tra", method1 = "marg", ktype = "normal", knots = 10, kappa = 10000, RandDist = "Gamma", hazard = "Splines", maxit = 300, B = 0, alpha = 0.05, plot = "TRUE" )
Y |
a numeric vector of event-times or observed times. |
M |
a numeric vector of (bio)marker or risk score values. |
censor |
a vector of censoring indicator, |
group |
a categorical vector of group/cluster. |
w |
a scalar window for prediction. |
t |
a scalar time for prediction. The default value is |
U |
a vector of grid points where the ROC curve is estimated. The default is a sequence of |
bw |
a character string specifying the bandwidth estimation method for the ROC itself. The possible options are " |
len |
a scalar value specifying the length of vector |
method |
a character string specifying the method of ROC curve estimation. The possible options are " |
method1 |
a character string specifying prediction method applied on model. The possible options are " |
ktype |
a character string specifying the type kernel distribution to be used for smoothing the ROC curve: " |
knots |
a scalar for specifying the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (knots+2) splines functions for the approximation of the hazard or the survival functions. Rondeau, et al. (2012) suggested that the number of knots must be between 4 and 20. The default is |
kappa |
a positive smoothing parameter value for the penalized likelihood estimation. The defaults is " |
RandDist |
a character string to state the distribution of random effect: " |
hazard |
types of hazard functions: " |
maxit |
maximum number of iterations. The default is |
B |
a number of bootstrap samples to be used for variance estimation. The default is |
alpha |
the significance level. The default is |
plot |
a logical parameter to see the ROC curve plot. The default is |
This function takes correlated right-censored survival data and returns an empirical (non-smoothed) ROC estimate and the smoothed time-dependent ROC estimate with/without boundary correction and the corresponding time-dependent AUC estimates.
For the smoothing parameter estimation, three data-driven methods: the normal reference "NR", the plug-in "PI" and the cross-validation "CV" introduced in Beyene and El Ghouch (2020) were implemented. See Beyene and El Ghouch (2020) for details.
The conditional survival function estimation can done by using semi-parametric or fully parametric shared frailty models.
Returns the following items:
ROC |
vector of estimated ROC values. These will be numeric values between zero and one. |
AUC |
data frame of dimension |
U |
vector of grid points used. |
bw |
computed value of bandwidth parameter. For the empirical method this is always |
Dt |
vector of estimated event status. |
M |
vector of (bio)marker vlaues. |
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
Rondeau, V., Marzroui, Y., & Gonzalez, J. R. (2012). frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. Journal of Statistical Software, 47, 1-28.
library(frailtyROC) data(kidney) out1 <- frailtyROC(Y=kidney$time, M=kidney$Marker, censor=kidney$status, group = kidney$id, w=120, method = "emp", method1 = "cond", hazard = "Weibull") out1$AUClibrary(frailtyROC) data(kidney) out1 <- frailtyROC(Y=kidney$time, M=kidney$Marker, censor=kidney$status, group = kidney$id, w=120, method = "emp", method1 = "cond", hazard = "Weibull") out1$AUC
This dataset contains four columns: id, patient code; marker, the risk score; time, the observed follow-up time; and status, the event indicator for subjects in the kidney marker dataset.
data(kidney)data(kidney)
This is a data frame with 76 observations and the following 4 variables.
patient code
time to recurrence of infection
censoring indicator; 0=censored, 1=recurrence
risk score derived from observed data using frailty model
This dataset pertains to the recurrence of infection in patients with kidney disease who use portable dialysis equipment. Recurrent infection is the primary complication in these patients, typically occurring at the catheter insertion site. When an infection occurs, the catheter is removed and reinserted after successful treatment. In some cases, the catheter is removed for reasons unrelated to infection; these observations are treated as censored. A total of 38 patients were followed to assess the time to recurrence of infection. Each patient contributes exactly two observations.
The dataset includes the following covariates: sex (1 = Male, 2 = Female), age, and disease
(Disease type; a factor with four levels: "GN", "AN", "PKD", and "Other"). The marker is derived using a gamma frailty model as follows:
where is the estimated frailty term, and (for i=1,2,3) are
the estimated regression coefficients from the frailty model.
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
This dataset contains four columns: id, the identifier for health institutions (clusters); marker, the risk score; time, the observed follow-up time; and status, the event indicator for subjects in the NCCTG lung cancer marker dataset.
data(LungCancer)data(LungCancer)
This is a data frame with 238 observations and the following 4 variables.
health institutions code
time to death in days
censoring indicator; 1=censored, 2=dead
risk score derived from the observed data using frailty model
The NCCTG lung cancer dataset was collected from 228 patients across 18 different healthcare institutions. The number of subjects per institution ranged from 2 to 36. For the final analysis, only 226 patients with complete records were included. The dataset contains survival times along with several important predictor variables, including: sex (coded as Male = 1, Female = 2), age (in years), ph.ecog (Eastern Cooperative Oncology Group performance status, assessed by a physician on a scale from 0 [asymptomatic] to 5 [dead]), and pat.karno (Karnofsky performance status, assessed by the patient).
The marker (risk score) was derived from three predictor variables: sex, age, and ph.ecog. To this end, a frailty model with gamma-distributed frailty was fitted. As in the previous example, the prognostic marker is defined as:
where is the estimated frailty term, and (for i=1,2,3) are
the estimated regression coefficients from the frailty model.
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
This function computes a data-driven bandwidth for smoothing the ROC curve, supporting three methods: the normal reference method, the plug-in method, and the cross-validation method introduced in Beyene and El Ghouch (2020). It is particularly important for estimating the bandwidth in the presence of weighted data.
wbw(X, wt, bw = "NR", ktype = "normal")wbw(X, wt, bw = "NR", ktype = "normal")
X |
numeric data vector. |
wt |
non-negative weight vector. |
bw |
a character string specifying the bandwidth selection method. The possible options are " |
ktype |
a character string indicating the type of kernel function: " |
Returns the estimated value for the bandwidth parameter.
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
library(frailtyROC) X <- rnorm(100) # random data vector wt <- runif(100) # weight vector # Normal reference bandwidth selection wbw(X = X, wt = wt)$bwlibrary(frailtyROC) X <- rnorm(100) # random data vector wt <- runif(100) # weight vector # Normal reference bandwidth selection wbw(X = X, wt = wt)$bw