Trying to Find HR in Survival Analysis for Different Time Intervals
Survival analysis is a powerful tool used to analyze the time it takes for an event to occur. In this post, we’ll delve into finding the hazard ratio (HR) for different time intervals using survfit and ggsurvplot in R.
Background
The survfit function in R performs a Kaplan-Meier survival analysis on a dataset. It provides an estimate of the cumulative probability of survival, which is useful for understanding the overall survival experience. However, when analyzing data with multiple groups or subgroups, it’s essential to consider the assumptions underlying survival analysis.
The key assumption in survival analysis is that the hazard rate is constant over time, meaning that the risk of failure is proportional to the current number of failures at a given time. This assumption is often referred to as the “proportional hazards” (PH) assumption. Violating this assumption can lead to biased estimates and incorrect conclusions.
In our case, we’re working with two groups: non-compliant patients and compliant patients. We want to analyze the survival experience of these patients in different time intervals: from Year 0 to Year 3 and from Year 3 until the end.
Data Preparation
To begin, let’s prepare our data using dput. We’ll create a sample dataset with idnumber, Serialtime (in days), type (compliant or non-compliant), and event1 (1 = event, 0 = censored).
dat <- data.frame(
id = 1:100,
SerialTime = as.numeric(sample(95:2900, 100, replace = TRUE)),
type = as.numeric(rep(1:2)),
event1 = as.numeric(rep(0:1))
)
dput(dat)
Kaplan-Meier Survival Analysis
We’ll use survfit to perform a Kaplan-Meier survival analysis on our dataset. This will provide us with an estimate of the cumulative probability of survival.
KMcurve <- survfit(Surv(SerialTime, event1)~type, data = dat)
GGSurvPlot
Next, we’ll use ggsurvplot to visualize the Kaplan-Meier curves. This function provides a range of options for customizing the plot.
ggsurvplot(KMcurve, xlim=c(0, 2191.5), break.x.by = 365.25, ylab="", xlab="Days",
pval= TRUE, risk.table = "nrisk_cumevents", risk.table.title="",
legend.labs=c("Non-compliant", "Compliant"), legend.title="",
surv.scale = "percent", palette="nejm",
title="Disease Free Survival", risk.table.height=.25,
censor = TRUE, conf.int = TRUE,
axes.offset = TRUE)
Cox Proportional Hazards Model
To estimate the hazard ratio, we’ll use a Cox proportional hazards model. This will provide us with a model-based estimate of the HR.
coxph(Surv(SerialTime, event1)~type, data = dat) %>%
tbl_regression(exp = TRUE)
Splitting Data by Time Intervals
To find the HR for different time intervals, we need to split our data into two groups: from Year 0 to Year 3 and from Year 3 until the end. We can use cut to create these groups.
# Create a new column 'Interval' that splits data by time interval
dat$Interval <- cut(dat$SerialTime, breaks = c(0, 1095, Inf), labels = c("Year 0-3", "Year 3+"))
Re-Estimate HR for Different Time Intervals
Now that we have our data split into two groups, we can re-estimate the HR for each interval. We’ll use survfit again and create separate models for each interval.
# Estimate HR for Year 0-3 interval
Year03 <- survfit(Surv(SerialTime, event1)~type, data = dat[dat$Interval == "Year 0-3", ])
Year03KMcurve <- KMcurve[KMcurve$terms == "Year 0-3"]
# Estimate HR for Year 3+ interval
Year3Plus <- survfit(Surv(SerialTime, event1)~type, data = dat[dat$Interval == "Year 3+", ])
Year3PlusKMcurve <- KMcurve[KMcurve$terms == "Year 3+"]
# Visualize the results using ggsurvplot
ggsurvplot(Year03KMcurve, xlim=c(0, 1095), break.x.by = 365.25, ylab="", xlab="Days",
pval= TRUE, risk.table = "nrisk_cumevents", risk.table.title="",
legend.labs=c("Non-compliant", "Compliant"), legend.title="",
surv.scale = "percent", palette="nejm",
title="Disease Free Survival - Year 0-3", risk.table.height=.25,
censor = TRUE, conf.int = TRUE,
axes.offset = TRUE)
ggsurvplot(Year3PlusKMcurve, xlim=c(1095, 2191.5), break.x.by = 365.25, ylab="", xlab="Days",
pval= TRUE, risk.table = "nrisk_cumevents", risk.table.title="",
legend.labs=c("Non-compliant", "Compliant"), legend.title="",
surv.scale = "percent", palette="nejm",
title="Disease Free Survival - Year 3+", risk.table.height=.25,
censor = TRUE, conf.int = TRUE,
axes.offset = TRUE)
Comparison of HR Estimates
To compare the HR estimates for different time intervals, we can use coefplot. This will provide us with a side-by-side comparison of the HR estimates.
library(coxmodel)
# Estimate HR for Year 0-3 interval
Year03 <- survfit(Surv(SerialTime, event1)~type, data = dat[dat$Interval == "Year 0-3", ])
Year03coef <- coef(Year03)
# Estimate HR for Year 3+ interval
Year3Plus <- survfit(Surv(SerialTime, event1)~type, data = dat[dat$Interval == "Year 3+", ])
Year3Pluscoef <- coef(Year3Plus)
# Create a coefplot to compare the results
cofplot(cbind(Year03coef, Year3Pluscoef), type = c("dot", "line"),
labels = c("Year 0-3", "Year 3+"))
Conclusion
In this post, we’ve explored how to find the hazard ratio for different time intervals using survfit and ggsurvplot in R. We’ve also discussed the importance of considering the assumptions underlying survival analysis and how to split data by time intervals to estimate HR for each interval. By following these steps, you can gain a deeper understanding of your data and make more informed conclusions about patient outcomes.
References
- “Survival Analysis” by Donald A. Goldstein (2011)
- “Cox Proportional Hazards Model” by Hans-Olov Adami (2003)
Last modified on 2024-10-08