Original link:tecdat.cn/?p=2858

Original source:Tuo End number according to the tribe public number

The target

The purpose of this paper is to provide a brief and comprehensive assessment of how to perform survival analysis in R. The literature on the subject is extensive and covers only a limited number of (common) issues. The number of R packages available reflects the scope of research on the topic.

R package

Various R packages can be used to solve specific problems. Here are the packages used to read, manage, analyze and display the data. Run the following lines to install and load the required packages.

if (! require(pacman)) install.packages("pacman") pacman::p_load(tidyverse, survival )Copy the code

六四运动

data

The evaluation will be based on an ORCA dataset containing a subset of 338 patients diagnosed with oral squamous cell carcinoma (OSCC) in the northernmost provinces of Finland between 1 January 1985 and 31 December 2005. Follow-up began on the date of cancer diagnosis and ended on December 31, 2008, when the patient died and the migration or follow-up deadline ended. The causes of death were divided into two categories :(1) OSCC death; (2) Death due to other causes. The dataset contains the following variables: ID = serial number, sex= gender, category 1 = “female”, 2 = “male”, Age = age (year) at the date of diagnosis of cancer, stage= TNM stage of tumor (factor) : 1 = “I”… , 4 = “IV”, 5 = “UNKN” time= follow-up time from diagnosis to death or review (measured in years), Event = events ending follow-up (factor) : 1 = normal, 2 = death from oral cancer, 3 = death from other causes.

Load data from the URL into R.

head(orca)
Copy the code

Id sex age stage time Event 1 1 Male 65.42274 UNKN 5.081 Alive 22 Female 83.08783 III 0.419 Oral ca. Death 3 3 Male 52.59008 II 7.915 Other death 4 4 Male 77.08630 I 2.480 Other death 5 5 Male 80.33622 IV 2.500 Oral ca. Death 6 6 Female 82.58132 IV 0.167 Other DeathCopy the code

 
Copy the code

summary(orca)
Copy the code

Id sex age Stage time event Min. : 1.00 Female:152 Min. :15.15 I :50 Min. : 0.085 Alive :109 1st Qu.: 85.25 Male :186 1st Qu.:53.24 II :77 1st Qu.: 1.333 Oral ca. Death :122 Median :169.50 Median :64.86 III :72 Median: 3.869 Other Death :107 Mean :169.50 Mean :63.51 IV :68 Mean: 5.662 3rd Qu.:253.75 3rd Qu.:74.29 UNKN :71 3rd Qu.: 8.417 Max. :338.00 Max. :92.24 Max. :23.258Copy the code

Survival data analysis

Survival analysis focuses on the timing of event data. In our case, it’s the time of death after diagnosis.

To define the failure-time random variable, we need: 1. Time origin (diagnostic OSCC), 2. Time scale (years after diagnosis, age), 3. Definition of an event. We will first consider the overall mortality rate.

Figure 1: Block diagram of the transformation.

 
Copy the code

         Alive Oral ca. death    Other death 
           109            122            107 
Copy the code


FALSE  TRUE 
  109   229 
Copy the code

Graphically displaying the observed duration of follow-up is very helpful in the analysis of survival data.

OSCC deaths are more likely to occur early after diagnosis than deaths from other causes. What about the type?

'Surv' num [1:338, 1:2] 5.081+ 0.419 7.915 2.480 2.500 0.167 5.925+ 1.503 13.333 7.666+... - attr(*, "dimnames")=List of 2 .. $ : NULL .. $ : chr [1:2] "time" "status" - attr(*, "type")= chr "right"Copy the code

The created survival objects are then used as dependent variables in other specific functions for survival analysis.

Estimated survival function

Nonparametric estimation

We will first introduce a class of nonparametric estimators.

Kaplan Meier –

The survival curve is based on the number of risks and events at each time of death. Package survFit () creates (estimates) survival curves.

Call: survfit(formula = Surv(time, all) ~ 1, Data = ORca) n events * Rmean * SE (Rmean) median 0.95LCL 0.95UCL 338.000 229.000 8.060 0.465 5.418 4.331 6.916 * Restricted mean with upper limit = 23.3Copy the code

The function returns a summary of the estimated survival curve.

Time n.risk n.vent n.ensor surv STD. Err Upper Lower 1 0.085 338 2 0 0.9940828 0.004196498 1.0000000 0.9859401 2 0.162 336 2 0 0.9881657 0.005952486 0.9997618 0.9767041 3 0.167 334 4 0 0.9763314 0.008468952 0.9926726 0.9602592 4 0.170 330 2 0 0.9704142 0.009497400 0.9886472 0.9525175 5 0.246 328 1 0 0.9674556 0.009976176 0.9865584 0.9487228 6 0.249 327 1 0 0.9644970 0.010435745 0.9844277 0.9449699Copy the code

Survminer of GGSURvPlot () provides an informative description of the estimated survival curve.

The default KM diagram shows the survival function.

   
Copy the code

Survival curve estimation

Survival curves are common among actuaries and demographers. It is particularly useful for grouping data.

To show this approach in a real example, we first need to create the aggregate data, which is then grouped and the risk calculated in each layer.

Based on grouped data, we expect to use a survival curve.

 
Copy the code

Nsubs nlost nrisk nevent surv PDF hazard se.surv se.pdf se.hazard 0-1 338 0 338.0 64 1.0000 0.1893 0.2092 0.0000 0.0213 0.0260 1-2 274 4 272.0 41 0.8107 0.1222 0.1630 0.0213 0.0179 0.0254 2-3 229 9 224.5 21 0.6885 0.0644 0.0981 0.0252 0.0136 0.0214 3-4 199 12 193.0 20 0.6241 0.0647 0.1093 0.0265 0.0140 0.0244 4-5 167 9 162.5 13 0.5594 0.0448 0.0833 0.0274 0.0121 0.0231 5-6 145 14 138.0 13 0.5146 0.0485 0.0989 0.0279 0.0131 0.0274 6-7 118 5 115.5 8 0.4662 0.0323 0.0717 0.0283 0.0112 0.0254 7-8 105 8 101.0 9 0.4339 0.0387 0.0933 0.0286 0.0126 0.0311 8-9 88 7 84.5 1 0.3952 0.0047 0.0119 0.0288 0.0047 0.0119 9-10 80 4 78.0 8 0.3905 0.0401 0.1081 0.0288 0.0137 0.0382 10-11 68 4 66.0 5 0.3505 0.0266 0.0787 0.0291 0.0116 0.0352Copy the code

Nelson – Aalen estimates

Graphical comparison

Different estimates of the survival function can be plotted to assess potential differences.

 
Copy the code

Measures such as the central tendency of quantiles can be derived from the estimated survival curve.

 
Copy the code

Quantile km.lower km.upper fh.quantile fh.lower fH. upper 25 0.25 1.333 1.084 1.834 1.333 1.084 1.747 50 0.50 5.418 4.331 6.916 5.418 4.244 6.913 75 0.75 13.673 11.748 16.580 13.673 11.748 15.833Copy the code

Half are estimated to live for more than 5.4 years. The first quarter died within 1.3 years, while the first three quarters lived to more than 1.3 years. The first three thirds died within 13.7 years, while the first quarter died longer than 13.7 years.

Graphical representation of estimators (based on survival curves using KM)

 
Copy the code

Parameter estimation

We will consider three common options: exponential, Weibull, and log-Logistic models.

flexsurvreg(formula = su_obj ~ 1, data = orca, dist = "exponential") Estimates: Est L95% U95% se rate 0.11967 0.10513 0.13621 0.00791 N = 338, Events: 229, uncensored: 109 Total time at risk: 1913.673 log-likelihood = -715.1802, DF = 1 AIC = 1432.36Copy the code

 
Copy the code

Similarly, nonparametric estimates can be used to graphically compare different methods

Comparison of survival curves

For example, tumor stage is an important prognostic factor in cancer survival studies. We can estimate and plot survival curves for different groups (stages) of different colors.

 
Copy the code

Level 1 I 25 336.776 25 336.776 0.07423332 0.04513439 0.1033322 0.95 2 II 51 stage D Y x pt rate lower upper conf.level 1 I 25 336.776 25 336.776 0.07423332 0.04513439 0.1033322 0.95 2 II 51 556.700 51 556.700 0.09161128 0.06646858 0.1167540 0.95 3 III 51 464.836 51 464.836 0.10971611 0.07960454 0.1398277 0.95 4 IV 57 262.552 57 262.552 0.21709985 0.16073995 0.2734597 0.95 5 UNKN 45 292.809 45 292.809 0.15368380 0.10878136 0.1985862 0.95Copy the code

In general, patients diagnosed with lower-stage tumors have a lower mortality rate than patients with higher-stage tumors. You can use the survFit () function to perform an overall comparison of survival functions.

 
Copy the code

Call: survfit(formula = su_obj ~ stage, Data = ORca) N events 0.95LCL 0.95UCL stage=I 50 25 10.56 6.17 NA stage=II 77 51 7.92 4.92 13.34 stage=III 72 51 7.41 3.92 9.90 stage=IV 68 57 2.00 1.08 4.82 stage=unkn 71 45 3.67 2.83 8.17Copy the code

Since the incidence of low tumor stage is lower, the median survival time for tumor stage increase is also reduced. The same behavior was observed, and KM survival curves were plotted for different tumor stages.

You can also build the entire survival table for each stage level. Here are the first three rows of the survival table for each tumor stage.

# Groups:   strata [5]
Copy the code

time n.risk n.event n.censor surv std.err upper lower strata <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> 1 50 1 0 0.98 0.0202 0.942 0.17 I 49 1 0 2, 0.498 0.96 0.0289 0.907 0.665 I 3 48 1 0 0.94 0.0357 0.876 I 4 0.419 77 1 0 0.987 0.0131 1 0.962 II 5 0.498 76 1 0 0.974 0.0186 1 0.939 II 6 0.665 75 1 0 0.961 0.0229 1 0.919 II 7 0.167 72 1 0 0.986 0.0140 1 0.959 III 8 0.249 71 10 0.972 0.0199 1 0.935 III 9 0.413 70 10 0.958 0.0246 1 0.913 III 10 0.085 68 2 0 0.971 0.0211 1 0.931 IV 11 0.162 66 1 0 0.956 0.0261 1 0.908 IV 12 0.167 65 1 0 0.941 0.0303 0.999 0.887 IV 13 0.162 71 1 0 0.986 0.0142 1 unkn 14 70 2 0 0.958 0.0249 0.167 0.959 0.912 unkn 15 0.17 68 1 0 0.944 0.0290 0.999 0.892 unknCopy the code

 arrange_ggsurvplots(glist, print = TRUE, ncol = 2, nrow = 1)
Copy the code

Mantel – Haenszel logrank test

The default parameter rho = 0 implements log-rank or Mantel-Haenszel tests.

Call:
Copy the code

survdiff(formula = su_obj ~ stage, Data = Orca) N Observed Expected (O-E)^2/E (O-E)^2/V stage=I 50 25 39.9 5.573 6.813 stage=II 77 51 63.9 2.606 3.662 Stage =III 72 51 54.1 0.174 0.231 stage=IV 68 57 33.2 16.966 20.103 stage=unkn 71 45 37.9 1.346 1.642 Chisq= 27.2 on 4 degrees of freedom, p= 2e-05Copy the code

Peto&Peto Gehan – Wilcoxon test

 
Copy the code

survdiff(formula = su_obj ~ stage, data = orca, Rho = 1) N Observed Expected (O-E)^2/E (O-E)^2/V stage=I 50 14.5 25.2 4.500 7.653 stage=II 77 29.3 39.3 2.549 4.954 Stage =III 72 30.7 33.8 0.284 0.521 stage=IV 68 40.3 22.7 13.738 21.887 stage= unKN 71 32.0 25.9 1.438 2.359 Chisq= 30.9 on 4 degrees of freedom, p= 3e-06Copy the code

Different tests use different weights to compare survival functions. In practical examples, they give comparable results, showing that survival functions are different at different tumor stages.

Modeling survival data

Nonparametric tests are particularly feasible when comparing survival functions at factor levels. They are very powerful, efficient, and often simple/intuitive. However, as the number of factors of interest increases, nonparametric tests become difficult to conduct and interpret. In contrast, regression models are more flexible for exploring the relationship between survival and predictors.

We will introduce two different broad models: semi-parametric (that is, proportional risk) and parametric models.

CoxPH model

In our example, we will consider modeling time of death as a function of gender, age, and tumor stage. The Cox proportional risk model survival can be established by using the CoxPH () function.

 summary(m1)
Copy the code

Call: coxph(formula = su_obj ~ sex + I((age - 65)/10) + stage, data = orca) n= 338, The number of events = 229 coef exp (coef) se (coef) z Pr (> | z |) sexMale 0.35139 1.42104 0.14139 2.485 0.012947 I ((age - 65) / 10) 0.41603 1.51593 0.05641 7.375 1.65E-13 stageII 0.03492 1.03554 0.24667 0.142 0.887421 stageIII 0.34545 1.41262 0.24568 1.406 0.159708 stageIV 0.88542 2.42399 0.24273 3.648 0.000265 stageunKN 0.58441 1.79393 0.25125 2.326 0.020016 exp(coef) Exp (-coef) lower.95 upper.95 sexMale 1.421 0.7037 1.0771 1.875 I((age-65)/10) 1.516 0.6597 1.3573 1.693 stageII 1.036 0.9657 0.6386 1.679 stageIII 1.413 0.7079 0.8728 2.286 stageIV 2.424 0.4125 1.5063 3.901 stageunKN 1.794 0.5574 2.935 Concordance= 0.674 (se = 0.02) Rsquare= 0.226 (Max possible= 0.999) Likelihood ratio test= 86.76 on 6 df, P =< 2E-16 Wald test = 80.5 on 6 DF, P = 3E-15 Score (logrank) test = 82.86 on 6 DF, P = 9E-16Copy the code

We can check that the data are consistent with the proportional risk assumptions for each variable and globally, respectively.

                      rho    chisq     p
Copy the code

SexMale -0.00137 0.000439 0.983 I((age-65)/10) 0.07539 1.393597 0.238 stageII -0.04208 0.411652 0.521 stageIII -0.06915 1.083755 0.298 stageIV -0.10044 2.301780 0.129 stagEUNKN-0.09663 2.082042 0.149 GLOBAL NA 4.895492 0.557Copy the code

Apparently no evidence of violation of the proportionality hypothesis was found.

Cox model results showed significant effects of gender, age and stage. In particular, for every additional 10 years, the death rate increases by 50 per cent. The HR for all-cause mortality was 1.42 compared with men and women. In addition, no differences were detected in the estimates between phases I and II. Therefore, it is prudent to exclude these topics from the data and combine the first two phases into one.

 
round(ci.exp(m2), 4)
Copy the code

Exp (Est.) 2.5% 97.5% sexMale 1.3284 0.9763 1.8074 I(age-65)/10) 1.4624 1.2947 1.6519 st3III 1.3620 0.9521 1.9482 St3IV 2.3828 1.6789 3.3818Copy the code

A convenient way to display and graphically compare the results of a multivariate Cox model is through a forest map.

 
Copy the code

Let’s plot the predicted survival curve step by step, determining values for sex and age based on the fitted model

 
newd
Copy the code

sex age st3 id 1 Male 40 I+II 1 2 Female 40 I+II 2 3 Male 80 I+II 3 4 Female 80 I+II 4 5 Male 40 III 5 6 Female 40 III 6  7 Male 80 III 7 8 Female 80 III 8 9 Male 40 IV 9 10 Female 40 IV 10 11 Male 80 IV 11 12 Female 80 IV 12Copy the code

 
Copy the code

AFT model

Parametric models assume distributions of lifetime.

Call: flexsurvreg(formula = Surv(time, all) ~ sex + I((age - 65)/10) + st3, data = orca2, dist = "weibull") Estimates: Data mean EST l95%U95%SE exp(EST) L95%Shape NA 0.93268 0.82957 1.04861 0.05575 NA NA scale NA 13.53151 9.97582 18.35456 2.10472 NA NA sexMale 0.53184-0.33905-0.66858-0.00951 0.16813 0.71245 0.51243 I((age-65)/10) -0.15979 -0.41836-0.54898-0.28773 0.06665 0.65813 0.57754 st3III 0.26966-0.32567-0.70973 0.05839 0.19595 0.72204 0.49178 St3IV 0.25468-0.95656-1.33281-0.58030 0.19197 0.38421 0.26374 U95% Shape NA scale NA sexMale 0.99053 I(age-65)/10 0.74996 st3III st3IV 1.06012 0.55973 N = 267, Events: 184, Censored: 83 Total time at risk: 1620.864 log-likelihood = -545.858, DF = 6 AIC = 1103.716Copy the code

It can be proved that AFT models assuming exponential or Weibull distributions can be reparameterized into proportional risk models. Eha.

 
Copy the code

Call: weibreg(formula = Surv(time, all) ~ sex + I((age - 65)/10) + st3, Data = OrCA2) Coef Exp(Coef) se(Coef) Wald P sex Female 0.490 0 1 (reference) Male 0.510 0.316 1.372 0.156 0.0434 I((age-65)/10) -0.522 0.390 1.477 0.062 0.000 ST3 I+II 0.551 0 1 (Reference) III 0.287 0.304 1.355 0.182 0.095IV 0.162 0.892 2.440 0.178 0.000 log(scale) 2.605 13.532 0.156 0.000 log(shape) -0.070 0.933 0.060 0.244 Events 181 Total time at risk 1620.9 max.log.likelihood -545.86 LR test statistic 68.7 Degrees of freedom 4 Overall p-value 4.30767 e-14Copy the code

The coefficient (exponent) has an equivalent interpretation to the coefficient in the Cox proportional model.

Any function that fits the parameters of the model can be summarized or plotted by providing fn for the parameters to the summary or plot methods. For example, the median survival rate under Weibull model can be summarized as

 
newd <- data.frame(sex = c("Male", "Female"), age = 65, st3 = "I+II")
summary(m2w, newdata = newd, fn = median.weibull, t = 1, B = 10000)
Copy the code

Sex =Male, I(age-65)/10)=0, st3=I+II time est LCL UCl 1 6.507834 4.898889 8.631952 sex=Female, I(age-65)/10)=0, St3 =I+II time est LCL UCL 1 1 9.134466 6.801322 12.33771Copy the code

The results were compared with those of Cox model.

survfit(m2, newdata = newd)
Copy the code

Call: Survfit (formula = m2, newdata = newd) n events median 0.95LCL 0.95UCL 1 267 184 7.00 5.25 10.6 2 267 184 9.92 7.33 13.8Copy the code

Poisson regression

It can be proved that Cox model is mathematically equivalent to poisson regression model for specific transformation of data. We first define a unique time to observe the event (all == 1) and use the survSplit() function survival in the package to split the data.

 
head(orca_splitted, 15)
Copy the code

The fitting condition is Poisson regression, in which the effect of time (as a factor variable) can be marginalised (not estimated to improve computational efficiency).

mod_poi <- gnm(all ~ sex + I((age-65)/10) + st3, data = orca_splitted, 
               family = poisson, eliminate = factor(time))
summary(mod_poi)
Copy the code

The estimates obtained from conditional Poisson were compared with cox proportional risk models.

round(data.frame(cox = ci.exp(m2), poisson = ci.exp(mod_poi)), 4)
Copy the code

cox.exp.Est.. Cox. 2.5. Cox. 97.5. The poisson. Exp. Est... Poisson.2.5.poisson.97.5. SexMale 1.3284 0.9763 1.8074 1.3284 0.9763 1.8074 I((age-65)/10) 1.4624 1.2947 1.6519 1.4624 1.2947 1.6519 st3III 1.3620 0.9521 1.9482 1.3620 0.9521 1.9482 st3IV 2.3828 1.6789 3.3818 2.3828 1.6789 3.3818Copy the code

If we want to estimate the baseline risk, we also need to estimate the effect of time in the Poisson model.

orca_splitted$dur <- with(orca_splitted, time - tstart)
mod_poi2 <- glm(all ~ -1 + factor(time) + sex + I((age-65)/10) + st3, 
                data = orca_splitted, family = poisson, offset = log(dur))
Copy the code

Baseline risk includes a step function where the rate is constant over each time interval.

newd <- data.frame(time = cuts, dur = 1,
                   sex = "Female", age = 65, st3 = "I+II")
blhaz <- data.frame(ci.pred(mod_poi2, newdata = newd))
ggplot(blhaz, aes(x = c(0, cuts[-138]), y = Estimate, xend = cuts, yend = Estimate)) + geom_segment() +
  scale_y_continuous(trans = "log", limits = c(.05, 5), breaks = pretty_breaks()) +
  theme_classic() + labs(x = "Time (years)", y = "Baseline hazard")
Copy the code

A better approach is to simulate the baseline risk flexibly by using splines such as those with nodes \ (k \).

 
Copy the code

Exp (Est.) 0.074 0.040 0.135 ns(time, knots = k)1 0.402 0.177 0.912 ns(time, knots = k) Knots = k (time, knots = k)2 0.280 0.359 ns(time, knots = k)3 0.220 0.359 ns(time, knots = k) Knots = k) 5285285ns (time, Knots = k) 5 1.040 0.178knots 6 1.040 0.178knots sexMale 1.325 0.975 0.178i ((age-65)/10) 1.469 1.300 1.659 st3III 1.360 0.952 1.942 St3IV 2.361 1.665 3.347Copy the code

 
Copy the code

Compare different strategies

We can compare previous strategies based on the predicted survival curve of a specific covariate model, such as a 65-year-old woman with stage I or II cancer.

newd <- data.frame(sex = "Female", age = 65, st3 = "I+II")
 
Copy the code

The graphical representation of the survival function facilitates comparison.

 
Copy the code

Other analysis

nonlinear

We assume that the effect of age on (log) mortality is linear. A possible strategy to relax this assumption is to fit a Cox model in which age is modulated by a quadratic effect.

 
Copy the code

Call: coxph(formula = Surv(time, all) ~ sex + I(age - 65) + I((age - 65)^2) + st3, data = orca2) n= 267, The number of events = 184 coef exp (coef) se (coef) z Pr (> | z |) sexMale e-01 e+00 e-01 1.591 1.337 2.903 1.825 0.0681 I (age - 65) I((age-65)^2) 3.443e-05 1.000e+00 3.576e-04 0.264 0.7917st3iii 3.168e-01 2.385e+00 1.787e-01 4.863 1.16e-06 exp(coef) exp(-coef) lower.95 Upper.95 sexMale 1.337 0.7481 0.9787 1.826 I(age-65) 1.039 0.9621 1.0262 1.053 I((age-65)^2) 1.000 0.9999 0.9994 1.001ST3III 1.373 0.7284 0.9576 1.968ST3IV 2.385 0.4193 1.6801 3.385 Concordance= 0.674 (se = 0.022) Rsquare= 0.216 (Max possible= 0.999) Likelihood ratio test= 64.89 on 5 df, P = 1E-12 Wald test= 63.11 on 5 df, P = 3E-12 Score (logrank) test = 67.64 on 5 df, P = 3E-13Copy the code

The value of the nonlinearity (that is, the quadratic term) is high, so there is no evidence to reject the null hypothesis (that is, the linear hypothesis is appropriate).

If the relationship is non-linear, the age coefficient can no longer be explained directly. We can graphically present HR as a function of age. We need to specify an indicator; We chose a median age of 65.

age <- seq(20, 80, 1) - 65
 
  geom_vline(xintercept = 65, lty = 2) + geom_hline(yintercept = 1, lty = 2)
Copy the code

Time dependent coefficient

The cox.zph() function can be used to plot the effects of individual predictors over time, and thus can be used to diagnose and understand non-proportional risks.

 
Copy the code

We can relax the proportional risk assumption by fitting the step function, which means that there are different at different time intervals

Survival, the survSplit() function in the package, divides the data set.

Id sex age stage event ST3 tstart time All tgroup 1 2 Female 83.08783 III Oral ca. Death III 0 0.419 1 1 2 3 Male 52.59008 II Other death I+II 0 5.000 0 1 3 3 Male 52.59008 II Other death I+II 5 7.915 1 2 4 4 Male 77.08630 I Other Death I+II 0 2.480 1 1 5 5 Male 80.33622 IV Oral ca. Death IV 0 2.500 1 16 6 Female 82.58132 IV Other death IV 0 0.167 1 1Copy the code

 
Copy the code

I((age - 65)/10) + st3, Data = ORCA3) COEF exp(COEF) se(COEF) z p I((AGE-65)/10) 0.38184 1.46498 0.06255 6.104 1.03E-09 st3III 0.28857 1.33451 0.18393 1.569 0.1167 st3IV 0.87579 2.40076 0.17963 4.876 1.09e-06 relevel(sex, 2)Male:strata(tgroup)tgroup=1 0.42076 1.52312 0.19052 2.209 0.0272 relevel(sex, 2) strata(tgroup)tgroup=1 NA NA 0.00000 NA NA relevel(sex, 2)Male:strata(tgroup)tgroup= 2-0.10270 0.90240 0.28120-0.365 0.7149 relevel(sex, 2) strata(tgroup)tgroup=2 NA NA 0.00000 NA NA relevel(sex, 2)Male:strata(tgroup)tgroup=3 1.13186 3.10142 1.09435 1.034 0.3010 relevel(sex, 2)Female:strata(Tgroup) Tgroup =3 NA NA 0.00000 NA NA Likelihood ratio test=68.06 on 6 DF, P =1.023e-12 n= 416, number of events= 184Copy the code

Although not significant, the risk ratio for male and female comparisons was less than 1 in the second period (5 to 15 years) and higher than 1 in the other two periods.

Simulates percentile survival

A different but interesting approach involves simulating percentiles of survival time.

 
Copy the code

Call: ctqr(formula = Surv(time, all) ~ st3, data = orca2, p = p) Coefficients: P = 0.25 (Intercept) 2.665 ST3III-1.369 ST3IV-1.877 Degrees of Freedom: 267 total; 225 residualsCopy the code

β0= 2.665 was the time when the probability of death in the reference group was equal to 0.25. The other is interpreted as relative measure.

This information allows intuitive comparison of survival curves estimated separately at the tumor stage level.

 
  p = c(p, p - .005, p + .005)
)[-1, ]

  = 1 - p, xend = time_ref, 
                                 yend = 1 - p))
Copy the code

Possible differences in percentiles of time to survival assessed in the Cox model as a function of gender at diagnosis and age at tumor stage.

 
Copy the code

CTQR (formula = Surv(time, all) ~ sex + I((age-65)/10) + ST3, data = ORCa2, P = SEq (0.1, 0.7, 0.1)) Coefficients: P = 0.1p = 0.2p = 0.3p = 0.4p = 0.5p = 0.6p = 0.7 15.19359 sexMale -0.09218-0.27385-2.49580-3.27962-2.81428-4.01656 I((age-65)/10) -0.19026-0.39819 St3III - 0.60994-1.08534-1.89357-2.23741-3.10478-2.00037-1.59213 st3IV - 1.20278-1.93144-2.39229-3.03915-3.52711 st3III - 0.60994-1.08534-1.89357-2.10478-2.00037-1.59213 st3IV -1.07679-1.59566-2.92700-3.16652-4.74759-4.80838-5.25810 Degrees of Freedom: 267 Total; 220 residualsCopy the code

The results included differences in survival time for each covariable at different percentiles.

coef_q <- data.frame(coef(fit_q)) %>%
    .96 * se
    )
 
Copy the code

Alternatively, percentiles of survival time can be predicted for a particular set of covariance patterns.

 
Copy the code

CIF cumulative incidence function

In competitive risk scenarios, Kaplan-Meier estimates of survival for a specific cause are generally inappropriate. We will consider the cumulative incidence function (CIF) of events

CIF

Mstate calculates the nonparametric CIF (also known as the Aalen-Johansen estimate) and the associated standard errors for competitive events.

 head(cif)
Copy the code

Time Surv ci.oral ca. Death CI.Other death seSurv seci.oral ca. Death 1 0.085 0.9925094 0.007490637 0.000000000 0.005276805 0.005276805 2 0.162 0.9887640 0.011235955 0.000000000 0.006450534 0.006450534 3 0.167 0.9812734 0.011235955 0.007490637 0.008296000 0.006450534 4 0.170 0.9775281 0.011235955 0.011235955 0.009070453 0.006450534 5 0.249 0.9737828 0.011235955 0.014981273 0.009778423 0.006450534 6 0.252 0.9662921 0.014981273 0.018726592 0.011044962 0.007434315 SeCI.Other death 1 0.000000000 2 0.000000000 3 0.005276805 4 0.006450534 5 0.007434315 6 0.008296000Copy the code

We can plot the CIF and the survival function.

 
Copy the code

The cumulative incidence function was estimated by the level of factor variables.

grid.arrange(
Copy the code

 
  ncol = 2
)
Copy the code

We can see that the CIF of oral cancer deaths in stage IV is higher than that in III and even higher than that in I + II. In contrast, for mortality from other causes, the curve did not seem to vary with tumor stage.

When we want to model survival data in competitive risk Settings, there are two common strategies that can address different problems: – Cox models for event-specific risk, e.g., interest in predictors of biological effects on mortality very disease. – When we want to evaluate the effect of factors on the overall cumulative incidence of events.

CIF Cox model

 
Copy the code

 round(ci.exp(m2haz2), 4)
Copy the code

Exp (Est.) 2.5% 97.5% sexMale 1.8103 1.1528 2.8431 I((age-65)/10) 1.4876 1.2491 1.7715 st3III 1.2300 0.7488 2.0206 St3IV 1.6407 0.9522 2.8270Copy the code

The results of the cause-specific Cox model were consistent with the graphical representation of cause-specific CIF, namely that stage IV tumor was only an important risk factor for oral cancer mortality. Increased age was associated with increased mortality from both causes (HR = 1.42 for oral cancer and 1.48 for other causes). A sex difference was observed only for mortality from other causes (HR = 1.8).

CRR model

CRR () In the case of CMPRSK competition risk, the functions in the package can be used for regression modeling of sub-distribution functions.

Call: crr(ftime = time, fstatus = event, cov1 = model.matrix(m2), Failcode = "Oral ca. death") coef exp(coef) se(coef) z p-value sexMale -0.0953 0.909 0.213-0.447 6.5e-01 I((age - 65)/10) 0.2814 1.325 0.093 3.024 2.5e-03 st3III 0.3924 1.481 0.258 1.519 1.3e-01 st3IV 1.0208 2.775 0.233 4.374 1.2e-05 Exp (coef) exp(-coef) 2.5% sexMale 0.909 1.100 0.599 1.38i (age-65)/10) 1.325 0.755 1.104 1.59 st3III 1.481 0.675 0.892 2.46ST3iv 2.775 0.360 1.757 4.39 Num. Cases = 267 Pseudo log-likelihood = -501 Pseudo likelihood ratio test = 31.4 on 4 df,Copy the code

m2fg2 <- with(orca2, crr(time, event, cov1 = model.matrix(m2), failcode = "Other death"))
summary(m2fg2, Exp = T)
Copy the code

Competing Risks Regression Call: crr(ftime = time, fstatus = event, cov1 = model.matrix(m2), Failcode = "Other death") coef exp(coef) se(coef) z p-value sexMale 0.544 1.723 0.2342 2.324 0.020 I((age-65)/10) 0.197 1.218 0.0807 2.444 0.015 st3III 0.130 1.139 0.2502 0.521 0.600 st3iv-0.212 0.2839-0.748 0.450 exp(coef) Exp (-coef) 2.5% 97.5% sexMale 1.723 0.580 1.089 2.73 I(age-65)/10) 1.218 0.821 1.040 1.43 st3III 1.139 0.878 0.698 1.86ST3IV 0.809 1.237 0.464 1.41 Num. Cases = 267 Pseudo log-likelihood = -471 Pseudo likelihood ratio test = 9.43 on 4  df,Copy the code

 
Copy the code

More questions, contact us!Copy the code

Most welcome insight

1. R language survival curve drawing estimate | | how to survive R for survival analysis graph

2. Visual analysis of R language survival analysis

3. How does R language calculate IDI and NRI indicators in survival analysis and Cox regression

4. Use Bioconductor to analyze chip data in R language

5. Visualization case of R language survival analysis data analysis

6. R language GGploT2 error bar diagram quick guide

7. Draw functional enrichment bubble map in R language

8. How does R language find indicators with differences in patient data? (PLS-DA analysis)

9. Survival analysis of R language in 4 patients with advanced lung cancer

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

R language survival analysis data analysis visualization case

Original link:tecdat.cn/?p=2858

Original source:Tuo End number according to the tribe public number

The target

R package

data

Survival data analysis

Estimated survival function

Nonparametric estimation

Kaplan Meier –

Survival curve estimation

Nelson – Aalen estimates

Graphical representation of estimators (based on survival curves using KM)

Parameter estimation

Comparison of survival curves

Mantel – Haenszel logrank test

Peto&Peto Gehan – Wilcoxon test

Modeling survival data

CoxPH model

AFT model

Poisson regression

Compare different strategies

Other analysis

nonlinear

Time dependent coefficient

Simulates percentile survival

CIF cumulative incidence function

CIF

CIF Cox model

CRR model

R language survival analysis data analysis visualization case

Original link:tecdat.cn/?p=2858

Original source:Tuo End number according to the tribe public number

The target

R package

data

Survival data analysis

Estimated survival function

Nonparametric estimation

Kaplan Meier –

Survival curve estimation

Nelson – Aalen estimates

Graphical representation of estimators (based on survival curves using KM)

Parameter estimation

Comparison of survival curves

Mantel – Haenszel logrank test

Peto&Peto Gehan – Wilcoxon test

Modeling survival data

CoxPH model

AFT model

Poisson regression

Compare different strategies

Other analysis

nonlinear

Time dependent coefficient

Simulates percentile survival

CIF cumulative incidence function

CIF

CIF Cox model

CRR model

Related Posts

Time Management, Brainstorming, Reading, Meeting taking – Avi

MySQL series relearn: 9 minutes to understand the underlying principles of MySQL, deadlock exceptions, SQL performance

Create a free static Blog with Hexo+ Code cloud + Git