4.1.Survival Analysis
1) Background
1958年,Edward L. Kaplan 和Paul Meier也首次在临床研究中提出了生存曲线的概念,又被称作Kaplan-Meier曲线,主要用来对各组患者的生存状况进行描述。绘制生存曲线最主要的目的是进行生存分析,即通过将终点事件和出现这一终点所经历的时间结合起来进行统计分析,从而对两组患者的预后进行比较。
2) Pipeline
Survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. (https://en.wikipedia.org/wiki/Survival_analysis)
生存分析是研究生存时间和相关因素有无关系以及样本生存时间的分布规律的一种统计分析方法。
生存分析使用的方法:
描述生存过程:Kaplan-Meier plots to visualize survival curves(根据生存时间分布,估计生存率及其标准误,绘制生存曲线。常用Kaplan-Meier法,还有寿命法)
比较生存过程:Log-rank test to compare the survival curves of two or more groups(通过比较两组或者多组之间的的生存曲线,一般是生存率及其标准误,从而研究之间的差异,一般用log rank检验)
影响生存时间的因素分析:Cox proportional hazards regression to describe the effect of variables on survival(用Cox风险比例模型来分析变量对生存的影响,可以两个及两个以上的因素)
Reference: http://www.sthda.com/english/wiki/cox-proportional-hazards-model
3) Data structure
File name
Description
rna.rds
RSEM normalized counts value matrix
clinical_info.rds, LIHC.merged_only_clinical_clin_format.txt, all_clin.rds
Clinical information for TCGA samples
3a) Input data
Import data
Data character
TCGA barcode information: https://docs.gdc.cancer.gov/Encyclopedia/pages/images/TCGA-TCGAbarcode-080518-1750-4378.pdf
3b) Data preprocessing
4) Running steps
4a) Install packages
4b) Library package
4c) Create event vector for RNASeq data
4d) Fit survival curves
4e) Draw survival curves

*: each '+' represent a censored sample.
5) Appendix
5a) Download TCGA RNAseq data and clinical data
We could download data of TCGA liver cancer (LIHC) following:
go to FireBrowse (http://gdac.broadinstitute.org/), select "LIHC" -> "Browse"
from "mRNASeq" select "illuminahiseq_rnaseqv2-RSEM_genes_normalized" and save it
from "Clinical" select "Merge_Clinical" and download it
unzip the files
rename the folders as "RNA" and "Clinical"
5b) Data preprocessing
6) Homework
Please plot the survival curves about the patients with up-regulated differentially expressed and not altered expressed AFP gene in TCGA LIHC data.
7) References
Last updated
Was this helpful?