7.1.Translation: Ribo-seq

1) Background

Ribo-seq(有时又称为ribosome profiling)是2009年Weissman课题组首次发表的研究细胞内蛋白翻译组的二代测序技术。这种技术选择性的对有RNA上有核糖体结合的片段进行测序,这样就能获得很多翻译组的信息。

Ribo-seq大致步骤为:

  • 裂解细胞,富集结合着核糖体的mRNA

  • 用核酸酶消化掉mRNA上没有核糖体保护的片段

  • 富集核糖体,进而纯化得到有核糖体所保护的片段(ribosome protected fragment, RPF)

  • 对RPF建库测序

  • 生物信息学的分析

Ribo-seq数据测得的RNA片段长短与small RNA-seq相似,大约分布在25~35nt区间。由于Ribo-seq选择性的捕获正在翻译的序列,所以其数据的测序片段大多比对到基因组的CDS(coding sequence)。此外,Ribo-seq还有一个明显区别于其他RNA-seq的特点,即Ribo-seq的信号在CDS区域往往呈现3-nt的周期性(图1)。这主要因为一个codon是3 nt长,翻译过程中核糖体的移动模式也具有3 nt的周期性。

图1

Ribo-seq数据可以用来做的分析主要有以下两种:

  • 对翻译组进行定量。Ribo-seq数据可以像RNA-seq数据一样得到表达矩阵用作差异分析。如果同样的样本还有RNA-seq的数据,那么还可以进行翻译效率的分析

  • ORF calling。基于ribo-seq数据,我们可以注释一些novel的ORF。

2) Workflow

本文将介绍两个Ribo-seq处理软件(Ribocode和Ribowave),以及一个专门用来对翻译效率进行差异分析的R包(Xtail)。我们主要介绍Ribocode和Xtail的使用,Ribowave作为补充学习。

2a)Ribocode

Ribocode是一种包含Ribo-seq数据预处理、可视化、识别翻译ORF的的Ribo-seq全流程分析工具。Ribocode的分析流程如下:

2b)Xtail

Xtail是一个用于计算翻译效率(TE)和差异翻译效率的R package。计算流程如下:

2c)Ribowave

RiboWave是一种基于小波变换对Ribo-seq数据的进行预处理,从而识别翻译ORF,计算翻译效率的Ribo-seq全流程分析工具。RiboWave的分析流程如下:

3) RiboCode

3a) 准备工作环境

  • 使用docker 启动ribo-seq所用的 Docker,进入工作目录

  • 下载软件和文件 1.Install software:Ribocode 2.Download data: link

3b) 准备注释文件

产生transcripts annotation文件,用于后续分析。

3c) 确定P-site的位置

  • 我们知道,一个read对应一个RPF,一个RPF对应一个核糖体,而一个核糖体有A(aminoacyl-site),P(peptidyl-site)和E(exit-site)三个位点。我们通常用P位点的位置来代表翻译的信号。

  • 推测P site相对于RPF的位置有不同的策略。RiboCode的做法是,对于长度相同的RPF统一设定一个offset作为P site的位置,使得这个位置可以产生周期性最强的翻译信号。这样我们就能根据一个read的长度直接推测出P site的位置。

  • 请大家注意,这里的bam文件是位于转录本的坐标上的

Output:

  • wtuvb1.wtuvb1.Aligned.toTranscriptome.out.sorted.pdf:a PDF file plots the aggregate profiles of the distance from the 5'-end of reads to the annotated start codons (or stop codons), which is used for examining the P-site periodicity of RPF reads on CDS regions.

  • wtuvb1._pre_config.txt:The P-site config file, which defines the read lengths with strong 3-nt periodicity and the associated P-site locations for each length.

3d) ORFs calling

这一步可以获取有翻译数据支持的ORF。 Output:

  • ORF.gtf/ORF.txt:contains the information of all predicted ORFs in each transcript.

  • ORF_collapsed.gtf /RF_collapsed.txt :file combines the ORFs having the same stop codon in different transcript isoforms: the one harboring the most upstream in-frame ATG will be kept.

Some column names of the result file:

3e) 计算ORF的count matrix

"ORFcount" command which will call the HTSeq-count program. Only the reads of a given length will be counted. For those ORF with length longer than a specified value (set by "-e"), the RPF reads located in first few and last few codons can be excluded by adjusting the parameters "-f" and "-l".

The reads with length between 24-35 nt aligned on predicted ORF.The reads aligned to first 15 codons and last 5 codons of ORFs and had the length longer than 100 nt will be excluded.

4) Xtail

Xtail是对翻译效率进行差异检验的R包,它内部调用了DESeq2来实现负二项分布检验。

4a) Pre-processing

  • 自行下载和安装

1.下载RNA-seq表达矩阵(RNA_count.txt)和Ribo-seq表达矩阵(Ribo_count.txt),在这里

2.Xtail

  • Users should install DESeq2 before using Xtail: start R and enter:

4b) Calculate differential TE

5) RiboWave

5a) Pre-processing

  • 使用docker 启动ribo-seq所用的 Docker,进入工作目录

  • 下载软件和文件 1.Install software:Ribowave 2.Download data: link

5b) Create annotation

input files

  • annotation.gtf: the annotation gtf should contain start_codon and stop_codon information, eg: dmel-all-r6.18.gtf

  • genome.fasta: genome fasta, eg: dmel-all-chromosome-r6.18.fasta

  • annotation_dir: the directory for all the annotation output

  • scripts_dir: the directory of all the scripts in the package

output files

annotation directory, including :

  • start_codon.bed: the bed file annotating start codon

  • final.ORFs: all identified ORFs, eg: FBtr0300105_0_31_546 where FBtr0300105 refers to the transcript, 0 refers to the reading frame relative to the start of transcript, 31 refers to the start site, 546 refers to the stop codon.

5c) P-site determination

This step determines the P-site position for each Ribo-seq reads length by overlapping with the annotated start codons from previous step

input files

  • Ribo_bam: secondary alignment removed to ensure one genomic position per aligned read and sorted

  • start_codon.bed: annotated start site start_codon.bed. It is generated in the create_annotation.sh step

  • out_dir: the directory of the output result, eg: GSE52799

  • study_name: the name of all the output file, default: test. eg: SRR1039770

  • scripts_dir: the directory of all the scripts in the package

查看输出

output files

P-site directory, including :

  • name.psite1nt.txt: the Ribo-seq reads length and its corresponding P-sites position(= offset + 1). It may look this this :

  • name.psite.pdf: the PDF displaying the histogram of aggregated reads

    我们收集了所有已被注释的起始密码子并将 这些起始密码子和Ribo-seq 比对上的序列进行重合,分别计算Ribo-seq序列的5’端偏离起始密码子的第一个碱基A的距离(offset)。根据 Ribo-seq测序片段长度的不同,我们进一步将Ribo-seq片段分成多个组分。在每个长度对应的组分里,作出Ribo-seq片段5’端偏离起始密码子A的距离(offset)的直方图。

每一行代表不同长度的 Ribo-seq 测序片段的直方图。该数据中,30nt的reads数目最多,在30nt长度的Ribo-seq片段中,我们可以明显的看到在距离为13nt的位点含有一个峰值(peak)。鉴于大部分核糖体会在翻译起始位点停滞较多的时间,因此对于30nt长的Ribo-seq片段,其P-site位点的定义应该代表直方图中绝大多数的核糖体,因此我们将P-site位点应该定义为峰值最高的第13个碱基(13nt)的位置。

5d) Generating P-site track

基于Ribo-seq序列及其确定的P-site位点,将规律推广到所有Ribo-seq的片段中,直接根据Ribo-seq序列的长度推断其对应的P-site位点。根据这种方法,我们可以将每一条转录本上所有的Ribo-seq片段转化为对应的P-site位点的信号点并获得转录组水平的Ribo-seq信号轨迹(signal track)。由于是由P-site位点定义出的信号轨迹,通常也被叫做P-site信号轨迹(P-sites track)。转录本上每一个位点的信号丰度代表了有多少Ribo-seq片段对应的P-site位点落在该位置上。

This step creats the P-site track for transcripts of interests using determined P-sites position from previous step. look at transcripts from chromosome X :

查看输出

input files

  • Ribo_bam

  • exons.gtf: a gtf file for only the exons from transcripts of interest, eg: X.exons.gtf

  • chromosome_size: the file including all the chromosomes and its genome size. Noted: genome can be obtained by using samtools faidx function with the input of fasta file. genome may look like this:

  • P-site_position: the file listing the P-site position for each read length. This file can be found in the output of previous step, eg: name.psite1nt.txt

  • out_dir: the directory of the output result, eg: GSE52799

  • study_name: the name of all the output file, default: test. eg: SRR1039770

  • scripts_dir: the directory of all the scripts in the package

查看生成文件

output files

  • bedgraph/name directory, including :

    final.psite : P-site track at transcriptome wide. It may look like this :

5e) RiboWave main function

This step can achieve multiple functions :

  • denoising [denoise]

降噪:在降噪过程运用了小波变换来去除原始信号中非3-nt周期性的信号使得保留的信号均具有翻译的3-nt周期性,即P-site信号。

  • providing predicted p.value for each given ORF to identify its translation status [pvalue,-P]

鉴定ORF的翻译潜能:根据该转录本上经降噪后的P-sites是否富集在该ORF所在的阅读框内,判断ORF的翻译潜能。P<0.05即具有翻译活性。

  • providing reads density (P-site/PF P-site) for each given ORF [density,-D]

reads density = ORF上reads数/ORF长度,反映翻译水平的abundance。

  • providing translation efficiency (TE) estimation for each given ORF [TE,-T]

TE = 翻译水平的abundance/转录水平的abundance,反映翻译效率。

  • providing frameshift potential (CRF score) for each given ORF [CRF,-F]

鉴定潜在核糖体移码现象。

It might take hours to perform the analysis if the input is large. It is recommended to specify the number of CPU cores through the -p option.

Run Ribowave on example:

Denoise the P-site track

Identifying translated ORF

Estimating abundance

Estimating TE

IMPORTANT : when estimating TE, user should input the sequenced depth of Ribo-seq and the FPKM value from paired RNA-seq

Calculating frameshift potential

on annotated ORFs

Multiple functions

input files

  • bedgraph/name:

    P-site track: output from the previous step, containing the P-site track of transcripts of interest, eg: final.psite

  • ORF_list: ORFs of interest ,eg : final.ORFs. It is generated in the step of create_annotation.sh

  • Ribo-seq sequenced depth: the sequenced depth of Ribo-seq to calculate FPKM , eg: 9012445

  • RNA FPKM: FPKM table. It may look like this :

output files

  • name.PF_psite: the denoised signal track(PF P-sites signal track) at transcriptome wide. It looks similar as the input final psite.

  • including chi-square P-value information. It may look like this :

    result directory, including :

  • name.95%.mx: RiboWave makes the prediction on the translation initiation sites and gives the final translated product output (p.value < 0.05) . It may look like this :

  • name.density: reads density ( PF P-site ) of given ORFs. It may look like this :

  • name.TE: TE of given ORFs. It may look like this :

  • name.CRF.final: ORFs that might experience reading frame translocation. It may look like this :

6) More Software

Preprocessing
ORF Iden.
Quantification TE
Translation elongation
Integrative & visualization

Rfoot

DeepRibo

scikit-ribo

diricore

Plastid

RiboWaltz

FLOSS

RiboDiff

Iχnos

Rfeet

RiboWave

ORF-RATER

riborex

PausePred

RiboGalaxy

PRICE

Xtail

ROSE

RiboProfiling

RiboCode

RiboMiner

riboSeqR

RiboHMM

RiboTools

RibORF

RIVIT

RiboTaper

Shoelaces

RP-BP

tRanslatome

SPECtre

TITER

7) Homework

1)解释翻译效率(translation efficiency,TE)的含义。

2)利用RiboWave过程中示例文件算出TE,并画出TE的分布。

3)我们给出作业文件wtuvb2(在/home/test/rna_regulation/ribo-code/wtuvb2,或者在这里下载),利用Ribocode,画出该样本的3nt周期性示意图,计算翻译ORF,并计算每类ORF的个数,尝试解释ORF不同类别的含义。

4)我们给出作业文件RNA_count.txt和Ribo_count_2.txt(在这里下载)利用Xtail过程计算差异翻译,并绘制火山图。

Last updated

Was this helpful?