Bioinformatics Tutorial
Files Needed
  • Getting Started
    • Setup
    • Run jobs in a Docker
    • Run jobs in a cluster [Advanced]
  • Part I. Programming Skills
    • 1.Linux
      • 1.1.Basic Command
      • 1.2.Practice Guide
      • 1.3.Linux Bash
    • 2.R
      • 2.1.R Basics
      • 2.2.Plot with R
    • 3.Python
  • PART II. BASIC ANALYSES
    • 1.Blast
    • 2.Conservation Analysis
    • 3.Function Analysis
      • 3.1.GO
      • 3.2.KEGG
      • 3.3.GSEA
    • 4.Clinical Analyses
      • 4.1.Survival Analysis
  • Part III. NGS DATA ANALYSES
    • 1.Mapping
      • 1.1 Genome Browser
      • 1.2 bedtools and samtools
    • 2.RNA-seq
      • 2.1.Expression Matrix
      • 2.2.Differential Expression with Cufflinks
      • 2.3.Differential Expression with DEseq2 and edgeR
    • 3.ChIP-seq
    • 4.Motif
      • 4.1.Sequence Motif
      • 4.2.Structure Motif
    • 5.RNA Network
      • 5.1.Co-expression Network
      • 5.2.miRNA Targets
      • 5.3. CLIP-seq (RNA-Protein Interaction)
    • 6.RNA Regulation - I
      • 6.1.Alternative Splicing
      • 6.2.APA (Alternative Polyadenylation)
      • 6.3.Chimeric RNA
      • 6.4.RNA Editing
      • 6.5.SNV/INDEL
    • 7.RNA Regulation - II
      • 7.1.Translation: Ribo-seq
      • 7.2.RNA Structure
    • 8.cfDNA
      • 8.1.Basic cfDNA-seq Analyses
  • Part IV. MACHINE LEARNING
    • 1.Machine Learning Basics
      • 1.1 Data Pre-processing
      • 1.2 Data Visualization & Dimension Reduction
      • 1.3 Feature Extraction and Selection
      • 1.4 Machine Learning Classifiers/Models
      • 1.5 Performance Evaluation
    • 2.Machine Learning with R
    • 3.Machine Learning with Python
  • Part V. Assignments
    • 1.Precision Medicine - exSEEK
      • Help
      • Archive: Version 2018
        • 1.1.Data Introduction
        • 1.2.Requirement
        • 1.3.Helps
    • 2.RNA Regulation - RiboShape
      • 2.0.Programming Tools
      • 2.1.RNA-seq Analysis
      • 2.2.Ribo-seq Analysis
      • 2.3.SHAPE Data Analysis
      • 2.4.Integration
    • 3.RNA Regulation - dsRNA
    • 4.Single Cell Data Analysis
      • Help
  • 5.Model Programming
  • Appendix
    • Appendix I. Keep Learning
    • Appendix II. Databases & Servers
    • Appendix III. How to Backup
    • Appendix IV. Teaching Materials
    • Appendix V. Software and Tools
    • Appendix VI. Genome Annotations
Powered by GitBook
On this page
  • 1) Background
  • 2) Tools
  • 3) Pipeline
  • 4) Running steps (RNAEditor)
  • 4a) input files
  • 4b) starting analysis
  • 5) Homework
  • 6) References

Was this helpful?

Edit on GitHub
  1. Part III. NGS DATA ANALYSES
  2. 6.RNA Regulation - I

6.4.RNA Editing

Previous6.3.Chimeric RNANext6.5.SNV/INDEL

Last updated 3 years ago

Was this helpful?

1) Background

RNA编辑可能发挥多样化的调控作用。

mRNA主要存在四类RNA editing事件。

A to I editing是最常见的一类editing事件。在RNA-seq建库的过程中,逆转录的过程会把I转化为G。所以A to I的editing会体现为reads中A to G的coversion。

2) Tools

  • 在DNA水平上也会存在很多单核苷酸多态性(SNP),其中有一部分相对于参考基因组又恰好是A to G的coversion。在这类SNP和RNA编辑事件之间进行区分,理想情况下,如果我们有配套的DNA-seq数据,就可以把DNA水平上存在的A to G coversion给过滤掉,认为剩下的对应着RNA编辑事件。

3) Pipeline

下图展示了RNAeditor内部实现的分析流程。RNAeditor用bwa做RNA-seq的reads mapping,并没有用STAR,hisat2这类spliced aligner,可能是因为作者认为考虑splicing对结果影响不大。

4) Running steps (RNAEditor)

4a) input files

We need a configuration file to assign the input files to the RNAeditor, here is a brief look of configuration file

# This file is used to configure the behaviour of RNAeditor
# Standard input files
refGenome = /home/test/data/Homo_sapiens.GRCh38.ch1.fa
gtfFile = /home/test/data/Homo_sapiens.GRCh38.chr1.gtf
dbSNP = /home/test/data/dbSNP.vcf.new
hapmap = /home/test/data/HAPMAP.vcf
omni = /home/test/data/1000GenomeProject.vcf
esp = /home/test/data/ESP.chr1.vcf
aluRegions = /home/test/data/Repeats.chr1.bed
output = /apps/RNAEditor/output/chr1
sourceDir = /usr/local/bin/
maxDiff = 0.04
seedDiff = 2
standCall = 0
standEmit = 0
edgeDistance = 3
intronDistance = 5
minPts = 5
eps = 50
paired = False
keepTemp = True
overwrite = False
threads = 1

注意:当使用singularity时,需要改变refGenome、gtfFile、dbSNP、hapmap、omni、esp、aluRegions、output等改成对应路径。

4b) starting analysis

rm -rf /apps/RNAEditor/output
mkdir /apps/RNAEditor/output
cd /apps/RNAEditor
RNAEditor.py -i /home/test/chr1.fq  -c /home/test/config_new
mv /apps/RNAEditor/output/ /home/test/out_new

5) Homework

6) References

  • A-to-I RNA editing — immune protector and transcriptome diversifier. Eli Eisenberg, et al. Nature Reviews, 2018.

  • RNAEditor: easy detection of RNA editing events andthe introduction of editing islands. David John, et al. Briefings in Bioinformatics, 2017.

把reads mapping回参考基因组后,对于一些已知的RNA editing位点,我们可以统计落在每一个位点的reads有多少发生了editing,有多少没发生editing。用samtools mpileup命令再结合一些简单的脚本就可以实现这一目的。GATK提供的也可以很容易的实现这一点。

利用mapping的结果,我们也可以从头发现一些RNA编辑事件。我们这里介绍的这个工具用于RNA编辑事件的从头发现。下面我们简述这种从头发现的基本原理。

RNA编辑导致的A to G的转换是一种RNA水平的单核苷酸变异(SNV),所以用中这类进行variant calling的工具原理上可以检测到这些editing事件(在一节中我们会对GATK的使用进行专门的介绍)。

在没有配套的DNA-seq数据的情况下,对于人类数据,由于人已经有很多已知的SNP,一种常见做法是从GATK计算出的A to G conversion中过滤掉已知的A to G的SNP,认为剩下来的对应着真实的editing events。采用的就是这样的策略。

请首先启动相应 ,进入工作目录。

参照RNAEditor网页上页面,理解示例文件运行完的输出结果中chr1.editingSites.vcf和chr1.editingSites.gvf的含义。根据chr1.editingSites.gvf文件,统计RNA编辑位点在基因组上的分布(3‘UTR,intron等各不同区域各有多少RNA editing sites,用柱形图或表格展示)。

有兴趣的同学还可以参考这个工具。它和我们前面介绍的用于可变剪接分析的rMATs是同一个实验室开发的,在计算组件差异的显著性时用的是同样的的统计检验。

ASEReadCounter
RNAeditor
GATK
HaplotypeCaller
snv_rna-seq
RNAeditor
Documentation
rMATS-DVR
Docker