Bioinformatics Tutorial
Files Needed
  • Getting Started
    • Setup
    • Run jobs in a Docker
    • Run jobs in a cluster [Advanced]
  • Part I. Programming Skills
    • 1.Linux
      • 1.1.Basic Command
      • 1.2.Practice Guide
      • 1.3.Linux Bash
    • 2.R
      • 2.1.R Basics
      • 2.2.Plot with R
    • 3.Python
  • PART II. BASIC ANALYSES
    • 1.Blast
    • 2.Conservation Analysis
    • 3.Function Analysis
      • 3.1.GO
      • 3.2.KEGG
      • 3.3.GSEA
    • 4.Clinical Analyses
      • 4.1.Survival Analysis
  • Part III. NGS DATA ANALYSES
    • 1.Mapping
      • 1.1 Genome Browser
      • 1.2 bedtools and samtools
    • 2.RNA-seq
      • 2.1.Expression Matrix
      • 2.2.Differential Expression with Cufflinks
      • 2.3.Differential Expression with DEseq2 and edgeR
    • 3.ChIP-seq
    • 4.Motif
      • 4.1.Sequence Motif
      • 4.2.Structure Motif
    • 5.RNA Network
      • 5.1.Co-expression Network
      • 5.2.miRNA Targets
      • 5.3. CLIP-seq (RNA-Protein Interaction)
    • 6.RNA Regulation - I
      • 6.1.Alternative Splicing
      • 6.2.APA (Alternative Polyadenylation)
      • 6.3.Chimeric RNA
      • 6.4.RNA Editing
      • 6.5.SNV/INDEL
    • 7.RNA Regulation - II
      • 7.1.Translation: Ribo-seq
      • 7.2.RNA Structure
    • 8.cfDNA
      • 8.1.Basic cfDNA-seq Analyses
  • Part IV. MACHINE LEARNING
    • 1.Machine Learning Basics
      • 1.1 Data Pre-processing
      • 1.2 Data Visualization & Dimension Reduction
      • 1.3 Feature Extraction and Selection
      • 1.4 Machine Learning Classifiers/Models
      • 1.5 Performance Evaluation
    • 2.Machine Learning with R
    • 3.Machine Learning with Python
  • Part V. Assignments
    • 1.Precision Medicine - exSEEK
      • Help
      • Archive: Version 2018
        • 1.1.Data Introduction
        • 1.2.Requirement
        • 1.3.Helps
    • 2.RNA Regulation - RiboShape
      • 2.0.Programming Tools
      • 2.1.RNA-seq Analysis
      • 2.2.Ribo-seq Analysis
      • 2.3.SHAPE Data Analysis
      • 2.4.Integration
    • 3.RNA Regulation - dsRNA
    • 4.Single Cell Data Analysis
      • Help
  • 5.Model Programming
  • Appendix
    • Appendix I. Keep Learning
    • Appendix II. Databases & Servers
    • Appendix III. How to Backup
    • Appendix IV. Teaching Materials
    • Appendix V. Software and Tools
    • Appendix VI. Genome Annotations
Powered by GitBook
On this page
  • 1) Background
  • 2) Software
  • 2a) Install STAR-Fusion
  • 3) Running STAR-Fusion
  • 3a) Method 1. Input junction file
  • 3b) Method 2. Input fastq file
  • 4) Utility

Was this helpful?

Edit on GitHub
  1. Part III. NGS DATA ANALYSES
  2. 6.RNA Regulation - I

6.3.Chimeric RNA

Previous6.2.APA (Alternative Polyadenylation)Next6.4.RNA Editing

Last updated 4 years ago

Was this helpful?

本章介绍如何通过RNA-seq找到可能的Chimeric RNAs。

1) Background

Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These RNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.

Chimeric RNA的产生来源包括两种可能的融合,1)两段DNA的融合(Gene Fusion); 2)两条RNA剪接(trans-splicing) 而成。

2) Software

2a) Install STAR-Fusion

docker load -i ~/Desktop/bioinfo_chimeric.tar.gz

mv ~/Downloads/ref_genome.fa.star.idx ~/Downloads/ctat_genome_lib_build_X_docker

docker run -dt -v ~/Downloads/ctat_genome_lib_build_X_docker:/data --name=bioinfo_starfusion gangxu/starfusion:latest

docker exec -it bioinfo_starfusion bash

需要挂载文件ctat_genome_lib_build_X_docker.zip,ref_genome.fa.star.idx.zip,请从清华云下载,。

3) Running STAR-Fusion

STAR-Fusion可以直接以Fastq为输入文件进行融合基因分析;也可以使用STAR的Chimeric.out.junction文件作为STAR-Fusion的输入文件。

下面分别介绍使用这2种不同输入文件进行分析的方法。

3a) Method 1. Input junction file

  • 使用STAR将Fastq比对到参考基因组上,输出Chimeric.out.junction文件:

    这步需要大量的内存,建议直接跳过。可以在集群中运行这步。

echo STAR start `date`
/usr/local/src/STAR-2.7.2b/bin/Linux_x86_64 \
 --runThreadN 2 \
 --genomeDir /data/ref_genome.fa.star.idx \
 --readFilesIn /data/SRR5712523_1.fastq.gz  /data/SRR5712523_2.fastq.gz \
 --outFileNamePrefix /data/SRR5712523. \
 --outReadsUnmapped None \
 --readFilesCommand "gunzip -c" \
 --outSAMstrandField intronMotif \
 --outSAMunmapped Within \
 --chimSegmentMin 12 \
 --chimJunctionOverhangMin 12 \
 --chimOutJunctionFormat 1 \
 --alignSJDBoverhangMin 10 \
 --alignMatesGapMax 100000 \
 --alignIntronMax 100000 \
 --alignSJstitchMismatchNmax 5 -1 5 5 \
 --outSAMattrRGline ID:SRR5712523 \
 --chimMultimapScoreRange 3 \
 --chimScoreJunctionNonGTAG -4 \
 --chimMultimapNmax 20 \
 --chimNonchimScoreDropMin 10 \
 --peOverlapNbasesMin 12 \
 --peOverlapMMp 0.1 

echo STAR end `date`

以Chimeric.out.junction为输入文件,用STAR-Fusion进行融合基因分析

/usr/local/src/STAR-Fusion/STAR-Fusion --CPU 2 \
--genome_lib_dir /data \
-J /data/SRR5712523.Chimeric.out.junction \
--output_dir /data/SRR5712523_fusion_X_docker

3b) Method 2. Input fastq file

由于STAR运行时会占用较大内存(RAM),约20~30G;如果STAR-Fusion加了--FusionInspector validate参数可能会使内存总占用达到~40G,因此当我们从fastq开始使用STAR-fusion分析时需要合理控制并行运行的STAR-Fusion任务数量。

/usr/local/src/STAR-Fusion/STAR-Fusion \
    --left_fq /data/SRR5712523_1.fastq.gz   \
    --right_fq /data/SRR5712523_2.fastq.gz \
    --genome_lib_dir /data/ \
    -O /data/StarFusionOut \

4) Utility

在本示例中,我们使用STAR-Fusion进行分析, STAR-Fusion是一款利用RNA-Seq数据检测人类融合基因的软件,STAR-Fusion提供了Docker镜像,以方便用户使用。

Download reference files for STAR-Fusion

在寻找chimeric RNA时,我们还需要从Broad Institute数据库网站下载STAR-Fusion所需要的参考基因组与注释文件,选择“plug-n-play”压缩文件进行下载。下载地址如下:

下载后将其命名为CTAT_resource_lib.tar.gz ,解压。

如果您不使用Docker镜像而是自行安装,请查看。

有详细的软件使用方法说明

Brian J. Haas, et al. bioRxiv, 2017.

其他可以用于分析融合基因的软件有:, , , , 。

具体地址请看这里。
STAR-Fusion的安装指南
STAR-Fusion的GitHub主页
STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq.
Prada
FusionCatcher
SoapFuse
TophatFusion
DISCASM/GMAP-Fusion
https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/