Bioinformatics Tutorial
Files Needed
  • Getting Started
    • Setup
    • Run jobs in a Docker
    • Run jobs in a cluster [Advanced]
  • Part I. Programming Skills
    • 1.Linux
      • 1.1.Basic Command
      • 1.2.Practice Guide
      • 1.3.Linux Bash
    • 2.R
      • 2.1.R Basics
      • 2.2.Plot with R
    • 3.Python
  • PART II. BASIC ANALYSES
    • 1.Blast
    • 2.Conservation Analysis
    • 3.Function Analysis
      • 3.1.GO
      • 3.2.KEGG
      • 3.3.GSEA
    • 4.Clinical Analyses
      • 4.1.Survival Analysis
  • Part III. NGS DATA ANALYSES
    • 1.Mapping
      • 1.1 Genome Browser
      • 1.2 bedtools and samtools
    • 2.RNA-seq
      • 2.1.Expression Matrix
      • 2.2.Differential Expression with Cufflinks
      • 2.3.Differential Expression with DEseq2 and edgeR
    • 3.ChIP-seq
    • 4.Motif
      • 4.1.Sequence Motif
      • 4.2.Structure Motif
    • 5.RNA Network
      • 5.1.Co-expression Network
      • 5.2.miRNA Targets
      • 5.3. CLIP-seq (RNA-Protein Interaction)
    • 6.RNA Regulation - I
      • 6.1.Alternative Splicing
      • 6.2.APA (Alternative Polyadenylation)
      • 6.3.Chimeric RNA
      • 6.4.RNA Editing
      • 6.5.SNV/INDEL
    • 7.RNA Regulation - II
      • 7.1.Translation: Ribo-seq
      • 7.2.RNA Structure
    • 8.cfDNA
      • 8.1.Basic cfDNA-seq Analyses
  • Part IV. MACHINE LEARNING
    • 1.Machine Learning Basics
      • 1.1 Data Pre-processing
      • 1.2 Data Visualization & Dimension Reduction
      • 1.3 Feature Extraction and Selection
      • 1.4 Machine Learning Classifiers/Models
      • 1.5 Performance Evaluation
    • 2.Machine Learning with R
    • 3.Machine Learning with Python
  • Part V. Assignments
    • 1.Precision Medicine - exSEEK
      • Help
      • Archive: Version 2018
        • 1.1.Data Introduction
        • 1.2.Requirement
        • 1.3.Helps
    • 2.RNA Regulation - RiboShape
      • 2.0.Programming Tools
      • 2.1.RNA-seq Analysis
      • 2.2.Ribo-seq Analysis
      • 2.3.SHAPE Data Analysis
      • 2.4.Integration
    • 3.RNA Regulation - dsRNA
    • 4.Single Cell Data Analysis
      • Help
  • 5.Model Programming
  • Appendix
    • Appendix I. Keep Learning
    • Appendix II. Databases & Servers
    • Appendix III. How to Backup
    • Appendix IV. Teaching Materials
    • Appendix V. Software and Tools
    • Appendix VI. Genome Annotations
Powered by GitBook
On this page
  • 1) Background
  • 2) Workflow
  • 3) Running steps (DaPars)
  • 3a) Generate region annotation
  • 3b) Main function to get final result
  • 3c) Filter diff-APA events
  • 4) Homework
  • 5) Tips

Was this helpful?

Edit on GitHub
  1. Part III. NGS DATA ANALYSES
  2. 6.RNA Regulation - I

6.2.APA (Alternative Polyadenylation)

Previous6.1.Alternative SplicingNext6.3.Chimeric RNA

Last updated 3 years ago

Was this helpful?

1) Background

可变多聚腺苷酸化(Alternative polyadenylatio,APA)指的是mRNA在polyA加尾时可能会选取不同的位置,这样就会产生不同的isoforms,每个isform 3' UTR的序列有所不同。APA是一种调控mRNA多样性,稳定性和翻译的普遍机制。

2) Workflow

  • 目前已有一些专门针对APA研究的测序方法(例如PAS-seq专门对转录本中基因组编码的序列和poly A的junction进行测序),不过基于常规的RNA-seq数据也可以进行一些APA的分析。

  • 我们这里介绍的DaPar,就是一个从常规RNA-seq数据出发进行APA分析的工具。

  • DaPar假设每个转录本都存在一个proximal的poly A位点,一个distal的poly A位点,因而产生长短两种isoform。

  • DaPar假设长的isoform对应基因组注释的转录本末端,再根据3' UTR reads coverage的模式推断出APA的位点,进而估计出长短两种isoform的相对比例。

3) Running steps (DaPars)

cd /home/test/rna_regulation/apa

3a) Generate region annotation

在这一步骤中,DaPars_Extract_Anno.py这个脚本从用户提供的bed文件中提取出3'UTR,把有注释的转录本末端当做distal poly A site。 我们可以通过下面一条命令实现:

/home/test/software/dapars-0.9.1/src/DaPars_Extract_Anno.py -b hg19_refseq_whole_gene.bed -s hg19_4_19_2012_Refseq_id_from_UCSC.txt -o hg19_refseq_extracted_3UTR.bed
  • 和bed文件一样,bed12文件每一行都对应一个genomic interval,特殊之处在于它还在10-12列注释出了这个genomic interval中的一些互不重合的sub regions。这样的形式就很适合描述一个转录本是由基因组上的哪些exons剪接形成的。

  • 在我们这个例子中,hg19_refseq_whole_gene.bed每一行都对应一个转录本,它所能反应的信息和常规的gtf/gff注释文件非常相似。

input

hg19_refseq_whole_gene.bed (bed12 format)

   chr1    66999824    67210768    NM_032291    0    +    67000041    67208778    25    227,64,25,72,57,55,176,12,12,25,52,86,93,75,501,128,127,60,112,156,133,203,65,165,2013,    0,91705,98928,101802,105635,108668,109402,126371,133388,136853,137802,139139,142862,145536,147727,155006,156048,161292,185152,195122,199606,205193,206516,207130,208931,
   chr1    33546713    33585995    NM_052998    0    +    33547850    33585783    12    182,121,212,177,174,173,135,166,163,113,215,351,    0,275,488,1065,2841,10937,12169,13435,15594,16954,36789,38931,
   chr1    16767166    16786584    NM_001145278    0    +    16767256    16785385    104,101,105,82,109,178,76,1248,    0,2960,7198,7388,8421,11166,15146,18170,

hg19_4_19_2012_Refseq_id_from_UCSC.txt

   #name    name2
   NM_032291    SGIP1
   NM_052998    ADC

output

hg19_refseq_extracted_3UTR.bed

   chr14    50792327    50792946    NM_001003805|ATP5S|chr14|+    0    +
   chr9    95473645    95477745    NM_001003800|BICD2|chr9|-    0    -
   chr11    92623657    92629635    NM_001008781|FAT3|chr11|+    0    +

3b) Main function to get final result

starting analysis

/home/test/software/dapars-0.9.1/src/DaPars_main.py configure_file

dapar要求我们提供一个包含输入输出及参数设置的配置文件。

input

configure_file

The format of the configure file is:

#The following file is the result of step 1.

Annotated_3UTR=hg19_refseq_extracted_3UTR.bed

#A comma-separated list of BedGraph files of samples from condition 1

Group1_Tophat_aligned_Wig=Condition_A_chrX.wig
#Group1_Tophat_aligned_Wig=Condition_A_chrX_r1.wig,Condition_A_chrX_r2.wig if multiple files in one group

#A comma-separated list of BedGraph files of samples from condition 2

Group2_Tophat_aligned_Wig=Condition_B_chrX.wig

Output_directory=DaPars_Test_data/

Output_result_file=DaPars_Test_data

#At least how many samples passing the coverage threshold in two conditions
Num_least_in_group1=1

Num_least_in_group2=1

Coverage_cutoff=30

#Cutoff for FDR of P-values from Fisher exact test.

FDR_cutoff=0.05


PDUI_cutoff=0.5

Fold_change_cutoff=0.59

output

3c) Filter diff-APA events

FDR_cutoff, PDUI_cutoff, Fold_change_cutoff → Pass filer (Y nor N)

4) Homework

运行示例文件,理解输出文件“DaPars_Test_data_All_Prediction_Results.txt”中每一列的含义。 (1)解释PDUI的含义; (2)写脚本过滤adjusted.P_val<=0.05,PDUI_Group_diff>=0.5, PDUI_fold_change>=0.59的作为diff-APA events,和Pass_filter为“Y“筛选出来的diff-APA events做比较。

5) Tips

如果使用singularity,需要安装scipy和singledispatch。命令如下:

source /WORK/Samples/singularity.sh
singularity run /data/images/bioinfo_tsinghua_6.2_apa_6.3_ribo_6.4_structure.simg

pip2 install scipy
pip2 install singledispatch

然后再运行软件,命令如下:

cp -r /home/test/rna_regulation/apa apa
cd apa

/home/test/software/dapars-0.9.1/src/DaPars_Extract_Anno.py -b hg19_refseq_whole_gene.bed -s hg19_4_19_2012_Refseq_id_from_UCSC.txt -o hg19_refseq_extracted_3UTR.bed

启动 6.2 APA, 6.3 Ribo-seq, 6.4 Structure-seq的 ,然后进入工作目录

注意这里的bed文件和我们前面提到的bed文件有所不同,确切的来说应该叫bed12文件。请参考给出的解释。

http://genome.ucsc.edu/FAQ/FAQformat#format1
Docker