# 6.2.APA (Alternative Polyadenylation)

## 1) Background

可变多聚腺苷酸化(Alternative polyadenylatio,APA)指的是mRNA在polyA加尾时可能会选取不同的位置,这样就会产生不同的isoforms,每个isform 3' UTR的序列有所不同。APA是一种调控mRNA多样性,稳定性和翻译的普遍机制。

![](/files/-LfOLvVhz38N_xsgLLO6)

## 2) Workflow

* 目前已有一些专门针对APA研究的测序方法(例如PAS-seq专门对转录本中基因组编码的序列和poly A的junction进行测序),不过基于常规的RNA-seq数据也可以进行一些APA的分析。
* 我们这里介绍的DaPar，就是一个从常规RNA-seq数据出发进行APA分析的工具。
* DaPar假设每个转录本都存在一个proximal的poly A位点，一个distal的poly A位点，因而产生长短两种isoform。
* DaPar假设长的isoform对应基因组注释的转录本末端,再根据3' UTR reads coverage的模式推断出APA的位点，进而估计出长短两种isoform的相对比例。

![](/files/-LfOLvVffnuyIS0k--Ns)

## 3) Running steps (DaPars)

启动 6.2 APA, 6.3 Ribo-seq, 6.4 Structure-seq的 [Docker](/teaching/part-iii.-ngs-data-analyses/6.rna-regulation.md#files)，然后进入工作目录

```bash
cd /home/test/rna_regulation/apa
```

### 3a) Generate region annotation

在这一步骤中，`DaPars_Extract_Anno.py`这个脚本从用户提供的bed文件中提取出3'UTR，把有注释的转录本末端当做distal poly A site。 我们可以通过下面一条命令实现:

```bash
/home/test/software/dapars-0.9.1/src/DaPars_Extract_Anno.py -b hg19_refseq_whole_gene.bed -s hg19_4_19_2012_Refseq_id_from_UCSC.txt -o hg19_refseq_extracted_3UTR.bed
```

* 注意这里的bed文件和我们前面提到的bed文件有所不同，确切的来说应该叫bed12文件。请参考<http://genome.ucsc.edu/FAQ/FAQformat#format1>给出的解释。
* 和bed文件一样，bed12文件每一行都对应一个genomic interval，特殊之处在于它还在10-12列注释出了这个genomic interval中的一些互不重合的sub regions。这样的形式就很适合描述一个转录本是由基因组上的哪些exons剪接形成的。
* 在我们这个例子中,`hg19_refseq_whole_gene.bed`每一行都对应一个转录本,它所能反应的信息和常规的gtf/gff注释文件非常相似。

#### input

hg19\_refseq\_whole\_gene.bed (bed12 format)

```
   chr1    66999824    67210768    NM_032291    0    +    67000041    67208778    25    227,64,25,72,57,55,176,12,12,25,52,86,93,75,501,128,127,60,112,156,133,203,65,165,2013,    0,91705,98928,101802,105635,108668,109402,126371,133388,136853,137802,139139,142862,145536,147727,155006,156048,161292,185152,195122,199606,205193,206516,207130,208931,
   chr1    33546713    33585995    NM_052998    0    +    33547850    33585783    12    182,121,212,177,174,173,135,166,163,113,215,351,    0,275,488,1065,2841,10937,12169,13435,15594,16954,36789,38931,
   chr1    16767166    16786584    NM_001145278    0    +    16767256    16785385    104,101,105,82,109,178,76,1248,    0,2960,7198,7388,8421,11166,15146,18170,
```

hg19\_4\_19\_2012\_Refseq\_id\_from\_UCSC.txt

```
   #name    name2
   NM_032291    SGIP1
   NM_052998    ADC
```

#### output

hg19\_refseq\_extracted\_3UTR.bed

```
   chr14    50792327    50792946    NM_001003805|ATP5S|chr14|+    0    +
   chr9    95473645    95477745    NM_001003800|BICD2|chr9|-    0    -
   chr11    92623657    92629635    NM_001008781|FAT3|chr11|+    0    +
```

### 3b) Main function to get final result

#### starting analysis

```
/home/test/software/dapars-0.9.1/src/DaPars_main.py configure_file
```

dapar要求我们提供一个包含输入输出及参数设置的配置文件。

#### input

configure\_file

The format of the configure file is:

```
#The following file is the result of step 1.

Annotated_3UTR=hg19_refseq_extracted_3UTR.bed

#A comma-separated list of BedGraph files of samples from condition 1

Group1_Tophat_aligned_Wig=Condition_A_chrX.wig
#Group1_Tophat_aligned_Wig=Condition_A_chrX_r1.wig,Condition_A_chrX_r2.wig if multiple files in one group

#A comma-separated list of BedGraph files of samples from condition 2

Group2_Tophat_aligned_Wig=Condition_B_chrX.wig

Output_directory=DaPars_Test_data/

Output_result_file=DaPars_Test_data

#At least how many samples passing the coverage threshold in two conditions
Num_least_in_group1=1

Num_least_in_group2=1

Coverage_cutoff=30

#Cutoff for FDR of P-values from Fisher exact test.

FDR_cutoff=0.05


PDUI_cutoff=0.5

Fold_change_cutoff=0.59
```

#### output

![](/files/-LfOLvVpvsAJxqtfDU0i)

### 3c) Filter diff-APA events

FDR\_cutoff, PDUI\_cutoff, Fold\_change\_cutoff → Pass filer (Y nor N)

## 4) Homework

运行示例文件，理解输出文件“DaPars\_Test\_data\_All\_Prediction\_Results.txt”中每一列的含义。 (1)解释PDUI的含义； (2)写脚本过滤adjusted.P\_val<=0.05,PDUI\_Group\_diff>=0.5, PDUI\_fold\_change>=0.59的作为diff-APA events，和Pass\_filter为“Y“筛选出来的diff-APA events做比较。

## 5) Tips

如果使用singularity，需要安装scipy和singledispatch。命令如下：

```bash
source /WORK/Samples/singularity.sh
singularity run /data/images/bioinfo_tsinghua_6.2_apa_6.3_ribo_6.4_structure.simg

pip2 install scipy
pip2 install singledispatch
```

然后再运行软件，命令如下：

```bash
cp -r /home/test/rna_regulation/apa apa
cd apa

/home/test/software/dapars-0.9.1/src/DaPars_Extract_Anno.py -b hg19_refseq_whole_gene.bed -s hg19_4_19_2012_Refseq_id_from_UCSC.txt -o hg19_refseq_extracted_3UTR.bed
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.ncrnalab.org/teaching/part-iii.-ngs-data-analyses/6.rna-regulation/apa.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
