3.1.GO
Last updated
Was this helpful?
Last updated
Was this helpful?
参考 文件获取方式下载 GO_gene.txt 文件, 内容为一组人类基因的ensembl ID:
File format
Information contained in file
File description
Notes
txt
Gene encode id
The file contain the gene encode id
-
File format
Information contained in file
File description
Notes
txt
Output information
The gene ontology of each gene
-
Reference list
User upload
Mapped IDs:
21042 out of 21042
50 out of 50
Unmapped IDs:
0
1
Multiple mapping information:
0
0
We only display results with False Discovery Rate (FDR) < 0.05.
DNA replication
208
6
0.49
12.14
+
1.11E-05
1.25E-02
通过和数据库比对,我们可以知道在数据库参考基因组中的21042基因中,被注释到DNA replication 的有208个,在用户上传的50个可以识别的基因中有6个基因被注释为DNA replication。
expected 0.4942= 208*50/21042
Fold Enrichment 12.14=6/0.4942
+/- 富集用“+”表示
raw P value 可以用下面公式计算
N: numbers of one organism's genes annotated with GO or of the user-provided background . 这里N等于21042
n: numbers of genes mapped to the background in the query list
. 这里n等于50
K: numbers of genes in one GO term
. 这里K等于208
k: the counts of genes mapped to the GO term in the query list
. 这里k等于6
其他一些网页工具和R package也常被用来做富集分析,有兴趣的同学可自行了解:
R package
从wt.light.vs.dark.all.txt(这是我们在差异表达一节获得的野生型的结果)中选取显著上调的(FDR<0.05, logFC>1)的基因进行GO分析。
请问上面的例子中, Fold Enrichment和P value是如何计算的? 请写出公式,并解释原理。此外,在定义显著富集的 GO terms 时为什么一般不是参考P value的大小,而是要计算一个 FDR来做为参考?