Bioinformatics Tutorial
Files Needed
  • Getting Started
    • Setup
    • Run jobs in a Docker
    • Run jobs in a cluster [Advanced]
  • Part I. Programming Skills
    • 1.Linux
      • 1.1.Basic Command
      • 1.2.Practice Guide
      • 1.3.Linux Bash
    • 2.R
      • 2.1.R Basics
      • 2.2.Plot with R
    • 3.Python
  • PART II. BASIC ANALYSES
    • 1.Blast
    • 2.Conservation Analysis
    • 3.Function Analysis
      • 3.1.GO
      • 3.2.KEGG
      • 3.3.GSEA
    • 4.Clinical Analyses
      • 4.1.Survival Analysis
  • Part III. NGS DATA ANALYSES
    • 1.Mapping
      • 1.1 Genome Browser
      • 1.2 bedtools and samtools
    • 2.RNA-seq
      • 2.1.Expression Matrix
      • 2.2.Differential Expression with Cufflinks
      • 2.3.Differential Expression with DEseq2 and edgeR
    • 3.ChIP-seq
    • 4.Motif
      • 4.1.Sequence Motif
      • 4.2.Structure Motif
    • 5.RNA Network
      • 5.1.Co-expression Network
      • 5.2.miRNA Targets
      • 5.3. CLIP-seq (RNA-Protein Interaction)
    • 6.RNA Regulation - I
      • 6.1.Alternative Splicing
      • 6.2.APA (Alternative Polyadenylation)
      • 6.3.Chimeric RNA
      • 6.4.RNA Editing
      • 6.5.SNV/INDEL
    • 7.RNA Regulation - II
      • 7.1.Translation: Ribo-seq
      • 7.2.RNA Structure
    • 8.cfDNA
      • 8.1.Basic cfDNA-seq Analyses
  • Part IV. MACHINE LEARNING
    • 1.Machine Learning Basics
      • 1.1 Data Pre-processing
      • 1.2 Data Visualization & Dimension Reduction
      • 1.3 Feature Extraction and Selection
      • 1.4 Machine Learning Classifiers/Models
      • 1.5 Performance Evaluation
    • 2.Machine Learning with R
    • 3.Machine Learning with Python
  • Part V. Assignments
    • 1.Precision Medicine - exSEEK
      • Help
      • Archive: Version 2018
        • 1.1.Data Introduction
        • 1.2.Requirement
        • 1.3.Helps
    • 2.RNA Regulation - RiboShape
      • 2.0.Programming Tools
      • 2.1.RNA-seq Analysis
      • 2.2.Ribo-seq Analysis
      • 2.3.SHAPE Data Analysis
      • 2.4.Integration
    • 3.RNA Regulation - dsRNA
    • 4.Single Cell Data Analysis
      • Help
  • 5.Model Programming
  • Appendix
    • Appendix I. Keep Learning
    • Appendix II. Databases & Servers
    • Appendix III. How to Backup
    • Appendix IV. Teaching Materials
    • Appendix V. Software and Tools
    • Appendix VI. Genome Annotations
Powered by GitBook
On this page
  • 1) workflow
  • 2) running steps
  • 运行环境
  • 预测RNA二级结构
  • 把预测出的结构编码到新的字母表上
  • 3) other tools
  • 4) Homework

Was this helpful?

Edit on GitHub
  1. Part III. NGS DATA ANALYSES
  2. 4.Motif

4.2.Structure Motif

Previous4.1.Sequence MotifNext5.RNA Network

Last updated 3 years ago

Was this helpful?

1) workflow

  • 选择RNA structure motif discovery的输入序列的方法和RNA sequence motif discovery是完全一样的。

  • RNA structure motif discovery的应用场景相对更狭窄一些,目前也没有一个像MEME suite那样具有统治地位的工具。所以我们这里举了好几个例子。

  • 本教程中实际使用的是BEAM这个工具,其他的软件我们也列出了相应的文章和说明文档,请有兴趣的同学自行了解。

  • BEAM的具体做法是先用RNA二级结构预测的软件(RNAfold, RNAstructure等)预测出RNA的二级结构,在按照这个工具自己的一套规则定义一套掺入了结构信息的一套新的字母表,把原来四个核苷酸的字符串编码成在这个新的字母表上同样长度的字符串。这样它就把structure motif discovery的问题转化成了sequence motif discovery的问题,而sequence motif discovery的问题是有很多well established的算法去解决的。

  • 这个方法的好处在于简单,但是也有明显的局限性,因为把二维结构按人为定义的规则转化为少量字符组成的一维序列,总是会丢失很多信息。

2) running steps

运行环境

docker exec -it -u test motif /bin/bash
cd /home/test/motif/structure_motif/BEAM
  • 如果在P集群上使用singularity:

source /WORK/Samples/singularity.sh
export SINGULARITY_BINDPATH='/data:/data'
singularity shell /data/images/bioinfo_motif_2.0.simg
source /home/test/.bashrc
#...
exit
  • 如果想使用P集群,又不想用singularity

    • 可以使用/data/2022-bioinfo-shared/softwares提供的工具

    • 输入序列为/data/2022-bioinfo-shared/data/motif-analysis/test.fa

预测RNA二级结构

  • 用RNAfold预测RNA二级结构。这里的".dbn"后缀意思是"dot-bracket notation",用点(dot,".")表示单链区域,括号(bracket,"("和")")表示互补配对。

  • ".dbn"文件每三行为一个record,三行分别是以">"开头的序列id,序列,和dot-bracket notation表示的二级结构。dot-bracket notation后面空一格还会输出计算出的自由能。

cd /home/test/motif/structure_motif/BEAM
RNAfold --noPS <test.fa > test.dbn

把预测出的结构编码到新的字母表上

BearEncoder.new.jar这个java程序实现了把二级结构编码到新字母表上的过程。这个程序比较僵化,如果.dbn文件每一个record的第三行除了二级结构之外还有别的内容,就没法正确解析,所以我们第一行代码先把自由能一项给舍弃:

cat test.dbn | awk 'NR%3==0{print $1;next;}{print $0}' > test.fixed.dbn
java -jar /home/test/software/BEAM/beam-2.0/BearEncoder.new.jar test.fixed.dbn test.bear

motif discovery

java -jar /home/test/software/beam-2.0/BEAM_release_1.5.1.jar -f test.bear -w 10 -W 40 -M 3

motif可视化

cd risultati/test/webLogoOut/motifs
weblogo -a 'ZAQXSWCDEVFRBGTNHY' -f test_m1_run1_wl.fa \
-o out.jpeg -F jpeg --composition="none" \
-C red ZAQ 'Stem' -C blue XSW 'Loop' -C forestgreen CDE 'InternalLoop' \
-C orange VFR 'StemBranch' -C DarkOrange B 'Bulge' \
-C lime G 'BulgeBranch' -C purple T 'Branching' \
-C limegreen NHY 'InternalLoopBranch'

3) other tools

  • RNApromo

  • GraphProt

  • RNAcontext

4) Homework

如果使用docker,本节仍使用我们在一节用到的docker容器

example output

software:

publication: 2008,PNAS,

software:

publication: 2014, Genome Biology,

software:

publication: 2010, Plos Computational Biology,

阅读2014,NAR,这篇文章,简要描述BEAR encoder这个工具是怎样把二维结构编码到一维序列上的。

RNApromo这个工具是2008,PNAS,这篇文章发表的。阅读文章,简要回答RNApromo和meme的区别在什么的地方?

sequence motif
RNApromo
Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes
https://github.com/dmaticzka/GraphProt
GraphProt: modeling binding preferences of RNA-binding proteins
RNAcontext
RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins
A novel approach to represent and compare RNA secondary structures
Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes