Bioinformatics Tutorial
Files Needed
  • Getting Started
    • Setup
    • Run jobs in a Docker
    • Run jobs in a cluster [Advanced]
  • Part I. Programming Skills
    • 1.Linux
      • 1.1.Basic Command
      • 1.2.Practice Guide
      • 1.3.Linux Bash
    • 2.R
      • 2.1.R Basics
      • 2.2.Plot with R
    • 3.Python
  • PART II. BASIC ANALYSES
    • 1.Blast
    • 2.Conservation Analysis
    • 3.Function Analysis
      • 3.1.GO
      • 3.2.KEGG
      • 3.3.GSEA
    • 4.Clinical Analyses
      • 4.1.Survival Analysis
  • Part III. NGS DATA ANALYSES
    • 1.Mapping
      • 1.1 Genome Browser
      • 1.2 bedtools and samtools
    • 2.RNA-seq
      • 2.1.Expression Matrix
      • 2.2.Differential Expression with Cufflinks
      • 2.3.Differential Expression with DEseq2 and edgeR
    • 3.ChIP-seq
    • 4.Motif
      • 4.1.Sequence Motif
      • 4.2.Structure Motif
    • 5.RNA Network
      • 5.1.Co-expression Network
      • 5.2.miRNA Targets
      • 5.3. CLIP-seq (RNA-Protein Interaction)
    • 6.RNA Regulation - I
      • 6.1.Alternative Splicing
      • 6.2.APA (Alternative Polyadenylation)
      • 6.3.Chimeric RNA
      • 6.4.RNA Editing
      • 6.5.SNV/INDEL
    • 7.RNA Regulation - II
      • 7.1.Translation: Ribo-seq
      • 7.2.RNA Structure
    • 8.cfDNA
      • 8.1.Basic cfDNA-seq Analyses
  • Part IV. MACHINE LEARNING
    • 1.Machine Learning Basics
      • 1.1 Data Pre-processing
      • 1.2 Data Visualization & Dimension Reduction
      • 1.3 Feature Extraction and Selection
      • 1.4 Machine Learning Classifiers/Models
      • 1.5 Performance Evaluation
    • 2.Machine Learning with R
    • 3.Machine Learning with Python
  • Part V. Assignments
    • 1.Precision Medicine - exSEEK
      • Help
      • Archive: Version 2018
        • 1.1.Data Introduction
        • 1.2.Requirement
        • 1.3.Helps
    • 2.RNA Regulation - RiboShape
      • 2.0.Programming Tools
      • 2.1.RNA-seq Analysis
      • 2.2.Ribo-seq Analysis
      • 2.3.SHAPE Data Analysis
      • 2.4.Integration
    • 3.RNA Regulation - dsRNA Code
    • 4.Single Cell Data Analysis
      • Help
  • 5.Model Programming
  • Appendix
    • Appendix I. Keep Learning
    • Appendix II. Databases & Servers
    • Appendix III. How to Backup
    • Appendix IV. Teaching Materials
    • Appendix V. Software and Tools
    • Appendix VI. Genome Annotations
Powered by GitBook
On this page
  • 0) Process files in different format
  • 0.1) sequence
  • 0.2) alignment
  • 0.3) interval
  • 1) Homolog analysis
  • 1.1) Sequence based search
  • 1.2) Profile based search
  • 1.3) Multiple sequence alignment
  • 2) Genome Browsers
  • 3) DNA-seq
  • (3.1) Mapping and QC
  • (3.2) Variant Calling
  • (3.3) Assembly
  • (3.4) CNV
  • (3.5) SV (structural variation)
  • 4) RNA-seq
  • (4.1) RNA-seq
  • (4.2) Single Cell RNA-seq (scRNA-seq)
  • 4.3 Assembly
  • 5) Interactome
  • (5.1) ChIP-seq
  • (5.2) CLIP-seq
  • (5.3) Motif analysis
  • 6) Epigenetic Data
  • (6.1) ChIP-seq
  • (6.2) DNAase-seq
  • (6.3) ATAC-seq
  • (7) Microbe data analysis
  • More: Shared tools and scripts
  • More: Software for the ages

Was this helpful?

Edit on GitHub
  1. Appendix

Appendix V. Software and Tools

0) Process files in different format

0.1) sequence

  • seqtk

  • bbmap

0.2) alignment

  • gffread

  • samtools

  • bamtools

0.3) interval

  • bedtools

  • bedtk

1) Homolog analysis

1.1) Sequence based search

  • blast: 方便的网页工具

  • blat: a blast like tool

  • mmseqs: 比blast更现代的同源搜索工具,推荐本地进行大量计算时使用

  • diamond: 蛋白的同源搜索工具

1.2) Profile based search

  • hmmer: profile hmm based search for protein and nucleotide sequence

  • infernal: profile SCFG based search for structured noncoding RNA

  • hh-suite: profile hmm to profile hmm alignment

1.3) Multiple sequence alignment

  • MAFFT

  • clustal

  • T-Coffee

2) Genome Browsers

  • UCSD Genome Browser (@youtube @bilibili)

  • IGV (@youtube @bilibili)

see more in our Tutorial

3) DNA-seq

(3.1) Mapping and QC

  • Remove adaptor

    • cutadapt

    • TrimGalore: 对cutadapt进行封装,自动识别常见adaptor

    • Trimmomatic

    • fastp

  • Mapping

    • bowtie2

    • bowtie

    • bwa

  • QC

    • fastqc

(3.2) Variant Calling

  • Mutation discovery

    • GATK

    • Varscan

  • Mutation annotation

    • ANNOVAR

(3.3) Assembly

denovo assembly software

  • SPAdes

    • the sub-utility metaSPAdes is designed for metagenome assembly

  • megahit: designed for metagenome assembly

(3.4) CNV

  • Whole genome Seq

    • Control-FREEC

  • Whole exome Seq

    • CONTRA

    • ExomeCNV

(3.5) SV (structural variation)

  • structural variation

    • lumpy

    • Breakdancer

4) RNA-seq

(4.1) RNA-seq

  • Mapping

    • STAR

      • The sub-utility STARsolo is designed for mapping of single cell RNA-seq data

    • hisat2

  • Expression Quantification

    • featureCounts

    • htseq-count

    • salmon

    • kallisto

  • Differential Analysis

    • deseq2

    • edgeR

    • limma

  • Alternative Splicing Analysis:

    • rMATS

    • MAJIQ

    • SUPPA

    • DEXSeq

  • RNA Editing

    • RNAEditor

    • REDItools

(4.2) Single Cell RNA-seq (scRNA-seq)

  • awesome-single-cell: a collection of single cell analysis tools

  • seurat: a widely used R package

  • scanpy: a widely used python package

  • monocle: Trajectory analysis

  • cellphonedb: Cell-cell interaction analysis

  • scenic: Transcriptional regulatory network

  • Tutorials

    • https://bioconductor.org/books/release/OSCA/

    • https://github.com/theislab/single-cell-tutorial

Nature Biotechnology 2020 38(3):254-257

Software name
Developer
Price structure
Platform-specific
Relevant stages of experiment

10X Genomics

Free download

10X Chromium

Raw read alignment, QC and matrix generation for scRNA-seq and ATAC-seq; data normalization; dimensionality reduction and clustering

10X Genomics

Free download

10X Chromium

Visualization and analysis

Partek

License

No

Complete data analysis and visualization pipeline for scRNA-seq data

Qlucore

License

No

scRNA-seq data filtering, dimensionality reduction and clustering, visualization

Takara Bio

Free download

Takara ICell8

Raw read alignment and matrix generation for scRNA-seq

Takara Bio

Free download

Takara ICell8

Clustering and analysis of mappa data

Fluidigm

Free download

Fluidigm C1 or Biomark

Analysis and visualization of differential gene expression data for scRNA-seq

FlowJo/BD Biosciences

License

No

Data normalization and QC, dimensionality reduction and clustering, analysis and visualization

Seven Bridges/BD Biosciences

License

BD Rhapsody and Precise

Cloud-based raw read alignment, QC and matrix generation

Mission Bio

Free download

Mission Bio Tapestri

Analysis of single-cell genomics data

Illumina

License

Illumina SureCell libraries

Raw read alignment and matrix generation

Qiagen

License

No

Raw read alignment, QC and matrix generation, dimensionality reduction and clustering

4.3 Assembly

  • Trinity: 利用RNA-seq数据进行转录本组装

5) Interactome

(5.1) ChIP-seq

  • MACS: peak calling

  • homer: peak calling, motif finding, etc

  • ChIPseeker: visualization and annotation

(5.2) CLIP-seq

  • CTK

  • Piranha

  • PARalyzer

  • clipper

(5.3) Motif analysis

sequence

  1. MEME motif based sequence analysis tools http://meme-suite.org/

  2. HOMER Software for motif discovery and next-gen sequencing analysis http://homer.ucsd.edu/homer/motif/

structure

  1. RNApromo Computational prediction of RNA structural motifs involved in post transcriptional regulatory processes https://genie.weizmann.ac.il/pubs/rnamotifs08/

  2. GraphProt modeling binding preferences of RNA-binding proteins http://www.bioinf.uni-freiburg.de/Software/GraphProt/

6) Epigenetic Data

(6.1) ChIP-seq

  • Bisulfate sequencing:

    • Review: Katarzyna Wreczycka, et al. Strategies for analyzing bisulfite sequencing data. Journal of Biotechnology. 2017.

    • Mapping:

      • Bismark

      • BSMAP

    • Differential Methylation Regions (DMRs) detection

      • methylkit

      • ComMet

    • Segmentation of the methylome, Classification of Fully Methylated Regions (FMRs), Unmethylated Regions (UMRs) and Low-Methylated Regions (LMRs)

      • MethylSeekR

    • Annotation of DMRs

      • genomation

      • ChIPpeakAnno

    • Web-based service

      • WBSA

  • IP data:

    • Overview to CHIP-Seq: https://github.com/crazyhottommy/ChIP-seq-analysis

    • peak calling: MACS2

    • Peak annotation and visualization

      • HOMER annotatePeak

      • ChIPseeker

    • Gene set enrichment analysis for ChIP-seq peaks

      • GREAT

(6.2) DNAase-seq

  • review : Yongjing Liu, et al. Brief in Bioinformatics, 2019.

  • Peak calling: F-Seq

  • Peak annotation: ChIPpeakAnno

  • Motif analysis: MEME-ChIP

(6.3) ATAC-seq

  • Pipeline recommended by Harward informatics

(7) Microbe data analysis

  • kraken2: k-mer based fast metagenome reads classification

  • metaphlan: marker gene based microbe taxonomy abundance estimation

  • motu: marker gene based microbe taxonomy abundance estimation

  • maxbin: binning contigs into metagenome-assembled genomes (MAGs)

  • mash: rapid estimation of distance between genome

  • drep: pick representative genome from sample-wise assembly

  • prodigal: prokaryote gene prediction

  • prokka: pipeline for prokaryote genome annotation

  • qiime2: 16S amplicon sequencing data analysis

More: Shared tools and scripts

  • Scripts: Lu Lab | Zhi J. Lu

  • Plots: Lu Lab | Zhi J. Lu

More: Software for the ages

Software
Purpose
Creators
Key capabilities
Year released
Citationsa

BLAST

Sequence alignment

Stephen Altschul, Warren Gish, Gene Myers, Webb Miller, David Lipman

First program to provide statistics for sequence alignment, combination of sensitivity and speed

1990

35,617

R

Statistical analyses

Robert Gentleman, Ross Ihaka

Interactive statistical analysis, extendable by packages

1996

N/A

ImageJ

Image analysis

Wayne Rasband

Flexibility and extensibility

1997

N/A

Cytoscape

Network visualization and analysis

Trey Ideker et al.

Extendable by plugins

2003

2,374

Bioconductor

Analysis of genomic data

Robert Gentleman et al.

Built on R, provides tools to enhance reproducibility of research

2004

3,517

Galaxy

Web-based analysis platform

Anton Nekrutenko, James Taylor

Provides easy access to high-performance computing

2005

309b

MAQ

Short-read mapping

Heng Li, Richard Durbin

Integrated read mapping and SNP calling, introduced mapping quality scores

2008

1,027

Bowtie

Short-read mapping

Ben Langmead, Cole Trapnell, Mihai Pop, Steven Salzberg

Fast alignment allowing gaps and mismatches based on Burrows-Wheeler Transform

2009

1,871

Tophat

RNA-seq read mapping

Cole Trapnell, Lior Pachter, Steven Salzberg

Discovery of novel splice sites

2009

817

BWA

Short-read mapping

Heng Li, Richard Durbin

Fast alignment allowing gaps and mismatches based on Burrows-Wheeler Transform

2009

1,556

Circos

Data visualization

Martin Krzywinski et al.

Compact representation of similarities and differences arising from comparison between genomes

2009

431

SAMtools

Short-read data format and utilities

Heng Li, Richard Durbin

Storage of large nucleotide sequence alignments

2009

1,551

Cufflinks

RNA-seq analysis

Cole Trapnell, Steven Salzberg, Barbara Wold, Lior Pachter

Transcript assembly and quantification

2010

710

IGV

Short-read data visualization

James Robinson et al.

Scalability, real-time data exploration

2011

335

N/A, paper not available in Web of Science.

From: The anatomy of successful computational biology software

PreviousAppendix IV. Teaching MaterialsNextAppendix VI. Genome Annotations

Last updated 2 years ago

Was this helpful?

Cell Ranger
Loupe Cell Browser
Partek Flow
Qlucore Omics Explorer
mappa Analysis Pipeline
hanta R kit
Singular Analysis Toolset
SeqGeq
Seven Bridges
Tapestri Pipeline/Insights
BaseSpace SureCell
OmicSoft Array Studio