# Appendix V. Software and Tools

## 0) Process files in different format

### 0.1) sequence

* [seqtk](https://github.com/lh3/seqtk)
* [bbmap](https://sourceforge.net/projects/bbmap/)

### 0.2) alignment

* [gffread](https://github.com/gpertea/gffread)
* [samtools](http://www.htslib.org/)
* [bamtools](https://github.com/pezmaster31/bamtools)

### 0.3) interval

* [bedtools](https://bedtools.readthedocs.io/en/latest/)
* [bedtk](https://github.com/lh3/bedtk)

## 1) Homolog analysis

### 1.1) Sequence based search

* [blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi): 方便的网页工具
* [blat](https://genome.ucsc.edu/cgi-bin/hgBlat): a blast like tool
* [mmseqs](https://github.com/soedinglab/MMseqs2): 比blast更现代的同源搜索工具，推荐本地进行大量计算时使用
* [diamond](https://github.com/bbuchfink/diamond): 蛋白的同源搜索工具

### 1.2) Profile based search

* [hmmer](http://hmmer.org/): profile hmm based search for protein and nucleotide sequence
* [infernal](http://eddylab.org/infernal/): profile SCFG based search for structured noncoding RNA
* [hh-suite](https://github.com/soedinglab/hh-suite): profile hmm to profile hmm alignment

### 1.3) Multiple sequence alignment

* [MAFFT](https://mafft.cbrc.jp/alignment/software/)
* [clustal](http://www.clustal.org/)
* [T-Coffee](https://tcoffee.crg.eu/)

## 2) Genome Browsers

* [UCSD Genome Browser](https://genome.ucsc.edu/) ([@youtube](https://youtu.be/eTgEtfI65hA) [@bilibili](https://player.bilibili.com/player.html?aid=30448417\&cid=53132461\&page=1))
* [IGV](http://software.broadinstitute.org/software/igv/) ([@youtube](https://youtu.be/6_1ZcVw7ptU) [@bilibili](https://player.bilibili.com/player.html?aid=30448472\&cid=53133093\&page=1))

> see more in [our Tutorial](/teaching/part-iii.-ngs-data-analyses/1.mapping/1.1-genome-browser.md)

## 3) DNA-seq

### (3.1) Mapping and QC

* **Remove adaptor**
  * [cutadapt](https://cutadapt.readthedocs.io/en/stable/)
  * [TrimGalore](https://github.com/FelixKrueger/TrimGalore): 对cutadapt进行封装，自动识别常见adaptor
  * [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic)
  * [fastp](https://github.com/OpenGene/fastp)
* **Mapping**
  * [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
  * [bowtie](https://bowtie-bio.sourceforge.net/manual.shtml)
  * [bwa](https://bio-bwa.sourceforge.net/)
* **QC**
  * [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

### (3.2) Variant Calling

* **Mutation discovery**
  * [GATK](https://gatk.broadinstitute.org/hc/en-us)
  * [Varscan](http://dkoboldt.github.io/varscan/)
* **Mutation annotation**
  * [ANNOVAR](http://annovar.openbioinformatics.org/en/latest/user-guide/download/)

### (3.3) Assembly

**denovo assembly software**

* [SPAdes](https://github.com/ablab/spades)
  * the sub-utility metaSPAdes is designed for metagenome assembly
* [megahit](https://github.com/voutcn/megahit): designed for metagenome assembly

### (3.4) CNV

* **Whole genome Seq**
  * [Control-FREEC](http://boevalab.inf.ethz.ch/FREEC/)
* **Whole exome Seq**
  * [CONTRA](http://contra-cnv.sourceforge.net/)
  * [ExomeCNV](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3179661/)

### (3.5) SV (structural variation)

* **structural variation**
  * [lumpy](https://github.com/arq5x/lumpy-sv)
  * [Breakdancer](http://breakdancer.sourceforge.net/)

## 4) RNA-seq

### (4.1) RNA-seq

* **Mapping**
  * [STAR](https://github.com/alexdobin/STAR)
    * The sub-utility STARsolo is designed for mapping of single cell RNA-seq data
  * [hisat2](http://daehwankimlab.github.io/hisat2/)
* **Expression Quantification**
  * [featureCounts](http://subread.sourceforge.net/)
  * [htseq-count](https://htseq.readthedocs.io/en/master/)
  * [salmon](https://combine-lab.github.io/salmon/)
  * [kallisto](https://pachterlab.github.io/kallisto/)
* **Differential Analysis**
  * [deseq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
  * [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html)
  * [limma](https://bioconductor.org/packages/release/bioc/html/limma.html)
* **Alternative Splicing Analysis**:
  * [rMATS](http://rnaseq-mats.sourceforge.net/)
  * [MAJIQ](https://majiq.biociphers.org/)
  * [SUPPA](https://github.com/comprna/SUPPA)
  * [DEXSeq](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html)
* **RNA Editing**
  * [RNAEditor](http://rnaeditor.uni-frankfurt.de/)
  * [REDItools](http://code.google.com/p/reditools/)

### (4.2) Single Cell RNA-seq (scRNA-seq)

* [awesome-single-cell](https://github.com/seandavi/awesome-single-cell): a collection of single cell analysis tools
* [seurat](https://satijalab.org/seurat/): a widely used R package
* [scanpy](https://scanpy.readthedocs.io/en/stable/): a widely used python package
* [monocle](http://cole-trapnell-lab.github.io/monocle-release/): Trajectory analysis
* [cellphonedb](https://www.cellphonedb.org/): Cell-cell interaction analysis
* [scenic](https://scenic.aertslab.org/): Transcriptional regulatory network
* Tutorials
  * <https://bioconductor.org/books/release/OSCA/>
  * <https://github.com/theislab/single-cell-tutorial>

> [Nature Biotechnology 2020 38(3):254-257](https://www.nature.com/articles/s41587-020-0449-8)

| Software name                                                                                                                                            | Developer                    | Price structure | Platform-specific           | Relevant stages of experiment                                                                                                        |
| -------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- | --------------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| [Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger)                                 | 10X Genomics                 | Free download   | 10X Chromium                | Raw read alignment, QC and matrix generation for scRNA-seq and ATAC-seq; data normalization; dimensionality reduction and clustering |
| [Loupe Cell Browser](https://support.10xgenomics.com/single-cell-gene-expression/software/visualization/latest/what-is-loupe-cell-browser)               | 10X Genomics                 | Free download   | 10X Chromium                | Visualization and analysis                                                                                                           |
| [Partek Flow](https://www.partek.com/application-page/single-cell-gene-expression/)                                                                      | Partek                       | License         | No                          | Complete data analysis and visualization pipeline for scRNA-seq data                                                                 |
| [Qlucore Omics Explorer](https://www.qlucore.com/single-cell-rnaseq)                                                                                     | Qlucore                      | License         | No                          | scRNA-seq data filtering, dimensionality reduction and clustering, visualization                                                     |
| [mappa Analysis Pipeline](https://www.takarabio.com/products/automation-systems/icell8-system-and-software/bioinformatics-tools/mappa-analysis-pipeline) | Takara Bio                   | Free download   | Takara ICell8               | Raw read alignment and matrix generation for scRNA-seq                                                                               |
| [hanta R kit](https://www.takarabio.com/products/automation-systems/icell8-system-and-software/bioinformatics-tools/hanta-r-kit)                         | Takara Bio                   | Free download   | Takara ICell8               | Clustering and analysis of mappa data                                                                                                |
| [Singular Analysis Toolset](https://www.fluidigm.com/software)                                                                                           | Fluidigm                     | Free download   | Fluidigm C1 or Biomark      | Analysis and visualization of differential gene expression data for scRNA-seq                                                        |
| [SeqGeq](https://www.flowjo.com/solutions/seqgeq)                                                                                                        | FlowJo/BD Biosciences        | License         | No                          | Data normalization and QC, dimensionality reduction and clustering, analysis and visualization                                       |
| [Seven Bridges](https://www.sevenbridges.com/bdgenomics/)                                                                                                | Seven Bridges/BD Biosciences | License         | BD Rhapsody and Precise     | Cloud-based raw read alignment, QC and matrix generation                                                                             |
| [Tapestri Pipeline/Insights](https://missionbio.com/panels/software/)                                                                                    | Mission Bio                  | Free download   | Mission Bio Tapestri        | Analysis of single-cell genomics data                                                                                                |
| [BaseSpace SureCell](https://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub.html)                                         | Illumina                     | License         | Illumina SureCell libraries | Raw read alignment and matrix generation                                                                                             |
| [OmicSoft Array Studio](https://omicsoftdocs.github.io/ArraySuiteDoc/tutorials/scRNAseq/Introduction/)                                                   | Qiagen                       | License         | No                          | Raw read alignment, QC and matrix generation, dimensionality reduction and clustering                                                |

### 4.3 Assembly

* [Trinity](https://github.com/trinityrnaseq/trinityrnaseq/wiki): 利用RNA-seq数据进行转录本组装

## 5) Interactome

### **(5.1) ChIP-seq**

* [MACS](https://github.com/macs3-project/MACS): peak calling
* [homer](http://homer.ucsd.edu/homer/): peak calling, motif finding, etc
* [ChIPseeker](https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html): visualization and annotation

### **(5.2) CLIP-seq**

* [CTK](https://zhanglab.c2b2.columbia.edu/index.php/CTK_Documentation)
* [Piranha](http://smithlabresearch.org/software/piranha/)
* [PARalyzer](https://ohlerlab.mdc-berlin.de/software/PARalyzer_85/)
* [clipper](https://github.com/YeoLab/clipper)

### **(5.3) Motif analysis**

**sequence**

1. MEME motif based sequence analysis tools <http://meme-suite.org/>
2. HOMER Software for motif discovery and next-gen sequencing analysis <http://homer.ucsd.edu/homer/motif/>

**structure**

1. RNApromo Computational prediction of RNA structural motifs involved in post transcriptional regulatory processes <https://genie.weizmann.ac.il/pubs/rnamotifs08/>
2. GraphProt modeling binding preferences of RNA-binding proteins <http://www.bioinf.uni-freiburg.de/Software/GraphProt/>

## 6) Epigenetic Data

### **(6.1) ChIP-seq**

* **Bisulfate sequencing**:
  * Review: [Katarzyna Wreczycka, et al. Strategies for analyzing bisulfite sequencing data. Journal of Biotechnology. 2017.](https://www.sciencedirect.com/science/article/pii/S0168165617315936)
  * Mapping:
    * [Bismark](http://www.bioinformatics.babraham.ac.uk/projects/bismark/)
    * [BSMAP](https://github.com/zyndagj/BSMAPz)
  * Differential Methylation Regions (DMRs) detection
    * [methylkit](https://bioconductor.org/packages/release/bioc/html/methylKit.html)
    * [ComMet](https://github.com/yutaka-saito/ComMet)
  * Segmentation of the methylome, Classification of Fully Methylated Regions (FMRs), Unmethylated Regions (UMRs) and Low-Methylated Regions (LMRs)
    * [MethylSeekR](http://www.bioconductor.org/packages/release/bioc/html/MethylSeekR.html)
  * Annotation of DMRs
    * [genomation](https://bioconductor.org/packages/release/bioc/html/genomation.html)
    * [ChIPpeakAnno](https://www.bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html)
  * Web-based service
    * [WBSA](http://wbsa.big.ac.cn/)
* **IP data**:
  * Overview to CHIP-Seq: <https://github.com/crazyhottommy/ChIP-seq-analysis>
  * peak calling: [MACS2](https://github.com/taoliu/MACS/wiki/Advanced:-Call-peaks-using-MACS2-subcommands)
  * Peak annotation and visualization
    * [HOMER annotatePeak](http://homer.ucsd.edu/homer/ngs/annotation.html)
    * [ChIPseeker](http://bioconductor.org/packages/release/bioc/html/ChIPseeker.html)
  * Gene set enrichment analysis for ChIP-seq peaks
    * [GREAT](http://bejerano.stanford.edu/great/public/html/)

### **(6.2) DNAase-seq**

* review : [Yongjing Liu, et al. Brief in Bioinformatics, 2019.](https://academic.oup.com/bib/article-abstract/20/5/1865/5053117?redirectedFrom=fulltext)
* Peak calling: [F-Seq](http://fureylab.web.unc.edu/software/fseq/)
* Peak annotation: [ChIPpeakAnno](https://www.bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html)
* Motif analysis: [MEME-ChIP](http://meme-suite.org/doc/meme-chip.html?man_type=web)

### **(6.3) ATAC-seq**

* Pipeline recommended by [Harward informatics](https://github.com/harvardinformatics/ATAC-seq)

### (7) Microbe data analysis

* [kraken2](https://ccb.jhu.edu/software/kraken2/): k-mer based fast metagenome reads classification
* [metaphlan](https://huttenhower.sph.harvard.edu/metaphlan/): marker gene based microbe taxonomy abundance estimation
* [motu](https://motu-tool.org/): marker gene based microbe taxonomy abundance estimation
* [maxbin](https://sourceforge.net/projects/maxbin/): binning contigs into metagenome-assembled genomes (MAGs)
* [mash](https://mash.readthedocs.io/en/latest/distances.html): rapid estimation of distance between genome
* [drep](https://drep.readthedocs.io/en/latest/): pick representative genome from sample-wise assembly
* [prodigal](https://github.com/hyattpd/Prodigal): prokaryote gene prediction
* [prokka](https://github.com/tseemann/prokka): pipeline for prokaryote genome annotation
* [qiime2](https://qiime2.org/): 16S amplicon sequencing data analysis

## More: Shared tools and scripts

* Scripts: [Lu Lab](https://github.com/lulab/shared_scripts) | [Zhi J. Lu](https://github.com/urluzhi/scripts)
* Plots: [Lu Lab](/teaching/part-i.-programming-skills/2.r/2.2.plots-with-r.md) | [Zhi J. Lu](https://github.com/urluzhi/scripts/tree/master/Rscript/R_plot)

## More: Software for the ages

| Software                                    | Purpose                              | Creators                                                             | Key capabilities                                                                                 | Year released | Citationsa |
| ------------------------------------------- | ------------------------------------ | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | :-----------: | :--------: |
| BLAST                                       | Sequence alignment                   | Stephen Altschul, Warren Gish, Gene Myers, Webb Miller, David Lipman | First program to provide statistics for sequence alignment, combination of sensitivity and speed |      1990     |   35,617   |
| R                                           | Statistical analyses                 | Robert Gentleman, Ross Ihaka                                         | Interactive statistical analysis, extendable by packages                                         |      1996     |     N/A    |
| ImageJ                                      | Image analysis                       | Wayne Rasband                                                        | Flexibility and extensibility                                                                    |      1997     |     N/A    |
| Cytoscape                                   | Network visualization and analysis   | Trey Ideker *et al*.                                                 | Extendable by plugins                                                                            |      2003     |    2,374   |
| Bioconductor                                | Analysis of genomic data             | Robert Gentleman *et al*.                                            | Built on R, provides tools to enhance reproducibility of research                                |      2004     |    3,517   |
| Galaxy                                      | Web-based analysis platform          | Anton Nekrutenko, James Taylor                                       | Provides easy access to high-performance computing                                               |      2005     |    309b    |
| MAQ                                         | Short-read mapping                   | Heng Li, Richard Durbin                                              | Integrated read mapping and SNP calling, introduced mapping quality scores                       |      2008     |    1,027   |
| Bowtie                                      | Short-read mapping                   | Ben Langmead, Cole Trapnell, Mihai Pop, Steven Salzberg              | Fast alignment allowing gaps and mismatches based on Burrows-Wheeler Transform                   |      2009     |    1,871   |
| Tophat                                      | RNA-seq read mapping                 | Cole Trapnell, Lior Pachter, Steven Salzberg                         | Discovery of novel splice sites                                                                  |      2009     |     817    |
| BWA                                         | Short-read mapping                   | Heng Li, Richard Durbin                                              | Fast alignment allowing gaps and mismatches based on Burrows-Wheeler Transform                   |      2009     |    1,556   |
| Circos                                      | Data visualization                   | Martin Krzywinski *et al*.                                           | Compact representation of similarities and differences arising from comparison between genomes   |      2009     |     431    |
| SAMtools                                    | Short-read data format and utilities | Heng Li, Richard Durbin                                              | Storage of large nucleotide sequence alignments                                                  |      2009     |    1,551   |
| Cufflinks                                   | RNA-seq analysis                     | Cole Trapnell, Steven Salzberg, Barbara Wold, Lior Pachter           | Transcript assembly and quantification                                                           |      2010     |     710    |
| IGV                                         | Short-read data visualization        | James Robinson *et al*.                                              | Scalability, real-time data exploration                                                          |      2011     |     335    |
| N/A, paper not available in Web of Science. |                                      |                                                                      |                                                                                                  |               |            |

> From: [The anatomy of successful computational biology software](https://www.nature.com/articles/nbt.2721)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.ncrnalab.org/teaching/appendix/appendix-v.-software-and-tools.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
