Bioinformatics Tutorial
Files Needed
  • Getting Started
    • Setup
    • Run jobs in a Docker
    • Run jobs in a cluster [Advanced]
  • Part I. Programming Skills
    • 1.Linux
      • 1.1.Basic Command
      • 1.2.Practice Guide
      • 1.3.Linux Bash
    • 2.R
      • 2.1.R Basics
      • 2.2.Plot with R
    • 3.Python
  • PART II. BASIC ANALYSES
    • 1.Blast
    • 2.Conservation Analysis
    • 3.Function Analysis
      • 3.1.GO
      • 3.2.KEGG
      • 3.3.GSEA
    • 4.Clinical Analyses
      • 4.1.Survival Analysis
  • Part III. NGS DATA ANALYSES
    • 1.Mapping
      • 1.1 Genome Browser
      • 1.2 bedtools and samtools
    • 2.RNA-seq
      • 2.1.Expression Matrix
      • 2.2.Differential Expression with Cufflinks
      • 2.3.Differential Expression with DEseq2 and edgeR
    • 3.ChIP-seq
    • 4.Motif
      • 4.1.Sequence Motif
      • 4.2.Structure Motif
    • 5.RNA Network
      • 5.1.Co-expression Network
      • 5.2.miRNA Targets
      • 5.3. CLIP-seq (RNA-Protein Interaction)
    • 6.RNA Regulation - I
      • 6.1.Alternative Splicing
      • 6.2.APA (Alternative Polyadenylation)
      • 6.3.Chimeric RNA
      • 6.4.RNA Editing
      • 6.5.SNV/INDEL
    • 7.RNA Regulation - II
      • 7.1.Translation: Ribo-seq
      • 7.2.RNA Structure
    • 8.cfDNA
      • 8.1.Basic cfDNA-seq Analyses
  • Part IV. MACHINE LEARNING
    • 1.Machine Learning Basics
      • 1.1 Data Pre-processing
      • 1.2 Data Visualization & Dimension Reduction
      • 1.3 Feature Extraction and Selection
      • 1.4 Machine Learning Classifiers/Models
      • 1.5 Performance Evaluation
    • 2.Machine Learning with R
    • 3.Machine Learning with Python
  • Part V. Assignments
    • 1.Precision Medicine - exSEEK
      • Help
      • Archive: Version 2018
        • 1.1.Data Introduction
        • 1.2.Requirement
        • 1.3.Helps
    • 2.RNA Regulation - RiboShape
      • 2.0.Programming Tools
      • 2.1.RNA-seq Analysis
      • 2.2.Ribo-seq Analysis
      • 2.3.SHAPE Data Analysis
      • 2.4.Integration
    • 3.RNA Regulation - dsRNA
    • 4.Single Cell Data Analysis
      • Help
  • 5.Model Programming
  • Appendix
    • Appendix I. Keep Learning
    • Appendix II. Databases & Servers
    • Appendix III. How to Backup
    • Appendix IV. Teaching Materials
    • Appendix V. Software and Tools
    • Appendix VI. Genome Annotations
Powered by GitBook
On this page
  • 1) Recommended Books
  • (1) 参考书 - 综合
  • (2). 参考书 - 工具书
  • (3) 参考书 - 统计类
  • 2) Recommended on-line Courses
  • 3) Recommended Tips
  • 4)✨ [Education Papers] Computational Biology Primers
  • (1) Basics
  • (2) Basic Statistics
  • (3) Basic Algorithms
  • (4) Machine Learning
  • (5) Others
  • 5) ✨[Education Papers] Getting Started in Something
  • (1) Basics
  • (2) Advanced
  • (3) MS and Array
  • 6) Advanced for AI
  • (1) Recommended Books
  • (2) Recommended On-line Courses
  • (4) Recommended Educational Papers
  • (5) More Books
  • (6) More Online Resources

Was this helpful?

Edit on GitHub
  1. Appendix

Appendix I. Keep Learning

Previous5.Model ProgrammingNextAppendix II. Databases & Servers

Last updated 8 months ago

Was this helpful?

在生物信息学的学习和应用中,最重要的、最有用的基本工具和技能,过去一直是,我相信将来的很长一段时间也会是:

  1. google

  2. wikipedia

  3. 论坛(知乎,,, etc)

⭐: 必读 ✨: 推荐

for Text-books and Education Papers

1) Recommended Books

(1) 参考书 - 综合

选择性阅读的案头书

  • ✨ 《生物信息学》 101 教材

  • 《生物信息学》 樊龙江 主编

  • 《生物信息学》 李霞,雷健波,李亦学 等 编

(2). 参考书 - 工具书

按需阅读和练习

Better to learn and practice 3 basic techniques (完成任何一个要求即可:1. 1000行以上的程序; 2. 认可证书,例如在线课程的正式)

  1. R (or MATLAB)

  2. Python (or Perl)

  3. Linux (Editor (e.g. VIM) and Shell Script (e.g. bash))

  1. ⭐ 《笨办法学 Python》(《Learn Python The Hard Way》)OR 《Beginning Perl for Bioinformatics》

Linux 推荐章节:

  • 第5章: 5.3.1 man page; 第6章: 6.1用户与用户组; 6.2 LINUX文件权限概念; 6.3 LINUX目录配置

  • 第7章: 7.1目录与路径; 7.2文件与目录管理; 7.3文件内容查阅; 7.5命令与文件的查询; 7.6权限与命令间的关系; 第8章: 8.2文件系统的简单操作

  • 第9章: 9.1压缩文件的用途与技术; 9.2 Linux系统常见的压缩命令; 9.3打包命令:tar

  • 第10章 vim程序编辑器

  • 第11章 认识与学习bash; 第12章 正则表达式与文件格式化处理;第13章 学习shell script

  • 第25章 LINUX备份策略: 25.2.2完整备份的差异备份; 25.3鸟哥的备份策略; 25.4灾难恢复的考虑; 25.5重点回顾

Linux 重点学习:

  1. Editor (e.g. VIM)

  2. Shell Script (e.g. bash)

(3) 参考书 - 统计类

  • 《Principles of Biostatistics》 by Marcello Pagano, Kimberlee Gauvreau

2) Recommended on-line Courses

3) Recommended Tips

4)✨ [Education Papers] Computational Biology Primers

This is a list of explanatory papers that have appeared as primer in the Computational Biology section of the journal Nature Biotechnology, in reverse chronological order. (Last addition November 2013 / checked March 2016).

— Nature Biotechnology

(1) Basics

The anatomy of successful computational biology software

(Stephen Altschul, Barry Demchak, Richard Durbin, Robert Gentleman, Martin Krzywinski, Heng Li, Anton Nekrutenko, James Robinson, Wayne Rasband, James Taylor & Cole Trapnell)

October 2013, Vol 31, No 10; pp 894 - 897

Understanding genome browsing

(Melissa S Cline & W James Kent)

February 2009, Vol 27, No 2; pp 153 - 155

(2) Basic Statistics

How does multiple testing correction work?

(William S Noble)

December 2009, Vol 27, No 12 ; pp 1135 - 1137

What is Bayesian statistics?

(Sean R Eddy)

September 2004, Volume 22, No 9; pp 1177 - 1178

(3) Basic Algorithms

How to map billions of short reads onto genomes

(Cole Trapnell & Steven L Salzberg)

May 2009, Vol 27, No 5; pp 455 - 457

Where did the BLOSUM62 alignment score matrix come from?

(Sean R Eddy)

August 2004, Volume 22, No 8; pp 1035 - 1036

What is dynamic programming?

(Sean R Eddy)

July 2004, Volume 22, No 7; pp 909 - 910

How do RNA folding algorithms work?

(Sean R Eddy)

November 2004, Volume 22, No 11; pp 1457 - 1458

(4) Machine Learning

What is a hidden Markov model?

(Sean R Eddy)

October 2004, Volume 22, No 10; pp 1315 - 1316

What is the expectation maximization algorithm?

(Chuong B Do & Serafim Batzoglou)

August 2008, Volume 26 No 8; pp 897 - 899

What are decision trees?

(Carl Kingsford & Steven L Salzberg)

September 2008, Volume 26, No 9; pp 1011 - 1013

What is a support vector machine?

(William S Noble)

December 2006, Volume 24, No 12; pp 1565 - 1567

Inference in Bayesian networks

(Chris J Needham, James R Bradford, Andrew J Bulpitt & David R Westhead)

January 2006, Volume 24, No 1; pp 51 - 53

What are artificial neural networks?

(Anders Krogh)

February 2008, Volume 26, No 2; pp 195 - 197

How does gene expression clustering work?

(Patrik D'haeseleer)

December 2005, Volume 23, No 12; pp 1499 - 1501

What is principal component analysis?

(Markus Ringnér)

March 2008, Volume 26, No 3; pp 303 - 304

(5) Others

What are DNA sequence motifs?

(Patrik D'haeseleer)

April 2006, Volume 24, No 4; pp 423 - 425

How does DNA sequence motif discovery work?

(Patrik D'haeseleer)

August 2006, Volume 24, No 8; pp 959 - 961

How to apply de Bruijn graphs to genome assembly

(Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler)

November 2011, Vol 29, No 11; pp 987 - 991

How does eukaryotic gene prediction work?

(Michael R Brent)

August 2007, Volume 25, No 8; pp 883 - 885

Analyzing 'omics data using hierarchical models

(Hongkai Ji & X Shirley Liu)

April 2010, Vol 28, No 4; pp 337 - 340

What is flux balance analysis?

(Jeffrey D Orth, Ines Thiele & Bernhard Ø Palsson)

March 2010, Vol 28, No 3; pp 245 - 248

How to visually interpret biological data using networks

(Daniele Merico, David Gfeller & Gary D Bader)

October 2009, Vol 27 No 10 ; pp 921 - 924

SNP imputation in association studies

(Eran Halperin & Dietrich A Stephan)

April 2009, Vol 27, No 4; pp 349 - 351

Maximizing power in association studies

(Eran Halperin & Dietrich A Stephan)

March 2009, Vol 27, No 3; pp 255 - 256

How do shotgun proteomics algorithms identify proteins?

(Edward M Marcotte)

July 2007, Volume 25, No 7; pp 755 - 757

5) ✨[Education Papers] Getting Started in Something

Several Captions have been used to indicate educationally relevant papers in Plos CompBio. Here we have collected some other papers. — PloS Computational Biology

Getting Started in Computational Immunology.

(Kleinstein SH )

PLoS Comput Biol (2008) 4(8): e1000128;

(1) Basics

Getting Started in Gene Orthology and Functional Analysis

(Fang G, Bhardwaj N, Robilotto R, Gerstein MB)

PLoS Comput Biol (2010) 6(3): e1000703;

Getting Started in Biological Pathway Construction and Analysis.

(Viswanathan GA, Seto J, Patil S, Nudelman G, Sealfon SC )

PLoS Comput Biol (2008) 4(2): e16;

Getting Started in Structural Phylogenomics

(Sjölander K )

PLoS Comput Biol (2010) 6(1): e1000621 ;

(2) Advanced

Getting Started in Text Mining

(Cohen KB, Hunter L)

PLoS Comput Biol (2008) 4(1): e20;

Getting Started in Text Mining: Part Two.

(Rzhetsky A, Seringhaus M, Gerstein MB)

PLoS Comput Biol (2009) 5(7): e1000411. ;

Getting Started in Probabilistic Graphical Models.

(Airoldi EM )

PLoS Comput Biol (2007) 3(12): e252. ;

(3) MS and Array

Getting Started in Computational Mass Spectrometry-Based Proteomics.

(Vitek O)

PLoS Comput Biol (2009) 5(5): e1000366. ;

Getting Started in Gene Expression Microarray Analysis

(Slonim DK, Yanai I)

PLoS Comput Biol (2009) 5(10): e1000543;

Getting Started in Tiling Microarray Analysis

(Liu XS)

PLoS Comput Biol (2007) 3(10): e183;

6) Advanced for AI

⭐: 必读 ✨: 推荐

(1) Recommended Books

(2) Recommended On-line Courses

(4) Recommended Educational Papers

(5) More Books

edited based on Xiaofan Liu's list

  1. 数学基础 (建议根据自己的基础进行复习)

    1. 《高等数学》

    2. 《线性代数》

    3. 《数理统计与概率论》

  2. 入门书籍 (其中1、2可选一本精读,数学基础好的推荐选2)

    1. 《机器学习》,周志华著 (★★★推荐)

    2. 《统计学习方法》,李航著 (★★★推荐)

    3. 《多元统计分析》,何晓群著

  3. Python编程书籍

    1. 《Python机器学习基础教程》,[德]安德里亚斯·穆勒(Andreas C.Müller,[美]莎拉·吉多(Sarah Guido)著,张亮(hysic)译 (★★★推荐)

    2. 《python高性能编程》,Micha,Gorelick,戈雷利克,Ian,Ozsvald ...著

  4. 深度学习类书籍 (希望加强对模型数学原理的理解,并且进一步学习深度学习的同学可选读)

    1. 《深度学习[deep learning]》,[美] Ian,Goodfellow,[加] Yoshua,Bengio,[加] Aaron ... 著(★★★推荐)

    2. 《模式识别与机器学习(Pattern Recognition and Machine Learning)》,Christopher M. Bishop著

    3. 《机器学习:从概率的视角分析(The Machine Learning: A Probabilistic Perspective)》,Kevin P. Murphy著

    注:PRML和MLAPP两本书难度较大

  5. 深度学习编程与实践书籍 (工具类书籍,不是必读)

    1. 《Keras深度学习实战》,[意大利]安东尼奥·古利

    2. 《深度学习入门之PyTorch》,廖星宇著

    3. 《深度学习框架PyTorch快速开发与实战》,邢梦来,王硕,孙洋洋著

    4. 《TensorFlow实战》,黄文坚,唐源著

(6) More Online Resources

edited based on Xiaofan Liu's list

  1. 机器学习入门课程

  2. 深度学习课程

⭐ Quick R () OR 《R语言实战》 (《R in action》)

⭐ 《》 (推荐章节)

by Nature

(北大 @MOOC)

(UC SanDiego @coursera)

(DragonStar Course @github)

✨ (e.g. )

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

doi: ()

⭐ 《》 by Vince Buffalo

✨ 《: Probabilistic Models of Proteins and Nucleic Acids》 ( | ) by Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison

✨ -- 周志华

⭐ ()

✨ Machine Learning by Andrew Ng 吴恩达 (CS229): @

✨ ()

✨

by Nature

(根据自己基础选择复习)

Machine Learning by Andrew Ng 吴恩达 (CS229): @ (★★★推荐)

Deep Learning by Andrew Ng 吴恩达 (CS230): @ | @ (★★★推荐)

(★★★推荐)

Seqanswers
Biostars
PDFs
online
鸟哥的Linux私房菜-基础学习篇
Statistics for biologist
生物信息导论和方法
Bioinformatics Specialization
Genomics of Human Diseases
One Tip Per Day
How to tell which library type to use
10.1038/nbt.2721
google
10.1038/nbt0209-153
google
10.1038/nbt1209-1135
google
10.1038/nbt0904-1177
google
10.1038/nbt0509-455
google
10.1038/nbt0804-1035
google
10.1038/nbt0704-909
google
10.1038/nbt1104-1457
google
10.1038/nbt1004-1315
google
10.1038/nbt1406
google
10.1038/nbt0908-1011
google
10.1038/nbt1206-1565
google
10.1038/nbt0106-51
google
10.1038/nbt1386
google
10.1038/nbt1205-1499
google
10.1038/nbt0308-303
google
10.1038/nbt0406-423
google
10.1038/nbt0806-959
google
10.1038/nbt.2023
google
10.1038/nbt0807-883
google
10.1038/nbt.1619
google
10.1038/nbt.1614
google
10.1038/nbt.1567
google
10.1038/nbt0409-349
google
10.1038/nbt0309-255
google
10.1038/nbt0707-755
google
10.1371/journal.pcbi.1000128
google
10.1371/journal.pcbi.1000703
google
10.1371/journal.pcbi.0040016
google
10.1371/journal.pcbi.1000621
google
10.1371/journal.pcbi.0040020
google
10.1371/journal.pcbi.1000411
google
10.1371/journal.pcbi.0030252
google
10.1371/journal.pcbi.1000366
google
10.1371/journal.pcbi.1000543
google
10.1371/journal.pcbi.0030183
google
Bioinformatics Data Skills
Biological Sequence Analysis
English
中文
《机器学习》
StatQuest Video List
StatQuest: Machine Learning @Youtube
coursera
李沐: Practical Machine Learning
李沐: 实用机器学习-斯坦福2021秋@Bilibili
李沐-动手学深度学习 PyTorch版@Bilibili
李沐AI论文解读@Bilibili
Statistics for biologist
浙江大学公开课:概率论与数理统计
coursera
coursera
bilibili
Keras快速搭建神经网络
李宏毅深度学习2017
不用博士学位玩转Tensorflow深度学习