# Appendix I. Keep Learning

{% hint style="info" %}
在生物信息学的学习和应用中，最重要的、最有用的基本工具和技能，过去一直是，我相信将来的很长一段时间也会是：

1. google
2. wikipedia
3. 论坛（知乎，[Seqanswers](http://seqanswers.com/forums/index.php)，[Biostars](https://www.biostars.org/), etc)
   {% endhint %}

> ⭐: **必读**\
> ✨: **推荐**
>
> [**PDFs** ](https://cloud.tsinghua.edu.cn/d/ad22768345664924b202/?p=%2FBooks%20and%20Education%20Papers\&mode=list)**for Text-books and Education Papers**

## 1) Recommended Books <a href="#self-study" id="self-study"></a>

### **(1) 参考书 - 综合**

> **选择性阅读的案头书**

* ✨ 《生物信息学》 101 教材
* 《生物信息学》 樊龙江  主编
* 《生物信息学》 李霞，雷健波，李亦学 等 编

### **(2). 参考书 - 工具书**

> **按需阅读和练习**
>
> Better to learn and practice **3** basic techniques (完成任何一个要求即可：1. 1000行以上的程序； 2. 认可证书，例如在线课程的正式）
>
> 1. R (or MATLAB)
> 2. Python (or Perl)
> 3. Linux (Editor (e.g. VIM) and Shell Script (e.g. bash))

1. ⭐ Quick R ([online](http://www.statmethods.net/))  *OR* 《R语言实战》 (《R in action》)
2. ⭐ 《笨办法学 Python》（《Learn Python The Hard Way》）*OR* 《Beginning Perl for Bioinformatics》
3. ⭐ 《[鸟哥的Linux私房菜-基础学习篇](https://www.ctolib.com/docs/sfile/vbird-linux-basic-4e)》 (推荐章节)

> **Linux 推荐章节：**
>
> * 第5章: 5.3.1 man page; 第6章: 6.1用户与用户组; 6.2 LINUX文件权限概念; 6.3 LINUX目录配置
> * 第7章:  7.1目录与路径; 7.2文件与目录管理; 7.3文件内容查阅; 7.5命令与文件的查询; 7.6权限与命令间的关系; 第8章: 8.2文件系统的简单操作&#x20;
> * 第9章: 9.1压缩文件的用途与技术; 9.2 Linux系统常见的压缩命令; 9.3打包命令：tar
> * 第10章 vim程序编辑器
> * 第11章 认识与学习bash; 第12章 正则表达式与文件格式化处理；第13章 学习shell script
> * 第25章 LINUX备份策略: 25.2.2完整备份的差异备份; 25.3鸟哥的备份策略; 25.4灾难恢复的考虑; 25.5重点回顾
>
> **Linux 重点学习:**
>
> 1. Editor (e.g. VIM)
> 2. Shell Script (e.g. bash)

### (3) 参考书 - 统计类

* 《Principles of Biostatistics》 by *Marcello Pagano, Kimberlee Gauvreau*
* [Statistics for biologist](http://www.nature.com/collections/qghhqm/) by *Nature*

## 2) Recommended on-line Courses

* [生物信息导论和方法](https://www.coursera.org/course/pkubioinfo) (北大 @MOOC)
* [Bioinformatics Specialization](https://www.coursera.org/specializations/bioinformatics?utm_medium=courseDescripTop) (UC SanDiego @coursera)
* [Genomics of Human Diseases](https://github.com/wglab/dragonstar2019) (DragonStar Course @github)

## 3) Recommended Tips <a href="#share-script" id="share-script"></a>

* ✨[One Tip Per Day](http://onetipperday.sterding.com/) (e.g. [How to tell which library type to use ](http://onetipperday.sterding.com/2012/07/how-to-tell-which-library-type-to-use.html))

## 4)✨ \[Education Papers] Computational Biology Primers

> This is a list of explanatory papers that have appeared as primer in the Computational Biology section of the journal Nature Biotechnology, in reverse chronological order. (Last addition November 2013 / checked March 2016).
>
> — *Nature Biotechnology*

### (1) Basics

**The anatomy of successful computational biology software**

(Stephen Altschul, Barry Demchak, Richard Durbin, Robert Gentleman, Martin Krzywinski, Heng Li, Anton Nekrutenko, James Robinson, Wayne Rasband, James Taylor & Cole Trapnell)

October 2013, Vol 31, No 10; pp 894 - 897

doi: [10.1038/nbt.2721](http://dx.doi.org/10.1038/nbt.2721) ([google](https://www.google.com/search?as_q=+The+anatomy+of+successful+computational+biology+software))

**Understanding genome browsing**

(Melissa S Cline & W James Kent)

February 2009, Vol 27, No 2; pp 153 - 155

doi: [10.1038/nbt0209-153](http://dx.doi.org/10.1038/nbt0209-153) ([google](http://www.google.com/search?as_q=Understanding+genome+browsing\&as_filetype=pdf))

### (2) Basic Statistics

**How does multiple testing correction work?**

(William S Noble)

December 2009, Vol 27, No 12 ; pp 1135 - 1137

doi: [10.1038/nbt1209-1135](http://dx.doi.org/10.1038/nbt1209-1135) ([google](http://www.google.com/search?as_q=How+does+multiple+testing+correction+work?\&as_filetype=pdf))

**What is Bayesian statistics?**

(Sean R Eddy)

September 2004, Volume 22, No 9; pp 1177 - 1178

doi: [10.1038/nbt0904-1177](http://dx.doi.org/10.1038/nbt0904-1177) ([google](http://www.google.com/search?as_q=What+is+Bayesian+statistics?+\&as_filetype=pdf))

### (3) Basic Algorithms

**How to map billions of short reads onto genomes**

(Cole Trapnell & Steven L Salzberg)

May 2009, Vol 27, No 5; pp 455 - 457

doi: [10.1038/nbt0509-455](http://dx.doi.org/10.1038/nbt0509-455) ([google](http://www.google.com/search?as_q=How+to+map+billions+of+short+reads+onto+genomes\&as_filetype=pdf))

**Where did the BLOSUM62 alignment score matrix come from?**

(Sean R Eddy)

August 2004, Volume 22, No 8; pp 1035 - 1036

doi: [10.1038/nbt0804-1035](http://dx.doi.org/10.1038/nbt0804-1035) ([google](http://www.google.com/search?as_q=Where+did+the+BLOSUM62+alignment+score+matrix+come+from?+\&as_filetype=pdf))

**What is dynamic programming?**

(Sean R Eddy)

July 2004, Volume 22, No 7; pp 909 - 910

doi: [10.1038/nbt0704-909](http://dx.doi.org/10.1038/nbt0704-909) ([google](http://www.google.com/search?as_q=What+is+dynamic+programming?+\&as_filetype=pdf))

**How do RNA folding algorithms work?**

(Sean R Eddy)

November 2004, Volume 22, No 11; pp 1457 - 1458

doi: [10.1038/nbt1104-1457](http://dx.doi.org/10.1038/nbt1104-1457) ([google](http://www.google.com/search?as_q=How+do+RNA+folding+algorithms+work?+\&as_filetype=pdf))

### (4) Machine Learning

**What is a hidden Markov model?**

(Sean R Eddy)

October 2004, Volume 22, No 10; pp 1315 - 1316

doi: [10.1038/nbt1004-1315](http://dx.doi.org/10.1038/nbt1004-1315) ([google](http://www.google.com/search?as_q=What+is+a+hidden+Markov+model?+\&as_filetype=pdf))

**What is the expectation maximization algorithm?**

(Chuong B Do & Serafim Batzoglou)

August 2008, Volume 26 No 8; pp 897 - 899

doi: [10.1038/nbt1406](http://dx.doi.org/10.1038/nbt1406) ([google](http://www.google.com/search?as_q=What+is+the+expectation+maximization+algorithm?\&as_filetype=pdf))

**What are decision trees?**

(Carl Kingsford & Steven L Salzberg)

September 2008, Volume 26, No 9; pp 1011 - 1013

doi: [10.1038/nbt0908-1011](http://dx.doi.org/10.1038/nbt0908-1011) ([google](http://www.google.com/search?as_q=What+are+decision+trees?\&as_filetype=pdf))

**What is a support vector machine?**

(William S Noble)

December 2006, Volume 24, No 12; pp 1565 - 1567

doi: [10.1038/nbt1206-1565](http://dx.doi.org/10.1038/nbt1206-1565) ([google](http://www.google.com/search?as_q=What+is+a+support+vector+machine?\&as_filetype=pdf))

**Inference in Bayesian networks**

(Chris J Needham, James R Bradford, Andrew J Bulpitt & David R Westhead)

January 2006, Volume 24, No 1; pp 51 - 53

doi: [10.1038/nbt0106-51](http://dx.doi.org/10.1038/nbt0106-51) ([google](http://www.google.com/search?as_q=Inference+in+Bayesian+networks\&as_filetype=pdf))

**What are artificial neural networks?**

(Anders Krogh)

February 2008, Volume 26, No 2; pp 195 - 197

doi: [10.1038/nbt1386](http://dx.doi.org/10.1038/nbt1386) ([google](http://www.google.com/search?as_q=What+are+artificial+neural+networks?\&as_filetype=pdf))

**How does gene expression clustering work?**

(Patrik D'haeseleer)

December 2005, Volume 23, No 12; pp 1499 - 1501

doi: [10.1038/nbt1205-1499](http://dx.doi.org/10.1038/nbt1205-1499) ([google](http://www.google.com/search?as_q=How+does+gene+expression+clustering+work?+\&as_filetype=pdf))

**What is principal component analysis?**

(Markus Ringnér)

March 2008, Volume 26, No 3; pp 303 - 304

doi: [10.1038/nbt0308-303](http://dx.doi.org/10.1038/nbt0308-303) ([google](http://www.google.com/search?as_q=What+is+principal+component+analysis?+\&as_filetype=pdf))

### (5) Others

**What are DNA sequence motifs?**

(Patrik D'haeseleer)

April 2006, Volume 24, No 4; pp 423 - 425

doi: [10.1038/nbt0406-423](http://dx.doi.org/10.1038/nbt0406-423) ([google](http://www.google.com/search?as_q=What+are+DNA+sequence+motifs?\&as_filetype=pdf))

**How does DNA sequence motif discovery work?**

(Patrik D'haeseleer)

August 2006, Volume 24, No 8; pp 959 - 961

doi: [10.1038/nbt0806-959](http://dx.doi.org/10.1038/nbt0806-959) ([google](http://www.google.com/search?as_q=How+does+DNA+sequence+motif+discovery+work?\&as_filetype=pdf))

**How to apply de Bruijn graphs to genome assembly**

(Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler)

November 2011, Vol 29, No 11; pp 987 - 991

doi: [10.1038/nbt.2023](http://dx.doi.org/10.1038/nbt.2023) ([google](http://www.google.com/search?as_q=How+to+apply+de+Bruijn+graphs+to+genome+assembly\&as_filetype=pdf))

**How does eukaryotic gene prediction work?**

(Michael R Brent)

August 2007, Volume 25, No 8; pp 883 - 885

doi: [10.1038/nbt0807-883](http://dx.doi.org/10.1038/nbt0807-883) ([google](http://www.google.com/search?as_q=How+does+eukaryotic+gene+prediction+work?+\&as_filetype=pdf))

**Analyzing 'omics data using hierarchical models**

(Hongkai Ji & X Shirley Liu)

April 2010, Vol 28, No 4; pp 337 - 340

doi: [10.1038/nbt.1619](http://dx.doi.org/10.1038/nbt.1619) ([google](https://www.google.com/search?as_q=Analyzing+%27omics+data+using+hierarchical+models\&as_filetype=pdf))

**What is flux balance analysis?**

(Jeffrey D Orth, Ines Thiele & Bernhard Ø Palsson)

March 2010, Vol 28, No 3; pp 245 - 248

doi: [10.1038/nbt.1614](http://dx.doi.org/10.1038/nbt.1614) ([google](http://www.google.com/search?as_q=What+is+flux+balance+analysis?\&as_filetype=pdf))

**How to visually interpret biological data using networks**

(Daniele Merico, David Gfeller & Gary D Bader)

October 2009, Vol 27 No 10 ; pp 921 - 924

doi: [10.1038/nbt.1567](http://dx.doi.org/10.1038/nbt.1567) ([google](http://www.google.com/search?as_q=How+to+visually+interpret+biological+data+using+networks\&as_filetype=pdf))

**SNP imputation in association studies**

(Eran Halperin & Dietrich A Stephan)

April 2009, Vol 27, No 4; pp 349 - 351

doi: [10.1038/nbt0409-349](http://dx.doi.org/10.1038/nbt0409-349) ([google](http://www.google.com/search?as_q=SNP+imputation+in+association+studies\&as_filetype=pdf))

**Maximizing power in association studies**

(Eran Halperin & Dietrich A Stephan)

March 2009, Vol 27, No 3; pp 255 - 256

doi: [10.1038/nbt0309-255](http://dx.doi.org/10.1038/nbt0309-255) ([google](http://www.google.com/search?as_q=Maximizing+power+in+association+studies\&as_filetype=pdf))

**How do shotgun proteomics algorithms identify proteins?**

(Edward M Marcotte)

July 2007, Volume 25, No 7; pp 755 - 757

doi: [10.1038/nbt0707-755](http://dx.doi.org/10.1038/nbt0707-755) ([google](http://www.google.com/search?as_q=How+do+shotgun+proteomics+algorithms+identify+proteins?\&as_filetype=pdf))

## 5) ✨\[Education Papers] Getting Started in Something

> Several Captions have been used to indicate educationally relevant papers in Plos CompBio. Here we have collected some other papers. — *PloS Computational Biology*

**Getting Started in Computational Immunology.**

(Kleinstein SH )

PLoS Comput Biol (2008) 4(8): e1000128;

doi: [10.1371/journal.pcbi.1000128](http://dx.doi.org/10.1371/journal.pcbi.1000128) ([google](http://www.google.com/search?as_q=Getting+Started+in+Computational+Immunology.+\&as_filetype=pdf))

### (1) Basics

**Getting Started in Gene Orthology and Functional Analysis**

(Fang G, Bhardwaj N, Robilotto R, Gerstein MB)

PLoS Comput Biol (2010) 6(3): e1000703;

doi: [10.1371/journal.pcbi.1000703](http://dx.doi.org/10.1371/journal.pcbi.1000703) ([google](http://www.google.com/search?as_q=Getting+Started+in+Gene+Orthology+and+Functional+Analysis\&as_filetype=pdf))

**Getting Started in Biological Pathway Construction and Analysis.**

(Viswanathan GA, Seto J, Patil S, Nudelman G, Sealfon SC )

PLoS Comput Biol (2008) 4(2): e16;

doi: [10.1371/journal.pcbi.0040016](http://dx.doi.org/10.1371/journal.pcbi.0040016) ([google](http://www.google.com/search?as_q=Getting+Started+in+Biological+Pathway+Construction+and+Analysis.+\&as_filetype=pdf))

**Getting Started in Structural Phylogenomics**

(Sjölander K )

PLoS Comput Biol (2010) 6(1): e1000621 ;

doi: [10.1371/journal.pcbi.1000621](http://dx.doi.org/10.1371/journal.pcbi.1000621) ([google](http://www.google.com/search?as_q=Getting+Started+in+Structural+Phylogenomics\&as_filetype=pdf))

### (2) Advanced

**Getting Started in Text Mining**

(Cohen KB, Hunter L)

PLoS Comput Biol (2008) 4(1): e20;

doi: [10.1371/journal.pcbi.0040020](http://dx.doi.org/10.1371/journal.pcbi.0040020) ([google](http://www.google.com/search?as_q=Getting+Started+in+Text+Mining\&as_filetype=pdf))

**Getting Started in Text Mining: Part Two.**

(Rzhetsky A, Seringhaus M, Gerstein MB)

PLoS Comput Biol (2009) 5(7): e1000411. ;

doi: [10.1371/journal.pcbi.1000411](http://dx.doi.org/10.1371/journal.pcbi.1000411) ([google](http://www.google.com/search?as_q=Getting+Started+in+Text+Mining:+Part+Two.+\&as_filetype=pdf))

**Getting Started in Probabilistic Graphical Models.**

(Airoldi EM )

PLoS Comput Biol (2007) 3(12): e252. ;

doi: [10.1371/journal.pcbi.0030252](http://dx.doi.org/10.1371/journal.pcbi.0030252) ([google](http://www.google.com/search?as_q=Getting+Started+in+Probabilistic+Graphical+Models.+\&as_filetype=pdf))

### (3) MS and Array

**Getting Started in Computational Mass Spectrometry-Based Proteomics.**

(Vitek O)

PLoS Comput Biol (2009) 5(5): e1000366. ;

doi: [10.1371/journal.pcbi.1000366](http://dx.doi.org/10.1371/journal.pcbi.1000366) ([google](http://www.google.com/search?as_q=Getting+Started+in+Computational+Mass+Spectrometry-Based+Proteomics.+\&as_filetype=pdf))

**Getting Started in Gene Expression Microarray Analysis**

(Slonim DK, Yanai I)

PLoS Comput Biol (2009) 5(10): e1000543;

doi: [10.1371/journal.pcbi.1000543](http://dx.doi.org/10.1371/journal.pcbi.1000543) ([google](http://www.google.com/search?as_q=Getting+Started+in+Gene+Expression+Microarray+Analysis\&as_filetype=pdf))

**Getting Started in Tiling Microarray Analysis**

(Liu XS)

PLoS Comput Biol (2007) 3(10): e183;

doi: [10.1371/journal.pcbi.0030183](http://dx.doi.org/10.1371/journal.pcbi.0030183) ([google](http://www.google.com/search?as_q=Getting+Started+in+Tiling+Microarray+Analysis\&as_filetype=pdf))

## 6) Advanced for AI

> ⭐: **必读**\
> ✨: **推荐**

### (1) Recommended Books

* ⭐ 《[Bioinformatics Data Skills](http://a.co/1wYbUB5)》 by *Vince Buffalo*
* ✨ 《[Biological Sequence Analysis](http://www.amazon.com/Biological-Sequence-Analysis-Probabilistic-Proteins/dp/0521629713/): Probabilistic Models of Proteins and Nucleic Acids》 ([English](http://www.amazon.com/Biological-Sequence-Analysis-Probabilistic-Proteins/dp/0521629713) | [中文](http://www.amazon.cn/dp/B003ZUIRZ2)) by *Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison*
* ✨ [《机器学习》](https://book.douban.com/subject/26708119/) -- 周志华

### (2) Recommended On-line Courses

* ⭐ [StatQuest Video List ](https://statquest.org/video-index/) ([StatQuest: Machine Learning @Youtube](https://www.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF))
* ✨ Machine Learning *by Andrew Ng 吴恩达* (CS229): @[coursera](https://www.coursera.org/learn/machine-learning)&#x20;
* ✨ [李沐: Practical Machine Learning](https://c.d2l.ai/stanford-cs329p/) ([李沐: 实用机器学习-斯坦福2021秋@Bilibili](https://space.bilibili.com/1567748478/channel/seriesdetail?sid=358496))
* ✨ [李沐-动手学深度学习 PyTorch版@Bilibili](https://space.bilibili.com/1567748478/channel/seriesdetail?sid=358497)
* [李沐AI论文解读@Bilibili](https://space.bilibili.com/1567748478/channel/collectiondetail?sid=32744)

### (4) Recommended Educational Papers

* [Statistics for biologist](http://www.nature.com/collections/qghhqm/) by *Nature*

### **(5) More Books**

> edited based on Xiaofan Liu's list

1. **数学基础** (建议根据自己的基础进行复习)
   1. 《高等数学》
   2. 《线性代数》
   3. 《数理统计与概率论》
2. **入门书籍** (其中1、2可选一本精读，数学基础好的推荐选2)
   1. 《机器学习》，周志华著 (★★★推荐)
   2. 《统计学习方法》，李航著 (★★★推荐)
   3. 《多元统计分析》，何晓群著
3. **Python编程书籍**
   1. 《Python机器学习基础教程》，\[德]安德里亚斯·穆勒（Andreas C.Müller，\[美]莎拉·吉多（Sarah Guido）著，张亮（hysic）译 (★★★推荐)
   2. 《python高性能编程》，Micha，Gorelick，戈雷利克，Ian，Ozsvald ...著
4. **深度学习类书籍** (希望加强对模型数学原理的理解，并且进一步学习深度学习的同学可选读)

   1. 《深度学习\[deep learning]》，\[美] Ian，Goodfellow，\[加] Yoshua，Bengio，\[加] Aaron ... 著(★★★推荐)
   2. 《模式识别与机器学习(Pattern Recognition and Machine Learning)》,Christopher M. Bishop著
   3. 《机器学习：从概率的视角分析(The Machine Learning: A Probabilistic Perspective)》，Kevin P. Murphy著

   *注：**PRML**和**MLAPP**两本书难度较大*
5. **深度学习编程与实践书籍** (工具类书籍，不是必读)&#x20;
   1. &#x20;《Keras深度学习实战》，\[意大利]安东尼奥·古利
   2. &#x20;《深度学习入门之PyTorch》，廖星宇著&#x20;
   3. &#x20;《深度学习框架PyTorch快速开发与实战》，邢梦来，王硕，孙洋洋著&#x20;
   4. &#x20;《TensorFlow实战》，黄文坚，唐源著

### **(6) More Online Resources**

> edited based on Xiaofan Liu's list

1. **机器学习入门课程**
   1. [浙江大学公开课：概率论与数理统计 ](http://open.163.com/movie/2019/4/R/6/MEC1U20OT_MEC1U8MR6.html)(根据自己基础选择复习)
   2. **Machine Learning** by *Andrew Ng 吴恩达* (CS229): @[coursera](https://www.coursera.org/learn/machine-learning)  (★★★推荐)
2. **深度学习课程**
   1. **Deep Learning** by *Andrew Ng 吴恩达* (CS230): @[coursera](https://www.coursera.org/specializations/deep-learning) | @[bilibili](https://www.bilibili.com/video/av47055599/) (★★★推荐)
   2. [Keras快速搭建神经网络](http://t.cn/RTuDLKD) (★★★推荐)
   3. [李宏毅深度学习2017](http://t.cn/RpO3VJK)
   4. [不用博士学位玩转Tensorflow深度学习](http://t.cn/RTuemTK)
