Bioinformatics Tutorial
Files Needed
  • Getting Started
    • Setup
    • Run jobs in a Docker
    • Run jobs in a cluster [Advanced]
  • Part I. Programming Skills
    • 1.Linux
      • 1.1.Basic Command
      • 1.2.Practice Guide
      • 1.3.Linux Bash
    • 2.R
      • 2.1.R Basics
      • 2.2.Plot with R
    • 3.Python
  • PART II. BASIC ANALYSES
    • 1.Blast
    • 2.Conservation Analysis
    • 3.Function Analysis
      • 3.1.GO
      • 3.2.KEGG
      • 3.3.GSEA
    • 4.Clinical Analyses
      • 4.1.Survival Analysis
  • Part III. NGS DATA ANALYSES
    • 1.Mapping
      • 1.1 Genome Browser
      • 1.2 bedtools and samtools
    • 2.RNA-seq
      • 2.1.Expression Matrix
      • 2.2.Differential Expression with Cufflinks
      • 2.3.Differential Expression with DEseq2 and edgeR
    • 3.ChIP-seq
    • 4.Motif
      • 4.1.Sequence Motif
      • 4.2.Structure Motif
    • 5.RNA Network
      • 5.1.Co-expression Network
      • 5.2.miRNA Targets
      • 5.3. CLIP-seq (RNA-Protein Interaction)
    • 6.RNA Regulation - I
      • 6.1.Alternative Splicing
      • 6.2.APA (Alternative Polyadenylation)
      • 6.3.Chimeric RNA
      • 6.4.RNA Editing
      • 6.5.SNV/INDEL
    • 7.RNA Regulation - II
      • 7.1.Translation: Ribo-seq
      • 7.2.RNA Structure
    • 8.cfDNA
      • 8.1.Basic cfDNA-seq Analyses
  • Part IV. MACHINE LEARNING
    • 1.Machine Learning Basics
      • 1.1 Data Pre-processing
      • 1.2 Data Visualization & Dimension Reduction
      • 1.3 Feature Extraction and Selection
      • 1.4 Machine Learning Classifiers/Models
      • 1.5 Performance Evaluation
    • 2.Machine Learning with R
    • 3.Machine Learning with Python
  • Part V. Assignments
    • 1.Precision Medicine - exSEEK
      • Help
      • Archive: Version 2018
        • 1.1.Data Introduction
        • 1.2.Requirement
        • 1.3.Helps
    • 2.RNA Regulation - RiboShape
      • 2.0.Programming Tools
      • 2.1.RNA-seq Analysis
      • 2.2.Ribo-seq Analysis
      • 2.3.SHAPE Data Analysis
      • 2.4.Integration
    • 3.RNA Regulation - dsRNA
    • 4.Single Cell Data Analysis
      • Help
  • 5.Model Programming
  • Appendix
    • Appendix I. Keep Learning
    • Appendix II. Databases & Servers
    • Appendix III. How to Backup
    • Appendix IV. Teaching Materials
    • Appendix V. Software and Tools
    • Appendix VI. Genome Annotations
Powered by GitBook
On this page
  • Brief Introduction
  • More Reading
  • Recommended
  • Others

Was this helpful?

Edit on GitHub
  1. Part IV. MACHINE LEARNING

1.Machine Learning Basics

Previous8.1.Basic cfDNA-seq AnalysesNext1.1 Data Pre-processing

Last updated 3 years ago

Was this helpful?

Brief Introduction

根据预测变量是否已知,机器学习问题通常可以分为两类:

  • Supervised Learning ( 监督学习): 模型有明确的输入(自变量/特征)和输出(因变量/响应变量)。如果目标变量(要预测的变量)是类别信息(例如正/负),该问题称为分类问题。如果目标变量是连续的(例如身高)则为回归问题。

  • Unsupervised Learning (无监督学习): 目标变量是未指定的。模型的目的是挖掘样本点之间的关系。常见的无监督学习任务包括降维和聚类。

我们这里主要聚焦于有监督学习,分5个小节介绍在实践中如何解决一个分类问题:

  • (数据预处理)

  • (数据降维和可视化)

  • (特征提取和选择)

  • (模型训练)

  • (模型评估)

希望大家意识到,机器学习是一个很大的领域,我们这里介绍的仅仅是利用现有的工具已经实现好的一些传统的分类模型,来解决生物信息分析中可能会遇到的分类问题。如果大家希望更深入的了解机器学习理论,以及其他机器学习问题,如各种无监督的模式挖掘和聚类算法,强化学习,神经网络/深度学习的各种应用,概率图模型等等,请参考相应的相应的教材和文档。

More Reading

Recommended

  • @ coursera

  • 《机器学习》- 周志华

Others

by Lu Lab (Machine Learning, Feature Selection, Deep Learning)

An Introduction to Machine Learning with R
机器学习算法Python实现
《机器学习(推导版)》
Advanced Tutorial
1.1.Data Pre-processing
1.2 Dimension Reduction and Visualization
1.3 Feature Extraction and Selection
1.4 Model training
1.5 Model Evaluation
Machine Learning by Ng Andrew
Education Papers - Machine Learning