.fasta
or .fa
)The word following the>
symbol is the identifier of the sequence, and the rest of the line is the description (optional). Normally, identifiers are simply protein accession, name or Entrez gi's (e.g., Q5I7T1, AG10B_HUMAN, 129295), but a bar-separated NCBI sequence identifier (e.g., gi|129295) will also be accepted. Any arbitrary user-specified sequence identifier can also be used (e.g., CLONE00073452).
>
/home/test/blast/
下进行blastp
进行蛋白质比对VIM.fasta
与 NMD.fasta
分别是金属beta酶家族的两个亚种酶的序列blastn
进行DNA序列比对H1N1-HA.fasta
与 H7N9-HA.fasta
是流感病毒序列文件-dbtype
: 待建库的类型(nucl
, prot
)-in
: 待建库的序列文件-out
: 序列库名前缀more
, less
等命令或者利用vi
等文本编辑工具查看结果文件。Note: Docker 中已经装好
sudo apt-get install ncbi-blast+
这里 blast 由ncbi-blast+
提供
ncbi-blast-2.2.28+-ia32-linux.tar.gz
ncbi-blast-2.2.28+-x64-linux.tar.gz
可以通过uname -a
查看机器类型是64还是32位
注: blast的网站会提供多个mouse的databases,可以任选1个进行比对;也可以重复几次,每次选一个不同的database看看不同的输出结果,可以在作业中比较和讨论一下输出结果不同的原因。
PAM matrices are also used as a scoring matrix when comparing DNA sequences or protein sequences to judge the quality of the alignment. This form of scoring system is utilized by a wide range of alignment software including BLAST. — wikipedia
Figure 83. Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff, ed. National Biomedical Research Foundation, 1979