如何构建本地的blast数据库?

发布网友发布时间：2022-04-10 03:34

共2个回答

懂视网时间：2022-04-10 07:56

以后打算工作中用到的相关BLAST操作全部用BLAST+来完成

与以前的Blast相以，我们还是从格式化数据库到比对开始

一般我们是有一个fasta文件用来格式化数据库，以前的命令是formatdb，现在是makeblastdb

一般用到的格式如下：

makeblastdb -in input_file -dbtype molecule_type -title database_title -parse_seqids -out database_name -logfile File_Name

-in 后接输入文件，你要格式化的fasta序列

-dbtype 后接序列类型，nucl为核酸，prot为蛋白

-title 给数据库起个名，好看~~(不能用在后面搜索时-db的参数)

-parse_seqids 推荐加上，现在有啥原因还没搞清楚

-out 后接数据库名，自己起一个有意义的名字，以后blast+搜索时要用到的-db的参数

-logfile 日志文件，如果没有默认输出到屏幕

和以前的formatdb差别还是挺大的，呵呵

用makeblastdb接参数-help会打印出为些信息：

makeblastdb -help
USAGE
makeblastdb [-h] [-help] [-in input_file] [-dbtype molecule_type]
    [-title database_title] [-parse_seqids] [-hash_index]
    [-mask_data mask_data_files] [-out database_name]
    [-max_file_sz number_of_bytes] [-taxid TaxID] [-taxid_map TaxIDMapFile]
    [-logfile File_Name] [-version]

DESCRIPTION
Application to create BLAST databases, version 2.2.23+

OPTIONAL ARGUMENTS
-h
   Print USAGE and DESCRIPTION; ignore other arguments
-help
   Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments
-version
   Print version number; ignore other arguments

*** Input options
-in <File_In>
   Input file/database name; the data type is automatically detected, it may
   be any of the following:
        FASTA file(s) and/or
        BLAST database(s)
   Default = `-‘
-dbtype <String, `nucl‘, `prot‘>
   Molecule type of input
   Default = `prot‘

*** Configuration options
-title <String>
   Title for BLAST database
   Default = input file name provided to -in argument
-parse_seqids
   Parse Seq-ids in FASTA input
-hash_index
   Create index of sequence hash values.

*** Sequence masking options
-mask_data <String>
Comma-separated list of input files containing masking data as produced by
NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)

*** Output options
-out <String>
   Name of BLAST database to be created
   Default = input file name provided to -in argumentRequired if multiple
   file(s)/database(s) are provided as input
-max_file_sz <String>
   Maximum file size for BLAST database files
   Default = `1GB‘

*** Taxonomy options
-taxid <Integer, >=0>
   Taxonomy ID to assign to all sequences
    * Incompatible with: taxid_map
-taxid_map <File_In>
   Text file mapping sequence IDs to taxonomy IDs.
   Format:<SequenceId> <TaxonomyId><newline>
    * Incompatible with: taxid
-logfile <File_Out>
   File to which the program log should be redirected

BLAST+中makeblastdb参数详解

标签：mapping format 原因 option val conf multiple 不能操作

热心网友时间：2022-04-10 05:04

假设有一序列数据（sequence.fa，多序列，fasta格式），欲自己做成Blast数据库，典型的命令如下：核酸序列：$ ./formatdb –i sequence.fa –p F –o T/F蛋白序列：$ ./formatdb –i sequence.fa –p T –o T/F执行blast:获得了单机版的Blast程序，解压开以后，如果有了相应的数据库（db），那么就可以开始执行Blast分析了。单机版的Blast程序包，把基本的blast分析，包括blastn，blastp，blastx等都整合到了blastall一个程序里面。以下是一个典型的blastn分析命令：(待分析序列seq.fa，数据库nt_db)$./blastall –p blastn –i seq.fa -d nt_db –w 7 –e 10 –o seq.blastn.out（该命令的意思是，对seq.fa文件中的核酸序列对nt_db数据库执行blastn搜索，窗口大小是7，e值*是10，输出的结果保存到文件seq.blastn.out 中）。Blastall的常用参数：-p 程序名应该是blastn，blastp，blastx，tblastn，tblastx中的一个-d 数据库名称，默认nr-i 查询序列文件，默认stdin-e E值*，默认10-o 结果输出文件，默认stdout-F 过滤选项，默认T-a 选择进行运算的CPU个数