原核生物基因组快速注释——Prokka
Prokka是一个适用于原核生物的基因组自动注释工具,由墨尔本大学生物信息学家 Torsten Seemann开发。Prokka协调了一套现有的软件工具,可以对原核基因组和宏基因组进行快速高效的功能注释。
目前常用对基因进行注释的工具有两款:Prokka和RAST。在web of science上RAST引用次数6280次,prokka引用次数3177次,但是prokka内置基因CDS预测工具prodigal的引用次数比RAST内置基因CDS预测工具glimmer多一千多次。同时,可能RAST的在线版本降低了其使用的难度,因此使RAST的引用次数较多。相较之下,选择使用Prokka。
Prokka是一款快速对原核生物基因组进行注释的工具,可以在10分钟内完成对一个细菌基因草图的注释工作。
利用conda安装
原本以为利用conda安装so easy,结果都是坑
首先直接用conda安装
conda install prokka
结果一直卡在Solving environment,一顿google,发现可能的原因:
1.可能是由于未使用国内的镜像源,所以非常的慢
2.可能由于conda是比较低的版本,需要更新
尝试了以上两种方式,都不可
最后解决方法是,先创建一个名为prokka的小环境,然后再在prokka环境中安装prokka
conda create -n prokka
conda install -c bioconda prokka
##启动和退出环境
To activate this environment, use
$ conda activate prokka
To deactivate an active environment, use
$ conda deactivate
BUG
在conda install -c bioconda prokka的过程中安装了很多依赖包,但最后还是报错:
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https: conda.anaconda.org bioconda noarch perl-bio-tools-phylo-paml-1.7.3-pl5321hdfd78af_3.tar.bz2>
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https: conda.anaconda.org bioconda noarch perl-devel-stacktrace-2.04-pl5321hdfd78af_1.tar.bz2>
Elapsed: -
</https:></https:>
解决上述报错的方法是,再一次运行conda install -c bioconda prokka,将之前未安装好的依赖包再次安装,结果就成功了。最后显示如下:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Retrieving notices: ...working... done
prokka genomic.fna --outdir annotation --prefix test --kingdom Bacteria
参数说明:
#usage
Usage:
prokka [options] <contigs.fasta>
General:
--help This help
--version Print version and exit
--citation Print citation for referencing Prokka
--quiet No screen output (default OFF)
--debug Debug mode: keep all temporary files (default OFF)
Setup:
--dbdir [X] Prokka database root folders (default '/home6/trainees/miniconda3/db')
--listdb List all configured databases
--setupdb Index all installed databases
--cleandb Remove all database indices
--depends List all software dependencies
Outputs:
--outdir [X] Output folder [auto] (default '')
--force Force overwriting existing output folder (default OFF)
--prefix [X] Filename output prefix [auto] (default '') #前缀
--addgenes Add 'gene' features for each 'CDS' feature (default OFF)
--addmrna Add 'mRNA' features for each 'CDS' feature (default OFF)
--locustag [X] Locus tag prefix [auto] (default '')
--increment [N] Locus tag counter increment (default '1')
--gffver [N] GFF version (default '3')
--compliant Force Genbank/ENA/DDJB compliance: --addgenes --mincontiglen 200 --centre XXX (default OFF)
--centre [X] Sequencing centre ID. (default '')
--accver [N] Version to put in Genbank file (default '1')
Organism details:
--genus [X] Genus name (default 'Genus')
--species [X] Species name (default 'species')
--strain [X] Strain name (default 'strain')
--plasmid [X] Plasmid name or identifier (default '')
Annotations:
--kingdom [X] Annotation mode: Archaea|Bacteria|Mitochondria|Viruses (default 'Bacteria')
--gcode [N] Genetic code / Translation table (set if --kingdom is set) (default '0')
--prodigaltf [X] Prodigal training file (default '')
--gram [X] Gram: -/neg +/pos (default '')
--usegenus Use genus-specific BLAST databases (needs --genus) (default OFF)
--proteins [X] FASTA or GBK file to use as 1st priority (default '')
--hmms [X] Trusted HMM to first annotate from (default '')
--metagenome Improve gene predictions for highly fragmented genomes (default OFF)
--rawproduct Do not clean up /product annotation (default OFF)
--cdsrnaolap Allow [tr]RNA to overlap CDS (default OFF)
</contigs.fasta>
Original: https://blog.csdn.net/xq_ing/article/details/127271457
Author: xq_ing
Title: Prokka安装和使用
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/770635/
转载文章受原作者版权保护。转载请注明原作者出处!