Reference protein database for MMseqs2 ====== MMseqs2 is a bioinformatics tool for searching and clustering large numbers of protein and nucleotide sequences. - Website: https://github.com/soedinglab/MMseqs2 - Citation: Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology. 2017 Nov;35(11):1026-8. Built using MMseqs2 v13-45111. Commands: ``` mmseqs createdb db.faa WoLr2 mmseqs createindex WoLr2 tmp --threads # ``` Typical search commands: Search protein sequences: ``` mmseqs easy-search input.fa WoLr2 output.m8 tmp --threads # ``` Search whole-metagenome sequencing data: ``` mmseqs easy-search input.R1.fq.gz input.R2.fq.gz WoLr2 output.m8 tmp \ --threads # -s 7.5 --translation-table 11 ``` This analysis requires at least 150 GB and up to 364 GB memory. Indexing: This directory contains both the original database (WoL2r.* without "idx") and the indexed database (WoLr2.idx.*). Either set of files is sufficient for the search operation. The latter may accelerate batch search operations despite occupying more disk space. To use the latter, replace "WoLr2" with "WoLr2.idx" in the search commands.