Reference protein database for MMseqs2
======

MMseqs2 is a bioinformatics tool for searching and clustering large numbers of
protein and nucleotide sequences.

 - Website: https://github.com/soedinglab/MMseqs2
 - Citation: Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence
   searching for the analysis of massive data sets. Nature biotechnology. 2017
   Nov;35(11):1026-8.

Built using MMseqs2 v13-45111. Commands:

```
mmseqs createdb db.faa WoLr2
mmseqs createindex WoLr2 tmp --threads #
```

Typical search commands:

Search protein sequences:
```
mmseqs easy-search input.fa WoLr2 output.m8 tmp --threads #
```

Search whole-metagenome sequencing data:
```
mmseqs easy-search input.R1.fq.gz input.R2.fq.gz WoLr2 output.m8 tmp \
  --threads # -s 7.5 --translation-table 11
```

This analysis requires at least 150 GB and up to 364 GB memory.

Indexing:

This directory contains both the original database (WoL2r.* without "idx") and
the indexed database (WoLr2.idx.*). Either set of files is sufficient for the
search operation. The latter may accelerate batch search operations despite
occupying more disk space. To use the latter, replace "WoLr2" with "WoLr2.idx"
in the search commands.