MetaCyc hierarchies
======

Annotation of proteins (see proteins/) using MetaCyc release 23.0.

The usage of these mappings is explained at:

- https://github.com/qiyunzhu/woltka/blob/master/doc/metacyc.md

## Structure

The entry level is "protein". Specifically, "protein.map" is a mapping of
WoL protein IDs to MetaCyc protein IDs.

Starting from proteins, the following hierarchies are built upon them:

               v
       go < protein > gene > pathway
               v
regulation < enzrxn
               v
       ec < reaction > compound (left / right) > type
               v
     type < pathway > taxonomic range
               v
         super pathway
               v
              type

For example, "protein-to-enzrxn.txt" is a mapping of protein IDs to enzymatic
reaction IDs.

## Alignment

The alignment of WoL proteins against MetaCyc reference proteins was performed
using DIAMOND v0.9.25. The command was:

```
diamond blastp --index-chunks 1 --evalue 1.0 --id 50 --subject-cover 50 \
  --query-cover 90 --max-target-seqs 1 --threads 16 --db $db --query $input \
  --out $output
```

An alternative release using `--id 80` instead of `50` (percent sequence
identity) is provided under "strict/"

## UniRef

In addition, "uniref/" hosts the mapping from UniRef entries to MetaCyc
proteins, available from the UniRef data release.