Taxonomic classification of genomes ====== Default taxonomy: The default taxonomy (this directory level) is based on GTDB R207, curated to match the WoL2 phylogeny. Both Greengenes-style lineage strings and NCBI-style taxdump (with dummy TaxIDs) are provided. - Website: https://gtdb.ecogenomic.org/ Statistics: Numbers of taxonomic units: - Domains: 2 - Phyla: 124 - Classes: 321 - Orders: 914 - Families: 2,057 - Genera: 6,811 - Species: 12,258 Database files: - taxid.map: Genome ID to TaxID mapping. - nodes.dmp: NCBI taxdump-style node mapping. - names.dmp: NCBI taxdump-style name mapping. - lineages.txt: Lineage strings with taxon names. - linetids.txt: Lineage strings with TaxIDs. Taxonomy systems: Taxonomic annotation of the WoL2 genomes were performed based on: - Greengenes2 release 2022.10 - Based on WoLr2 phylogeny and GTDB R207 taxonomy. Source: http://ftp.microbio.me/greengenes_release/2022.10/ - GTDB RS207 (2022-04-08) - Source: https://data.gtdb.ecogenomic.org/releases/release207/207.0/ - NCBI taxdump 2022-01-01 - Source: https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/ taxdmp_2022-01-01.zip See: gg2/, gtdb/ and ncbi/, respectively. Curation of taxonomy: The Original GTDB/NCBI taxonomy was automatically curated according to the WoL2 phylogeny using Tax2Tree. - Website: https://github.com/biocore/tax2tree - Citation: McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. The ISME journal. 2012 Mar;6(3):610-8. See: gtdb/tax2tree and ncbi/tax2tree, respectively. Used Tax2Tree 1.0 commit 36856f0 (updated on 2022-05-31) with Python 3.10.4. Installation: ``` conda create -n tax2tree python=3 conda activate tax2tree pip install tax2tree ``` Analysis: ``` t2t decorate -m linetids.txt -t tree.nwk -o output \ --no-suffix --min-count 1 --add-nameholder ``` Note: For NCBI, pre- and post-processings were necessary. See ncbi/README for details. Further curation: The Tax2Tree-curated GTDB taxonomy was further curated to ensure that the hierarchical relationships among taxa are consistent with the original GTDB taxonomy (species to domain), while retaining the consistency with the WoL2 phylogeny (genome tree topology). The outcome is the default taxonomy (in the current directory). Specifically, the following three genomes were edited: Genome: G000441555 Original: d__3; p__24; c__246; o__813; f__4072; g__13316; s__44083 Tax2Tree: d__3; p__24; c__246; o__813; f__2735; g__13316; s__44083 Final: d__3; p__24; c__246; o__813; f__2735; g__; s__ Genome: G000817735 Original: d__3; p__33; c__257; o__857; f__3208; g__8970; s__73328 Tax2Tree: d__3; p__33; c__257; o__857; f__2817; g__8970; s__73328 Final: d__3; p__33; c__257; o__857; f__2817; g__; s__ Genome: G003265155 Original: d__3; p__26; c__248; o__820; f__2787; g__8227; s__40950 Tax2Tree: d__3; p__25; c__247; o__829; f__2757; g__12453; s__40950 Final: d__3; p__25; c__247; o__829; f__2757; g__12453; s__ The corresponding lineage string files are: Original: gtdb/linetids.txt Tax2Tree: gtdb/tax2tree/linetids.txt Final: linetids.txt