GTDB taxonomy ====== Taxonomic annotation of WoL2 genomes based on GTDB R207. - Source: https://data.gtdb.ecogenomic.org/releases/release207/207.0/ Database files: - taxid.map: Genome ID to TaxID mapping. - nodes.dmp: NCBI taxdump-style node mapping. - names.dmp: NCBI taxdump-style name mapping. - lineages.txt: Lineage strings with taxon names. - linetids.txt: Lineage strings with TaxIDs. - ncbi2gtdb.tsv: Translation table of 1,525 genomes which are absent in GTDB R207 into GTDB taxonomy. - tax2tree/: Tax2Tree-curated taxonomy. - r207.0/: The entire GTDB R207 taxonomy. Note: The current directory hosts original (uncurated) GTDB annotations. Dummy taxdump: NCBI-style taxdump files (taxid.map, nodes.dmp, and names.dmp) were generated based on the GTDB lineages using gtdb_to_taxdump.py such that they can be adopted by a wider variety of downstream applications. The dummy TaxIDs were assigned through a level-order traversal of the taxonomy tree: 1 - root, 2 - domain Archaea, 3 - domain Bacteria, then phyla, then classes, so on so forth. Note: They should not be confused with the NCBI TaxIDs. Translation: GTDB R207 contains a total of 317,542 taxa. Among the 15,953 WoL2 genomes, 14,428 (90.4%) were found in this pool. Their taxonomic assignments were directly adopted. The other 1,525 genomes were not present. These genomes were classified by translating NCBI taxonomic assignments to GTDB using a translation table provided in the GTDB data release. Because each NCBI taxon may be mapped to multiple GTDB taxa, the translation process only considers mappings where at least 95% members of a NCBI taxon are assigned to one GTDB taxon. For example (see r207.0/ncbi2gtdb.map): - NCBI taxon: c__Acidimicrobiia - GTDB taxa: c__Acidimicrobiia(p__Actinobacteriota) 98.9%, c__Vicinamibacteria(p__Acidobacteriota) 0.55%, c__Acidobacteriae(p__Acidobacteriota) 0.22%, c__Alphaproteobacteria(p__Proteobacteria) 0.22%, c__Actinomycetia(p__Actinobacteriota) 0.11% In this scenario, because 98.9% > 95%, NCBI taxon c__Acidimicrobiia is translated into GTDB taxon c__Acidimicrobiia. Each of the 1,525 genomes were assigned GTDB taxonomy based on the lowest rank that can be translated.