Tax2Tree-curated NCBI taxonomy ====== The original NCBI taxonomy was curated using Tax2Tree such that the hierarchy is consistent with the WoL2 phylogeny. Database files: - lineages.txt: Consensus lineage strings with taxon names. - linetids.txt: Consensus lineage strings with TaxIDs. - decorated.nwk: Phylogeny with nodes decorated with consensus lineages. - fmeasures.txt: F-measures of taxa. - raw/: raw Tax2Tree output. Note: This process does not create new taxa (TaxIDs or names). All taxa can be found in the original NCBI taxonomy. However, some hierarchical relationships among taxa (species to domain) may be modified in order to be consistent with the phylogeny. ## Protocol See ../../README for the Tax2Tree command. However, pre- and post-processing were needed because the NCBI taxonomy contains gaps in the lineages (i.e., not all ranks from species to domain are filled with taxa). For example: d__Bacteria; p__Candidatus Tectomicrobia; c__; o__; f__; g__Candidatus Entotheonella; s__Candidatus Entotheonella factor This violates the design of Tax2Tree. To proceed, the empty ranks were filled with dummy codes based on their parents before running Tax2Tree. These dummy codes were removed after running Tax2Tree. These operations were performed using tax2tree_input.py and tax2tree_output.py.