Greengenes2 taxonomy ====== Taxonomic annotation of WoL2 genomes based on Greengenes 2 release 2022.10. - Citation: McDonald D, Jiang Y, Balaban M, Cantrell K, Zhu Q, Gonzalez A, Morton JT, Nicolaou G, Parks D, Karst SM, Albertsen M. Greengenes2 enables a shared data universe for microbiome studies. bioRxiv. 2022:2022-12. - Source: http://ftp.microbio.me/greengenes_release/2022.10/ Summary: Greengenes2 is a reference database of 16S rRNA gene sequences with taxonomy and phylogeny. It was reconstructed by placing 16S sequences from various sources into the WoLr2 phylogeny using uDance (see ../../phylogeny), followed by phylogeny- consistent annotation using the GTDB R207 taxonomy. Therefore, Greengenes and WoLr2 share largely consistent taxonomy and phylogeny, which maximizes consistency of the analyses of 16S rRNA gene amplicon data and shotgun metagenomic data. A proportion of WoLr2 genomes without identifiable 16S rRNA genes cannot be precisely mapped to Greengenes2 16S rRNA gene sequences. For these genomes, species-level taxon name matching was performed to associate them with Greengenes2 taxonomic lineages where applicable. Collectively, 88.7% genomes have corresponding Greengenes2 taxonomy. Statistics: - Number of WoLr2 genomes with matches in Greengenes2: 14,149 (i.e., 88.7% of the 15,959 genomes). Of which: - Number of genomes with exact matches (i.e., the "G"-number): 12,283 - Number of genomes with matching species names: 1,866 Database files: - lineages.txt: Greengenes2 lineage strings of WoLr2 genomes (n = 14,149). - wol2gg.tsv: Mapping of WoLr2 genome IDs to Greengenes2 sequence IDs by matching species names (n = 1,866). - Only matches that do not have the same "G"-numbers are included.