American Gut Manuscript File Sets ================================= For the American Gut manuscript, we have prepared a data set containing a single fecal sample from each participant who had a fecal sample with at least 1250 sequences per sample after the sequences were deblurred. Many analyses may benefit from focusing on a healthy subset. The samples which are part of this set are described by the "healthy_subset" column of the mapping file. The criteria for a participant to be included in this group is a self-reported age between 20 and 69, a BMI between 18.5 and 30, and no reported history of Inflammatory Bowel Disease, Diabetes, or antibiotic use in the past year. Data Dictionary --------------- A data dictionary describing all the base columns in the mapping file is provided as the `data_dictionary.csv` in the parent partition directory. A data dictionary describing the "vioscreen_" categories is provided in a separate document titled `vioscreen_data_dictionary.pdf`. Phylogenetic Tree ----------------- An insertion tree was generated using SEPP with the deblur data against Greengenes 13_8. This can be found in the parent directory. Files ----- Within each dataset, the following files are provided: Metadata file +++++++++++++ The sample and prep metadata downloaded from Qiita, along with appended Vioscreen results. Alpha diversity (PD whole tree, shannon, and observed OTUs) for the rarefaction depth, and every depth lower have been added. OTU table +++++++++ A rarefied and unrarefied biom table are provided. The rarefied table is denoted by the rarefaction depth. The unrarefied table is filtered for samples with fewer than the number of reads denoted by the rarefaction level (e.g., all samples with fewer than 1250 sequences are removed from the unrarefied 1250 biom table). Distance Matrices +++++++++++++++++ The weighted, normalized-weighted, unweighted UniFrac and Bray-Curtis distance are provided. PICRUSt +++++++ PICRUSt prediction based on clustering the deblur sequences against the greengenes 13_8 OTU database at 99% and then performing PICRUSt prediction. Tables were filtered to remove samples with fewer than 1250 sequences/sample before normalization for 16S copy number. Tables collapsed at L1, L2, and L3 are also included. Directory Structure ------------------- data_dictionary.csv ag_tree.tre 1250/ ag_map_with_alpha.txt sample_id.txt deblur_125nt_no_blooms_rare.biom deblur_125nt_no_blooms.biom collated_alpha.txt distance/ bray_curtis.txt.gz unweighted.txt.gz weighted-normalized.txt.gz weighted-unnormalized.txt.gz picrust/ deblur_no_blooms_125nt_min1250_gg99_normed_pred.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L1.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L2.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L3.biom 2500/ ag_map_with_alpha.txt sample_id.txt deblur_125nt_no_blooms_rare.biom deblur_125nt_no_blooms.biom collated_alpha.txt distance/ bray_curtis.txt.gz unweighted.txt.gz weighted-normalized.txt.gz weighted-unnormalized.txt.gz picrust/ deblur_no_blooms_125nt_min1250_gg99_normed_pred.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L1.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L2.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L3.biom 5000/ ag_map_with_alpha.txt sample_id.txt deblur_125nt_no_blooms_rare.biom deblur_125nt_no_blooms.biom collated_alpha.txt distance/ bray_curtis.txt.gz unweighted.txt.gz weighted-normalized.txt.gz weighted-unnormalized.txt.gz picrust/ deblur_no_blooms_125nt_min1250_gg99_normed_pred.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L1.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L2.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L3.biom 10000/ ag_map_with_alpha.txt sample_id.txt deblur_125nt_no_blooms_rare.biom deblur_125nt_no_blooms.biom collated_alpha.txt distance/ bray_curtis.txt.gz unweighted.txt.gz weighted-normalized.txt.gz weighted-unnormalized.txt.gz picrust/ deblur_no_blooms_125nt_min1250_gg99_normed_pred.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L1.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L2.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L3.biom 50000/ ag_map_with_alpha.txt sample_id.txt deblur_125nt_no_blooms_rare.biom deblur_125nt_no_blooms.biom collated_alpha.txt distance/ bray_curtis.txt.gz unweighted.txt.gz weighted-normalized.txt.gz weighted-unnormalized.txt.gz picrust/ deblur_no_blooms_125nt_min1250_gg99_normed_pred.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L1.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L2.biom deblur_no_blooms_125nt_min1250_gg99_normed_pred_L3.biom