Species tree reconstruction using the uDance workflow ====== The uDance workflow employs a divide-and-conquer strategy to reconstruct large phylogenomic trees by partitioning the taxon set, inferring species trees from each subset, then assembling the results into one tree containing all taxa. In this analysis, 20 partitions were created. For each partition, one tree per each of the 380 marker genes was inferred using RAxML-NG, followed by summariz- ing the gene trees into one species tree using ASTRAL-constrained. Finally, all species trees were combined into one tree by uDance. - Note: Partition 12 was merged into partition 13 by uDance. As a consequence, the species tree for partition 12 does not exist, and that for partition 13 includes the genomes from both partitions 12 and 13. The entire taxon set was divided into 20 chunks. For each chunk, one tree per each of the 380 marker genes was inferred using RAxML-NG with the LG+G4 model, followed by summarization into one species tree (final.nwk) using ASTRAL- constrained. The sequence alignments, gene trees, and species trees (i.e., genome trees) were released at: - https://github.com/balabanmetin/wol2-16k-data (commit 57c48cd) In this species tree, branch lengths are in coalescence units. Conventional branch lengths (i.e., in the unit of number of substitutions per site) were then re-estimated (brlen.nwk) using RAxML based on 100 most conserved amino acid sites per marker gene as selected by the PhyloPhlAn3 workflow. The species tree with re-estimated branch lengths, together with other variants and intermediate data, were released at: - https://github.com/yueyujiang/WoL_GG2_trees (commit 8088c8c) Database files: - final.nwk: Final species tree of all taxa generated by assembling 19 species trees. This is the raw output of ASTRAL. Node labels contain the following fields: - 1, 2, 3: main topology (this tree), 1st and 2nd alternative. - q: quartet support. - f: number of supporting quartets in gene trees. - pp: local posterior probability. - QC: number of quartets defined. - EN: effective number of genes. * Among these metrics, pp1 and EN are the most relevant ones to measure the confidence of each branch. - brlen.nwk: Species tree with conventional branch lengths (number of amino acid substitutions per site). - unplaced.lst: A list of 56 genomes that were excluded from the final species tree because they could not be confidently placed. - #/species.nwk: Species tree of partition # (# = 00..19). - #/genes/: Gene trees of partition #.