Software | Class of method | Purpose | Specifics | Issues and warnings | Link | Reference |
---|---|---|---|---|---|---|
Dsuite | ABBA-BABA | Identifying past events of admixture between populations | Fast, handles VCF format. Suited for low-depth sequencing (handles uncertainties on genotypes). Provides a set of summary statistics that are useful to investigate complex admixture events | Requires an outgroup sequence. The methods cannot estimate the direction of gene flow. | https://github.com/millanek/Dsuite | (Malinsky et al., 2020) |
RENT+ | Ancestral Recombination Graphs/coalescence | Retracing the whole process of recombination and coalescence along a genome | Faster than first version of ARGWeaver. | Requires phased haplotypes. Specific input format. No built-in functions to extract information from genealogies. | https://github.com/SajadMirzaei/RentPlus | (Mirzaei and Wu, 2017) |
TREEMIX | Clustering and characterizing admixture | Admixture graph, infers most likely admixture events in a tree | Based on allele frequencies and can be used for pooled data. | Requires multiple runs to properly assess the likelihood of each model | https://bitbucket.org/nygcresearch/treemix/src | (Pickrell and Pritchard, 2012) |
G-PhoCS | Coalescence/Bayesian | Estimating population divergence and migration parameters using a coalescent framework | Bayesian + MCMC, handles ancient samples | Parameters scaled by mutation rate, no admixture | http://compgen.cshl.edu/GPhoCS/ | (Gronau et al., 2011) |
IMa3 | Coalescence/Bayesian | Inferring parameters from an isolation with migration (IM) model | Fully bayesian approach, can perform joint estimates of parameters in L-mode and test for nested models. Can estimate phylogenetic relationships and migration rates | IM model is the only one available. Discrete admixture cannot be tested. Can only use subsets of whole-genome resequencing data. Recent splits lead to overestimate migration rates | https://github.com/jodyhey/IMa3 | (Hey and Nielsen, 2007) |
ABLE | Coalescence/Composite Likelihood | Model comparison and parameters estimation | Uses both allele frequency spectrum and linkage disequilibrium within blocks of a pre-specified size. | Relies on ms syntax. Determining the most informative size for blocks requires performing pilot runs. | https://github.com/champost/ABLE | (Beeravolu et al., 2016) |
Stairway2 | Coalescence/Composite Likelihood | Inferring change in Ne with time | User-friendly. Fast. Suitable for pools or low-depth sequencing. | Cannot handle migration or population splits. | https://github.com/xiaoming-liu/stairway-plot-v2 | (Liu and Fu, 2020) |
fastsimcoal2 | Coalescence/Likelihood | Model comparison and parameters estimation | Performs coalescent simulations, parameter estimation and model testing using a fast likelihood method. Can handle arbitrarily complex scenarios for any type of marker | The maximum-likelihood method only uses the allele frequency spectrum. Several runs (20-100) are needed to explore the likelihood space. | http://cmpg.unibe.ch/software/fastsimcoal2/ | (Excoffier et al., 2013) |
∂a∂i | Diffusion approximation of the AFS | Model comparison and parameters estimation | Run time does not depend on the number of SNPs included, does not require coalescent simulations, handles arbitrarily complex scenarios. Fast estimation of confidence intervals around parameters estimates (Godambe method). Suitable for pools/low-depth sequencing | requires some knowledge of Python. Limited to 3 populations. Several runs (20-100) are needed to explore the likelihood space. | https://bitbucket.org/gutenkunstlab/dadi | (Gutenkunst et al., 2009) |
moments | Diffusion approximation of the AFS | Model comparison and parameters estimation | Based on Python, syntax similar to ∂a∂i. Can handle selection. Can use VCF files as input. | Requires some knowledge of Python. Limited to 5 populations. Several runs (20-100) are needed to explore the likelihood space. | https://bitbucket.org/simongravel/moments/src/master/ | (Jouganous et al., 2017) |
momi2 | Diffusion approximation of the AFS | Model comparison and parameters estimation | Can scale to ten populations. Can simulate and read data in the VCF format. Detailed tutorials available | Does not handle continuous gene flow | https://github.com/popgenmethods/momi2 | (Kamm et al., 2020) |
KIMTree | Diffusion approximation/Bayesian | Estimating divergence time between populations and testing for topologies. Estimate divergence times and past effective sex-ratio along branches of a populations tree. | Fast and user-friendly. R scripts to obtain plots are available. Suitable for pools/low-depth sequencing. The method is conditional on a prior topology provided by the user. It computes DIC for a given topology, allowing to test for the best one. | Strong selection on the sex chromosome can produce male-biased sex-ratios. Times are given in diffusion time scale, and can be converted in demographic times using independent estimates of Ne. | http://www1.montpellier.inra.fr/CBGP/software/kimtree/download.html | (Clemente et al., 2018) |
GADMA | Genetic algorithm | Model comparison and parameters estimation | Based on moments and ∂a∂i. Automates the search for the best set of models explaining a given frequency spectrum. | Limited to three populations at the moment. | https://github.com/ctlab/GADMA | (Noskova et al., 2020) |
DoRIS | Identity by Descent (IBD) tract | Testing various demographic scenario | Uses variation in IBD tracts length to test for various demographic models. | IBD must be inferred first with, e.g., BEAGLE. Handles a limited set of demographic scenarios. Modification in the code is required for more complex scenarios | https://github.com/pierpal/DoRIS | (Palamara and Pe’er, 2013) |
Unnamed. | Identity by state (IBS) tract | Predict observed patterns of Identity by state along a genome by fittingan appropriate, arbitrary complex demographic model | Allows bootstrapping and estimating confidence over parameter estimates with ms | Specific input format (similar to MSMC or ARGWeaver) | https://github.com/kelleyharris/Inferring-demography-from-IBS | (Harris and Nielsen, 2013) |
ASTRAL-2 | Phylogeny | Builds species trees using short non-recombining sequences | Coalescence-based. Suitable for short loci (e.g. RAD-seq and GBS) | More reliable under high incomplete lineage sorting that SVDQuartets and NJst (Chou et al. 2015) | https://github.com/smirarab/ASTRAL | (Mirarab and Warnow, 2015) |
BEAST2 | Phylogeny | Network reconstruction and phylogenetic relationships | User friendly. Can be used to track changes in effective population sizes (Bayesian Skyline Plots). Possible to estimate divergence times | Slow for large datasets. Requires sequence data that can be produced by , e.g., Stacks for RAD-seq data | http://beast2.org/ | (Drummond and Rambaut, 2007; Bouckaert et al., 2014) |
IQ-Tree 2 | Phylogeny | Divergence time estimation and phylogenetic relationships | User-friendly, can be run locally or on a webserver, very detailed tutorials. Fast and accurate. | Still no tutorial for analyzing big data (last checked December 2020). | http://www.iqtree.org/ | (Minh et al., 2020) |
MCMCTree and MCMCTreeR | Phylogeny | Divergence time estimation and phylogenetic relationships | Included in PAML. A R program is designed to help choosing relevant priors and interpreting results https://github.com/PuttickMacroevolution/MCMCtreeR | Bayesian, sensitive to priors. Requires a resolved phylogeny and an alignment. Slow for large datasets. Not suited for recent divergence and high gene flow. | http://abacus.gene.ucl.ac.uk/software/paml.html | (Yang, 2007; Puttick, 2019) |
NJst | Phylogeny | Builds species trees using short non-recombining sequences | Available in the R package phybase. Estimates populations/species tree from gene trees | Requires splitting part of the genome into non-recombining "loci". | https://github.com/bomeara/phybase/ | (Liu and Yu, 2010, 2011) |
PHRAPL | Phylogeny | Admixture graph, reticulated evolution | Uses trees in the NEWICK format as an input to infer topology, migration rates, divergence times. Similar to ABC in spirit, using tree topology as a summary statistics. | Cannot handle more than 16 taxa at a time, and requires subsetting larger datasets | http://www.phrapl.org/ | (Jackson et al., 2017) |
PhyML | Phylogeny | Phylogenetic relationships | Maximum Likelihood inference of phylogenetic relationships. An online version is available | Should be used on complex of species or divergent populations with little migration. Can be ran on genomic windows to detect introgression (with e.g. TWISST, Dsuite) | http://www.atgc-montpellier.fr/phyml/binaries.php | (Guindon et al., 2010) |
RAxML | Phylogeny | Network reconstruction and phylogenetic relationships | Maximum Likelihood inference of phylogenetic relationships | Should be used on complex of species or divergent populations with little migration | http://sco.h-its.org/exelixis/web/software/raxml/index.html | (Stamatakis, 2014) |
SNAPP | Phylogeny | Phylogenetic relationships | Handles SNP data | Remains slow for medium to large datasets (>1,000SNPs) | http://beast2.org/snapp/ | (Bryant et al., 2012) |
SNPhylo | Phylogeny | Network reconstruction and phylogenetic relationships | Complete pipeline from SNP filtering to tree reconstruction | Should be used on complex of species or divergent populations with little migration | http://chibba.pgml.uga.edu/snphylo/ | (Lee et al., 2014) |
SVDQuartets | Phylogeny | Phylogenetic relationships | Estimates populations/species tree from gene trees | Remains slow for large datasets. Requires PAUP*. | https://www.asc.ohio-state.edu/kubatko.2/software/SVDquartets/ | (Chifman and Kubatko, 2014) |
SVDQuest | Phylogeny | Phylogenetic relationships | Estimates populations/species tree from gene trees | Faster than SVDQuartets | https://github.com/pranjalv123/SVDquest | (Vachaspati and Warnow, 2018) |
*BEAST | Phylogeny and species tree inference | Divergence time estimation and phylogenetic relationships | Outputs a species tree instead of concatenated gene tree. Allows for testing consistency between phylogenetic signals at different loci | Slow for large datasets. Requires sequence data. Not suited for situations where gene flow/admixture is important | http://beast2.org/ | (Heled and Drummond, 2010) |
Splitstree | Phylogeny/Network | Network reconstruction and phylogenetic relationships | User friendly interface, proposes a variety of methods for networks reconstruction | Mostly descriptive | http://www.splitstree.org/ | (Huson and Bryant, 2006) |
diCal2 | Sequentially Markovian coalescent | Testing any arbitrary demographic scenario | Works with smaller, more fragmented datasets than PSMC. Handles more complex demographic models than MSMC (including admixture). | Requires phased whole genome data and a model to be defined | https://sourceforge.net/projects/dical2/ | (Sheehan et al., 2013) |
MSMC and MSMC-IM | Sequentially Markovian coalescent | Inferring change in Ne and migration rates with time between two populations | Allows to track population size changes in time without a priori. Allows estimating variation in cross-coalescence rate between two populations | Limited to the study of 8 diploid individuals from 2 populations at once. Requires whole genome phased data and masking regions with insufficient sequencing depth | https://github.com/stschiff/msmc and https://github.com/wangke16/MSMC-IM | (Schiffels and Durbin, 2014) |
PSMC | Sequentially Markovian coalescent | Inferring change in effective population sizes (Ne) with time using a single diploid genome | Allows to track population size changes in time without a priori. | Limited to one population and one diploid individual. Better used within MSMC. Requires phased whole genome data and masking regions with insufficient sequencing depth | https://github.com/lh3/psmc | (Li and Durbin, 2011) |
SMC++ | Sequentially Markovian coalescent | Inferring change in Ne with time and splitting time between two populations | Can analyze hundreds of individuals at a time and does not require phasing | Masking regions as in MSMC. The ancestral allele is assumed to be the reference allele by default. Assumes a clean split for populations divergence. Future versions should allow gene flow inference. | https://github.com/popgenmethods/smcpp | (Terhorst et al., 2016) |
TWISST | Topology weighting | Chromosome painting, clustering and branching between populations | Retrieves the most likely coalescence pattern between several taxa along the genome. Can be seen as an extension of the ABBA/BABA test | Needs a priori grouping of individuals into taxa. Requires at least 4 taxa. Impractical for more than 6 taxa. Windows size must include enough SNPs to retrieve the correct topology but at the risk that regions with different histories are included | https://github.com/simonhmartin/twisst | (Martin and Van Belleghem, 2016) |
BAYPASS/Bayenv | Variance/covariance matrix | Building a population covariance matrix across population allele frequencies, similar to TREEMIX | Can handle pooled data | Matrices are mostly designed to provide a neutral model for assessing selection, but can be used to infer population structure | http://www1.montpellier.inra.fr/CBGP/software/baypass/ ; https://bitbucket.org/tguenther/bayenv2_public/src | (Günther and Coop, 2013; Gautier, 2015) |
ETEToolkit | Phylogeny and species tree inference | Phylogenetic relationships | Well documented suite of python commands to perform phylogenetic analyses | Species trees can be biased by important gene flow or admixture (general issue, not specific to ETEToolkit) | http://etetoolkit.org/ | (Huerta-Cepas et al., 2016) |
References
Beeravolu, C. R., Hickerson, M. J., Frantz, L. A. F., & Lohse, K. (2016). Approximate Likelihood Inference of Complex Population Histories and Recombination from Multiple Genomes. BioarXiv, 1–31. doi: 10.1101/077958
Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C. H., Xie, D., … Drummond, A. J. (2014). BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLoS Computational Biology, 10(4), 1–6. doi: 10.1371/journal.pcbi.1003537
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & Roychoudhury, A. (2012). Inferring species trees directly from biallelic genetic markers: Bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution, 29(8), 1917–1932. doi: 10.1093/molbev/mss086
Chifman, J., & Kubatko, L. (2014). Quartet inference from SNP data under the coalescent model. Bioinformatics, 30(23), 3317–3324. doi: 10.1093/bioinformatics/btu530
Clemente, F., Gautier, M., & Vitalis, R. (2018). Inferring sex-specific demographic history from SNP data. PLoS Genetics, 14(1), 1–32. doi: 10.1371/journal.pgen.1007191
Drummond, A. J., & Rambaut, A. (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 7, 214. doi: 10.1186/1471-2148-7-214
Excoffier, L., Dupanloup, I., Huerta-Sanchez, E., Sousa, V. C., & Foll, M. (2013). Robust Demographic Inference from Genomic and SNP Data. PLoS Genetics, 9(10). doi: 10.1371/journal.pgen.1003905
Gautier, M. (2015). Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates. Genetics, 201(September), 1555–1579. doi: doi:0.1534/genetics.115.181453
Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G., & Siepel, A. (2011). Bayesian inference of ancient human demography from individual genome sequences. Nature Genetics, 43(10), 1031–1034. doi: 10.1038/ng.937
Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W., & Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Systematic Biology, 59(3), 307–321. doi: 10.1093/sysbio/syq010
Günther, T., & Coop, G. (2013). Robust identification of local adaptation from allele frequencies. Genetics, 195(1), 205–220. doi: 10.1534/genetics.113.152462
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H., & Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics, 5(10). doi: 10.1371/journal.pgen.1000695
Harris, K., & Nielsen, R. (2013). Inferring Demographic History from a Spectrum of Shared Haplotype Lengths. PLoS Genetics, 9(6). doi: 10.1371/journal.pgen.1003521
Heled, J., & Drummond, A. J. (2010). Bayesian Inference of Species Trees from Multilocus Data. Molecular Biology and Evolution, 27(3), 570–580. doi: 10.1093/molbev/msp274
Hey, J., & Nielsen, R. (2007). Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences of the United States of America, 104(8), 2785–2790. doi: 10.1073/pnas.0611164104
Huson, D. H., & Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution, 23(2), 254–267. doi: 10.1093/molbev/msj030
Jackson, N. D., Morales, A. E., Carstens, B. C., & O’Meara, B. C. (2017). PHRAPL: Phylogeographic Inference Using Approximate Likelihoods. Systematic Biology, 66(6), 1045–1053. doi: 10.1093/sysbio/syx001
Jouganous, J., Long, W., Ragsdale, A. P., & Gravel, S. (2017). Inferring the joint demographic history of multiple populations: Beyond the diffusion approximation. Genetics, 206(3), 1549–1567. doi: 10.1534/genetics.117.200493
Kamm, J., Terhorst, J., Durbin, R., & Song, Y. S. (2020). Efficiently Inferring the Demographic History of Many Populations With Allele Count Data. Journal of the American Statistical Association, 115(531), 1472–1487. doi: 10.1080/01621459.2019.1635482
Lee, T.-H., Guo, H., Wang, X., Kim, C., & Paterson, A. H. (2014). SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics, 15(1), 162. doi: 10.1186/1471-2164-15-162
Li, H., & Durbin, R. (2011). Inference of human population history from individual whole-genome sequences. Nature, 475(7357), 493–496. doi: 10.1038/nature10231
Liu, L., & Yu, L. (2010). Phybase: An R package for species tree analysis. Bioinformatics, 26(7), 962–963. doi: 10.1093/bioinformatics/btq062
Liu, L., & Yu, L. (2011). Estimating species trees from unrooted gene trees. Systematic Biology, 60(5), 661–667. doi: 10.1093/sysbio/syr027
Liu, X., & Fu, Y. X. (2020). Stairway Plot 2: demographic history inference with folded SNP frequency spectra. Genome Biology, 21(1), 1–9. doi: 10.1186/s13059-020-02196-9
Malinsky, M., Matschiner, M., & Svardal, H. (2020). Dsuite – Fast D-statistics and related admixture evidence from VCF files. Molecular Ecology Resources. doi: 10.1111/1755-0998.13265
Martin, S. H., & Van Belleghem, S. M. (2016). Exploring evolutionary relationships across the genome using topology weighting. BioRxiv, 069112. doi: 10.1101/069112
Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von Haeseler, A., … Teeling, E. (2020). IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution, 37(5), 1530–1534. doi: 10.1093/molbev/msaa015
Mirarab, S., & Warnow, T. (2015). ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44–i52. doi: 10.1093/bioinformatics/btv234
Mirzaei, S., & Wu, Y. (2017). RENT+: An improved method for inferring local genealogical trees from haplotypes with recombination. Bioinformatics, 33(7), 1021–1030. doi: 10.1093/bioinformatics/btw735
Noskova, E., Ulyantsev, V., Koepfli, K. P., O’brien, S. J., & Dobrynin, P. (2020). GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data. GigaScience, 9(3), 1–18. doi: 10.1093/gigascience/giaa005
Palamara, P. F., & Pe’er, I. (2013). Inference of historical migration rates via haplotype sharing. Bioinformatics, 29(13), 180–188. doi: 10.1093/bioinformatics/btt239
Pickrell, J. K., & Pritchard, J. K. (2012). Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genetics, 8(11), e1002967. doi: 10.1371/journal.pgen.1002967
Puttick, M. N. (2019). MCMCtreeR: Functions to prepare MCMCtree analyses and visualize posterior ages on trees. Bioinformatics, 35(24), 5321–5322. doi: 10.1093/bioinformatics/btz554
Schiffels, S., & Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. Nature Genetics, 46(8), 919–925. doi: 10.1038/ng.3015
Sheehan, S., Harris, K., & Song, Y. S. (2013). Estimating Variable Effective Population Sizes from Multiple Genomes : A Sequentially Markov Conditional Sampling Distribution Approach. Genetics, 194, 647–662. doi: 10.1534/genetics.112.149096
Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9), 1312–1313. doi: 10.1093/bioinformatics/btu033
Terhorst, J., Kamm, J. A., & Song, Y. S. (2016). Robust and scalable inference of population history from hundreds of unphased whole genomes. Nature Genetics, 49(2), 303–309. doi: 10.1038/ng.3748
Vachaspati, P., & Warnow, T. (2018). SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space. Molecular Phylogenetics and Evolution, 124, 122–136. doi: 10.1016/j.ympev.2018.03.006
Yang, Z. (2007). PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24(8), 1586–1591. doi: 10.1093/molbev/msm088