Methods to infer populations history

MethodsSoftwarePurposeSpecificsIssues and warningsLinkReference
Identity by state tractUnnamedPredict observed patterns of Identity by state along a genome by fittingan appropriate, arbitrary complex demographic modelAllows bootstrapping and estimating confidence over parameter estimates with msSpecific input format (similar to MSMC or ARGWeaver) and Nielsen, 2013)
IBD tractDoRISTesting various demographic scenarioUses variation in IBD tracts length to test for various demographic models. IBD must be inferred first with, e.g., BEAGLE. Handles a limited set of demographic scenarios. Modification in the code is required for more complex scenarios and Pe’er, 2013)
Sequentially Markovian coalescentdiCal2Testing any arbitrary demographic scenarioWorks with smaller, more fragmented datasets than PSMC. Handles more complex demographic models than MSMC (including admixture).Requires phased whole genome data and a model to be defined et al. , 2013)
Sequentially Markovian coalescentPSMCInferring change in Ne with time using a single diploid genomeAllows to track population size changes in time without a priori.Limited to one population and one diploid individual. Better used within MSMC. Requires phased whole genome data and masking regions with insufficient sequencing depth and Durbin, 2011)
Sequentially Markovian coalescentMSMCInferring change in Ne and migration rates with time between two populationsAllows to track population size changes in time without a priori. Allows estimating variation in cross-coalescence rate between two populationsLimited to the study of 8 diploid individuals from 2 populations at once. Requires whole genome phased data and masking regions with insufficient sequencing depth and Durbin, 2014)
BayesianMigrate-nInferring migration ratesBoth ML and bayesian methods can be used to estimate parametersOnly estimates population sizes and migration rates. Not suited for large datasets. Private input format and Palczewski, 2010)
BayesianIMa2Inferring parameters from an isolation with migration modelFully bayesian approach, can perform joint estimates of parameters in L-mode and test for nested modelsIM model is the only one available. Discrete admixture cannot be tested. Long computation times. Recent splits lead to overestimate migration rates and Nielsen, 2007)
BayesianG-PhoCSEstimating population divergence and migration parameters using a coalescent frameworkBayesian + MCMC, handles ancient samplesParameters scaled by mutation rate, no admixture et al. , 2011)
Ancestral Recombination Graphs/coalescenceARGWeaverRetracing the whole process of recombination and coalescence along a genomeProvides quantitative estimates for TMRCA and topologies at each locus. Estimates effective population size. Provides tools to extract summary statistics for the topologies retrieved.High computing cost. Requires phased whole-genome data. et al. , 2014)
coalescent simulationsms, msms, msABC,msnsamBuilding any arbitrary scenario using a coalescent frameworkAny arbitrary scenario can be implemented. Handles SNP, microsatellites and sequence data. msms can include selection in the model.Can be difficult to handle for the naive user (but see coala)
Scripts for implementing model comparison with ABC, with heterogeneous gene flow across loci:
(Hudson, 2002; Ewing and Hermisson, 2010; Pavlidis et al. , 2010)
coalescent simulationsscrmFast simulation of chromosome-scale sequencesSyntax similar to ms, handles any arbitrary scenarioDoes not handle gene conversion and fixed number of segregating sites (unlike ms) et al. , 2015)
coalescent simulationsfastsimcoal2Building any arbitrary scenario using a coalescent frameworkAny arbitrary scenario can be implemented. Handles SNP, microsatellites and sequence data. Does not handle selection and Foll, 2011)
ABC/Composite Likelihoodfastsimcoal2Model comparison and parameters estimationPerforms coalescent simulations, parameter estimation and model testing using a fast likelihood method. Can handle arbitrarily complex scenarios for any type of markerSummary statistics need to be calculated through Arlequin, slowing down their computation et al. , 2013)
ABC/coalescent simulationscoalaCombining coalescent simulators within a single frameworkFacilitates the building of scenarios and computes summary statistics for simulationsIncludes so far ms, msms and scrm and Metzler, 2016)
ABCabcPerforms all steps for model-checking and parameters estimation for ABC analysesInformative vignette, allows graphical representation, complete and robustDoes not perform coalescent simulations (but can be used in combination with coala)éry et al. , 2012)
ABCDIYABCComplete ABC analysis, from simulations to model checking and parameters estimationUser-friendlyDoes not allow to model continuous gene flow et al. , 2008)
ABCABCToolboxComplete ABC analysis, from simulations to model checking and parameters estimationModular, facilitates the computation of summary statisticsCurrent version is Beta (15/01/2016) et al. , 2010)
ABCPopSizeABCInferring change in Ne using whole-genome dataSupposed to better assess recent events. Uses a set of summary statistics for the AFS and LD between markers. Handles multiple individualsApproximate bayesian approaches relie on summary statistics, not using the full information from observation. et al. , 2016)
Diffusion approximation of the AFS∂a∂i Model comparison and parameters estimationRun time does not depend on the number of SNPs included, does not require coalescent simulations, handles arbitrarily complex scenariosrequires some knowledge of Python. Limited to 3 populations
Scripts for implementing complex demographic histories with heterogeneous gene flow:
(Gutenkunst et al. , 2009)
Coalescence/Composite LikelihoodABLEModel comparison and parameters estimationUses both allele frequency spectrum and linkage disequilibrium within blocks of a pre-specified size. Handles whole-genome data and RAD-seq.Relies on ms syntax. Determining the most informative size for blocks requires performing pilot runs. et al., 2016)
Sequentially Markovian coalescentSMC++Inferring change in Ne with time and splitting time between two populationsCan analyze hundreds of individuals at a time and does not require phasingMasking regions as in MSMC. The ancestral allele is assumed to be the reference allele by default. Assumes a clean split for populations divergence. Future versions should allow gene flow inference. et al., 2016)
Composite LikelihoodStairwayInferring change in Ne with time for whole genomes and reduced-representation datasetsDoes not require phasing. Fast. Requires an estimate of the number of invariant sites. Not reliable for ancient history. and Fu, 2015)


Beerli P, Palczewski M (2010). Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics 185: 313–26.

Boistard S, Rodriguez W, Jay F, Mona S, Austerlitz F (2016). Inferring Population Size History from Large Samples of Genome-Wide Molecular Data – An Approximate Bayesian Computation Approach. PLoS Genet: 858–865.

Cornuet J-M, Santos F, Beaumont MA, Robert CP, Marin J-M, Balding DJ, et al. (2008). Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713–9.

Csilléry K, François O, Blum MGB (2012). abc: an R package for approximate Bayesian computation (ABC). Methods Ecol Evol 3: 475–479.

Ewing G, Hermisson J (2010). MSMS: A coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: 2064–2065.

Excoffier L, Dupanloup I, Huerta-Sanchez E, Sousa VC, Foll M (2013). Robust Demographic Inference from Genomic and SNP Data. PLoS Genet 9.

Excoffier L, Foll M (2011). Fastsimcoal: a Continuous-Time Coalescent Simulator of Genomic Diversity Under Arbitrarily Complex Evolutionary Scenarios. Bioinformatics 27: 1332–4.

Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A (2011). Bayesian inference of ancient human demography from individual genome sequences. Nat Genet 43: 1031–1034.

Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5.

Harris K, Nielsen R (2013). Inferring Demographic History from a Spectrum of Shared Haplotype Lengths. PLoS Genet 9.

Hey J, Nielsen R (2007). Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci U S A 104: 2785–90.

Hudson RR (2002). Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18: 337–338.

Li H, Durbin R (2011). Inference of human population history from individual whole-genome sequences. Nature 475: 493–496.

Palamara PF, Pe’er I (2013). Inference of historical migration rates via haplotype sharing. Bioinformatics 29: 180–188.

Pavlidis P, Laurent S, Stephan W (2010). MsABC: A modification of Hudson’s ms to facilitate multi-locus ABC analysis. Mol Ecol Resour 10: 723–727.

Rasmussen MD, Hubisz MJ, Gronau I, Siepel A (2014). Genome-Wide Inference of Ancestral Recombination Graphs. PLoS Genet 10.

Schiffels S, Durbin R (2014). Inferring human population size and separation history from multiple genome sequences. Nat Genet 46: 919–25.

Sheehan S, Harris K, Song YS (2013). Estimating Variable Effective Population Sizes from Multiple Genomes : A Sequentially Markov Conditional Sampling Distribution Approach. Genetics 194: 647–662.

Staab PR, Metzler D (2016). Coala: An R framework for coalescent simulation. Bioinformatics 32: 1903–1904.

Staab PR, Zhu S, Metzler D, Lunter G (2015). Scrm: Efficiently simulating long sequences using the approximated coalescent with recombination. Bioinformatics 31: 1680–1682.

Terhorst J, Kamm JA, Song YS (2016). Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49: 303–309.

Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010). ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11: 116.