New methods added: KimTree, selEstim, ABLE

Today I am adding a few methods that aim at resolving relationships between populations, estimating selection coefficients and estimate parameters for arbitrarily complex scenarios.

SelEstim and Kimtree both relie on diffusion approximations, which model how alleles diffuse in a set of populations forward in time (whereas coalescent simulations work backward in time, something that has puzzled so many of us at first). They do not take into account mutation and are therefore suited for the study of adaptation in recently diverged populations.  Kimtree (Gautier and Vitalis, 2013) allows to estimate the divergence time between several populations, given a prior tree. It is robust to gene flow and variation in population size . The time is estimated as tau, which depends on the effective population size N and the time in demographic units t. It is therefore necessary to estimate the effective population size using other methods to get t in years/generations. An interesting aspect of this software is that it allows to estimate the support of the topology given as a prior using deviance information criterion (DIC). This is useful when trying to get the order at which populations diverge before using demographic models in, e.g., fastsimcoal or IMa2 that require prior information about the topology.

selEstim (Vitalis et al., 2014) is a method to detect Fst outliers that has been shown to perform better than Bayescan and provides an estimate of the selection coefficient, which I think is its big strength. Note however that it assumes an island model, even if it seems rather robust to false positives if a proper calibration procedure is used. I would suggest using the R function simulate.baypass() in BayPass (Gautier, 2015) to easily produce pseudo-observed datasets in the right format. This way you can also run a BayPass analysis and compare the results.

These two methods have been developed by the same people as BayPass (Gautier, 2015), and I would advise using the three of them in a single pipeline since they use the same input for allele counts. Note that they handle pooled data. This is a nice and powerful toolbox to make sense of new datasets, especially for pooled data and recent divergence. Note also that there is a much faster implementation of REHH now available in R to perform haplotype-based tests of selection (Gautier et al., 2017).

The last method is ABLE (Beeravolu et al., 2016), currently under review but extremely promising since it allows estimating parameters from any arbitrary complex scenario using whole genome data, in a way similar to fastsimcoal, except that it takes into account the linkage disequilibrium within blocks of a given size, which provides more information and can be useful to discriminate between different demographic scenarios. Long story short, it summarizes the data using blockwise site frequency spectrum (SFS), counting haplotypes to feed the SFS instead of using single SNPs. Since it models non recombining blocks of a size specified by the user, it also allows analyzing RAD-seq data. Note that it does NOT require phasing. It estimates an approximate composite likelihood that can be used to estimate models support. I would be personally careful when computing AIC based on this when differences in likelihood are small between models, since it can overestimate the support for the best model (check discussion in Excoffier 2013). One should perform the likelihood estimation 10-100 times and compare the distributions for each model, like what is done in the paper. Note that the syntax is mostly based on ms, and that it is apparently rather sensitive to small mistakes in data format, such as blank lines or white spaces…

Tables have been updated to take these additions into account. Do not hesitate to contact me for any comment or suggestion !

Thanks to Renaud Vitalis (selEstim and Kimtree) and Isaac Overcast (ABLE) for pointing these tools to me!
Enjoy playing around with these new toys…

References

Beeravolu CR, Hickerson MJ, Frantz LAF, Lohse K (2016). Approximate Likelihood Inference of Complex Population Histories and Recombination from Multiple Genomes. bioarXiv: 1–31.

Excoffier L, Dupanloup I, Huerta-Sanchez E, Sousa VC, Foll M (2013). Robust Demographic Inference from Genomic and SNP Data. PLoS Genet 9.

Gautier M (2015). Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates. Genetics 201: 1555–1579.

Gautier M, Klassmann A, Vitalis R (2017). rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure. Mol Ecol Resour 17: 78–90.

Gautier M, Vitalis R (2013). Inferring population histories using genome-wide allele frequency data. Mol Biol Evol 30: 654–68.

Vitalis R, Gautier M, Dawson KJ, Beaumont MA (2014). Detecting and measuring selection from gene frequency data. Genetics 196: 799–817.

Leave a Reply

Your email address will not be published. Required fields are marked *