Today I am updating the tables by adding two new methods, BetaScan and Stairway.
- BetaScan: long-term balancing selection
There are not many methods to detect ancient balancing selection. You can use a combination of summary statistics like Tajima’s D or nucleotide diversity, and check for outliers, but these statistics are prone to produce false negatives and false positives. Besides, ancient balancing selection can be hard to detect as its effects do not extend far from the site under selection since recombination had much more time to break associations between alleles. ARGWeaver is an interesting option to detect alleles that are extremely old, but is time-consuming at genomic scales.
BetaScan (Siewert and Voight 2017) is an interesting new method that uses predictions about the frequency spectrum of alleles linked to balanced polymorphisms (see Figure 1 in the paper for more details). In my experience it works rather well (I find many peaks at immune genes in multiple species, which makes sense from a biological perspective). It seems rather robust to demography. If you have a good idea of the demographic history for your species, it may be worth performing coalescent simulations and estimate the rate of false positives. Note that the method has been used on panmictic populations.
Quick tip: If you have an idea of the recombination rate/bp/generation (rho), then for a time of T generations since the balanced polymorphism arose, the 95% percentile of the length distribution of ancestral fragments on each side of the selected site is estimated by L=-log(0.05)/(T*rho). This value corrresponds to half the size of the window that should be used by the algorithm (see Sup. Mat. in the original paper). Adjust the size depending on how old you expect the balanced polymorphisms to be.
- Stairway: Multi-epoch model of population size changes
Methods that use the allele frequency spectrum to infer demography often require that the number of population size changes be defined a priori. Stairway (Liu and Fu, 2015) provides a way to use SNP frequency data to produce plots that remind skyline plots from, e.g., BEAST. You will need an estimate of the number of invariant sites (e.g. the number of bases covered by RAD-sequencing or the number of genomic regions with high enough depth of coverage) and the allele frequency spectrum of (filtered) SNPs. It is fast, does not require phasing, but should not be trusted for really old events. In my experience it seems to work on RAD-seq data, giving results that were consistent with the ML method implemented in fastsimcoal.
- New version of fastsimcoal
Note that there is a new version of fastsimcoal (v 2.6). Two major changes include:
1) the possibility to exclude singletons for optimizing demographic parameters (dadi did that already), which should limit the impact of badly called SNPs on inference.
2) the possibility to specify an average inbreeding rate for each population. This should be useful for scientists working on organisms that like selfing…
Liu X, Fu Y-X (2015). Exploring population size changes using SNP frequency spectra. Nat Genet 47: 555–559.
Siewert KM, Voight BF (2017). Detecting Long-Term Balancing Selection Using Allele Frequency Correlation. Mol Biol Evol 34: 2996–3005.