Here you will find a summary of methods aiming at identifying population structure.
|Software||Type of method||Purpose||Specifics||Issues and warnings||Link||Reference|
|SPRelate||Multivariate analysis||Summarizing variance across loci and visualizing inter-individual genetic distance||Fast. Can use VCF files as an input||Requires careful interpretation (Jombard et al. 2009)||https://bioconductor.org/packages/release/bioc/html/SNPRelate.html||(Zheng et al. , 2012)|
|Eigenstrat/smartpca||Multivariate analysis||Summarizing variance across loci and visualizing inter-individual genetic distance||Fast. Can use VCF files as an input||Requires careful interpretation (Jombard et al. 2009)||https://github.com/DReichLab/EIG/tree/master/EIGENSTRAT||(Price et al. , 2006)|
|DAPC (adegenet)||Multivariate analysis/Clustering||Maximizes divergence between groups identified by PCA||Fast. Less sensitive to HWE assumptions. Claims to be more efficient than Structure||Requires careful interpretation (Jombard et al. 2009)||http://adegenet.r-forge.r-project.org/||(Jombart et al. , 2010)|
|sPCA (adegenet)||Multivariate analysis/Clustering||Spatially explicit model to assess population structure||Spatially explicit and able to detect cryptic structure. Fast.||Does not take into account HW equilibrium or LD||http://adegenet.r-forge.r-project.org/||(Jombart et al. , 2008)|
|BEDASSLE||Differentiation and MCMC model testing||Identifies contribution of environment and geographical distance to populations differentiation||Less biased than Mantel tests, provides tools for model testing||Uses population-level data.||https://cran.r-project.org/web/packages/BEDASSLE/index.html||(Bradburd et al. , 2013)|
|GENELAND||Clustering and characterizing admixture||Grouping individuals in spatially consistent clusters maximizing HW equilibrium||Takes into account spatial variation, supposed to detect weak structure, framed in R||Immigrant alleles are assumed to be found only in new immigrants||https://cran.r-project.org/web/packages/Geneland/||(Guillot et al. , 2012)|
|sNMF||Clustering and characterizing admixture||Grouping individuals in clusters maximizing HW equilibrium and LD between loci||Fast (30X than ADMIXTURE)||Still slow computation time for large datasets||http://membres-timc.imag.fr/Olivier.Francois/snmf/index.htm||(Frichot et al. , 2014)|
|STRUCTURE||Clustering and characterizing admixture||Grouping individuals in clusters maximizing HW equilibrium and LD between loci||User friendly interface. Bayesian inference.||Slow for large datasets. Requires specific input format||http://pritchardlab.stanford.edu/structure.html||(Pritchard et al. , 2000)|
|FastSTRUCTURE||Clustering and characterizing admixture||Grouping individuals in clusters maximizing HW equilibrium and LD between loci||~100X faster than Structure||Approximate inference of the original Structure model||http://rajanil.github.io/fastStructure/||(Raj et al. , 2014)|
|ADMIXTURE||Clustering and characterizing admixture||Grouping individuals in clusters maximizing HW equilibrium and LD between loci||Maximum Likelihood, claimed to be faster than Structure. Note that it allows mixed ploidy (e.g. individuals that are haploids or diploids at a chromosome/locus depending on their sex can be analyzed jointly).||Often slower than its counterparts||https://www.genetics.ucla.edu/software/admixture/index.html||(Alexander and Novembre, 2009)|
|FineStructure/GlobeTrotter||Clustering and characterizing admixture||Chromosome painting, admixture and clustering||Estimates time since admixture, fast, specific tools for RAD-seq, set of scripts to facilitate analysis||Relies on Structure and fastStructure assumptions. Requires phased data.||http://paintmychromosomes.com/||(Hellenthal et al. , 2014)|
|PCAdmix||Clustering and characterizing admixture||Chromosome painting||Fast, uses HMM to smooth out windows and limit noise due to low confidence ancestry||Requires a priori definition of ancestral populations and phased haplotypes||https://sites.google.com/site/pcadmix/||(Brisbin et al. , 2012)|
|Splitstree||Phylogeny/Network||Network reconstruction and phylogenetic relationships||User friendly interface, proposes a variety of methods for networks reconstruction||Mostly descriptive||http://www.splitstree.org/||(Huson and Bryant, 2006)|
|SNPhylo||Phylogeny||Network reconstruction and phylogenetic relationships||Complete pipeline from SNP filtering to tree reconstruction||Should be used on complex of species or divergent populations with little migration||http://chibba.pgml.uga.edu/snphylo/||(Lee et al. , 2014)|
|RAxML||Phylogeny||Network reconstruction and phylogenetic relationships||Maximum Likelihood inference of phylogenetic relationships||Should be used on complex of species or divergent populations with little migration||http://sco.h-its.org/exelixis/web/software/raxml/index.html||(Stamatakis, 2014)|
|BEAST2||Phylogeny||Network reconstruction and phylogenetic relationships||User friendly. Can be used to track changes in effective population sizes (Bayesian Skyline Plots). Possible to estimate divergence times||Slow for large datasets. Requires sequence data that can be produced by , e.g., Stacks for RAD-seq data||http://beast2.org/||(Drummond and Rambaut, 2007; Bouckaert et al. , 2014)|
|PhyML||Phylogeny||Phylogenetic relationships||Maximum Likelihood inference of phylogenetic relationships. An online version is available||Should be used on complex of species or divergent populations with little migration||http://www.atgc-montpellier.fr/phyml/binaries.php||(Guindon et al. , 2010)|
|SNAPP||Phylogeny||Phylogenetic relationships||Handles SNP data||Remains slow for medium to large datasets (>1,000SNPs)||http://beast2.org/snapp/||(Bryant et al. , 2012)|
|*BEAST||Phylogeny and species tree inference||Divergence time estimation and phylogenetic relationships||Outputs a species tree instead of concatenated gene tree. Allows for testing consistency between phylogenetic signals at different loci||Slow for large datasets. Requires sequence data. Not suited for situations where gene flow/admixture occurrs||http://beast2.org/||(Heled and Drummond, 2010)|
|TREEMIX||Clustering and characterizing admixture||Admixture graph, infers most likely admixture events in a tree||Based on allele frequencies and can be used for pooled data.||Requires multiple runs to properly assess the likelihood of each model||https://bitbucket.org/nygcresearch/treemix/src||(Pickrell and Pritchard, 2012)|
|TWISST||Topology weighting||Chromosome painting, clustering and branching between populations||Retrieves the most likely coalescence pattern between several taxa along the genome. Can be seen as an extension of the ABBA/BABA test||Needs a priori grouping of individuals into taxa. Requires at least 4 taxa. Impractical for more than 6 taxa. Windows size must include enough SNPs to retrieve the correct topology but at the risk that regions with different histories are included||https://github.com/simonhmartin/twisst||(Martin and Van Belleghem, 2016)|
|LAMP||Pedigree, Identity by descent/state||Chromosome painting, relatedness||LAMP also allows for association and pedigree analyses||Identifies local ancestry in windows (source of noise), requires phased data||http://lamp.icsi.berkeley.edu/lamp/||(Baran et al. , 2012)|
|PLINK||Pedigree, Identity by descent/state||Estimating inbreeding and relatedness||Allows studying identity by descent and by state. PLINK is a multi-purpose tool, facilitating data analysis within the same software||NA||http://pngu.mgh.harvard.edu/~purcell/plink/||(Purcell et al. , 2007)|
|VCFTOOLS||Pedigree, Identity by descent/state||Estimating inbreeding and relatedness||Computes unadjusted Ajk and kinship coefficient||NA||https://vcftools.github.io/man_latest.html||(Danecek et al. , 2011)|
|KING||Pedigree, Identity by descent/state||Estimating inbreeding and relatedness, multivariate analysis||Mendelian error checking, testing family structure, highly accurate kinship coefficient, association analysis, population structure inference||Kinship coefficient also computed in VCFTOOLS||http://people.virginia.edu/~wc9c/KING/Download.htm||(Manichaikul et al. , 2010)|
|BAYPASS/Bayenv||Variance/covariance matrix||Building a population covariance matrix across population allele frequencies, similar to TREEMIX||Can handle pooled data||Matrices are mostly designed to provide a neutral model for assessing selection, but can be used to infer population structure||http://www1.montpellier.inra.fr/CBGP/software/baypass/ ; https://bitbucket.org/tguenther/bayenv2_public/src||(Günther and Coop, 2013; Gautier, 2015)|
|Arlequin||AMOVA (Analysis of MOlecular VAriance)||Characterizing hierarchical population structure||Arlequin allows for a variety of other analyses of diversity||Requires a priori assignment of individuals to populations, data formatting is required prior analysis||http://cmpg.unibe.ch/software/arlequin35/Arl35Downloads.html||(Excoffier and Lischer, 2010)|
|POPTREE2||Genetic distance||Visualizing a matrix of pairwise differentiation statistics as a tree||Can be used for pooled datasets, several statistics can be used||Differentiation measures alone do not necessarily retrieve the actual history of populations||http://www.med.kagawa-u.ac.jp/~genomelb/takezaki/poptree2/index.html||(Takezaki et al. , 2010)|
|Stacks||Differentiation/Diversity/Phylogeny||Processing RAD-seq data and facilitate their analysis||Designed for RAD-seq data, variety of output formats for downstream analyses. Allows to retrieve DNA sequences for each locus||NA||http://catchenlab.life.illinois.edu/stacks/||(Catchen et al. , 2011)|
|Popoolation/Popoolation2/Popoolation TE||Differentiation/Diversity||Extracting summary statistics from pooled data||Explicitely corrects for sampling bias in pooled data||Mostly limited to a few summary statistics. A pipeline dedicated to TE detection is also available||https://sourceforge.net/p/popoolation/wiki/Main/||(Kofler, Orozco-terWengel, et al. , 2011; Kofler, Pandey, et al. , 2011)|
|POPGenome||Differentiation/Diversity/Recombination||Computing summary statistics based on AFS and LD along genomes||Accepts VCF and GFF/GFT files, efficient and fast. Tests for admixture available (ABBA BABA test). Includes basic coalescence simulations (ms and msms)||Mostly limited to summary statistics (but coalescent simulations are possible). No built-in SNP calling module||http://catchenlab.life.illinois.edu/stacks/||(Pfeifer et al. , 2014)|
|ANGSD||Differentiation/Diversity/Recombination||Computing summary statistics based on AFS and LD along genomes||Able to process BAM files, built-in procedures for data filtering, admixture analysis||Mostly limited to summary statistics||https://github.com/ANGSD/angsd||(Korneliussen et al. , 2014)|
|Arlequin||Differentiation/Diversity/Recombination||Computing summary statistics based on AFS and LD along genomes||Can output AFS for further analysis in fastsimcoal2||Slower than PopGenome, requires a private format||http://cmpg.unibe.ch/software/arlequin35/Arl35Downloads.html||(Excoffier and Lischer, 2010)|
|VCFTOOLS||Differentiation/Diversity/Recombination||Computing summary statistics based on AFS and LD along genomes||Fast. VCFTOOLS can also be used for SNP filtering||Less summary statistics than POPGenome||https://vcftools.github.io/man_latest.html||(Danecek et al. , 2011)|
|LDHat||Recombination||Estimating variation in recombination rates along a genome||Handles unphased and missing data, underlying model can be used for organisms such as viruses or bacteria||Limited to 300 sequences, private format, model for recombination hotspots based on human data||http://ldhat.sourceforge.net/||(McVean et al. , 2002)|
|LDHot||Recombination||Identifying recombination hotspots||Specifically designed for detecting recombination hotspots||Requires data to be phased, working with LDHat||https://github.com/auton1/LDhot||(Myers, 2005)|
|Kimtree||Genetic distance||Estimating divergence time between populations and testing for topologies||The method is conditional on a prior topology provided by the user. It computes DIC for a given topology, allowing to test for the best one.||Times are given in diffusion time scale, and can be converted in demographic times using independent estimates of Ne.||http://www1.montpellier.inra.fr/CBGP/software/kimtree/index.html||(Gautier and Vitalis, 2013)|
|npstat||Differentiation/Diversity||Extracting summary statistics from pooled data||Explicitely corrects for sampling bias in pooled data. Allows computing tests using an outgroup (MK test, Fay and Wu's H) and characterizing coding mutations.||Mostly limited to summary statistics, but more complete than Popoolation.||https://github.com/lucaferretti/npstat||(Ferretti et al. 2013)|
|SVDQuartets||Phylogeny||Builds species trees using short non-recombining sequences||Coalescence-based. Suitable for short loci (e.g. RAD-seq and GBS)||See ASTRAL-2 and Chou et al. 2015||http://www.stat.osu.edu/~lkubatko/software/SVDquartets/ ||(Chifman and Kubatko, 2014)|
|ASTRAL-2 ||Phylogeny||Builds species trees using short non-recombining sequences||Coalescence-based. Suitable for short loci (e.g. RAD-seq and GBS)||More reliable under high incomplete lineage sorting that SVDQuartets and NJst (Chou et al. 2015)||https://github.com/smirarab/ASTRAL||(Mirarab and Warnow, 2015)|
|NJst (in phybase)||Phylogeny||Builds species trees using short non-recombining sequences||Coalescence-based. Suitable for short loci (e.g. RAD-seq and GBS) ||See ASTRAL-2 and Chou et al. 2015||https://code.google.com/archive/p/phybase/downloads ||(Liu and Yu, 2011)|
Alexander DH, Novembre J (2009). Fast Model-Based Estimation of Ancestry in Unrelated Individuals. Genome Res: 1655–1664.
Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C, et al. (2012). Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 28: 1359–1367.
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, et al. (2014). BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLoS Comput Biol 10: 1–6.
Bradburd GS, Ralph PL, Coop GM (2013). Disentangling the effects of geographic and ecological isolation on genetic differentiation. Evolution (N Y) 67: 3258–3273.
Brisbin A, Bryc K, Byrnes J, Zakharia F, Omberg L, Degenhardt J, et al. (2012). PCAdmix: Principal Components-Based Assignment of Ancestry along Each Chromosome in Individuals with Admixed Ancestry from Two or More Populations. Hum Biol 84: 343–364.
Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, Roychoudhury A (2012). Inferring species trees directly from biallelic genetic markers: Bypassing gene trees in a full coalescent analysis. Mol Biol Evol 29: 1917–1932.
Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011). Stacks: building and genotyping Loci de novo from short-read sequences. G3 (Bethesda) 1: 171–82.
Chifman J, Kubatko L (2014). Quartet inference from SNP data under the coalescent model. Bioinformatics 30: 3317–3324.
Chou J, Gupta A, Yaduvanshi S, Davidson R, Nute M, Mirarab S, et al. (2015). A comparative study of SVDquartets and other coalescent-based species tree estimation methods. BMC Genomics 16: S2.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. (2011). The variant call format and VCFtools. Bioinformatics 27: 2156–2158.
Drummond AJ, Rambaut A (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7: 214.
Excoffier L, Lischer HEL (2010). Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10: 564–7.
Ferretti L., Ramos-Onsins S.E. and Perez-Enciso M (2013). Population genomics from pool sequencing. Molecular Ecology 22:5561-76.
Frichot E, Mathieu F, Trouillon T, Bouchard G, François O (2014). Fast and efficient estimation of individual ancestry coefficients. Genetics 196: 973–983.
Gautier M, Vitalis R (2013). Inferring population histories using genome-wide allele frequency data. Mol Biol Evol 30: 654–68.
Gautier M (2015). Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates. Genetics 201: 1555–1579.
Guillot G, Renaud S, Ledevin R, Michaux J, Claude J (2012). A unifying model for the analysis of phenotypic, genetic, and geographic data. Syst Biol 61: 897–911.
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol 59: 307–321.
Günther T, Coop G (2013). Robust identification of local adaptation from allele frequencies. Genetics 195: 205–220.
Heled J, Drummond AJ (2010). Bayesian Inference of Species Trees from Multilocus Data. Mol Biol Evol 27: 570–580.
Hellenthal G, Busby GBJ, Band G, Wilson JF, Capelli C, Falush D, et al. (2014). A Genetic Atlas of Human Admixture History. Science (80- ) 343: 747–751.
Huson DH, Bryant D (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23: 254–267.
Jombart T, Devillard S, Balloux F, Falush D, Stephens M, Pritchard J, et al. (2010). Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11: 94.
Jombart T, Devillard S, Dufour a-B, Pontier D (2008). Revealing cryptic spatial patterns in genetic variability by a new multivariate method. Heredity (Edinb) 101: 92–103.
Kofler R, Orozco-terWengel P, De Maio N, Pandey RV, Nolte V, Futschik A, et al. (2011). PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS One 6: e15925.
Kofler R, Pandey RV, Schlötterer C (2011). PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27: 3435–6.
Korneliussen TS, Albrechtsen A, Nielsen R (2014). ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15: 356.
Lee T-H, Guo H, Wang X, Kim C, Paterson AH (2014). SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics 15: 162.
Liu L, Yu L (2011). Estimating species trees from unrooted gene trees. Syst Biol 60: 661–667.
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen W-M (2010). Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867–2873.
Martin SH, Van Belleghem SM (2016). Exploring evolutionary relationships across the genome using topology weighting. bioRxiv: 69112.
McVean G, Awadalla P, Fearnhead P (2002). A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160: 1231–1241.
Mirarab S, Warnow T (2015). ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31: i44–i52.
Myers S (2005). A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome. Science 310: 321–324.
Pfeifer B, Wittelsburger U, Ramos-Onsins SE, Lercher MJ (2014). PopGenome: An efficient swiss army knife for population genomic analyses in R. Mol Biol Evol 31: 1929–1936.
Pickrell JK, Pritchard JK (2012). Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8: e1002967.
Price A, Patterson NJ, Plenge RM, Weinblatt ME, Shadick N a, Reich D (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–9.
Pritchard JK, Stephens M, Donnelly P (2000). Inference of population structure using multilocus genotype data. Genetics 155: 945–959.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. (2007). PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81: 559–575.
Raj A, Stephens M, Pritchard JK (2014). FastSTRUCTURE: Variational inference of population structure in large SNP data sets. Genetics 197: 573–589.
Stamatakis A (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312–1313.
Takezaki N, Nei M, Tamura K (2010). POPTREE2: Software for constructing population trees from allele frequency data and computing other population statistics with windows interface. Mol Biol Evol 27: 747–752.
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28: 3326–3328.