3 Identifying Population Structure

Session Presenters

Required packages

As always we need to have dartRverse installed and loaded. In addition you need to have dartR.popgen installend.

library(dartRverse)

Uncovering Population Structure

Introduction

Why population structure matters

What is a “population” in population genetics?
Genetic structure defined – how allele frequencies vary among groups.
Key drivers: effective population size, gene flow/immigration, natural selection, drift.
Isolation-by-distance & sampling design – avoiding spurious structure.
Why we care: management units, inbreeding, local adaptation, introgression.

Analytical toolkit

Approach	Purpose	dartR entry point
PCoA / PCA	Quick multivariate overview	`gl.pcoa()`, `gl.pcoa.plot()`
STRUCTURE	Bayesian clustering	`gl.run.structure()`, `gl.plot.structure()`, `gl.map.structure()`,`gl.read.structure()`
fastSTRUCTURE	Fast variational STRUCTURE (Mac/Linux)	`gl.run.faststructure()`, `gl.plot.faststructure()`
sNMF	Sparse non-negative matrix factorisation	`gl.run.snmf()`, `gl.plot.snmf()`, `gl.map.snmf()`
POPCLUSTER	Ultra-fast admixture inference	`gl.run.popcluster()`, `gl.plot.popcluster()`, `gl.map.popcluster()`

Example

To help us understand our analytical toolkit better let’s start with a simple example using a simulated dataset.

The dataset contains 10 populations of 30 individuals each and 1000 loci and is part of the dartRverse package.

To get an overview of the dataset, we use the function: gl.map.interactive which plots the individuals on a map. Please note the genlight/dartR object needs to have valid lat long coordinates for each individual to be able to do so.

table(pop(possums.gl)) #check the individuals and the populations


 A  B  C  D  E  F  G  H  I  J 
30 30 30 30 30 30 30 30 30 30

gl.map.interactive(possums.gl)

Starting gl.map.interactive 
  Processing genlight object with SNP data
Completed: gl.map.interactive

The populations are fairly independent but are linked by some imigration, so a typical Metapopulation scenario. The populations are named A to J and follow more or less an isolation by distance. So population next to each other (e.g. B and C) are fairly well mixed and populations further apart from the rest (e.g. D) are more isolated.

PCA

Principal Component Analysis (PCA) is a powerful statistical technique used for dimensionality reduction, data visualization, and feature extraction. In many real-world datasets, like our SNP datasets, data can have a very high number of features or dimensions. PCA helps by transforming the data into a new coordinate system where most of the variability in the data can be captured in fewer dimensions.

At its core, PCA identifies the directions (called principal components) along which the variation in the data is highest. These directions are orthogonal (perpendicular) to each other and are ranked by how much variance they capture from the original data. By projecting data onto the top few principal components, we can often retain the most important information while discarding noise or less useful details.

Great paper on PCA: Distances and their visualization in studies of spatial-temporal genetic variation using single nucleotide polymorphisms (SNPs)

# Undertake a PCA on the raw data
pc <- gl.pcoa(possums.gl, verbose = 3)

Starting gl.pcoa 
  Processing genlight object with SNP data
  Warning: Number of loci is less than the number of individuals to be represented
  Performing a PCA, individuals as entities, loci as attributes, SNP genotype as state

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

  Ordination yielded 9 informative dimensions( broken-stick criterion) from 299 original dimensions
    PCA Axis 1 explains 10.5 % of the total variance
    PCA Axis 1 and 2 combined explain 19.1 % of the total variance
    PCA Axis 1-3 combined explain 26.8 % of the total variance
Starting gl.colors 
Selected color type 2 
Completed: gl.colors

Completed: gl.pcoa

# Plot the first two dimensions of the PCA
pc_a1a2 <- gl.pcoa.plot(glPca = pc, x = possums.gl)

# Plot the first and third dimensions of the PCA
pc_a1a3 <- gl.pcoa.plot(glPca = pc, x = possums.gl, xaxis = 1, yaxis = 3)

# Plot the first three dimensions of the PCA
pc_a1a3 <- gl.pcoa.plot(glPca = pc, x = possums.gl, xaxis = 1, yaxis = 2, zaxis = 3)

Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors

Select one cluster

# Select only the data from one cluster in the primary PCA
temp <- gl.drop.pop(x = possums.gl, pop.list = c('D', 'A', 'E', 'F', 'H'))

Starting gl.drop.pop 
  Processing genlight object with SNP data
  Checking for presence of nominated populations, deleting them
  Warning: Resultant dataset may contain monomorphic loci
  Locus metrics not recalculated
Completed: gl.drop.pop

# Plot the first two dimensions of the secondary PCA
pc <- gl.pcoa(temp, verbose = 3)

Starting gl.pcoa 
  Processing genlight object with SNP data
  Performing a PCA, individuals as entities, loci as attributes, SNP genotype as state

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

  Ordination yielded 13 informative dimensions( broken-stick criterion) from 149 original dimensions
    PCA Axis 1 explains 12.8 % of the total variance
    PCA Axis 1 and 2 combined explain 20.9 % of the total variance
    PCA Axis 1-3 combined explain 25.7 % of the total variance
Starting gl.colors 
Selected color type 2 
Completed: gl.colors

Completed: gl.pcoa

pc_plot <- gl.pcoa.plot(glPca =  pc, x = temp)

Starting gl.pcoa.plot 
  Processing an ordination file (glPca)
  Processing genlight object with SNP data
  Plotting populations in a space defined by the SNPs
  Preparing plot .... please wait

Completed: gl.pcoa.plot

Canberra grassland earless dragon

The Canberra Grassland Earless Dragon (Tympanocryptis lineata) is a small lizard that is usually less than 150 mm long and weighs 5–9 g. Each one has distinctive markings on its back that can be used to identify individuals.

Canberra Grassland Earless Dragons are grassland specialists, preferring natural temperate grasslands. Found in ACT grasslands and adjacent grasslands in NSW.

Lets take at a look at the pca of CGED.

There seems to be some grasslands more isolated than others.

What can go wrong

PCA is a hypothesis generating tool, not a tool for definitive decisions of the structure of populations.

Missing data causes distortion, which can lead to misinterpretation.

dartR, that uses the adgenet package for its pca, fills missing data with the global average.

You can choose alternative methods of filling in the missing data prior to running your pca using the gl.impute function.

Structure variants can also turn up on a PCA, like an inversion.

PCA is sample size dependent - but this is more for the top two dimensions not all the informative dimensions.

Structure and FastStructure (Bayesian clustering models)

Structure attempts to find the number of populations or sources (K ) at which population genetics parameters (i.e. Hardy–Weinberg equilibrium within populations and linkage equilibrium between loci) are maximized.

Admixture Model

Definition: The admixture model assumes that individuals can have ancestry from multiple populations. This means that the genetic makeup of an individual can be a mixture of two or more populations. This model is particularly useful for analyzing genetic data from populations that are known to have mixed or where there is gene flow between populations.
Application: It is applied when there is historical or recent admixture between populations, and it allows for the estimation of individual ancestry proportions from each of the inferred clusters. For example, an individual might be 50% from population A, 30% from population B, and 20% from population C under the admixture model.
Utility: The admixture model can uncover complex patterns of genetic structure that are not apparent under the assumption of discrete, non-overlapping populations.

No-Admixture Model

Definition: The no-admixture model assumes that individuals have ancestry from only one population. This model is particularly useful for analyzing genetic data from populations that are known to be isolated from one another.
Application: This model is used in situations where populations are relatively well-defined and isolated, with little to no gene flow between them. It simplifies the analysis by considering that an individual’s entire genetic information originates from one of the K clusters without any mixture.
Utility: The no-admixture model is useful for identifying distinct populations and their members, especially in cases where populations are clearly separated geographically or temporally.

To run STRUCTURE from within R, we need to download the non-GUI executable (the version without frontend) for your operating system [e.g windows, mac or linux]. You can download STRUCTURE for your OS from http://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure.html.

The possible arguments are listed below:

parameter	description
k.range	vector of values to for maxpop in multiple runs. If set to NULL, a single STRUCTURE run is conducted with maxpops groups. If specified, do not also specify maxpops.
num.k.rep	number of replicates for each value in k.range.
label	label to use for input and output files
delete.files	logical. Delete all files when STRUCTURE is finished?
exec	name of executable for STRUCTURE. Defaults to “structure”.
burnin	number of iterations for MCMC burnin.
numreps	number of MCMC replicates.
noadmix	logical. No admixture?
freqscorr	logical. Correlated frequencies?
randomize	randomize.
pop.prior	a character specifying which population prior model to use: “locprior” or “usepopinfo”.
locpriorinit	parameterizes locprior parameter r - how informative the populations are. Only used when pop.prior = “locprior”.
maxlocprior	specifies range of locprior parameter r. Only used when pop.prior = “locprior”.
gensback	integer defining the number of generations back to test for immigrant ancestry. Only used when pop.prior = “usepopinfo”.
migrprior	numeric between 0 and 1 listing migration prior. Only used when pop.prior = “usepopinfo”.
popflag	a vector of integers (0, 1) or logicals identifiying whether or not to use strata information. Only used when pop.prior = “usepopinfo”.
pops	vector of population labels to be used in place of numbers in STRUCTURE file.

Running STRUCTURE

structure_file <- ifelse('structure.exe' %in% list.files('./binaries/'), 
                         './binaries/structure.exe', './binaries/structure')
srnoad <- gl.run.structure(possums.gl, k.range = 2:7, num.k.rep = 2, 
                           exec = structure_file,plot.out = FALSE,
                           burnin=500, numreps=1000, 
                           noadmix=TRUE)

Starting gl.run.structure 
  Processing genlight object with SNP data

Completed: gl.run.structure

Structure Results

Okay now that we got that out of our way lets see how to interpret the results of the structure run. However, to really trust our results we would want to run gl.run.structure with larger burn in and number of reps, more like burnin=50000 and numreps=100000. But this takes a while, so we will not be doing that today.

Evanno Plots

The Evanno method is a statistical approach used to determine the most likely number of genetic clusters (K) present in a dataset analyzed by STRUCTURE software. STRUCTURE is a computational tool used for inferring population structure using genetic data. Identifying the correct number of clusters (K) is crucial for accurately interpreting genetic data in the context of population structure, evolutionary biology, and conservation genetics. The Evanno method specifically addresses the challenge of choosing the optimal K by analyzing the rate of change in the likelihood of data between successive K values, rather than just relying on the maximum likelihood. This is done through the calculation of ΔK, a quantity based on the second order rate of change of the likelihood function with respect to K. The method suggests that the value of K corresponding to the highest ΔK should be considered the most likely number of clusters present in the dataset.

The Evanno method is a method to determine the most likely number of populations. It is based on the second order rate of change of the likelihood function with respect to K. The method is implemented in the gl.evanno function.

ev <- gl.evanno(srnoad)

Plotting the results (Q matrix)

The Q matrix represents the estimated ancestry proportions of individuals across different inferred genetic clusters. STRUCTURE aims to identify K clusters (populations) that best explain the patterns of genetic variation observed in the data, with K either being predefined by the user or determined using methods like the Evanno method.

The Q matrix is essentially a matrix where each row corresponds to an individual in the dataset, and each column represents one of the K inferred genetic clusters. The entries in the matrix are the estimated proportions (ranging from 0 to 1) of each individual’s genome that originated from each cluster. The sum of an individual’s ancestry proportions across all K clusters equals 1.

The values in the Q matrix can be interpreted as the fraction of an individual’s ancestry that comes from each of the K clusters. The Q matrix is often visualized using bar plots or stacked bar graphs, where each individual’s ancestry proportions are shown as segments of a bar colored differently for each cluster.

To get a plot for a certain level you need to specify K or at least a range of K.

qmatnoad <- gl.plot.structure(srnoad, K=3:5)

Starting gl.plot.structure

Completed: gl.plot.structure

head(qmatnoad[[1]])

    Label cluster1 cluster2 cluster3      K orig.pop   ord
   <char>    <num>    <num>    <num> <char>   <fctr> <int>
1:      1        0        1        0      3        A     1
2:     10        0        1        0      3        A     2
3:    100        1        0        0      3        D    91
4:    101        1        0        0      3        D    92
5:    102        1        0        0      3        D    93
6:    103        1        0        0      3        D    94

A “spatial” structure plot

gm <- gl.map.structure(qmat = qmatnoad, x = possums.gl,K=5, scalex=1, scaley=0.5 )

Exercise: Study the results using different Ks and with/without admixture

The above run was a Structure run using the admixture model for a range of K=2:7. Study the results by plotting it for various Ks and try to interpret the results in the light of the simulated population structure.

How about trying it on your own data or the example data provided.

Running fastStructure

Faststructure

Faststructure is a faster implementation of the structure algorithm. Be aware though it is named Fast’structure’ it is a fairly different implementation of the original approach, hence the results might differ from the original STRUCTURE. The method is based on a variational Bayesian framework and is designed to be faster and more scalable than the original STRUCTURE software. It is particularly useful for analyzing large datasets with many individuals and/or many SNPs (>5000). One of the most important differences is that there is no no-admixture model in Faststructure, but you can run two models that allow for “complex” situtations (logistic prior) and situations where the ancestry is more evenly distributed (simple prior). Also the way how to identify K differ between the methods. We will run the previous examples with both settings and compare the results.

The method is now implemented in the gl.run.faststructure function. Unfortunatley noone to my knowledge has compiled Faststructure for windows, so it is only available for Linux and Mac. We also need to have plink installed as this is the required input format for faststructure.

Faststructure simple prior

platypus.gl

 ********************
 *** DARTR OBJECT ***
 ********************

 ** 81 genotypes,  1,000 SNPs , size: 7.4 Mb

    missing data: 5902 (=7.29 %) scored as NA

 ** Genetic data
   @gen: list of 81 SNPbin
   @ploidy: ploidy of each individual  (range: 2-2)

 ** Additional data
   @ind.names:  81 individual labels
   @loc.names:  1000 locus labels
   @loc.all:  1000 allele labels
   @position: integer storing positions of the SNPs [within 69 base sequence]
   @pop: population of each individual (group size range: 17-41)
   @other: a list containing: loc.metrics, ind.metrics, latlon, loc.metrics.flags, verbose, history 
    @other$ind.metrics: id, pop, lat, lon, Microchip, Weight, Sex, AgeClass, concentration, PlateID, Row, Column, order_dart, id_b, service, plate_location, sex 
    @other$loc.metrics: AlleleID, CloneID, AlleleSequence, TrimmedSequence, Chrom_Platypus_Chrom_NCBIv1, ChromPos_Platypus_Chrom_NCBIv1, AlnCnt_Platypus_Chrom_NCBIv1, AlnEvalue_Platypus_Chrom_NCBIv1, SNP, SnpPosition, CallRate, OneRatioRef, OneRatioSnp, FreqHomRef, FreqHomSnp, FreqHets, PICRef, PICSnp, AvgPIC, AvgCountRef, AvgCountSnp, RepAvg, clone, uid, rdepth, monomorphs, maf, OneRatio, PIC 
   @other$latlon[g]: coordinates for all individuals are attached

gl.map.interactive(platypus.gl)

Starting gl.map.interactive 
  Processing genlight object with SNP data
  Warning: data include loci that are scored NA across all individuals.
  Consider filtering using gl <- gl.filter.allna(gl)
Completed: gl.map.interactive

# my_fast <- gl.run.faststructure(platypus.gl,  
#                                 k.range = 2:4,
#                                 num.k.rep = 1,
#                                 exec = "./binaries/fastStructure",
#                                 exec.plink = "./binaries/", output = tempdir())
# 
# gl.plot.faststructure(sr = my_fast,k.range = 3, border_ind = 0)

This plot was from running gl.run.faststructure with only one rep. Try increasing the number of reps (num.k.rep = x) and see how this plot changes

Here we can check the marginal likelihoods for the different K values. The recommended K is then the one with the highest marginal likelihood at the lowest K possible. So here we would decide on K=4. As before we can plot the Q matrix and the spatial structure plot.

Running snmf

my_snmf <- gl.run.snmf(possums.gl, minK = 2, maxK = 7, rep = 2, regularization = 10)

gl.plot.snmf(snmf_result = my_snmf,plot.K = 3, border_ind = 0)

Popcluster

my_popcluster <- gl.run.popcluster(x = possums.gl, minK = 2, maxK = 7, rep = 2, popcluster.path = './binaries/')

Starting gl.run.popcluster 
  Processing genlight object with SNP data

Completed: gl.run.popcluster

my_plot_popcluster <- gl.plot.popcluster(my_popcluster,plot.K = 3)

Starting gl.plot.popcluster

Completed: gl.plot.popcluster

gl.map.popcluster(x = possums.gl, qmat =  my_plot_popcluster)

Joining with `by = join_by(Label)`

$Q_name
$Q_name$A
   Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
9      9     9     9         0.0       1 0.811 0.143 0.046   A        A
3      3     3     3         0.0       1 0.840 0.000 0.160   A        A
5      5     5     5         0.0       1 0.844 0.000 0.156   A        A
8      8     8     8         0.0       1 0.845 0.051 0.104   A        A
4      4     4     4         0.0       1 0.887 0.000 0.113   A        A
2      2     2     2         0.0       1 0.896 0.104 0.000   A        A
17    17    17    17         0.0       1 0.901 0.099 0.000   A        A
19    19    19    19         0.0       1 0.908 0.059 0.033   A        A
27    27    27    27         0.0       1 0.931 0.000 0.069   A        A
7      7     7     7         0.0       1 0.949 0.000 0.051   A        A
1      1     1     1         0.0       1 0.970 0.030 0.000   A        A
25    25    25    25         0.0       1 0.974 0.026 0.000   A        A
15    15    15    15         0.0       1 0.979 0.000 0.021   A        A
23    23    23    23         0.0       1 0.979 0.000 0.021   A        A
16    16    16    16         0.0       1 0.998 0.001 0.001   A        A
6      6     6     6         0.0       1 0.999 0.000 0.001   A        A
13    13    13    13         0.0       1 0.999 0.000 0.001   A        A
20    20    20    20         0.0       1 0.999 0.000 0.001   A        A
28    28    28    28         0.0       1 0.999 0.000 0.001   A        A
10    10    10    10         0.0       1 0.999 0.001 0.000   A        A
11    11    11    11         0.0       1 0.999 0.001 0.000   A        A
12    12    12    12         0.0       1 0.999 0.001 0.000   A        A
14    14    14    14         0.0       1 0.999 0.001 0.000   A        A
18    18    18    18         0.0       1 0.999 0.001 0.000   A        A
21    21    21    21         0.0       1 0.999 0.001 0.000   A        A
22    22    22    22         0.0       1 0.999 0.001 0.000   A        A
24    24    24    24         0.0       1 0.999 0.001 0.000   A        A
26    26    26    26         0.0       1 0.999 0.001 0.000   A        A
29    29    29    29         0.0       1 0.999 0.001 0.000   A        A
30    30    30    30         0.0       1 0.999 0.001 0.000   A        A

$Q_name$B
   Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
48    48    48    48         0.0       2 0.000 0.762 0.238   B        B
38    38    38    38         0.0       2 0.000 0.799 0.201   B        B
31    31    31    31         0.0       2 0.000 0.822 0.178   B        B
56    56    56    56         0.0       2 0.000 0.845 0.155   B        B
45    45    45    45         0.0       2 0.000 0.848 0.152   B        B
54    54    54    54         0.0       2 0.000 0.848 0.152   B        B
59    59    59    59         0.0       2 0.000 0.892 0.108   B        B
57    57    57    57         0.0       2 0.000 0.894 0.106   B        B
32    32    32    32         0.0       2 0.000 0.903 0.097   B        B
34    34    34    34         0.0       2 0.000 0.915 0.085   B        B
50    50    50    50         0.0       2 0.000 0.926 0.074   B        B
39    39    39    39         0.0       2 0.000 0.930 0.070   B        B
36    36    36    36         0.0       2 0.000 0.999 0.001   B        B
37    37    37    37         0.0       2 0.000 0.999 0.001   B        B
51    51    51    51         0.0       2 0.001 0.997 0.002   B        B
33    33    33    33         0.0       2 0.001 0.999 0.000   B        B
60    60    60    60         0.0       2 0.001 0.999 0.000   B        B
43    43    43    43         0.0       2 0.004 0.828 0.168   B        B
58    58    58    58         0.0       2 0.004 0.857 0.139   B        B
52    52    52    52         0.0       2 0.005 0.787 0.208   B        B
41    41    41    41         0.0       2 0.025 0.975 0.000   B        B
55    55    55    55         0.0       2 0.038 0.850 0.112   B        B
44    44    44    44         0.0       2 0.054 0.860 0.086   B        B
46    46    46    46         0.0       2 0.071 0.929 0.000   B        B
35    35    35    35         0.0       2 0.076 0.924 0.000   B        B
47    47    47    47         0.0       2 0.106 0.791 0.103   B        B
49    49    49    49         0.0       2 0.151 0.849 0.000   B        B
53    53    53    53         0.0       2 0.173 0.827 0.000   B        B
40    40    40    40         0.0       2 0.175 0.825 0.000   B        B
42    42    42    42         0.0       2 0.228 0.715 0.057   B        B

$Q_name$C
   Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
78    78    78    78         0.0       2 0.000 0.508 0.492   C        C
80    80    80    80         0.0       2 0.000 0.513 0.487   C        C
83    83    83    83         0.0       2 0.000 0.515 0.485   C        C
67    67    67    67         0.0       2 0.000 0.593 0.407   C        C
74    74    74    74         0.0       2 0.000 0.718 0.282   C        C
63    63    63    63         0.0       2 0.000 0.736 0.264   C        C
72    72    72    72         0.0       2 0.000 0.792 0.208   C        C
75    75    75    75         0.0       2 0.000 0.806 0.194   C        C
73    73    73    73         0.0       2 0.000 0.808 0.192   C        C
84    84    84    84         0.0       2 0.000 0.830 0.170   C        C
62    62    62    62         0.0       2 0.000 0.863 0.137   C        C
65    65    65    65         0.0       2 0.000 0.874 0.126   C        C
89    89    89    89         0.0       2 0.000 0.887 0.113   C        C
70    70    70    70         0.0       2 0.000 0.900 0.100   C        C
77    77    77    77         0.0       2 0.000 0.905 0.095   C        C
68    68    68    68         0.0       2 0.000 0.910 0.090   C        C
71    71    71    71         0.0       2 0.000 0.911 0.089   C        C
69    69    69    69         0.0       2 0.000 0.915 0.085   C        C
82    82    82    82         0.0       2 0.000 0.973 0.027   C        C
87    87    87    87         0.0       2 0.000 0.999 0.001   C        C
61    61    61    61         0.0       2 0.001 0.844 0.155   C        C
76    76    76    76         0.0       2 0.001 0.951 0.048   C        C
90    90    90    90         0.0       2 0.002 0.799 0.199   C        C
88    88    88    88         0.0       2 0.002 0.820 0.178   C        C
66    66    66    66         0.0       2 0.003 0.589 0.408   C        C
85    85    85    85         0.0       2 0.006 0.732 0.262   C        C
86    86    86    86         0.0       2 0.006 0.739 0.255   C        C
81    81    81    81         0.0       2 0.037 0.751 0.212   C        C
64    64    64    64         0.0       2 0.037 0.826 0.137   C        C
79    79    79    79         0.0       2 0.165 0.772 0.063   C        C

$Q_name$D
    Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
97     97    97    97         0.0       3 0.000 0.001 0.999   D        D
99     99    99    99         0.0       3 0.000 0.001 0.999   D        D
104   104   104   104         0.0       3 0.000 0.001 0.999   D        D
105   105   105   105         0.0       3 0.000 0.001 0.999   D        D
110   110   110   110         0.0       3 0.000 0.001 0.999   D        D
120   120   120   120         0.0       3 0.000 0.001 0.999   D        D
92     92    92    92         0.0       3 0.001 0.000 0.999   D        D
94     94    94    94         0.0       3 0.001 0.000 0.999   D        D
98     98    98    98         0.0       3 0.001 0.000 0.999   D        D
100   100   100   100         0.0       3 0.001 0.000 0.999   D        D
102   102   102   102         0.0       3 0.001 0.000 0.999   D        D
107   107   107   107         0.0       3 0.001 0.000 0.999   D        D
109   109   109   109         0.0       3 0.001 0.000 0.999   D        D
111   111   111   111         0.0       3 0.001 0.000 0.999   D        D
113   113   113   113         0.0       3 0.001 0.000 0.999   D        D
114   114   114   114         0.0       3 0.001 0.000 0.999   D        D
115   115   115   115         0.0       3 0.001 0.000 0.999   D        D
116   116   116   116         0.0       3 0.001 0.000 0.999   D        D
117   117   117   117         0.0       3 0.001 0.000 0.999   D        D
118   118   118   118         0.0       3 0.001 0.000 0.999   D        D
95     95    95    95         0.0       3 0.001 0.001 0.998   D        D
96     96    96    96         0.0       3 0.001 0.001 0.998   D        D
103   103   103   103         0.0       3 0.001 0.001 0.998   D        D
119   119   119   119         0.0       3 0.001 0.001 0.998   D        D
93     93    93    93         0.0       3 0.023 0.000 0.977   D        D
106   106   106   106         0.0       3 0.031 0.000 0.969   D        D
91     91    91    91         0.0       3 0.034 0.000 0.966   D        D
101   101   101   101         0.0       3 0.061 0.000 0.939   D        D
108   108   108   108         0.0       3 0.075 0.000 0.925   D        D
112   112   112   112         0.0       3 0.089 0.000 0.911   D        D

$Q_name$E
    Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
121   121   121   121         0.0       3 0.000 0.001 0.999   E        E
122   122   122   122         0.0       3 0.000 0.001 0.999   E        E
123   123   123   123         0.0       3 0.000 0.001 0.999   E        E
126   126   126   126         0.0       3 0.000 0.001 0.999   E        E
129   129   129   129         0.0       3 0.000 0.001 0.999   E        E
131   131   131   131         0.0       3 0.000 0.001 0.999   E        E
135   135   135   135         0.0       3 0.000 0.001 0.999   E        E
137   137   137   137         0.0       3 0.000 0.001 0.999   E        E
140   140   140   140         0.0       3 0.000 0.001 0.999   E        E
142   142   142   142         0.0       3 0.000 0.001 0.999   E        E
143   143   143   143         0.0       3 0.000 0.001 0.999   E        E
144   144   144   144         0.0       3 0.000 0.001 0.999   E        E
146   146   146   146         0.0       3 0.000 0.001 0.999   E        E
148   148   148   148         0.0       3 0.000 0.001 0.999   E        E
150   150   150   150         0.0       3 0.000 0.028 0.972   E        E
138   138   138   138         0.0       3 0.000 0.048 0.952   E        E
139   139   139   139         0.0       3 0.000 0.054 0.946   E        E
147   147   147   147         0.0       3 0.000 0.066 0.934   E        E
149   149   149   149         0.0       3 0.000 0.079 0.921   E        E
141   141   141   141         0.0       3 0.000 0.089 0.911   E        E
136   136   136   136         0.0       3 0.000 0.138 0.862   E        E
128   128   128   128         0.0       3 0.001 0.000 0.999   E        E
130   130   130   130         0.0       3 0.001 0.000 0.999   E        E
124   124   124   124         0.0       3 0.001 0.001 0.998   E        E
145   145   145   145         0.0       3 0.032 0.000 0.968   E        E
125   125   125   125         0.0       3 0.044 0.000 0.956   E        E
127   127   127   127         0.0       3 0.063 0.000 0.937   E        E
132   132   132   132         0.0       3 0.066 0.000 0.934   E        E
133   133   133   133         0.0       3 0.073 0.000 0.927   E        E
134   134   134   134         0.0       3 0.075 0.000 0.925   E        E

$Q_name$F
    Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
151   151   151   151         0.0       2 0.001 0.999     0   F        F
152   152   152   152         0.0       2 0.001 0.999     0   F        F
153   153   153   153         0.0       2 0.001 0.999     0   F        F
154   154   154   154         0.0       2 0.001 0.999     0   F        F
155   155   155   155         0.0       2 0.001 0.999     0   F        F
156   156   156   156         0.0       2 0.001 0.999     0   F        F
157   157   157   157         0.0       2 0.001 0.999     0   F        F
158   158   158   158         0.0       2 0.001 0.999     0   F        F
159   159   159   159         0.0       2 0.001 0.999     0   F        F
160   160   160   160         0.0       2 0.001 0.999     0   F        F
161   161   161   161         0.0       2 0.001 0.999     0   F        F
162   162   162   162         0.0       2 0.001 0.999     0   F        F
163   163   163   163         0.0       2 0.001 0.999     0   F        F
164   164   164   164         0.0       2 0.001 0.999     0   F        F
165   165   165   165         0.0       2 0.001 0.999     0   F        F
167   167   167   167         0.0       2 0.001 0.999     0   F        F
168   168   168   168         0.0       2 0.001 0.999     0   F        F
169   169   169   169         0.0       2 0.001 0.999     0   F        F
170   170   170   170         0.0       2 0.001 0.999     0   F        F
171   171   171   171         0.0       2 0.001 0.999     0   F        F
172   172   172   172         0.0       2 0.001 0.999     0   F        F
173   173   173   173         0.0       2 0.001 0.999     0   F        F
174   174   174   174         0.0       2 0.001 0.999     0   F        F
176   176   176   176         0.0       2 0.001 0.999     0   F        F
177   177   177   177         0.0       2 0.001 0.999     0   F        F
178   178   178   178         0.0       2 0.001 0.999     0   F        F
179   179   179   179         0.0       2 0.001 0.999     0   F        F
180   180   180   180         0.0       2 0.001 0.999     0   F        F
175   175   175   175         0.0       2 0.024 0.976     0   F        F
166   166   166   166         0.0       2 0.052 0.948     0   F        F

$Q_name$G
    Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
194   194   194   194         0.0       2 0.000 0.975 0.025   G        G
189   189   189   189         0.0       2 0.001 0.999 0.000   G        G
190   190   190   190         0.0       2 0.001 0.999 0.000   G        G
196   196   196   196         0.0       2 0.001 0.999 0.000   G        G
197   197   197   197         0.0       2 0.001 0.999 0.000   G        G
204   204   204   204         0.0       2 0.001 0.999 0.000   G        G
206   206   206   206         0.0       2 0.001 0.999 0.000   G        G
209   209   209   209         0.0       2 0.001 0.999 0.000   G        G
184   184   184   184         0.0       2 0.018 0.982 0.000   G        G
193   193   193   193         0.0       2 0.025 0.975 0.000   G        G
202   202   202   202         0.0       2 0.052 0.948 0.000   G        G
187   187   187   187         0.0       2 0.057 0.943 0.000   G        G
207   207   207   207         0.0       2 0.066 0.934 0.000   G        G
208   208   208   208         0.0       2 0.068 0.932 0.000   G        G
203   203   203   203         0.0       2 0.077 0.923 0.000   G        G
205   205   205   205         0.0       2 0.079 0.921 0.000   G        G
198   198   198   198         0.0       2 0.080 0.920 0.000   G        G
210   210   210   210         0.0       2 0.082 0.918 0.000   G        G
195   195   195   195         0.0       2 0.089 0.911 0.000   G        G
200   200   200   200         0.0       2 0.094 0.906 0.000   G        G
201   201   201   201         0.0       2 0.098 0.902 0.000   G        G
183   183   183   183         0.0       2 0.107 0.893 0.000   G        G
182   182   182   182         0.0       2 0.121 0.879 0.000   G        G
185   185   185   185         0.0       2 0.123 0.877 0.000   G        G
188   188   188   188         0.0       2 0.132 0.868 0.000   G        G
191   191   191   191         0.0       2 0.132 0.868 0.000   G        G
186   186   186   186         0.0       2 0.133 0.867 0.000   G        G
181   181   181   181         0.0       2 0.153 0.847 0.000   G        G
192   192   192   192         0.0       2 0.166 0.834 0.000   G        G
199   199   199   199         0.0       2 0.178 0.822 0.000   G        G

$Q_name$H
    Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
232   232   232   232         0.0       1 0.929 0.000 0.071   H        H
230   230   230   230         0.0       1 0.932 0.068 0.000   H        H
212   212   212   212         0.0       1 0.998 0.001 0.001   H        H
214   214   214   214         0.0       1 0.998 0.001 0.001   H        H
217   217   217   217         0.0       1 0.998 0.001 0.001   H        H
222   222   222   222         0.0       1 0.998 0.001 0.001   H        H
223   223   223   223         0.0       1 0.998 0.001 0.001   H        H
237   237   237   237         0.0       1 0.998 0.001 0.001   H        H
238   238   238   238         0.0       1 0.998 0.001 0.001   H        H
240   240   240   240         0.0       1 0.998 0.001 0.001   H        H
220   220   220   220         0.0       1 0.999 0.000 0.001   H        H
221   221   221   221         0.0       1 0.999 0.000 0.001   H        H
224   224   224   224         0.0       1 0.999 0.000 0.001   H        H
226   226   226   226         0.0       1 0.999 0.000 0.001   H        H
227   227   227   227         0.0       1 0.999 0.000 0.001   H        H
228   228   228   228         0.0       1 0.999 0.000 0.001   H        H
229   229   229   229         0.0       1 0.999 0.000 0.001   H        H
236   236   236   236         0.0       1 0.999 0.000 0.001   H        H
239   239   239   239         0.0       1 0.999 0.000 0.001   H        H
211   211   211   211         0.0       1 0.999 0.001 0.000   H        H
213   213   213   213         0.0       1 0.999 0.001 0.000   H        H
215   215   215   215         0.0       1 0.999 0.001 0.000   H        H
216   216   216   216         0.0       1 0.999 0.001 0.000   H        H
218   218   218   218         0.0       1 0.999 0.001 0.000   H        H
219   219   219   219         0.0       1 0.999 0.001 0.000   H        H
225   225   225   225         0.0       1 0.999 0.001 0.000   H        H
231   231   231   231         0.0       1 0.999 0.001 0.000   H        H
233   233   233   233         0.0       1 0.999 0.001 0.000   H        H
234   234   234   234         0.0       1 0.999 0.001 0.000   H        H
235   235   235   235         0.0       1 0.999 0.001 0.000   H        H

$Q_name$I
    Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
263   263   263   263         0.0       2 0.000 0.878 0.122   I        I
259   259   259   259         0.0       2 0.000 0.905 0.095   I        I
254   254   254   254         0.0       2 0.000 0.923 0.077   I        I
253   253   253   253         0.0       2 0.000 0.932 0.068   I        I
255   255   255   255         0.0       2 0.000 0.962 0.038   I        I
243   243   243   243         0.0       2 0.000 0.981 0.019   I        I
242   242   242   242         0.0       2 0.000 0.999 0.001   I        I
245   245   245   245         0.0       2 0.000 0.999 0.001   I        I
249   249   249   249         0.0       2 0.000 0.999 0.001   I        I
250   250   250   250         0.0       2 0.000 0.999 0.001   I        I
256   256   256   256         0.0       2 0.000 0.999 0.001   I        I
258   258   258   258         0.0       2 0.000 0.999 0.001   I        I
261   261   261   261         0.0       2 0.000 0.999 0.001   I        I
262   262   262   262         0.0       2 0.000 0.999 0.001   I        I
268   268   268   268         0.0       2 0.000 0.999 0.001   I        I
269   269   269   269         0.0       2 0.000 0.999 0.001   I        I
252   252   252   252         0.0       2 0.001 0.999 0.000   I        I
260   260   260   260         0.0       2 0.001 0.999 0.000   I        I
265   265   265   265         0.0       2 0.001 0.999 0.000   I        I
266   266   266   266         0.0       2 0.001 0.999 0.000   I        I
267   267   267   267         0.0       2 0.001 0.999 0.000   I        I
244   244   244   244         0.0       2 0.002 0.894 0.104   I        I
270   270   270   270         0.0       2 0.005 0.886 0.109   I        I
248   248   248   248         0.0       2 0.005 0.944 0.051   I        I
264   264   264   264         0.0       2 0.012 0.797 0.191   I        I
247   247   247   247         0.0       2 0.014 0.844 0.142   I        I
241   241   241   241         0.0       2 0.039 0.895 0.066   I        I
246   246   246   246         0.0       2 0.049 0.799 0.152   I        I
251   251   251   251         0.0       2 0.066 0.756 0.178   I        I
257   257   257   257         0.0       2 0.076 0.886 0.038   I        I

$Q_name$J
    Index Order Label PercentMiss Cluster Pop_1 Pop_2 Pop_3 Pop Pop_name
300   300   300   300         0.0       2 0.000 0.699 0.301   J        J
284   284   284   284         0.0       2 0.000 0.792 0.208   J        J
282   282   282   282         0.0       2 0.000 0.828 0.172   J        J
297   297   297   297         0.0       2 0.000 0.842 0.158   J        J
285   285   285   285         0.0       2 0.000 0.887 0.113   J        J
292   292   292   292         0.0       2 0.000 0.893 0.107   J        J
279   279   279   279         0.0       2 0.000 0.896 0.104   J        J
275   275   275   275         0.0       2 0.000 0.897 0.103   J        J
278   278   278   278         0.0       2 0.000 0.908 0.092   J        J
298   298   298   298         0.0       2 0.000 0.913 0.087   J        J
289   289   289   289         0.0       2 0.000 0.957 0.043   J        J
271   271   271   271         0.0       2 0.000 0.967 0.033   J        J
295   295   295   295         0.0       2 0.000 0.998 0.002   J        J
272   272   272   272         0.0       2 0.000 0.999 0.001   J        J
273   273   273   273         0.0       2 0.000 0.999 0.001   J        J
276   276   276   276         0.0       2 0.000 0.999 0.001   J        J
277   277   277   277         0.0       2 0.000 0.999 0.001   J        J
280   280   280   280         0.0       2 0.000 0.999 0.001   J        J
283   283   283   283         0.0       2 0.000 0.999 0.001   J        J
287   287   287   287         0.0       2 0.000 0.999 0.001   J        J
288   288   288   288         0.0       2 0.000 0.999 0.001   J        J
290   290   290   290         0.0       2 0.000 0.999 0.001   J        J
291   291   291   291         0.0       2 0.000 0.999 0.001   J        J
293   293   293   293         0.0       2 0.000 0.999 0.001   J        J
294   294   294   294         0.0       2 0.000 0.999 0.001   J        J
296   296   296   296         0.0       2 0.000 0.999 0.001   J        J
274   274   274   274         0.0       2 0.001 0.945 0.054   J        J
281   281   281   281         0.0       2 0.001 0.999 0.000   J        J
286   286   286   286         0.0       2 0.001 0.999 0.000   J        J
299   299   299   299         0.0       2 0.057 0.855 0.088   J        J


$map

Further Study

Pdfs/links

Readings

• Evanno et al. 2005 – Detecting the number of clusters (ΔK).

• Lawson et al. 2018 – How not to over-interpret STRUCTURE/ADMIXTURE plots.

• Wang 2017 – Common pitfalls when using STRUCTURE.

• Raj et al. 2014 – fastSTRUCTURE.

• Frichot et al. 2014 – sNMF.

• Wang 2022 – POPCLUSTER.

• Kopelman et al. 2015 – CLUMPAK.

• Jakobsson & Rosenberg 2007 – CLUMPP.