library(dartRverse)
library(ggplot2)
library(knitr)
library(ggpmisc)W15 SNP that Matter
W15 SNP that Matter
Learning outcomes
Throughout this session, we will discuss the utility and functionality of SNP panels and when they offer a practical alternative to genome wide SNP datasets. We will discuss how SNP panel selection can target SNPs based on different criteria, which will impact the utility of the SNP panel to identify various population genetic metrics and consequently, their utility to address different conservation questions.
By the end of this session you should be able to:
- Explain why we use reduced SNP panels in conservation genetics (cost, logistics, speed, monitoring)
- Define a panel goal (e.g. population assignment / structure, relatedness / parentage, individual ID, sex-linked, hybrid detection, adaptive markers)
- Apply a quality-control workflow for panel design (missingness, call rate, MAF, sequence length, alignment quality)
- Use
gl.select.panel()to create panels using alternative strategies (e.g.dapc,random,pic) - Use
gl.check.panel()to evaluate how well a panel recovers a target parameter (e.g. FST) - Design and compare replicate panels (stability / robustness)
Context: Why SNP panels?
Genome-wide SNP datasets are fantastic for discovery and deep inference, but monitoring programs often need something else:
- cheaper per sample
- robust on low-quality DNA (sometimes)
- repeatable across years
- scalable (lots of samples, many timepoints)
- quick turnaround for management decisions
Typical conservation use-cases
- assigning individuals to management units
- detecting illegal translocations / admixture
- parentage and pedigree tracking (captive or reintroduction programs)
- long-term genetic monitoring (He, FIS, FST trends)
- hybridisation detection
- sex-linked markers for skewed sex ratios / demographic monitoring
Prerequisites
You should be comfortable with: - reading a genlight object - basic filtering in the dartRverse (gl.filter.*) - interpreting simple reports (call rate, missingness, MAF)
No prior experience with panel design is assumed.
Setup
Data for this tutorial
We use an example SNP dataset (pre-filtered genome-wide SNPs), plus a version that already includes BLAST alignment metrics.
rfbe <- readRDS(file.path(path.data, "rfbe.rds"))
rfbe20_5 <- readRDS(file.path(path.data, "rfbe20_5_blast.rds"))Interpretation questions - How many loci / individuals do we start with? - How many populations? - Are some populations very small? (this matters for the panel goal)
Part 1 — Decide what the panel is for
Panel goal → design choices
A panel for population structure (FST / assignment) often benefits from: - loci that differentiate populations (high between-pop variance) - moderate allele frequencies (informative, not singletons) - broad coverage across the genome (avoid tight linkage when possible)
A panel for relatedness / parentage often benefits from: - high heterozygosity loci (high information per locus) - more loci than structure panels (often) - careful QC to minimise genotyping error
A panel for sex-linked SNPs benefits from: - strong sex association - robust genotype calls - validation on independent samples
In this hands-on we focus on a structure / assignment panel of 50 SNPs, and evaluate it via how well it recovers pairwise FST.
Part 2 — Quality filtering workflow (worked example)
Step 1 — Ensure adequate sample sizes per population
Many panel methods implicitly assume you have enough individuals in each population to estimate allele frequencies reliably.
We keep populations with > 10 individuals:
tt <- table(pop(rfbe))
pop20 <- names(tt)[tt > 10]
rfbe20 <- gl.keep.pop(rfbe, pop.list = pop20)
table(pop(rfbe20))Step 2 — Remove loci with population-specific all-NA
rfbe20_1 <- gl.filter.allna(rfbe20, by.pop = TRUE)Step 3 — Call-rate filtering
rfbe20_2 <- gl.filter.callrate(rfbe20_1, threshold = 0.99)Step 4 — Minor allele count / frequency filtering
For panel design, rare alleles can be: - uninformative for many tasks - more sensitive to genotyping error - unstable across timepoints
Here we use a minor allele count threshold of 5:
rfbe20_3 <- gl.filter.maf(rfbe20_2, threshold = 5, by.pop = FALSE)
nLoc(rfbe20_3)Step 5 (optional but recommended) — Sequence length
Panel development often requires flanking sequence (primer/probe design).
We keep loci with trimmed sequences of length ≥ 30.
index <- nchar(as.character(rfbe20_3@other$loc.metrics$TrimmedSequence)) > 29
rfbe20_4 <- rfbe20_3[, index]
nLoc(rfbe20_4)Step 6 (optional) — Alignment quality to a reference genome
If you have a reference genome, you can BLAST trimmed sequences to check for: - strong matches (bitscore) - uniqueness (avoid multi-mapping) - alignment length / mismatches
In this tutorial we load a pre-computed BLAST-annotated dataset:
rfbe20_5 <- readRDS(file.path(path.data, "rfbe20_5_blast.rds"))Filter by a bitscore threshold (example: ≥ 100):
index <- rfbe20_5@other$loc.metrics$bitscore >= 100
index <- ifelse(is.na(index), FALSE, index)
rfbe20_6 <- rfbe20_5[, index]
nLoc(rfbe20_6)
# Order individuals by population (some panel methods benefit from this)
rfbe20_6 <- rfbe20_6[order(pop(rfbe20_6)), ]Exercise 2 — Explore the impact of QC thresholds
# Goal: Explore how QC thresholds change the number of loci you have available for panel design.
#
# 1) Try call-rate thresholds of 0.95, 0.98, 0.99.
# 2) Try minor allele count thresholds of 2, 5, 10.
# 3) Record how many loci remain after each combination.
#
# Tip: Start from rfbe20_1 each time so comparisons are fair.
# Example template:
# cr <- 0.99
# mac <- 5
# tmp <- gl.filter.callrate(rfbe20_1, threshold = cr)
# tmp <- gl.filter.maf(tmp, threshold = mac, by.pop = FALSE)
# nLoc(tmp)Part 3 — Selecting informative SNPs
Panel selection strategies in gl.select.panel()
Different methods target different signals. For example:
dapc: picks loci most informative for discriminating groups (good for structure / assignment)pic/picdart: picks loci with high information content (often useful broadly)hafall/hafpop: targets high allele frequency loci (sometimes helps genotyping stability)pahigh: private allele patternsrandom: a baseline comparison
methods <- c(
"dapc", "random", "pahigh", "monopop", "stratified",
"hafall", "hafpop", "pic", "picdart"
)
methodsWorked example — Select a 50 SNP panel using DAPC
panel_dapc <- gl.select.panel(rfbe20_6, method = "dapc", nl = 50)
nLoc(panel_dapc)Evaluate the panel — does it recover FST?
We compare the panel-derived FST to the full dataset FST.
out_dapc <- gl.check.panel(panel_dapc, rfbe20_6, parameter = "Fst")
out_dapcNow compare to a random panel:
panel_random <- gl.select.panel(rfbe20_6, method = "random", nl = 50)
out_random <- gl.check.panel(panel_random, rfbe20_6, parameter = "Fst")
out_randomExercise 3 — Build your own SNP panel and evaluate it
# Your task:
# 1) Choose a panel method (e.g. "dapc", "pic", "picdart", "hafpop", "random")
# 2) Choose a panel size (nl), e.g. 25, 50, 100
# 3) Create the panel
# 4) Check how well it recovers FST relative to the full dataset
#
# Record:
# - method
# - nl
# - your interpretation of the output (good / ok / poor)
my_method <- "pic" # change me
my_nl <- 50 # change me
my_panel <- gl.select.panel(rfbe20_6, method = my_method, nl = my_nl)
nLoc(my_panel)
my_eval <- gl.check.panel(my_panel, rfbe20_6, parameter = "Fst")
my_evalPart 4 — Replicate panels and robustness
Why replicate?
Some selection methods can be sensitive to randomness and sampling variation.
To avoid overconfidence from a single “lucky” panel, we generate replicate panels and compare performance.
A simple replicate loop (example)
set.seed(42)
B <- 20 # number of replicate panels
nl <- 50
method <- "dapc"
evals <- vector("list", B)
for (b in 1:B) {
p <- gl.select.panel(rfbe20_6, method = method, nl = nl)
ev <- gl.check.panel(p, rfbe20_6, parameter = "Fst")
evals[[b]] <- ev
}
# You can now summarise evals (depends on what gl.check.panel returns in your version)Exercise 4 — Panel stability
# Goal: Compare stability of TWO methods.
#
# 1) Pick two methods (e.g. "dapc" vs "pic" OR "dapc" vs "random")
# 2) Generate B replicate panels for each method
# 3) Evaluate each with gl.check.panel(parameter="Fst")
# 4) Decide: which method is more stable and why?
set.seed(1)
B <- 10
nl <- 50
m1 <- "dapc"
m2 <- "random"
# Write your replicate code here (use the template above).
# Tip: store results in a data.frame with columns: method, replicate, metric(s)Part 5 — Wrap-up: practical checklist
A pragmatic SNP panel checklist
- Define the primary purpose (structure vs parentage vs ID vs sex-linked…)
- QC hard (missingness, call rate, MAC/MAF)
- Ensure adequate sample sizes for allele frequency estimation
- Consider lab constraints (sequence length, alignment uniqueness, flanking sequence)
- Select candidate panels (multiple methods + sizes)
- Evaluate against the full dataset (target parameter)
- Test stability (replicate panels)
- Validate on independent samples (ideally)
Further reading
Armstrong, E. E., Li, C., Campana, M. G., Ferrari, T., Kelley, J. L., Petrov, D. A., . . . Mooney, J. A. (2025). A Pipeline and Recommendations for Population and Individual Diagnostic SNP Selection in Non-Model Species. Molecular Ecology Resources, 25, e14048. https://doi.org/10.1111/1755-0998.14048.
Beacham T.D., Wallace C., MacConnachie C., Jonsen K., McIntosh B., Candy J.R., and Withler R.E. (2018). Population and individual identification of Chinook salmon in British Columbia through parentage-based tagging and genetic stock identification with single nucleotide polymorphisms. Canadian Journal of Fisheries and Aquatic Sciences 75: 1096–1105
Bertola, L. D., Vermaat, M., Lesilau, F., Chege, M., Tumenta, P. N., Sogbohossou, E. A., . . . Vrieling, K. (2022). Whole genome sequencing and the application of a SNP panel reveal primary evolutionary lineages and genomic variation in the lion (Panthera leo). BMC Genomics, 23, 321. 10.1186/s12864-022-08510-y
Furlan, E. M., Gruber, B., Attard, C. R. M., Wager, R. N. E., Kerezsy, A., Faulks, L. K., . . . Unmack, P. J. (2020). Assessing the benefits and risks of translocations in depauperate species: A theoretical framework with an empirical validation. Journal of Applied Ecology, 57, 831-841. https://doi.org/10.1111/1365-2664.13581
Jasielczuk, I., Gurgul, A., Szmatoła, T., Radko, A., Majewska, A., Sosin, E., . . . Ząbek, T. (2024). The use of SNP markers for cattle breed identification. Journal of Applied Genetics, 65, 575-589. 10.1007/s13353-024-00857-0
Kerezsy, A. & Fensham, R. (2013). Conservation of the endangered red-finned blue-eye, Scaturiginichthys vermeilipinnis, and control of alien eastern gambusia, Gambusia holbrooki, in a spring wetland complex. Marine and Freshwater Research, 64, 851–863.
Magliolo, M., Prost, S., Orozco-Terwengel, P., Burger, P., Kropff, A. S., Kotze, A., . . . Dalton, D. L. (2021). Unlocking the potential of a validated single nucleotide polymorphism array for genomic monitoring of trade in cheetahs (Acinonyx jubatus). Molecular Biology Reports, 48, 171-181. 10.1007/s11033-020-06030-0.
Ohm, H., Åstrand, J., Ceplitis, A., Bengtsson, D., Hammenhag, C., Chawade, A. & Grimberg, Å. (2024). Novel SNP markers for flowering and seed quality traits in faba bean (Vicia faba L.): characterization and GWAS of a diversity panel. Frontiers in Plant Science, 15, 1348014.
Quinto-Cortés, C. D., Woerner, A. E., Watkins, J. C. & Hammer, M. F. (2018). Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference. Scientific Reports, 8, 10209. 10.1038/s41598-018-28539-y
Stronen, A. V., Mattucci, F., Fabbri, E., Galaverni, M., Cocchiararo, B., Nowak, C., . . . Caniglia, R. (2022). A reduced SNP panel to trace gene flow across southern European wolf populations and detect hybridization with other Canis taxa. Scientific Reports, 12, 4195. 10.1038/s41598-022-08132-0
Terrado-Ortuño, N. & May, P. (2025). Forensic DNA phenotyping: a review on SNP panels, genotyping techniques, and prediction models. Forensic Sciences Research, 10, owae013. 10.1093/fsr/owae013
Von Thaden, A., Nowak, C., Tiesmeyer, A., Reiners, T. E., Alves, P. C., Lyons, L. A., . . . Galián, J. (2020). Applying genomic data in wildlife monitoring: Development guidelines for genotyping degraded samples with reduced single nucleotide polymorphism panels. Molecular ecology resources, 20, 662-680.
Wehrenberg, G., Tokarska, M., Cocchiararo, B. & Nowak, C. (2024). A reduced SNP panel optimised for non-invasive genetic assessment of a genetically impoverished conservation icon, the European bison. Scientific Reports, 14, 1875. 10.1038/s41598-024-51495-9
Zenato-Lazzari, G., Figueiró, H. V., Sartor, C. C., Donadio, E., Di Martino, S., Draheim, H. M. & Eizirik, E. (2025). Development of a SNP Panel for Geographic Assignment and Population Monitoring of Jaguars (Panthera onca). Ecology and Evolution, 15, e71465. https://doi.org/10.1002/ece3.71465
