5  Assigning Individuals to Populations

Session Presenter

Required packages

library(dartRverse)

Introduction

Emydura River Turtle

We will explore four analysis for assignment of an individual of unknown provenance to a source population.

  • Genotype Likelihood: The likelihood of drawing the unknown from a population with the observed allele frequencies is calculated assuming Hardy-Weinberg equilibrium.

  • Private Alleles: A focal unknown individual is likely to have fewer private alleles in comparison with its source population than in comparison with other putative source populations.

  • PCA: The genotype of a focal unknown individual is likely to lie within the confidence envelope of its source population than within the confidence envelope of other putative source populations.

  • Mahalanobis Distance: The distances of the focal unknown individual from the centroids of the standardized confidence envelops of its putative source populations are used to calculate a z-scores and associated probabilities of assignment.

Emydura population assignment

Here is some context for our turtle example, below are two maps. The first is a map of eastern mainland Australia showing the distribution of the Emydura macquarii complex, the river drainage basins in which it occurs, and the broad regions, The second is of Northern Australia and Papua New Guinea.

Georges et al. 2025, in review

Load data

# We will first set the verbosity globally to level 3
gl.set.verbosity(3)
Starting gl.set.verbosity 
  Global verbosity set to: 3 
Completed: gl.set.verbosity 
# Read in the data set for the worked example
gl <- readRDS("./data/assignment_example1.Rdata")
# Familiarize yourself with its contents
gl  
 ********************
 *** DARTR OBJECT ***
 ********************

 ** 835 genotypes,  20,688 SNPs , size: 57.3 Mb

    missing data: 289548 (=1.68 %) scored as NA

 ** Genetic data
   @gen: list of 835 SNPbin
   @ploidy: ploidy of each individual  (range: 2-2)

 ** Additional data
   @ind.names:  835 individual labels
   @loc.names:  20688 locus labels
   @loc.all:  20688 allele labels
   @position: integer storing positions of the SNPs [within 69 base sequence]
   @pop: population of each individual (group size range: 3-30)
   @other: a list containing: loc.metrics, ind.metrics, latlon, loc.metrics.flags, verbose, history 
    @other$ind.metrics: id, pop, lat, lon, sex, maturity, collector, location, basin, drainage, service, plate_location 
    @other$loc.metrics: AlleleID, CloneID, AlleleSequence, SNP, SnpPosition, CallRate, OneRatioRef, OneRatioSnp, FreqHomRef, FreqHomSnp, FreqHets, PICRef, PICSnp, AvgPIC, AvgCountRef, AvgCountSnp, RepAvg, clone, uid, rdepth, monomorphs, maf, OneRatio, PIC, TrimmedSequence 
   @other$latlon[g]: coordinates for all individuals are attached
nLoc(gl)
[1] 20688
nInd(gl)
[1] 835
nPop(gl)
[1] 81
# Display a list of populations and sample sizes
table(pop(gl))

         Brisbane          Burdekin           Burnett          Clarence 
               10                10                11                10 
     Cooper_Alvin      Cooper_Cully  Cooper_Eulbertie        Dumaresque 
               10                10                10                10 
Fitzroy_Alligator  Fitzroy_Carnavan  Fitzroy_Fairburn     Fraser_Island 
               10                10                10                10 
           Hunter     EmmacJohnWari     EmmacMaclGeor              Mary 
               10                10                11                10 
     EmmacMDBBarr      EmmacMDBBarw     EmmacMDBBooth      EmmacMDBBowm 
               10                10                 9                10 
     EmmacMDBBurr      EmmacMDBCond      EmmacMDBCudg  EmmacMDBDarlBour 
               10                10                10                10 
 EmmacMDBDarlWeth      EmmacMDBDart      EmmacMDBEulo      EmmacMDBForb 
               10                10                10                10 
     EmmacMDBGoul        GurraGurra      EmmacMDBGwyd      EmmacMDBLach 
               10                10                10                10 
     EmmacMDBLodd      EmmacMDBMaci      EmmacMDBMoon  EmmacMDBMurrGunb 
               10                10                10                10 
 EmmacMDBMurrLock  EmmacMDBMurrMorg  EmmacMDBMurrMung  EmmacMDBMurrMurr 
               10                10                10                10 
 EmmacMDBMurrTink EmmacMDBMurrYarra      EmmacMDBOven  EmmacMDBParoBiny 
               10                10                10                10 
     EmmacMDBPind      EmmacMDBSanf      EmmacMDBToon          Normanby 
               10                10                11                11 
             Pine     EmmacRichCasi         EmmacRoss      EmmacTweeUki 
               10                10                10                10 
     EmsubBamuAli     EmsubBamuAwab     EmsubMorehead      EmsubFlyGuka 
               10                 9                16                10 
     EmsubFlyJikw      EmsubJardine       EmsubKerema       EmsubKikori 
               30                16                10                 4 
       EmworRoper        EmtanBlyth      EmtanFinniss     EmtanHolrChai 
               11                10                 7                10 
    EmtanMitchell     EmtanMitcMitc     EmtanPascFarm      EmtanWenlock 
                9                 3                 9                10 
        EmvicDaly     EmvicDrysdale        Fitzroy_WA     EmvicIsdeBell 
               10                10                10                12 
    EmvicKingMool          EmvicOrd     EmworClavPung         EmworDaly 
               10                18                10                10 
    EmworDalySlei     EmworLeicAlex     EmworLimmNath     EmworLiveMann 
                7                10                10                 9 
    EmworNichGreg 
               12 
Note

Several populations have sample sizes less than 10 and will be discarded during the analysis

Assignment by genotype likelihood

gen.result<-gl.assign.on.genotype(gl, unknown="AA011731", nmin=10)
Starting gl.assign.on.genotype 
  Processing genlight object with SNP data
          population Log Likelihood        AIC        dAIC        AIC.wt assign
3            Burnett      -4926.957   9853.914      0.0000  1.000000e+00    yes
16              Mary      -5341.050  10682.101    828.1863 1.450906e-180     no
1           Brisbane     -19251.444  38502.888  28648.9733  0.000000e+00     no
2           Burdekin     -32844.476  65688.953  55835.0384  0.000000e+00     no
4           Clarence     -31620.048  63240.095  53386.1808  0.000000e+00     no
5       Cooper_Alvin     -42008.293  84016.586  74162.6716  0.000000e+00     no
6       Cooper_Cully     -42849.639  85699.278  75845.3633  0.000000e+00     no
7   Cooper_Eulbertie     -42636.382  85272.764  75418.8497  0.000000e+00     no
8         Dumaresque     -28852.254  57704.509  47850.5946  0.000000e+00     no
9  Fitzroy_Alligator     -12133.240  24266.480  14412.5655  0.000000e+00     no
10  Fitzroy_Carnavan     -13118.904  26237.808  16383.8939  0.000000e+00     no
11  Fitzroy_Fairburn     -12281.682  24563.364  14709.4500  0.000000e+00     no
12     Fraser_Island     -17364.115  34728.231  24874.3162  0.000000e+00     no
13            Hunter     -49665.375  99330.751  89476.8365  0.000000e+00     no
14     EmmacJohnWari     -35711.561  71423.121  61569.2070  0.000000e+00     no
15     EmmacMaclGeor     -37389.885  74779.769  64925.8549  0.000000e+00     no
17      EmmacMDBBarr     -29473.892  58947.783  49093.8690  0.000000e+00     no
18      EmmacMDBBarw     -29276.257  58552.514  48698.5994  0.000000e+00     no
19      EmmacMDBBowm     -30859.543  61719.085  51865.1709  0.000000e+00     no
20      EmmacMDBBurr     -32296.475  64592.951  54739.0361  0.000000e+00     no
21      EmmacMDBCond     -25487.072  50974.145  41120.2302  0.000000e+00     no
22      EmmacMDBCudg     -29695.945  59391.891  49537.9762  0.000000e+00     no
23  EmmacMDBDarlBour     -28964.950  57929.900  48075.9852  0.000000e+00     no
24  EmmacMDBDarlWeth     -27302.477  54604.954  44751.0400  0.000000e+00     no
25      EmmacMDBDart     -33067.630  66135.261  56281.3466  0.000000e+00     no
26      EmmacMDBEulo     -20835.127  41670.254  31816.3394  0.000000e+00     no
27      EmmacMDBForb     -32680.705  65361.409  55507.4948  0.000000e+00     no
28      EmmacMDBGoul     -29137.664  58275.328  48421.4131  0.000000e+00     no
29        GurraGurra     -29715.298  59430.595  49576.6806  0.000000e+00     no
30      EmmacMDBGwyd     -29369.851  58739.701  48885.7868  0.000000e+00     no
31      EmmacMDBLach     -32476.968  64953.935  55100.0211  0.000000e+00     no
32      EmmacMDBLodd     -29746.861  59493.723  49639.8084  0.000000e+00     no
33      EmmacMDBMaci     -28944.433  57888.866  48034.9512  0.000000e+00     no
34      EmmacMDBMoon     -29257.025  58514.050  48660.1356  0.000000e+00     no
35  EmmacMDBMurrGunb     -28266.313  56532.626  46678.7116  0.000000e+00     no
36  EmmacMDBMurrLock     -29738.313  59476.625  49622.7109  0.000000e+00     no
37  EmmacMDBMurrMorg     -29058.787  58117.573  48263.6589  0.000000e+00     no
38  EmmacMDBMurrMung     -29471.357  58942.715  49088.8005  0.000000e+00     no
39  EmmacMDBMurrMurr     -29727.568  59455.137  49601.2226  0.000000e+00     no
40  EmmacMDBMurrTink     -28706.653  57413.305  47559.3909  0.000000e+00     no
41 EmmacMDBMurrYarra     -29584.123  59168.246  49314.3315  0.000000e+00     no
42      EmmacMDBOven     -29769.909  59539.819  49685.9045  0.000000e+00     no
43  EmmacMDBParoBiny     -30378.843  60757.686  50903.7713  0.000000e+00     no
44      EmmacMDBPind     -32423.222  64846.444  54992.5292  0.000000e+00     no
45      EmmacMDBSanf     -30946.657  61893.315  52039.4005  0.000000e+00     no
46      EmmacMDBToon     -21983.158  43966.316  34112.4020  0.000000e+00     no
47          Normanby     -42201.524  84403.048  74549.1332  0.000000e+00     no
48              Pine      -8578.105  17156.211   7302.2964  0.000000e+00     no
49     EmmacRichCasi     -25491.051  50982.103  41128.1884  0.000000e+00     no
50         EmmacRoss     -34003.894  68007.788  58153.8734  0.000000e+00     no
51      EmmacTweeUki     -20216.624  40433.249  30579.3343  0.000000e+00     no
52      EmsubBamuAli     -84564.461 169128.921 159275.0068  0.000000e+00     no
53     EmsubMorehead     -81201.565 162403.130 152549.2157  0.000000e+00     no
54      EmsubFlyGuka     -83272.339 166544.677 156690.7630  0.000000e+00     no
55      EmsubFlyJikw     -79486.100 158972.200 149118.2860  0.000000e+00     no
56      EmsubJardine     -86359.491 172718.983 162865.0685  0.000000e+00     no
57       EmsubKerema     -94433.639 188867.278 179013.3636  0.000000e+00     no
58        EmworRoper     -84920.595 169841.191 159987.2763  0.000000e+00     no
59        EmtanBlyth    -103876.148 207752.297 197898.3826  0.000000e+00     no
60     EmtanHolrChai    -101420.838 202841.676 192987.7618  0.000000e+00     no
61      EmtanWenlock    -101328.752 202657.504 192803.5897  0.000000e+00     no
62         EmvicDaly     -85769.402 171538.805 161684.8903  0.000000e+00     no
63     EmvicDrysdale     -97248.693 194497.386 184643.4716  0.000000e+00     no
64        Fitzroy_WA     -96787.705 193575.409 183721.4946  0.000000e+00     no
65     EmvicIsdeBell     -95745.919 191491.838 181637.9234  0.000000e+00     no
66     EmvicKingMool     -96702.092 193404.185 183550.2704  0.000000e+00     no
67          EmvicOrd     -93520.359 187040.717 177186.8030  0.000000e+00     no
68     EmworClavPung     -88270.073 176540.145 166686.2306  0.000000e+00     no
69         EmworDaly     -90298.465 180596.931 170743.0165  0.000000e+00     no
70     EmworLeicAlex     -91795.194 183590.389 173736.4745  0.000000e+00     no
71     EmworLimmNath     -91979.158 183958.316 174104.4020  0.000000e+00     no
72     EmworNichGreg     -85210.317 170420.635 160566.7205  0.000000e+00     no
  Warning: parameter by must be either 'join.by.ind' or 'join.by.loc', set to default 'join.by.loc'
Completed: gl.assign.on.genotype 

Assignment by Private Alleles

pa.result <- gl.assign.pa(gl, unknown="AA011731", nmin=10, alpha=0.05)
Starting gl.assign.pa 
  Processing genlight object with SNP data
  Discarding 9 populations with sample size < 10 :
                 pop count    Z-score  p-value assign
16              Mary    81 -0.1692350 0.567194    yes
3            Burnett    77  0.2743299 0.391916    yes
49              Pine   167  1.1555039 0.123942    yes
22      EmmacMDBCond   785  2.0204271 0.021670     no
47      EmmacMDBToon   668  2.7347470 0.003121     no
15     EmmacMaclGeor  1040  3.4791497 0.000252     no
69         EmvicDaly  1284  3.5437788 0.000197     no
20      EmmacMDBBowm   992  3.6051586 0.000156     no
81     EmworNichGreg  1260  3.8784997 0.000053     no
61        EmworRoper  1273  4.1008215 0.000021     no
25  EmmacMDBDarlWeth   865  4.8762430 0.000001     no
1           Brisbane   523 17.1445337 0.000000     no
2           Burdekin   821 15.7683126 0.000000     no
4           Clarence   915 14.1798931 0.000000     no
5       Cooper_Alvin   992 15.6406240 0.000000     no
6       Cooper_Cully  1008 18.1191005 0.000000     no
7   Cooper_Eulbertie  1001  9.1930282 0.000000     no
8         Dumaresque   929 31.3868949 0.000000     no
9  Fitzroy_Alligator   306 11.4289937 0.000000     no
10  Fitzroy_Carnavan   339 10.2951686 0.000000     no
11  Fitzroy_Fairburn   303  7.6467045 0.000000     no
12     Fraser_Island   457  5.0064421 0.000000     no
13            Hunter  1340 12.4578521 0.000000     no
14     EmmacJohnWari   893 11.1739703 0.000000     no
17      EmmacMDBBarr   940 26.7349588 0.000000     no
18      EmmacMDBBarw   937 26.1751262 0.000000     no
19     EmmacMDBBooth  1035 15.7478733 0.000000     no
21      EmmacMDBBurr  1025 15.2447327 0.000000     no
23      EmmacMDBCudg   952 18.4925912 0.000000     no
24  EmmacMDBDarlBour   916 13.1037035 0.000000     no
26      EmmacMDBDart  1079 14.9852921 0.000000     no
27      EmmacMDBEulo   639  6.1130987 0.000000     no
28      EmmacMDBForb  1051  5.2168208 0.000000     no
29      EmmacMDBGoul   922 12.4532508 0.000000     no
30        GurraGurra   957 13.0155533 0.000000     no
31      EmmacMDBGwyd   940 23.3109909 0.000000     no
32      EmmacMDBLach  1053 17.7866486 0.000000     no
33      EmmacMDBLodd   950 16.0172441 0.000000     no
34      EmmacMDBMaci   925 15.1899478 0.000000     no
35      EmmacMDBMoon   928 21.5894040 0.000000     no
36  EmmacMDBMurrGunb   898  7.6464411 0.000000     no
37  EmmacMDBMurrLock   959  8.1186128 0.000000     no
38  EmmacMDBMurrMorg   922 15.3803498 0.000000     no
39  EmmacMDBMurrMung   946 15.3622303 0.000000     no
40  EmmacMDBMurrMurr   958 27.7218281 0.000000     no
41  EmmacMDBMurrTink   912 11.1714406 0.000000     no
42 EmmacMDBMurrYarra   950 27.2732611 0.000000     no
43      EmmacMDBOven   949 27.2094137 0.000000     no
44  EmmacMDBParoBiny   975 11.8093091 0.000000     no
45      EmmacMDBPind  1037 25.1472989 0.000000     no
46      EmmacMDBSanf   995 20.6254532 0.000000     no
48          Normanby  1014  8.9353965 0.000000     no
50     EmmacRichCasi   727 23.4264098 0.000000     no
51         EmmacRoss   853 16.4775772 0.000000     no
52      EmmacTweeUki   591 10.7132631 0.000000     no
53      EmsubBamuAli  1286 23.2269725 0.000000     no
54     EmsubBamuAwab  1285 16.8725666 0.000000     no
55     EmsubMorehead  1238 21.7595831 0.000000     no
56      EmsubFlyGuka  1268 25.9689306 0.000000     no
57      EmsubFlyJikw  1226 17.9295179 0.000000     no
58      EmsubJardine  1287 10.9701686 0.000000     no
59       EmsubKerema  1370  5.1264437 0.000000     no
60       EmsubKikori  1326 20.7904053 0.000000     no
62        EmtanBlyth  1396  8.9673202 0.000000     no
63      EmtanFinniss  1382  7.6892448 0.000000     no
64     EmtanHolrChai  1361 10.7419476 0.000000     no
65     EmtanMitchell  1343  5.4248309 0.000000     no
66     EmtanMitcMitc  1369 12.8371209 0.000000     no
67     EmtanPascFarm  1365 17.4254784 0.000000     no
68      EmtanWenlock  1351 10.6423539 0.000000     no
70     EmvicDrysdale  1365 13.7459176 0.000000     no
71        Fitzroy_WA  1372 10.2841962 0.000000     no
72     EmvicIsdeBell  1355 14.7314585 0.000000     no
73     EmvicKingMool  1363 24.4944007 0.000000     no
74          EmvicOrd  1333 12.5867638 0.000000     no
75     EmworClavPung  1299 22.5017244 0.000000     no
76         EmworDaly  1307  5.2935238 0.000000     no
77     EmworDalySlei  1321  4.8949185 0.000000     no
78     EmworLeicAlex  1324 15.9637009 0.000000     no
79     EmworLimmNath  1322  5.7857267 0.000000     no
80     EmworLiveMann  1331 19.8407465 0.000000     no
  Warning: parameter by must be either 'join.by.ind' or 'join.by.loc', set to default 'join.by.loc'
Completed: gl.assign.pa 

Assignment by PCA

pca_pa_result <-gl.assign.pca(pa.result, unknown="AA011731")
Starting gl.assign.pca 
Starting gl.keep.pop 
  Processing genlight object with SNP data
  Checking for presence of nominated populations
  Retaining only populations Unknown 
  Locus metrics not recalculated
Completed: gl.keep.pop 
  Discarding 0 populations with sample size < nmin = 10 :
Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by 'spam'
Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by 'spam'
  Calculating a PCA to represent the unknown in the context
                   of putative sources

Starting gl.colors 
Selected color type 2 
Completed: gl.colors 

  Eliminating populations for which the unknown is outside
                   their confidence envelope
  Returning a genlight object with remaining putative source
                   populations plus the unknown
Completed: gl.assign.pca 

Assignment by Mahalanobis Distances

mahal_result <- gl.assign.mahal(pa.result,unknown="AA011731", verbose = 3)
HELLO HELLO HELLO
Starting gl.assign.mahal 
  Discarding 0 populations with sample size < 10 :
 
  Rendering the data matrix dense by imputation
Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by 'spam'
Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by 'spam'
  Undertaking a PCA

Starting gl.colors 
Selected color type 2 
Completed: gl.colors 
  Dimensions retained: 1 
  Number of dimensions with substantial eigenvalues (Broken-Stick Criterion): 1 . Hardwired limit 3 
    Selecting the smallest of the two
    Dimension of confidence envelope set at 1 
Assignment of unknown individual: AA011731 
Alpha level of significance: 0.001 
      pop     MahalD      pval assign
1    Mary 0.09623112 1.0000000    yes
2    Pine 0.34572127 0.9999989    yes
3 Burnett 3.32087394 0.9728334    yes
  Best assignment is the population with the largest probability
                of assignment, in this case Mary 
  Returning a genlight object with the putative source populations and the unknown
Completed: gl.assign.mahal 

Scenario

The authorities have recently raided a premises in Brisbane and found a number of reptiles held without permit. One of these is the painted turtle Emydura subglobosa. This species is widespread and common in southern New Guinea, but restricted in Australia to the Jardine River at the tip of Cape York. The Australian population is considered critically endangered under the EPBC Act. The question is, was the animal sourced from Cape York or imported from New Guinea? The specimen was genotyped and run in a service with the other available specimens from localities shown in Figure 1. The datafile is assignment_example1.Rdata. The SpecimenID is “AA046092“. Before you begin the analysis, restrict the populations under consideration to Emydura subglobosa.

Exercise

Can you confidently decide if the animal was sourced from Cape York or New Guinea using the tools we have provided you via dartR?

The data

gl
# The unknown
Unknown = "AA046092"
# Preliminaries
popNames(gl)

gl2 <- gl.keep.pop(gl, pop.list=c("EmsubBamuAli", "EmsubFlyGuka", "EmsubFlyJikw",
                                  "EmsubJardine", "EmsubKerema", "EmsubMorehead"))

# Knock yourself out

Further Study

Tutorial yet to come…

Readings