Assign an individual of unknown provenance to population based on Mahalanobis Distance

This script assigns an individual of unknown provenance to one or more target populations based on the unknown individual's proximity to population centroids; proximity is estimated using Mahalanobis Distance.

The following process is followed:

An ordination is undertaken on the populations to again yield a series of orthogonal (independent) axes.
A workable subset of dimensions is chosen, that specified, or equal to the number of dimensions with substantive eigenvalues, whichever is the smaller.
The Mahalobalis Distance is calculated for the unknown against each population and probability of membership of each population is calculated. The assignment probabilities are listed in support of a decision.

gl.assign.mahalanobis(
  x,
  dim.limit = 2,
  plevel = 0.999,
  plot.out = TRUE,
  unknown,
  verbose = NULL
)

Arguments

x: Name of the input genlight object [required].
dim.limit: Maximum number of dimensions to consider for the confidence ellipses [default 2]
plevel: Probability level for bounding ellipses [default 0.999].
plot.out: If TRUE, produces a plot showing the position of the unknown in relation to putative source populations [default TRUE]
unknown: Identity label of the focal individual whose provenance is unknown [required].
verbose: Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Value

A data frame with the results of the assignment analysis.

Details

There are three considerations to assignment. First, consider only those populations for which the unknown has no private alleles. Private alleles are an indication that the unknown does not belong to a target population (provided that the sample size is adequate, say >=10). This can be evaluated with gl.assign.pa().

A next step is to consider the PCoA plot for populations where no private alleles have been detected. The position of the unknown in relation to the confidence ellipses is plotted by this script as a basis for narrowing down the list of putative source populations. This can be evaluated with gl.assign.pca().

The third step (delivered by this script) is to consider the assignment probabilities based on the squared Generalised Linear Distance (Mahalanobis distance) of the unknown from the centroid for each population, then to consider the probability associated with its quantile using the Chisquare approximation. In effect, this index takes into account position of the unknown in relation to the confidence envelope in all selected dimensions of the ordination. The larger the assignment probability, the greater the confidence in the assignment.

If dim.limit is set to 2, to correspond with the dimensions used in gl.assign.pa(), then the output provides a ranking of the final set of putative source populations.

If dim.limit is set to be > 2, then this script provides a basis for further narrowing the set of putative populations.If the unknown individual is an extreme outlier, say at less than 0.001 probability of population membership (0.999 confidence envelope), then the associated population can be eliminated from further consideration.

Warning: gl.assign.mahal() treats each specified dimension equally, without regard to the percentage variation explained after ordination. If the unknown is an outlier in a lower dimension with an explanatory variance of, say, 0.1 dimensions from the ordination.

Each of these above approaches provides evidence, none are 100 They need to be interpreted cautiously.

In deciding the assignment, the script considers an individual to be an outlier with respect to a particular population at alpha = 0.001 as default

Author

Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr

Examples

if (FALSE) { # \dontrun{
#Test run with a focal individual from the Macleay River (EmmacMaclGeor) 
test <- gl.assign.pa(testset.gl, unknown='UC_01044', nmin=10, threshold=1,
verbose=3) 
test_2  <- gl.assign.pca(test, unknown='UC_01044', plevel=0.95, verbose=3)
df <- gl.assign.mahalanobis(test_2, unknown='UC_01044', verbose=3)
} # }