Reads FASTA files and converts them to genlight object

The following IUPAC Ambiguity Codes are taken as heterozygotes:

M is heterozygote for AC and CA
R is heterozygote for AG and GA
W is heterozygote for AT and TA
S is heterozygote for CG and GC
Y is heterozygote for CT and TC
K is heterozygote for GT and TG

The following IUPAC Ambiguity Codes are taken as missing data:

The function can deal with missing data in individuals, e.g. when FASTA files have different number of individuals due to missing data.

The allele with the highest frequency is taken as the reference allele.

SNPs with more than two alleles are skipped.

gl.read.fasta(fasta_files, parallel = FALSE, n_cores = NULL, verbose = NULL)

Arguments

fasta_files: Fasta files to read [required].
parallel: A logical indicating whether multiple cores -if available- should be used for the computations (TRUE), or not (FALSE); requires the package parallel to be installed [default FALSE].
n_cores: If parallel is TRUE, the number of cores to be used in the computations; if NULL, then the maximum number of cores available on the computer is used [default NULL].
verbose: Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity].

Value

A genlight object.

Details

Ambiguity characters are often used to code heterozygotes. However, using heterozygotes as ambiguity characters may bias many estimates. See more information in the link below: https://evodify.com/heterozygotes-ambiguity-characters/

Author

Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr

Examples

 # Folder where the fasta files are located. 
 folder_samples <- system.file('extdata', package ='dartR')
 # listing the FASTA files, including their path. Files have an extension
 # that contains "fas".
 file_names <- list.files(path = folder_samples, pattern = "*.fas", 
                          full.names = TRUE)
 # reading fasta files
  obj <- gl.read.fasta(file_names)
#> Starting gl.read.fasta 
#>   Reading locus_1.fas 
#>   Merging files...
#> Starting gl.compliance.check 
#>   Processing genlight object with SNP data
#>   Checking coding of SNPs
#>     SNP data scored NA, 0, 1 or 2 confirmed
#>   Checking locus metrics and flags
#>   Recalculating locus metrics
#>   Checking for monomorphic loci
#>     No monomorphic loci detected
#>   Checking for loci with all missing data
#>     No loci with all missing data detected
#>   Checking whether individual names are unique.
#>   Checking for individual metrics
#>   Warning: Creating a slot for individual metrics
#>   Checking for population assignments
#>   Population assignments not detected, individuals assigned
#>                     to a single population labelled 'pop1'
#>   Spelling of coordinates checked and changed if necessary to 
#>             lat/lon
#> Completed: gl.compliance.check 
#>   Genlight object does not have individual metrics. You need to add them 'manually' to the @other$ind.metrics slot.
#> Completed: gl.read.fasta 
#>