Title: | Genetic Distance Segmentation for Population Genetics |
---|---|
Description: | Provides efficient methods to compute local and genome wide genetic distances (corresponding to the so called Hudson Fst parameters) through moment method, perform chromosome segmentation into homogeneous Fst genomic regions, and selection sweep detection for multi-population comparison. When multiple profile segmentation is required, the procedure can be parallelized using the future package. |
Authors: | Tristan Mary-Huard [aut, cre]
|
Maintainer: | Tristan Mary-Huard <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2025-03-09 05:57:17 UTC |
Source: | https://github.com/cran/fst4pg |
The function builds a list where each element corresponds to a population present in both Freq and NbGametes (all other populations being discarded). Each element consists of a data.frame with 2 columns, Freq and NbGamete.
BuildFreqNbG(Freq, NbGamete)
BuildFreqNbG(Freq, NbGamete)
Freq |
A data.frame or matrix of frequencies where each row corresponds to a marker, each column corresponds to a population, |
NbGamete |
A data.frame or matrix of number of gametes where each row corresponds to a marker, and each column corresponds to a population |
a list of data.frames, each corresponding to a population.
## Load the HGDP data data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete)
## Load the HGDP data data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete)
Computation of the numerator of the moment estimator
Compute_Denominator(p1, p2)
Compute_Denominator(p1, p2)
p1 |
numeric, frequencies in population 1 |
p2 |
numeric, frequencies in population 2 |
a vector with the denominators of the Fst moment estimator
Computation of the numerator of the moment estimator
Compute_Nominator(p1, p2, n1, n2)
Compute_Nominator(p1, p2, n1, n2)
p1 |
numeric, frequencies in population 1 |
p2 |
numeric, frequencies in population 2 |
n1 |
numeric, number of gametes in population 1 |
n2 |
numeric, number of gametes in population 2 |
a vector with the numerators of the Fst moment estimator
Display mean ratio and/or number of selection graphs
ContrastGraphSummary( CS, Info, Ratio.thres, Coef = 1, CutNbSel = NULL, CutMeanRatio = NULL )
ContrastGraphSummary( CS, Info, Ratio.thres, Coef = 1, CutNbSel = NULL, CutMeanRatio = NULL )
CS |
a contrast summary, as provided by function |
Info |
a data.frame providing information about markers |
Ratio.thres |
a numeric value, regions exhibiting Fst levels whose ratio with the reference level is higher than Ratio.thres will be highlighted. |
Coef |
a scalar, controling font sizes for the graph, optional |
CutNbSel |
a scalar, providing a y-value for an horizontal line on the NbSel graph |
CutMeanRatio |
a scalar, providing a y-value for an horizontal line on the MeanRatio graph |
two ggplots objects, called NbSel and MeanRatio, respectively.
Summarize multiple Fst profiles
ContrastSummary(PS, RefLevel, Ratio.thres = 3, NbSnp.min = 1)
ContrastSummary(PS, RefLevel, Ratio.thres = 3, NbSnp.min = 1)
PS |
a list of profile summaries, as provided by the |
RefLevel |
a list of reference (i.e. baseline) Fst levels |
Ratio.thres |
a numeric value, regions exhibiting Fst levels whose ratio with the reference level is higher than Ratio.thres will be highlighted. |
NbSnp.min |
an integer. The minimum number of markers required to highlight a region |
a tibble
## The full example execution takes a few seconds. # data(Freq);data(NbGamete) # FreqNbG <- BuildFreqNbG(Freq,NbGamete) # HFst.m <- HudsonFst.m(FreqNbG) ## Two sets of populations to contrast # Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) # Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) # PS <- ProfilingSummary(Profiles,Info) # RefLevel <- rapply(Profiles,median,classes = "numeric",how='list') # Ratio.thres <- 3 # NbSnp.min <- 1 # CS <- ContrastSummary(PS, RefLevel, # Ratio.thres=Ratio.thres, # NbSnp.min=NbSnp.min)
## The full example execution takes a few seconds. # data(Freq);data(NbGamete) # FreqNbG <- BuildFreqNbG(Freq,NbGamete) # HFst.m <- HudsonFst.m(FreqNbG) ## Two sets of populations to contrast # Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) # Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) # PS <- ProfilingSummary(Profiles,Info) # RefLevel <- rapply(Profiles,median,classes = "numeric",how='list') # Ratio.thres <- 3 # NbSnp.min <- 1 # CS <- ContrastSummary(PS, RefLevel, # Ratio.thres=Ratio.thres, # NbSnp.min=NbSnp.min)
ContrastTopRegions
ContrastTopRegions(CS, Crit, Info, Thres, Simplify = FALSE)
ContrastTopRegions(CS, Crit, Info, Thres, Simplify = FALSE)
CS |
a list of contrast summaries as obtained from function |
Crit |
a string providing the name of the variable to use to select regions |
Info |
a data.frame providing information about markers |
Thres |
the threshold to be used on the Crit variable |
Simplify |
a boolean specifying whether the results should be displayed as a list (by-default option) or as a single data.frame |
a data.frame or a list of data.frames
## The full example execution takes a few seconds. # data(Freq);data(NbGamete) # FreqNbG <- BuildFreqNbG(Freq,NbGamete) # HFst.m <- HudsonFst.m(FreqNbG) ## Two sets of populations to contrast # Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) # Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) # PS <- ProfilingSummary(Profiles,Info) # RefLevel <- rapply(Profiles,median,classes = "numeric",how='list') # Ratio.thres <- 3 # NbSnp.min <- 1 # CS <- ContrastSummary(PS, RefLevel, # Ratio.thres=Ratio.thres, # NbSnp.min=NbSnp.min) # NbSel.thres <- 2 # TopRegions <- ContrastTopRegions(CS = CS,Crit = 'NbSel',Info = Info, # Thres = NbSel.thres, Simplify=TRUE)
## The full example execution takes a few seconds. # data(Freq);data(NbGamete) # FreqNbG <- BuildFreqNbG(Freq,NbGamete) # HFst.m <- HudsonFst.m(FreqNbG) ## Two sets of populations to contrast # Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) # Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) # PS <- ProfilingSummary(Profiles,Info) # RefLevel <- rapply(Profiles,median,classes = "numeric",how='list') # Ratio.thres <- 3 # NbSnp.min <- 1 # CS <- ContrastSummary(PS, RefLevel, # Ratio.thres=Ratio.thres, # NbSnp.min=NbSnp.min) # NbSel.thres <- 2 # TopRegions <- ContrastTopRegions(CS = CS,Crit = 'NbSel',Info = Info, # Thres = NbSel.thres, Simplify=TRUE)
Shape the data for Fst profile representation
DF4Plot1Prof(Info, HF, FstProf, Coord = NULL, Threshold = NULL)
DF4Plot1Prof(Info, HF, FstProf, Coord = NULL, Threshold = NULL)
Info |
a data.frame providing information about markers |
HF |
a data.frame with 2 columns Fst and Weight, as obtained from the |
FstProf |
an Fst profile, as obtained from the |
Coord |
a vector with the minimum and maximum coordinates (i.e. positions along the genome), providing the range of the genomic region that will be plotted, optional. |
Threshold |
a numeric value. Markers belonging to regions whose Fst profile is higher than threshold will be highlighted. Optional. |
a data.frame that can be used as an input for function Plot1Prof
Freq
is a data.frame containing the frequencies of 49,636 markers located on chromosome 1,
for the 13 American and European populations described in the Stanford HGDP dataset.
Freq
Freq
A data.frame
Filtering markers based on allelic frequencies
Freq.filt(FreqNbG, Maf = 0)
Freq.filt(FreqNbG, Maf = 0)
FreqNbG |
a list of data.frames (one per population) with 2 columns: Freq and NbGamete |
Maf |
a numerci value for the thresholding of minor allelic frequencies |
a vector of positions to be removed
Make a frequency heatmap
HeatMap( Min, Max, chr = NULL, Info, FreqNbG, Dir = NULL, Weights = NULL, Weight.thres = 0.05, NbAdjM = 0, Subsets = NULL )
HeatMap( Min, Max, chr = NULL, Info, FreqNbG, Dir = NULL, Weights = NULL, Weight.thres = 0.05, NbAdjM = 0, Subsets = NULL )
Min |
the starting position value of the region |
Max |
the end position value of the region |
chr |
a string providing the chromosome name, optional |
Info |
a data.frame providing information about markers |
FreqNbG |
a list of data.frames (one per population) with two columns: Freq and NbGamete |
Dir |
a string providing the name of the directory where the graph should be saved, optional |
Weights |
a vector of weights associated with each marker, optional |
Weight.thres |
a numeric value. Markers with weights lower than this threshold will be discarded from the graphical representation. Optional. |
NbAdjM |
an integer providing the number of markers before and after the highlighted regions that should be added to the graphical representation, optional. |
Subsets |
a list of character vectors with the population names, optional. |
A heatmap where rows correspond to markers, and columns to populations.
## The full example execution takes a few seconds. # data(Freq);data(NbGamete) # FreqNbG <- BuildFreqNbG(Freq,NbGamete) # HFst.m <- HudsonFst.m(FreqNbG) ## Two sets of populations to contrast # Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) # Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) # PS <- ProfilingSummary(Profiles,Info) # RefLevel <- rapply(Profiles,median,classes = "numeric",how='list') # Ratio.thres <- 3 # NbSnp.min <- 1 # CS <- ContrastSummary(PS, RefLevel, # Ratio.thres=Ratio.thres, # NbSnp.min=NbSnp.min) # NbSel.thres <- 2 # TopRegions <- ContrastTopRegions(CS = CS,Crit = 'NbSel',Info = Info, # Thres = NbSel.thres, Simplify=TRUE) # HeatMap(Min = TopRegions[1,]$Start, # Max = TopRegions[1,]$End, # chr = TopRegions[1,]$Chromosome, # Info = Info, # FreqNbG = FreqNbG, # Subsets = Contrast)
## The full example execution takes a few seconds. # data(Freq);data(NbGamete) # FreqNbG <- BuildFreqNbG(Freq,NbGamete) # HFst.m <- HudsonFst.m(FreqNbG) ## Two sets of populations to contrast # Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) # Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) # PS <- ProfilingSummary(Profiles,Info) # RefLevel <- rapply(Profiles,median,classes = "numeric",how='list') # Ratio.thres <- 3 # NbSnp.min <- 1 # CS <- ContrastSummary(PS, RefLevel, # Ratio.thres=Ratio.thres, # NbSnp.min=NbSnp.min) # NbSel.thres <- 2 # TopRegions <- ContrastTopRegions(CS = CS,Crit = 'NbSel',Info = Info, # Thres = NbSel.thres, Simplify=TRUE) # HeatMap(Min = TopRegions[1,]$Start, # Max = TopRegions[1,]$End, # chr = TopRegions[1,]$Chromosome, # Info = Info, # FreqNbG = FreqNbG, # Subsets = Contrast)
Compute genome-wide Hudson Fst moment estimator
HudsonFst.gw(HFst.m, Mat = TRUE)
HudsonFst.gw(HFst.m, Mat = TRUE)
HFst.m |
a list of data.frames as obtained with function |
Mat |
boolean, should the result be output as a matrix. |
By default a matrix of Hudson Fst coefficients, a vector otherwise.
data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) HFst.chr <- HudsonFst.gw(HFst.m)
data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) HFst.chr <- HudsonFst.gw(HFst.m)
Compute Hudson Fst moment estimator at marker level
HudsonFst.m(FreqNbG)
HudsonFst.m(FreqNbG)
FreqNbG |
a list of data.frames (one per population) with 2 columns: Freq and NbGamete |
a list of data.frames with 2 columns: Fst and Weight.
## Load the FreqNbG object build from the HGDP data data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG)
## Load the FreqNbG object build from the HGDP data data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG)
Plot Fst values along chromosomes
HudsonFst.plot( Info, HFst.m, HFst.prof = NULL, Coord = NULL, Ref = NULL, Threshold = NULL )
HudsonFst.plot( Info, HFst.m, HFst.prof = NULL, Coord = NULL, Ref = NULL, Threshold = NULL )
Info |
a data.frame providing information about markers |
HFst.m |
a data.frame with 2 columns, Fst and Weight, as provided by the |
HFst.prof |
a data.frame corresponding to one item of the output of the |
Coord |
a vector with the minimum and maximum coordinates (i.e. positions along the genome) providing the range of the genomic region that will be plotted. |
Ref |
a value to plot a reference line |
Threshold |
a value to plot a threshold line |
a ggplot object
## The full example execution takes a few seconds. data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) TwoPops <- list(First="Colombian",Second="Tuscan") HFst.prof <- HudsonFst.prof(HFst.m,Contrast=TwoPops) ## Plot the raw data HudsonFst.plot(Info,HFst.m$Colombian_Tuscan) ## Plot the raw data and the segmentation HudsonFst.plot(Info,HFst.m$Colombian_Tuscan,HFst.prof$Colombian_Tuscan) ## Add a background/reference level RefLevel <- median(HFst.prof$Colombian_Tuscan) HudsonFst.plot(Info,HFst.m$Colombian_Tuscan,HFst.prof$Colombian_Tuscan, Ref=RefLevel) ## Add a threshold Threshold <- 3*RefLevel HudsonFst.plot(Info,HFst.m$Colombian_Tuscan,HFst.prof$Colombian_Tuscan, Ref=RefLevel,Threshold = Threshold)
## The full example execution takes a few seconds. data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) TwoPops <- list(First="Colombian",Second="Tuscan") HFst.prof <- HudsonFst.prof(HFst.m,Contrast=TwoPops) ## Plot the raw data HudsonFst.plot(Info,HFst.m$Colombian_Tuscan) ## Plot the raw data and the segmentation HudsonFst.plot(Info,HFst.m$Colombian_Tuscan,HFst.prof$Colombian_Tuscan) ## Add a background/reference level RefLevel <- median(HFst.prof$Colombian_Tuscan) HudsonFst.plot(Info,HFst.m$Colombian_Tuscan,HFst.prof$Colombian_Tuscan, Ref=RefLevel) ## Add a threshold Threshold <- 3*RefLevel HudsonFst.plot(Info,HFst.m$Colombian_Tuscan,HFst.prof$Colombian_Tuscan, Ref=RefLevel,Threshold = Threshold)
Perform FST profiling between pairs of pops, as requested by Contrast. If no contrast is provided, all pairs are considered
HudsonFst.prof( HFst.m, Contrast = NULL, Kmax = 100, NbSegCrit = "biggest.S3IB", parallel = TRUE )
HudsonFst.prof( HFst.m, Contrast = NULL, Kmax = 100, NbSegCrit = "biggest.S3IB", parallel = TRUE )
HFst.m |
A list of data.frame with two columns each, Fst and Weight,
as provided by the |
Contrast |
a list of two vectors with the names of the populations to be contrasted |
Kmax |
maximum number of breakpoints to be considered |
NbSegCrit |
the criterion used for the choice of the number of segments |
parallel |
a boolean, should the profiling be parallelized (using future) or not |
a smoothed profile
data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) ## Two population analysis TwoPops <- list(First="Colombian",Second="Tuscan") HFst.prof <- HudsonFst.prof(HFst.m,Contrast=TwoPops) ## The full example execution takes a few seconds. ## Two sets of populations to contrast Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) ## For larger lists and/or larger marker sets, ## use the future package for parallel computation: future::plan("multisession",workers=4) Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) future::plan("default")
data(Freq);data(NbGamete) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) ## Two population analysis TwoPops <- list(First="Colombian",Second="Tuscan") HFst.prof <- HudsonFst.prof(HFst.m,Contrast=TwoPops) ## The full example execution takes a few seconds. ## Two sets of populations to contrast Contrast <- list(America=c("Colombian","Maya"),Europe=c("Tuscan","Italian")) Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) ## For larger lists and/or larger marker sets, ## use the future package for parallel computation: future::plan("multisession",workers=4) Profiles <- HudsonFst.prof(HFst.m,Contrast=Contrast) future::plan("default")
Info
is a data.frame describing the markers located on chromosome 1 in the Stanford
HGDP dataset. Each row corresponds to a marker, and the 5 columns provide information about
the marker name, its chromosome membership, its position, and its reference and alternative alleles.
Info
Info
A data.frame
Perform segmentation on a given dataset and returns the segmented profile
MakeProfile.op(DF, coef.pen.value = 1, sd.y = NULL)
MakeProfile.op(DF, coef.pen.value = 1, sd.y = NULL)
DF |
a data.frame with two columns, Fst and Weights, as provided by the |
coef.pen.value |
coef to use for penaly 2*coef.pen.value*log(n) |
sd.y |
a numeric value corresponding to the (estimated) standard deviation of the signal. If NULL (default) the value is automatically estimated. |
a smoothed profile
Perform segmentation on a given dataset and returns the segmented profile
MakeProfile.sn_nomemory(DF, Kmax, method = "biggest.S3IB", sd.y = NULL)
MakeProfile.sn_nomemory(DF, Kmax, method = "biggest.S3IB", sd.y = NULL)
DF |
a data.frame with two columns, Fst and Weights, as provided by the |
Kmax |
max number of changes used for model selection (check with crops) |
method |
a string, the name of the criterion used for model selection: (1) "givenVariance" = using the penalty of Lebarbier 2005 given a estimator of the variance, (2) "biggest.S3IB" = biggest=TRUE in saut taken from S3IB, (3) "notbiggest.S3IB" To be chosen amongs3ib.jump, s3ib.nojump, pre, ddse (capushe) or jump (capushe) |
sd.y |
a numeric value corresponding to the (estimated) standard deviation of the signal. If NULL (default) the value is automatically estimated. |
a smoothed profile
Merge adjacent top regions
MergeRegion(DT, Crit)
MergeRegion(DT, Crit)
DT |
a data.frame |
Crit |
a string corresponding to the name of the criterion used for selecting the top regions |
a simplified data.frame (tibble)
NbGamete
is a data.frame containing the number of gametes collected
for the 13 American and European populations described in the Stanford HGDP dataset.
NbGamete
NbGamete
A data.frame
Display the graphical representation of an Fst profile
Plot1Prof(DF, Title = "", Range = NULL)
Plot1Prof(DF, Title = "", Range = NULL)
DF |
a data.frame, as provided by function |
Title |
a string providing a title for the graph, optional. |
Range |
a vector with the minimum and maximum values for the y-axis |
a ggplot object
Summary of Fst profiles
ProfilingSummary(FstProfiles, SnpInfo)
ProfilingSummary(FstProfiles, SnpInfo)
FstProfiles |
a list of Fst profiles as obtained from function |
SnpInfo |
a data.frame providing information about markers |
a list of data.frame. Each data.frame summarizes a Fst profile, in terms of number of segments, Start and End positions, length (i.e. number of markers) and Fst level of each segment.
data(Freq);data(NbGamete);data(Info) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) TwoPops <- list(First="Colombian",Second="Tuscan") HFst.prof <- HudsonFst.prof(HFst.m,Contrast=TwoPops) PS <- ProfilingSummary(HFst.prof,Info)
data(Freq);data(NbGamete);data(Info) FreqNbG <- BuildFreqNbG(Freq,NbGamete) HFst.m <- HudsonFst.m(FreqNbG) TwoPops <- list(First="Colombian",Second="Tuscan") HFst.prof <- HudsonFst.prof(HFst.m,Contrast=TwoPops) PS <- ProfilingSummary(HFst.prof,Info)
Computation of the Fst moment estimator
Ratio_Average(Nominator, Denominator)
Ratio_Average(Nominator, Denominator)
Nominator |
numeric, numerators of the Fst moment estimator |
Denominator |
numeric, denominators of the Fst moment estimator |
a vector with the global Fst estimator
Plot the Fst estimates along a (portion of) chromosome
RawPlot(Info, HF, Coord = NULL, Title = "")
RawPlot(Info, HF, Coord = NULL, Title = "")
Info |
a data.frame providing information about markers |
HF |
a data.frame with 2 columns, Fst and Weight, as provided by the |
Coord |
a vector with the minimum and maximum coordinates (i.e. positions along the genome), providing the range of the genomic region that will be plotted. |
Title |
a string providing a title for the graph. |
a ggplot object.
Summarise1Profile
Summarise1Profile(profile, snpinfo)
Summarise1Profile(profile, snpinfo)
profile |
a vector, corresponding to the Fst profile of a pair of populations |
snpinfo |
a data.frame, providing information about markers |
a data.frame that combines both objects