Spectrum#
Classes for working with the site-frequency spectrum (SFS) and 2-SFS.
- class SFS(data: Sequence[float])[source]#
Bases:
SpectrumA site-frequency spectrum.
- property Theta: float#
Calculate genome-wide population mutation rate using Watterson’s estimator.
- __init__(data: Sequence[float])#
Initialize spectrum.
- Parameters:
data (
Sequence[float]) – SFS counts
- copy()#
Copy the spectrum.
- Return type:
Spectrum- Returns:
Copy of the spectrum
- fold()#
Fold the site-frequency spectrum.
- Return type:
Spectrum- Returns:
Folded spectrum
- static from_file(file: str)#
Load object from file.
- Parameters:
file (
str) – File name- Return type:
Spectrum- Returns:
Spectrum object
- static from_list(data: Sequence)#
Create Spectrum from list.
- Parameters:
data (
Sequence) – SFS counts- Return type:
Spectrum- Returns:
Spectrum
- static from_polydfe(polymorphic: Sequence, n_sites: float, n_div: float)#
Create Spectra from polyDFE specification which treats the number of mutational target sites and the divergence counts separately.
- Parameters:
polymorphic (
Sequence) – Polymorphic countsn_sites (
float) – Total number of sitesn_div (
float) – Number of divergence counts
- Return type:
Spectrum- Returns:
Spectrum
- static from_polymorphic(data: Sequence)#
Create Spectrum from polymorphic counts only.
- Parameters:
data (
Sequence) – Polymorphic counts- Return type:
Spectrum- Returns:
Spectrum
- static get_neutral(theta: float, n_sites: float, n: int, r: Sequence[float] = None)#
Obtain a standard neutral SFS for a given theta and number of sites.
- Parameters:
theta (
float) – Population mutation raten_sites (
float) – Number of total sitesn (
int) – Number of frequency classesr (
Sequence[float]) – Nuisance parameters that account for demography. An array of lengthn-1whose elements are multiplied element-wise with the polymorphic counts of the Kingman SFS. By default, no demography effects are considered which is equivalent tor = [1] * (n-1). Note that non-default values ofrwill also affect estimates of the population mutation rate.
- Return type:
Spectrum- Returns:
Neutral SFS
- property has_div: bool#
Whether n_div was specified.
- Returns:
Whether n_div was specified
- is_folded()#
Check if the site-frequency spectrum is folded.
- Return type:
bool- Returns:
True if folded, False otherwise
- misidentify(epsilon: float)#
Introduce ancestral misidentification at rate epsilon. Note that monomorphic counts won’t be affected.
- Parameters:
epsilon (
float) – Misidentification rate (0 <= epsilon <= 1)- Return type:
Spectrum- Returns:
Spectrum with misidentification applied
- Raises:
ValueError – If epsilon is not between 0 and 1
- property n: int#
The sample size.
- Returns:
Sample size
- property n_div: float#
Number of divergence counts.
- Returns:
Number of divergence counts
- property n_monomorphic: float#
Number of monomorphic sites.
- Returns:
Number of monomorphic sites
- property n_polymorphic: ndarray#
Get the polymorphic counts.
- Returns:
Polymorphic counts
- property n_sites: float#
The total number of sites.
- Returns:
Total number of sites
- normalize()#
Normalize SFS so that all non-monomorphic counts add up to 1.
- Return type:
Spectrum- Returns:
Normalized spectrum
- plot(show: bool = True, file: str = None, title: str = None, log_scale: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)#
Plot spectrum.
- Parameters:
show (
bool) – Whether to show plot.file (
str) – File to save plot to.title (
str) – Title of plot.log_scale (
bool) – Whether to use log scale on y-axis.show_monomorphic (
bool) – Whether to show monomorphic counts.kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes
- property polymorphic: ndarray#
Get the polymorphic counts.
- Returns:
Polymorphic counts
- resample(seed: int | Generator = None)#
Resample SFS assuming independent Poisson counts.
- Parameters:
seed (
int|Generator) – Random state or seed- Return type:
Spectrum- Returns:
Resampled spectrum.
- scale_theta(theta: float)#
Scale the spectrum to a different theta value by
- Parameters:
theta (
float) – New theta value- Return type:
Spectrum- Returns:
Scaled spectrum
- static standard_kingman(n: int, n_monomorphic: int = 0)#
Get standard Kingman SFS.
- Parameters:
n (
int) – sample sizen_monomorphic (
int) – Number of monomorphic sites.
- Return type:
Spectrum- Returns:
Standard Kingman SFS
- subsample(n: int, mode: Literal['random', 'probabilistic'] = 'probabilistic', seed: int | Generator = None)#
Subsample spectrum to a given sample size.
Warning
If using the ‘random’ mode, The SFS counts are cast to integers before subsampling so this will only provide sensible results if the SFS counts are integers or if they are large enough to be approximated well by integers. The ‘probabilistic’ mode does not have this limitation.
- Parameters:
n (
int) – Sample sizemode (
Literal['random','probabilistic']) – Subsampling mode. Either ‘random’ or ‘probabilistic’.seed (
int|Generator) – Random state or seed. Only for ‘random’ mode.
- Return type:
Spectrum- Returns:
Subsampled spectrum
- property theta: float#
Calculate site-wise population mutation rate using Watterson’s estimator. Note that theta is given per site, i.e. Watterson’s estimator is divided by the total number of sites (
n_sites).
- to_file(file: str)#
Save object to file.
- Parameters:
file (
str) – File name
- to_list()#
Convert to list.
- Return type:
list- Returns:
SFS counts
- to_numpy()#
Convert to array.
- Return type:
ndarray- Returns:
SFS counts
- to_spectra()#
Convert to Spectra object.
- Return type:
Spectra- Returns:
Spectra object
- class SFS2(data: ndarray | list)[source]#
Bases:
IterableA 2-dimensional site-frequency spectrum.
- __init__(data: ndarray | list)[source]#
Construct from data matrix.
- Parameters:
data (
ndarray|list)
- static from_file(file: str)[source]#
Load from file.
- Parameters:
file (
str) – File path.- Return type:
- Returns:
SFS2
- static from_json(json: str)[source]#
Load from JSON string.
- Parameters:
json (
str) – JSON string.- Return type:
- Returns:
SFS2
- is_folded()[source]#
Check if the 2-SFS is folded.
- Return type:
bool- Returns:
Whether the 2-SFS is folded.
- fold()[source]#
Fold 2-SFS by adding up
iandn - ifor both axes. Node that this only make sense for counts or frequencies.- Return type:
- Returns:
Folded 2-SFS.
- symmetrize()[source]#
Symmetric SFS so that
i, jandj, iare the same.- Return type:
- Returns:
Symmetric 2-SFS.
- fill_monomorphic(fill_value=nan)[source]#
Remote the diagonal entries of the given array.
- Parameters:
fill_value – Value to fill diagonal entries with.
- Return type:
- Returns:
2-SFS
- plot(ax: plt.Axes = None, title: str = None, max_abs: float = None, log_scale: bool = False, cbar_kws: Dict = None, show: bool = True)[source]#
Plot as a heatmap.
- Parameters:
title (
str) – Title of the plot.ax (plt.Axes) – Axes to plot on.
max_abs (
float) – Maximum absolute value to plot.log_scale (
bool) – Use log scale.cbar_kws (
Dict) – Keyword arguments for color bar.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.
- plot_surface(ax: plt.Axes = None, title: str = None, max_abs: float = None, vmin: float = None, vmax: float = None, show: bool = True)[source]#
Plot as a surface.
- Parameters:
title (
str)ax (plt.Axes) – Axes to plot on.
max_abs (
float) – Maximum absolute value to plot.vmin (
float) – Minimum value to plot.vmax (
float) – Maximum value to plot.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.
- mask_diagonal(fill_value=nan)[source]#
Mask both the primary and secondary diagonal entries of the 2-SFS matrix.
The primary diagonal runs from the top-left to the bottom-right, and the secondary diagonal runs from the top-right to the bottom-left.
- Parameters:
fill_value – The value to fill the diagonal entries with.
- Return type:
- Returns:
A new SFS2 object with both diagonals masked.
- class TwoLocusSFS(data: ndarray | list)[source]#
Bases:
SFS2The two-locus site-frequency spectrum under recombination: a square matrix whose entry
(i, j)is the expected product of the branch length subtendingisamples at locus 0 andjsamples at locus 1, for two loci separated by recombination rater. It interpolates between the within-tree cross-moment of the SFS atr = 0(fully linked, equal toCoalescent.sfs.covplus the outer product of the marginal means) and the outer product of the marginal SFS asr → ∞(independent loci). It shares the container machinery ofSFS2(plotting, folding, arithmetic, serialization).- __init__(data: ndarray | list)#
Construct from data matrix.
- Parameters:
data (
ndarray|list)
- fill_monomorphic(fill_value=nan)#
Remote the diagonal entries of the given array.
- Parameters:
fill_value – Value to fill diagonal entries with.
- Return type:
- Returns:
2-SFS
- fold()#
Fold 2-SFS by adding up
iandn - ifor both axes. Node that this only make sense for counts or frequencies.- Return type:
- Returns:
Folded 2-SFS.
- static from_file(file: str)#
Load from file.
- Parameters:
file (
str) – File path.- Return type:
- Returns:
SFS2
- static from_json(json: str)#
Load from JSON string.
- Parameters:
json (
str) – JSON string.- Return type:
- Returns:
SFS2
- get_max_abs()#
Get the maximum absolute entry of the 2-SFS matrix.
- Return type:
float- Returns:
The maximum absolute entry.
- is_folded()#
Check if the 2-SFS is folded.
- Return type:
bool- Returns:
Whether the 2-SFS is folded.
- mask_diagonal(fill_value=nan)#
Mask both the primary and secondary diagonal entries of the 2-SFS matrix.
The primary diagonal runs from the top-left to the bottom-right, and the secondary diagonal runs from the top-right to the bottom-left.
- Parameters:
fill_value – The value to fill the diagonal entries with.
- Return type:
- Returns:
A new SFS2 object with both diagonals masked.
- mask_upper(fill_value=nan)#
Mask the upper triangular entries of the 2-SFS matrix.
- Parameters:
fill_value – The value to fill the upper triangular entries with.
- Return type:
- Returns:
A new SFS2 object with upper triangular entries masked.
- plot(ax: plt.Axes = None, title: str = None, max_abs: float = None, log_scale: bool = False, cbar_kws: Dict = None, show: bool = True)#
Plot as a heatmap.
- Parameters:
title (
str) – Title of the plot.ax (plt.Axes) – Axes to plot on.
max_abs (
float) – Maximum absolute value to plot.log_scale (
bool) – Use log scale.cbar_kws (
Dict) – Keyword arguments for color bar.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.
- plot_surface(ax: plt.Axes = None, title: str = None, max_abs: float = None, vmin: float = None, vmax: float = None, show: bool = True)#
Plot as a surface.
- Parameters:
title (
str)ax (plt.Axes) – Axes to plot on.
max_abs (
float) – Maximum absolute value to plot.vmin (
float) – Minimum value to plot.vmax (
float) – Maximum value to plot.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.
- symmetrize()#
Symmetric SFS so that
i, jandj, iare the same.- Return type:
- Returns:
Symmetric 2-SFS.
- to_file(file)#
Save to file (in JSON format).
- Parameters:
file – File path.
- to_json()#
Convert data to JSON string.
- Return type:
str- Returns:
JSON string
- class JointSFS(data: ndarray | list)[source]#
Bases:
IterableA joint (multi-population) site-frequency spectrum.
The data is a
P-dimensional array of shape(n_0 + 1, ..., n_{P-1} + 1)wherePis the number of populations and entry(k_0, ..., k_{P-1})corresponds to branches subtendingk_psamples from populationp. For two populations this is a 2-dimensional array (analogous to but generally rectangular, unlike the squareSFS2); for three populations it is a 3-dimensional array, and so on.- __init__(data: ndarray | list)[source]#
Construct from a data array.
- Parameters:
data (
ndarray|list) – AP-dimensional array.
-
data:
ndarray# The joint SFS array.
- property n_pops: int#
Number of populations (dimensions of the joint SFS).
- property shape: Tuple[int, ...]#
Shape of the joint SFS array.
- marginalize(pops: Sequence[int])[source]#
Marginalize the joint SFS onto a subset of populations by summing over the other populations. This is useful for example to obtain a 2-dimensional view of a higher-dimensional joint SFS.
- Parameters:
pops (
Sequence[int]) – The population indices to keep, in the desired axis order.- Return type:
- Returns:
A joint SFS over the specified populations.
- plot(pops: Tuple[int, int] = (0, 1), ax: plt.Axes = None, title: str = None, log_scale: bool = False, mask_monomorphic: bool = True, cbar_kws: Dict = None, show: bool = True)[source]#
Plot the joint SFS as a 2-dimensional heatmap. For more than two populations, the joint SFS is first marginalized onto the two requested populations (summing over the others).
- Parameters:
pops (
Tuple[int,int]) – The two population indices to plot (y-axis, x-axis).ax (plt.Axes) – Axes to plot on.
title (
str) – Title of the plot.log_scale (
bool) – Whether to use a logarithmic color scale.mask_monomorphic (
bool) – Whether to mask the monomorphic corners (all-zero and all-derived).cbar_kws (
Dict) – Keyword arguments for the color bar.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.
- plot_surface(pops: Tuple[int, int] = (0, 1), ax: plt.Axes = None, title: str = None, log_scale: bool = False, mask_monomorphic: bool = True, cmap: str = 'viridis', show: bool = True)[source]#
Plot the joint SFS as a 3-dimensional surface. For more than two populations, the joint SFS is first marginalized onto the two requested populations (summing over the others).
- Parameters:
pops (
Tuple[int,int]) – The two population indices to plot (y-axis, x-axis).ax (plt.Axes) – Axes to plot on.
title (
str) – Title of the plot.log_scale (
bool) – Whether to use a logarithmic color scale.mask_monomorphic (
bool) – Whether to mask the monomorphic corners (all-zero and all-derived).cmap (
str) – The colormap.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.