Spectrum#
Classes for working with the site-frequency spectrum (SFS) and 2-SFS.
- class SFS(data: Sequence[float])[source]#
Bases:
SpectrumA site-frequency spectrum.
- property Theta: float#
Calculate population mutation rate using Watterson’s estimator.
- __init__(data: Sequence[float])#
Initialize spectrum.
- Parameters:
data (
Sequence[float]) – SFS counts
- copy()#
Copy the spectrum.
- Return type:
Spectrum- Returns:
Copy of the spectrum
- fold()#
Fold the site-frequency spectrum.
- Return type:
Spectrum- Returns:
Folded spectrum
- static from_file(file: str)#
Load object from file.
- Parameters:
file (
str) – File name- Return type:
Spectrum- Returns:
Spectrum object
- static from_list(data: Sequence)#
Create Spectrum from list.
- Parameters:
data (
Sequence) – SFS counts- Return type:
Spectrum- Returns:
Spectrum
- static from_polydfe(polymorphic: Sequence, n_sites: float, n_div: float)#
Create Spectra from polyDFE specification which treats the number of mutational target sites and the divergence counts separately.
- Parameters:
polymorphic (
Sequence) – Polymorphic countsn_sites (
float) – Total number of sitesn_div (
float) – Number of divergence counts
- Return type:
Spectrum- Returns:
Spectrum
- static from_polymorphic(data: Sequence)#
Create Spectrum from polymorphic counts only.
- Parameters:
data (
Sequence) – Polymorphic counts- Return type:
Spectrum- Returns:
Spectrum
- property has_div: bool#
Whether n_div was specified.
- Returns:
Whether n_div was specified
- is_folded()#
Check if the site-frequency spectrum is folded.
- Return type:
bool- Returns:
True if folded, False otherwise
- misidentify(epsilon: float)#
Introduce ancestral misidentification at rate epsilon. Note that monomorphic counts won’t be affected.
- Parameters:
epsilon (
float) – Misidentification rate (0 <= epsilon <= 1)- Return type:
Spectrum- Returns:
Spectrum with misidentification applied
- Raises:
ValueError – If epsilon is not between 0 and 1
- property n: int#
The sample size.
- Returns:
Sample size
- property n_div: float#
Number of divergence counts.
- Returns:
Number of divergence counts
- property n_monomorphic: float#
Number of monomorphic sites.
- Returns:
Number of monomorphic sites
- property n_polymorphic: ndarray#
Get the polymorphic counts.
- Returns:
Polymorphic counts
- property n_sites: float#
The total number of sites.
- Returns:
Total number of sites
- normalize()#
Normalize SFS so that all non-monomorphic counts add up to 1.
- Return type:
Spectrum- Returns:
Normalized spectrum
- plot(show: bool = True, file: str = None, title: str = None, log_scale: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)#
Plot spectrum.
- Parameters:
show (
bool) – Whether to show plot.file (
str) – File to save plot to.title (
str) – Title of plot.log_scale (
bool) – Whether to use log scale on y-axis.show_monomorphic (
bool) – Whether to show monomorphic counts.kwargs_legend (
dict) – Keyword arguments passed toplt.legend(). Only for Python visualization backend.ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.
- Return type:
plt.Axes
- Returns:
Axes
- property polymorphic: ndarray#
Get the polymorphic counts.
- Returns:
Polymorphic counts
- resample(seed: int = None)#
Resample SFS assuming independent Poisson counts.
- Parameters:
seed (
int) – Seed for random number generator.- Return type:
Spectrum- Returns:
Resampled spectrum.
- static standard_kingman(n: int, n_monomorphic: int = 0)#
Get standard Kingman SFS.
- Parameters:
n (
int) – sample sizen_monomorphic (
int) – Number of monomorphic sites.
- Return type:
Spectrum- Returns:
Standard Kingman SFS
- subsample(n: int, mode: Literal['random', 'probabilistic'] = 'probabilistic', seed: int | None = None)#
Subsample spectrum to a given sample size.
Warning
If using the ‘random’ mode, The SFS counts are cast to integers before subsampling so this will only provide sensible results if the SFS counts are integers or if they are large enough to be approximated well by integers. The ‘probabilistic’ mode does not have this limitation.
- Parameters:
n (
int) – Sample sizemode (
Literal['random','probabilistic']) – Subsampling mode. Either ‘random’ or ‘probabilistic’.seed (
int|None) – Seed for random number generator. Only for ‘random’ mode.
- Return type:
Spectrum- Returns:
Subsampled spectrum
- property theta: float#
Calculate site-wise population mutation rate using Watterson’s estimator. Note that theta is given per site, i.e. Watterson’s estimator is divided by the total number of sites (
n_sites).
- to_file(file: str)#
Save object to file.
- Parameters:
file (
str) – File name
- to_list()#
Convert to list.
- Return type:
list- Returns:
SFS counts
- to_numpy()#
Convert to array.
- Return type:
ndarray- Returns:
SFS counts
- to_spectra()#
Convert to Spectra object.
- Return type:
Spectra- Returns:
Spectra object
- class SFS2(data: ndarray | list)[source]#
Bases:
IterableA 2-dimensional site-frequency spectrum.
- __init__(data: ndarray | list)[source]#
Construct from data matrix.
- Parameters:
data (
ndarray|list)
- static from_file(file: str)[source]#
Load from file.
- Parameters:
file (
str) – File path.- Return type:
- Returns:
SFS2
- static from_json(json: str)[source]#
Load from JSON string.
- Parameters:
json (
str) – JSON string.- Return type:
- Returns:
SFS2
- is_folded()[source]#
Check if the 2-SFS is folded.
- Return type:
bool- Returns:
Whether the 2-SFS is folded.
- fold()[source]#
Fold 2-SFS by adding up
iandn - ifor both axes. Node that this only make sense for counts or frequencies.- Return type:
- Returns:
Folded 2-SFS.
- symmetrize()[source]#
Symmetric SFS so that
i, jandj, iare the same.- Return type:
- Returns:
Symmetric 2-SFS.
- fill_monomorphic(fill_value=nan)[source]#
Remote the diagonal entries of the given array.
- Parameters:
fill_value – Value to fill diagonal entries with.
- Return type:
- Returns:
2-SFS
- plot(ax: plt.Axes = None, title: str = None, max_abs: float = None, log_scale: bool = False, cbar_kws: Dict = None, show: bool = True)[source]#
Plot as a heatmap.
- Parameters:
title (
str) – Title of the plot.ax (plt.Axes) – Axes to plot on.
max_abs (
float) – Maximum absolute value to plot.log_scale (
bool) – Use log scale.cbar_kws (
Dict) – Keyword arguments for color bar.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.
- plot_surface(ax: plt.Axes = None, title: str = None, max_abs: float = None, vmin: float = None, vmax: float = None, show: bool = True)[source]#
Plot as a surface.
- Parameters:
title (
str)ax (plt.Axes) – Axes to plot on.
max_abs (
float) – Maximum absolute value to plot.vmin (
float) – Minimum value to plot.vmax (
float) – Maximum value to plot.show (
bool) – Whether to show the plot.
- Return type:
plt.Axes
- Returns:
Axes.
- mask_diagonal(fill_value=nan)[source]#
Mask both the primary and secondary diagonal entries of the 2-SFS matrix.
The primary diagonal runs from the top-left to the bottom-right, and the secondary diagonal runs from the top-right to the bottom-left.
- Parameters:
fill_value – The value to fill the diagonal entries with.
- Return type:
- Returns:
A new SFS2 object with both diagonals masked.