Spectrum#

Classes for working with the site-frequency spectrum (SFS) and 2-SFS.

class SFS(data: Sequence[float])[source]#

Bases: Spectrum

A site-frequency spectrum.

property Theta: float#

Calculate genome-wide population mutation rate using Watterson’s estimator.

Note

Property Theta is not normalized by the total number of sites, unlike theta.

__init__(data: Sequence[float])#

Initialize spectrum.

Parameters:

data (Sequence[float]) – SFS counts

copy()#

Copy the spectrum.

Return type:

Spectrum

Returns:

Copy of the spectrum

fold()#

Fold the site-frequency spectrum.

Return type:

Spectrum

Returns:

Folded spectrum

static from_file(file: str)#

Load object from file.

Parameters:

file (str) – File name

Return type:

Spectrum

Returns:

Spectrum object

static from_list(data: Sequence)#

Create Spectrum from list.

Parameters:

data (Sequence) – SFS counts

Return type:

Spectrum

Returns:

Spectrum

static from_polydfe(polymorphic: Sequence, n_sites: float, n_div: float)#

Create Spectra from polyDFE specification which treats the number of mutational target sites and the divergence counts separately.

Parameters:
  • polymorphic (Sequence) – Polymorphic counts

  • n_sites (float) – Total number of sites

  • n_div (float) – Number of divergence counts

Return type:

Spectrum

Returns:

Spectrum

static from_polymorphic(data: Sequence)#

Create Spectrum from polymorphic counts only.

Parameters:

data (Sequence) – Polymorphic counts

Return type:

Spectrum

Returns:

Spectrum

static get_neutral(theta: float, n_sites: float, n: int, r: Sequence[float] = None)#

Obtain a standard neutral SFS for a given theta and number of sites.

Parameters:
  • theta (float) – Population mutation rate

  • n_sites (float) – Number of total sites

  • n (int) – Number of frequency classes

  • r (Sequence[float]) – Nuisance parameters that account for demography. An array of length n-1 whose elements are multiplied element-wise with the polymorphic counts of the Kingman SFS. By default, no demography effects are considered which is equivalent to r = [1] * (n-1). Note that non-default values of r will also affect estimates of the population mutation rate.

Return type:

Spectrum

Returns:

Neutral SFS

property has_div: bool#

Whether n_div was specified.

Returns:

Whether n_div was specified

is_folded()#

Check if the site-frequency spectrum is folded.

Return type:

bool

Returns:

True if folded, False otherwise

misidentify(epsilon: float)#

Introduce ancestral misidentification at rate epsilon. Note that monomorphic counts won’t be affected.

Parameters:

epsilon (float) – Misidentification rate (0 <= epsilon <= 1)

Return type:

Spectrum

Returns:

Spectrum with misidentification applied

Raises:

ValueError – If epsilon is not between 0 and 1

property n: int#

The sample size.

Returns:

Sample size

property n_div: float#

Number of divergence counts.

Returns:

Number of divergence counts

property n_monomorphic: float#

Number of monomorphic sites.

Returns:

Number of monomorphic sites

property n_polymorphic: ndarray#

Get the polymorphic counts.

Returns:

Polymorphic counts

property n_sites: float#

The total number of sites.

Returns:

Total number of sites

normalize()#

Normalize SFS so that all non-monomorphic counts add up to 1.

Return type:

Spectrum

Returns:

Normalized spectrum

plot(show: bool = True, file: str = None, title: str = None, log_scale: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)#

Plot spectrum.

Parameters:
  • show (bool) – Whether to show plot.

  • file (str) – File to save plot to.

  • title (str) – Title of plot.

  • log_scale (bool) – Whether to use log scale on y-axis.

  • show_monomorphic (bool) – Whether to show monomorphic counts.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes

property polymorphic: ndarray#

Get the polymorphic counts.

Returns:

Polymorphic counts

resample(seed: int | Generator = None)#

Resample SFS assuming independent Poisson counts.

Parameters:

seed (int | Generator) – Random state or seed

Return type:

Spectrum

Returns:

Resampled spectrum.

scale_theta(theta: float)#

Scale the spectrum to a different theta value by

Parameters:

theta (float) – New theta value

Return type:

Spectrum

Returns:

Scaled spectrum

static standard_kingman(n: int, n_monomorphic: int = 0)#

Get standard Kingman SFS.

Parameters:
  • n (int) – sample size

  • n_monomorphic (int) – Number of monomorphic sites.

Return type:

Spectrum

Returns:

Standard Kingman SFS

subsample(n: int, mode: Literal['random', 'probabilistic'] = 'probabilistic', seed: int | Generator = None)#

Subsample spectrum to a given sample size.

Warning

If using the ‘random’ mode, The SFS counts are cast to integers before subsampling so this will only provide sensible results if the SFS counts are integers or if they are large enough to be approximated well by integers. The ‘probabilistic’ mode does not have this limitation.

Parameters:
  • n (int) – Sample size

  • mode (Literal['random', 'probabilistic']) – Subsampling mode. Either ‘random’ or ‘probabilistic’.

  • seed (int | Generator) – Random state or seed. Only for ‘random’ mode.

Return type:

Spectrum

Returns:

Subsampled spectrum

property theta: float#

Calculate site-wise population mutation rate using Watterson’s estimator. Note that theta is given per site, i.e. Watterson’s estimator is divided by the total number of sites (n_sites).

to_file(file: str)#

Save object to file.

Parameters:

file (str) – File name

to_list()#

Convert to list.

Return type:

list

Returns:

SFS counts

to_numpy()#

Convert to array.

Return type:

ndarray

Returns:

SFS counts

to_spectra()#

Convert to Spectra object.

Return type:

Spectra

Returns:

Spectra object

class SFS2(data: ndarray | list)[source]#

Bases: Iterable

A 2-dimensional site-frequency spectrum.

__init__(data: ndarray | list)[source]#

Construct from data matrix.

Parameters:

data (ndarray | list)

to_file(file)[source]#

Save to file (in JSON format).

Parameters:

file – File path.

to_json()[source]#

Convert data to JSON string.

Return type:

str

Returns:

JSON string

static from_file(file: str)[source]#

Load from file.

Parameters:

file (str) – File path.

Return type:

SFS2

Returns:

SFS2

static from_json(json: str)[source]#

Load from JSON string.

Parameters:

json (str) – JSON string.

Return type:

SFS2

Returns:

SFS2

is_folded()[source]#

Check if the 2-SFS is folded.

Return type:

bool

Returns:

Whether the 2-SFS is folded.

fold()[source]#

Fold 2-SFS by adding up i and n - i for both axes. Node that this only make sense for counts or frequencies.

Return type:

SFS2

Returns:

Folded 2-SFS.

copy()[source]#

Create deep copy.

Return type:

SFS2

Returns:

Deep copy.

symmetrize()[source]#

Symmetric SFS so that i, j and j, i are the same.

Return type:

SFS2

Returns:

Symmetric 2-SFS.

fill_monomorphic(fill_value=nan)[source]#

Remote the diagonal entries of the given array.

Parameters:

fill_value – Value to fill diagonal entries with.

Return type:

SFS2

Returns:

2-SFS

plot(ax: plt.Axes = None, title: str = None, max_abs: float = None, log_scale: bool = False, cbar_kws: Dict = None, show: bool = True)[source]#

Plot as a heatmap.

Parameters:
  • title (str) – Title of the plot.

  • ax (plt.Axes) – Axes to plot on.

  • max_abs (float) – Maximum absolute value to plot.

  • log_scale (bool) – Use log scale.

  • cbar_kws (Dict) – Keyword arguments for color bar.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

plot_surface(ax: plt.Axes = None, title: str = None, max_abs: float = None, vmin: float = None, vmax: float = None, show: bool = True)[source]#

Plot as a surface.

Parameters:
  • title (str)

  • ax (plt.Axes) – Axes to plot on.

  • max_abs (float) – Maximum absolute value to plot.

  • vmin (float) – Minimum value to plot.

  • vmax (float) – Maximum value to plot.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

mask_diagonal(fill_value=nan)[source]#

Mask both the primary and secondary diagonal entries of the 2-SFS matrix.

The primary diagonal runs from the top-left to the bottom-right, and the secondary diagonal runs from the top-right to the bottom-left.

Parameters:

fill_value – The value to fill the diagonal entries with.

Return type:

SFS2

Returns:

A new SFS2 object with both diagonals masked.

get_max_abs()[source]#

Get the maximum absolute entry of the 2-SFS matrix.

Return type:

float

Returns:

The maximum absolute entry.

mask_upper(fill_value=nan)[source]#

Mask the upper triangular entries of the 2-SFS matrix.

Parameters:

fill_value – The value to fill the upper triangular entries with.

Return type:

SFS2

Returns:

A new SFS2 object with upper triangular entries masked.

class TwoLocusSFS(data: ndarray | list)[source]#

Bases: SFS2

The two-locus site-frequency spectrum under recombination: a square matrix whose entry (i, j) is the expected product of the branch length subtending i samples at locus 0 and j samples at locus 1, for two loci separated by recombination rate r. It interpolates between the within-tree cross-moment of the SFS at r = 0 (fully linked, equal to Coalescent.sfs.cov plus the outer product of the marginal means) and the outer product of the marginal SFS as r (independent loci). It shares the container machinery of SFS2 (plotting, folding, arithmetic, serialization).

__init__(data: ndarray | list)#

Construct from data matrix.

Parameters:

data (ndarray | list)

copy()#

Create deep copy.

Return type:

SFS2

Returns:

Deep copy.

fill_monomorphic(fill_value=nan)#

Remote the diagonal entries of the given array.

Parameters:

fill_value – Value to fill diagonal entries with.

Return type:

SFS2

Returns:

2-SFS

fold()#

Fold 2-SFS by adding up i and n - i for both axes. Node that this only make sense for counts or frequencies.

Return type:

SFS2

Returns:

Folded 2-SFS.

static from_file(file: str)#

Load from file.

Parameters:

file (str) – File path.

Return type:

SFS2

Returns:

SFS2

static from_json(json: str)#

Load from JSON string.

Parameters:

json (str) – JSON string.

Return type:

SFS2

Returns:

SFS2

get_max_abs()#

Get the maximum absolute entry of the 2-SFS matrix.

Return type:

float

Returns:

The maximum absolute entry.

is_folded()#

Check if the 2-SFS is folded.

Return type:

bool

Returns:

Whether the 2-SFS is folded.

mask_diagonal(fill_value=nan)#

Mask both the primary and secondary diagonal entries of the 2-SFS matrix.

The primary diagonal runs from the top-left to the bottom-right, and the secondary diagonal runs from the top-right to the bottom-left.

Parameters:

fill_value – The value to fill the diagonal entries with.

Return type:

SFS2

Returns:

A new SFS2 object with both diagonals masked.

mask_upper(fill_value=nan)#

Mask the upper triangular entries of the 2-SFS matrix.

Parameters:

fill_value – The value to fill the upper triangular entries with.

Return type:

SFS2

Returns:

A new SFS2 object with upper triangular entries masked.

plot(ax: plt.Axes = None, title: str = None, max_abs: float = None, log_scale: bool = False, cbar_kws: Dict = None, show: bool = True)#

Plot as a heatmap.

Parameters:
  • title (str) – Title of the plot.

  • ax (plt.Axes) – Axes to plot on.

  • max_abs (float) – Maximum absolute value to plot.

  • log_scale (bool) – Use log scale.

  • cbar_kws (Dict) – Keyword arguments for color bar.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

plot_surface(ax: plt.Axes = None, title: str = None, max_abs: float = None, vmin: float = None, vmax: float = None, show: bool = True)#

Plot as a surface.

Parameters:
  • title (str)

  • ax (plt.Axes) – Axes to plot on.

  • max_abs (float) – Maximum absolute value to plot.

  • vmin (float) – Minimum value to plot.

  • vmax (float) – Maximum value to plot.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

symmetrize()#

Symmetric SFS so that i, j and j, i are the same.

Return type:

SFS2

Returns:

Symmetric 2-SFS.

to_file(file)#

Save to file (in JSON format).

Parameters:

file – File path.

to_json()#

Convert data to JSON string.

Return type:

str

Returns:

JSON string

class JointSFS(data: ndarray | list)[source]#

Bases: Iterable

A joint (multi-population) site-frequency spectrum.

The data is a P-dimensional array of shape (n_0 + 1, ..., n_{P-1} + 1) where P is the number of populations and entry (k_0, ..., k_{P-1}) corresponds to branches subtending k_p samples from population p. For two populations this is a 2-dimensional array (analogous to but generally rectangular, unlike the square SFS2); for three populations it is a 3-dimensional array, and so on.

__init__(data: ndarray | list)[source]#

Construct from a data array.

Parameters:

data (ndarray | list) – A P-dimensional array.

data: ndarray#

The joint SFS array.

property n_pops: int#

Number of populations (dimensions of the joint SFS).

property shape: Tuple[int, ...]#

Shape of the joint SFS array.

copy()[source]#

Create a deep copy.

Return type:

JointSFS

Returns:

Deep copy.

marginalize(pops: Sequence[int])[source]#

Marginalize the joint SFS onto a subset of populations by summing over the other populations. This is useful for example to obtain a 2-dimensional view of a higher-dimensional joint SFS.

Parameters:

pops (Sequence[int]) – The population indices to keep, in the desired axis order.

Return type:

JointSFS

Returns:

A joint SFS over the specified populations.

plot(pops: Tuple[int, int] = (0, 1), ax: plt.Axes = None, title: str = None, log_scale: bool = False, mask_monomorphic: bool = True, cbar_kws: Dict = None, show: bool = True)[source]#

Plot the joint SFS as a 2-dimensional heatmap. For more than two populations, the joint SFS is first marginalized onto the two requested populations (summing over the others).

Parameters:
  • pops (Tuple[int, int]) – The two population indices to plot (y-axis, x-axis).

  • ax (plt.Axes) – Axes to plot on.

  • title (str) – Title of the plot.

  • log_scale (bool) – Whether to use a logarithmic color scale.

  • mask_monomorphic (bool) – Whether to mask the monomorphic corners (all-zero and all-derived).

  • cbar_kws (Dict) – Keyword arguments for the color bar.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

plot_surface(pops: Tuple[int, int] = (0, 1), ax: plt.Axes = None, title: str = None, log_scale: bool = False, mask_monomorphic: bool = True, cmap: str = 'viridis', show: bool = True)[source]#

Plot the joint SFS as a 3-dimensional surface. For more than two populations, the joint SFS is first marginalized onto the two requested populations (summing over the others).

Parameters:
  • pops (Tuple[int, int]) – The two population indices to plot (y-axis, x-axis).

  • ax (plt.Axes) – Axes to plot on.

  • title (str) – Title of the plot.

  • log_scale (bool) – Whether to use a logarithmic color scale.

  • mask_monomorphic (bool) – Whether to mask the monomorphic corners (all-zero and all-derived).

  • cmap (str) – The colormap.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

to_file(file: str)[source]#

Save to file (in JSON format).

Parameters:

file (str) – File path.

to_json()[source]#

Convert to a JSON string.

Return type:

str

Returns:

JSON string.

static from_file(file: str)[source]#

Load from file.

Parameters:

file (str) – File path.

Return type:

JointSFS

Returns:

JointSFS

static from_json(json: str)[source]#

Load from a JSON string.

Parameters:

json (str) – JSON string.

Return type:

JointSFS

Returns:

JointSFS