Spectrum#

Classes for working with the site-frequency spectrum (SFS) and 2-SFS.

class SFS(data: Sequence[float])[source]#

Bases: Spectrum

A site-frequency spectrum.

property Theta: float#

Calculate population mutation rate using Watterson’s estimator.

Note

Property Theta is not normalized by the total number of sites, unlike theta.

__init__(data: Sequence[float])#

Initialize spectrum.

Parameters:

data (Sequence[float]) – SFS counts

copy()#

Copy the spectrum.

Return type:

Spectrum

Returns:

Copy of the spectrum

fold()#

Fold the site-frequency spectrum.

Return type:

Spectrum

Returns:

Folded spectrum

static from_file(file: str)#

Load object from file.

Parameters:

file (str) – File name

Return type:

Spectrum

Returns:

Spectrum object

static from_list(data: Sequence)#

Create Spectrum from list.

Parameters:

data (Sequence) – SFS counts

Return type:

Spectrum

Returns:

Spectrum

static from_polydfe(polymorphic: Sequence, n_sites: float, n_div: float)#

Create Spectra from polyDFE specification which treats the number of mutational target sites and the divergence counts separately.

Parameters:
  • polymorphic (Sequence) – Polymorphic counts

  • n_sites (float) – Total number of sites

  • n_div (float) – Number of divergence counts

Return type:

Spectrum

Returns:

Spectrum

static from_polymorphic(data: Sequence)#

Create Spectrum from polymorphic counts only.

Parameters:

data (Sequence) – Polymorphic counts

Return type:

Spectrum

Returns:

Spectrum

property has_div: bool#

Whether n_div was specified.

Returns:

Whether n_div was specified

is_folded()#

Check if the site-frequency spectrum is folded.

Return type:

bool

Returns:

True if folded, False otherwise

misidentify(epsilon: float)#

Introduce ancestral misidentification at rate epsilon. Note that monomorphic counts won’t be affected.

Parameters:

epsilon (float) – Misidentification rate (0 <= epsilon <= 1)

Return type:

Spectrum

Returns:

Spectrum with misidentification applied

Raises:

ValueError – If epsilon is not between 0 and 1

property n: int#

The sample size.

Returns:

Sample size

property n_div: float#

Number of divergence counts.

Returns:

Number of divergence counts

property n_monomorphic: float#

Number of monomorphic sites.

Returns:

Number of monomorphic sites

property n_polymorphic: ndarray#

Get the polymorphic counts.

Returns:

Polymorphic counts

property n_sites: float#

The total number of sites.

Returns:

Total number of sites

normalize()#

Normalize SFS so that all non-monomorphic counts add up to 1.

Return type:

Spectrum

Returns:

Normalized spectrum

plot(show: bool = True, file: str = None, title: str = None, log_scale: bool = False, show_monomorphic: bool = False, kwargs_legend: dict = {'prop': {'size': 8}}, ax: plt.Axes = None)#

Plot spectrum.

Parameters:
  • show (bool) – Whether to show plot.

  • file (str) – File to save plot to.

  • title (str) – Title of plot.

  • log_scale (bool) – Whether to use log scale on y-axis.

  • show_monomorphic (bool) – Whether to show monomorphic counts.

  • kwargs_legend (dict) – Keyword arguments passed to plt.legend(). Only for Python visualization backend.

  • ax (plt.Axes) – Axes to plot on. Only for Python visualization backend.

Return type:

plt.Axes

Returns:

Axes

property polymorphic: ndarray#

Get the polymorphic counts.

Returns:

Polymorphic counts

resample(seed: int = None)#

Resample SFS assuming independent Poisson counts.

Parameters:

seed (int) – Seed for random number generator.

Return type:

Spectrum

Returns:

Resampled spectrum.

static standard_kingman(n: int, n_monomorphic: int = 0)#

Get standard Kingman SFS.

Parameters:
  • n (int) – sample size

  • n_monomorphic (int) – Number of monomorphic sites.

Return type:

Spectrum

Returns:

Standard Kingman SFS

subsample(n: int, mode: Literal['random', 'probabilistic'] = 'probabilistic', seed: int | None = None)#

Subsample spectrum to a given sample size.

Warning

If using the ‘random’ mode, The SFS counts are cast to integers before subsampling so this will only provide sensible results if the SFS counts are integers or if they are large enough to be approximated well by integers. The ‘probabilistic’ mode does not have this limitation.

Parameters:
  • n (int) – Sample size

  • mode (Literal['random', 'probabilistic']) – Subsampling mode. Either ‘random’ or ‘probabilistic’.

  • seed (int | None) – Seed for random number generator. Only for ‘random’ mode.

Return type:

Spectrum

Returns:

Subsampled spectrum

property theta: float#

Calculate site-wise population mutation rate using Watterson’s estimator. Note that theta is given per site, i.e. Watterson’s estimator is divided by the total number of sites (n_sites).

to_file(file: str)#

Save object to file.

Parameters:

file (str) – File name

to_list()#

Convert to list.

Return type:

list

Returns:

SFS counts

to_numpy()#

Convert to array.

Return type:

ndarray

Returns:

SFS counts

to_spectra()#

Convert to Spectra object.

Return type:

Spectra

Returns:

Spectra object

class SFS2(data: ndarray | list)[source]#

Bases: Iterable

A 2-dimensional site-frequency spectrum.

__init__(data: ndarray | list)[source]#

Construct from data matrix.

Parameters:

data (ndarray | list)

to_file(file)[source]#

Save to file (in JSON format).

Parameters:

file – File path.

to_json()[source]#

Convert data to JSON string.

Return type:

str

Returns:

JSON string

static from_file(file: str)[source]#

Load from file.

Parameters:

file (str) – File path.

Return type:

SFS2

Returns:

SFS2

static from_json(json: str)[source]#

Load from JSON string.

Parameters:

json (str) – JSON string.

Return type:

SFS2

Returns:

SFS2

is_folded()[source]#

Check if the 2-SFS is folded.

Return type:

bool

Returns:

Whether the 2-SFS is folded.

fold()[source]#

Fold 2-SFS by adding up i and n - i for both axes. Node that this only make sense for counts or frequencies.

Return type:

SFS2

Returns:

Folded 2-SFS.

copy()[source]#

Create deep copy.

Return type:

SFS2

Returns:

Deep copy.

symmetrize()[source]#

Symmetric SFS so that i, j and j, i are the same.

Return type:

SFS2

Returns:

Symmetric 2-SFS.

fill_monomorphic(fill_value=nan)[source]#

Remote the diagonal entries of the given array.

Parameters:

fill_value – Value to fill diagonal entries with.

Return type:

SFS2

Returns:

2-SFS

plot(ax: plt.Axes = None, title: str = None, max_abs: float = None, log_scale: bool = False, cbar_kws: Dict = None, show: bool = True)[source]#

Plot as a heatmap.

Parameters:
  • title (str) – Title of the plot.

  • ax (plt.Axes) – Axes to plot on.

  • max_abs (float) – Maximum absolute value to plot.

  • log_scale (bool) – Use log scale.

  • cbar_kws (Dict) – Keyword arguments for color bar.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

plot_surface(ax: plt.Axes = None, title: str = None, max_abs: float = None, vmin: float = None, vmax: float = None, show: bool = True)[source]#

Plot as a surface.

Parameters:
  • title (str)

  • ax (plt.Axes) – Axes to plot on.

  • max_abs (float) – Maximum absolute value to plot.

  • vmin (float) – Minimum value to plot.

  • vmax (float) – Maximum value to plot.

  • show (bool) – Whether to show the plot.

Return type:

plt.Axes

Returns:

Axes.

mask_diagonal(fill_value=nan)[source]#

Mask both the primary and secondary diagonal entries of the 2-SFS matrix.

The primary diagonal runs from the top-left to the bottom-right, and the secondary diagonal runs from the top-right to the bottom-left.

Parameters:

fill_value – The value to fill the diagonal entries with.

Return type:

SFS2

Returns:

A new SFS2 object with both diagonals masked.

get_max_abs()[source]#

Get the maximum absolute entry of the 2-SFS matrix.

Return type:

float

Returns:

The maximum absolute entry.

mask_upper(fill_value=nan)[source]#

Mask the upper triangular entries of the 2-SFS matrix.

Parameters:

fill_value – The value to fill the upper triangular entries with.

Return type:

SFS2

Returns:

A new SFS2 object with upper triangular entries masked.