Distributions

Contents

Distributions#

class Coalescent(n: int | Dict[str, int] | List[int] | LineageConfig, model: CoalescentModel = None, demography: Demography = None, loci: int | LocusConfig = 1, recombination_rate: float = None, pbar: bool = False, parallelize: bool = True, start_time: float = 0, end_time: float = None, regularize: bool = True)[source]#

Bases: AbstractCoalescent, Serializable

Coalescent distribution.

__init__(n: int | Dict[str, int] | List[int] | LineageConfig, model: CoalescentModel = None, demography: Demography = None, loci: int | LocusConfig = 1, recombination_rate: float = None, pbar: bool = False, parallelize: bool = True, start_time: float = 0, end_time: float = None, regularize: bool = True)[source]#

Create object.

Parameters:
  • n (Union[int, Dict[str, int], List[int], LineageConfig]) –

    Number of lineages. Either a single integer if only one population, or a list of integers or dictionary with population names as keys and number of lineages as values for multiple populations.

    Alternatively, a LineageConfig object can be passed.

  • model (CoalescentModel) – Coalescent model. Default is the standard coalescent.

  • demography (Demography) – Demography.

  • loci (int | LocusConfig) – Number of loci or locus configuration.

  • recombination_rate (float) – Recombination rate.

  • pbar (bool) – Whether to show a progress bar when constructing the state space.

  • parallelize (bool) – Whether to parallelize computations.

  • start_time (float) – Time when to start accumulating moments. By default, this is 0.

  • end_time (float) – Time when to end the accumulating moments. If None, the end time is taken to be the time of almost sure absorption. Note that unnecessarily large end times can lead to numerical errors.

  • regularize (bool) – Whether to regularize the intensity matrix for numerical stability.

start_time: float#

Time when to start accumulating moments

pbar: bool#

Whether to show a progress bar

parallelize: bool#

Whether to parallelize computations

regularize: bool#

Whether to regularize the intensity matrix for numerical stability

property lineage_counting_state_space: LineageCountingStateSpace#

The lineage-counting state space.

property block_counting_state_space: BlockCountingStateSpace#

The block-counting state space.

property tree_height: TreeHeightDistribution#

Tree height distribution.

property total_branch_length: PhaseTypeDistribution#

Total branch length distribution.

property sfs: UnfoldedSFSDistribution#

Unfolded site-frequency spectrum distribution.

property fsfs: FoldedSFSDistribution#

Folded site-frequency spectrum distribution.

moment(k: int = 1, rewards: Sequence[Reward] = None, start_time: float = None, end_time: float = None, center: bool = True, permute: bool = True)[source]#

Get the kth (non-central) moment using the specified rewards and state space.

Parameters:
  • k (int) – The order of the moment

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, tree height rewards are used.

  • start_time (float) – Time when to start accumulation of moments. By default, the start time specified when initializing the distribution.

  • end_time (float) – Time when to end accumulation of moments. By default, either the end time specified when initializing the distribution or the time until almost sure absorption.

  • center (bool) – Whether to center the moment.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

float

Returns:

The kth moment

accumulate(k: int, end_times: Iterable[float], rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True)[source]#

Accumulate moments at different times.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times when to evaluate the moment. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

ndarray

Returns:

Accumulation of moments.

plot_accumulation(k: int = 1, end_times: Iterable[float] = None, rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True, ax: plt.Axes = None, show: bool = True, file: str = None, clear: bool = False, label: str = None, title: str = None)[source]#

Plot the accumulation of moments.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times when to evaluate the moment. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

  • ax (plt.Axes) – Axes to plot on.

  • show (bool) – Whether to show the plot.

  • file (str) – File to save the plot to.

  • clear (bool) – Whether to clear the plot before plotting.

  • label (str) – Label for the plot.

  • title (str) – Title of the plot.

Return type:

plt.Axes

Returns:

Axes.

drop_cache()[source]#

Drop state space cache.

to_json()[source]#

Serialize to JSON. Drop cache before serializing.

Return type:

str

Returns:

JSON string.

to_msprime(num_replicates: int = 10000, n_threads: int = 10, parallelize: bool = True, record_migration: bool = False, simulate_mutations: bool = False, mutation_rate: float = None, seed: int = None)[source]#

Convert to msprime coalescent.

Parameters:
  • num_replicates (int) – Number of replicates.

  • n_threads (int) – Number of threads.

  • parallelize (bool) – Whether to parallelize.

  • record_migration (bool) – Whether to record migrations which is necessary to calculate statistics per deme.

  • simulate_mutations (bool) – Whether to simulate mutations.

  • mutation_rate (float) – Mutation rate.

Return type:

MsprimeCoalescent

Returns:

msprime coalescent.

classmethod from_file(file: str, classes=None)#

Load object from file.

Parameters:
  • classes – Classes to be used for unserialization

  • file (str) – File to load from

Return type:

Self

classmethod from_json(json: str, classes=None)#

Unserialize object.

Parameters:
  • classes – Classes to be used for deserialization.

  • json (str) – JSON string

Return type:

Self

to_file(file: str)#

Save object to file (JSON).

Parameters:

file (str) – File path.

lineage_config: LineageConfig#

Population configuration

locus_config: LocusConfig#

Locus configuration

model: CoalescentModel#

Coalescent model

demography: Demography#

Demography

end_time: float#

End time

class PhaseTypeDistribution(state_space: StateSpace, tree_height: TreeHeightDistribution, demography: Demography = None, reward: Reward = None, regularize: bool = True)[source]#

Bases: MomentAwareDistribution

Phase-type distribution for a piecewise time-homogeneous process.

__init__(state_space: StateSpace, tree_height: TreeHeightDistribution, demography: Demography = None, reward: Reward = None, regularize: bool = True)[source]#

Initialize the distribution.

Parameters:
  • state_space (StateSpace) – The state space.

  • tree_height (TreeHeightDistribution) – The tree height distribution.

  • demography (Demography) – The demography.

  • reward (Reward) – The reward. By default, the tree height reward.

  • regularize (bool) – Whether to regularize the intensity matrix for numerical stability.

lineage_config: LineageConfig#

Population configuration

locus_config: LocusConfig#

Locus configuration

reward: Reward#

Reward

regularize: bool#

Whether to regularize the intensity matrix for numerical stability

state_space: StateSpace#

State space

demography: Demography#

Demography

tree_height: TreeHeightDistribution#

Tree height distribution

property mean: float | SFS#

First moment / mean.

property var: float | SFS#

Second central moment / variance.

property std: float | SFS#

Standard deviation.

property m2: float | SFS#

Second (non-central) moment.

property demes: MarginalDemeDistributions#

Marginal distributions over each deme.

property loci: MarginalLocusDistributions#

Marginal distributions over each locus.

moment(k: int, rewards: Sequence[Reward] = None, start_time: float = None, end_time: float = None, center: bool = True, permute: bool = True)[source]#

Get the kth (non-central) (cross-)moment.

Parameters:
  • k (int) – The order of the moment.

  • rewards (Sequence[Reward]) – Iterable of k rewards. By default, the reward of the underlying distribution.

  • start_time (float) – Time when to start accumulation of moments. By default, the start time specified when initializing the distribution.

  • end_time (float) – Time when to end accumulation of moments. By default, either the end time specified when initializing the distribution or the time until almost sure absorption.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

float

Returns:

The kth moment

accumulate(k: int, end_times: Iterable[float], rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True)[source]#

Evaluate the kth (non-central) moment at different end times.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – List of ends times or end time when to evaluate the moment.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

ndarray

Returns:

The moment accumulated at the specified times or time.

plot_accumulation(k: int = 1, end_times: Iterable[float] = None, rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True, ax: plt.Axes = None, show: bool = True, file: str = None, clear: bool = True, label: str = None, title: str = None)[source]#

Plot accumulation of (non-central) moments at different times.

Note

This is different from a CDF, as it shows the accumulation of moments rather than the probability of having reached absorption at a certain time.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times when to evaluate the moment. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

  • ax (plt.Axes) – The axes to plot on.

  • show (bool) – Whether to show the plot.

  • file (str) – File to save the plot to.

  • clear (bool) – Whether to clear the plot before plotting.

  • label (str) – Label for the plot.

  • title (str) – Title of the plot.

Return type:

plt.Axes

Returns:

Axes.

touch(**kwargs)#

Touch all cached properties.

Parameters:

kwargs – Additional keyword arguments.

class TreeHeightDistribution(state_space: LineageCountingStateSpace, demography: Demography = None, start_time: float = 0, end_time: float = None, regularize: bool = True)[source]#

Bases: PhaseTypeDistribution, DensityAwareDistribution

Phase-type distribution for a piecewise time-homogeneous process that allows the computation of the density function. This is currently only possible with default rewards.

max_epochs: int = 10000#

Maximum number of epochs to consider when determining time to almost sure absorption.

max_iter: int = 20#

Maximum number of time we double the end time when determining time to almost sure absorption.

p_absorption: float = 0.999999999999999#

Probability of almost sure absorption.

__init__(state_space: LineageCountingStateSpace, demography: Demography = None, start_time: float = 0, end_time: float = None, regularize: bool = True)[source]#

Initialize the distribution.

Parameters:
  • state_space (LineageCountingStateSpace) – The state space.

  • demography (Demography) – The demography.

  • start_time (float) – Time when to start accumulating moments.

  • end_time (float) – Time when to end accumulation of moments. By default, the time until almost sure absorption.

  • regularize (bool) – Whether to regularize the intensity matrix for numerical stability.

state_space: LineageCountingStateSpace#

State space

start_time: float#

Start time

end_time: float | None#

End time

cdf(t: float | Sequence[float])[source]#

Cumulative distribution function.

Parameters:

t (Union[float, Sequence[float]]) – Value or values to evaluate the CDF at.

Return type:

float | ndarray

Returns:

Cumulative probability

Raises:

NotImplementedError – if rewards are not default

quantile(q: float, expansion_factor: float = 2, precision: float = 1e-05, max_iter: int = 1000)[source]#

Find the specified quantile of a CDF using an adaptive bisection method.

Parameters:
  • q (float) – The desired quantile (between 0 and 1).

  • expansion_factor (float) – Factor by which to multiply the upper bound that does not yet contain the quantile.

  • precision (float) – Precision for quantile, i.e. F(b) - F(a) < precision.

  • max_iter (int) – Maximum number of iterations.

Returns:

The quantile.

pdf(t: float | Sequence[float], dx: float = None)[source]#

Density function. We use numerical differentiation of the CDF to calculate the density. This provides good results as the CDF is exact and continuous.

Parameters:
  • t (Union[float, Sequence[float]]) – Value or values to evaluate the density function at.

  • dx (float) – Step size for numerical differentiation. By default, the 99th percentile divided by 1e10.

Return type:

float | ndarray

Returns:

Density

property t_max: float#

Time until which computations are performed. This is either the end time specified when initializing the distribution or the time until almost sure absorption.

accumulate(k: int, end_times: Iterable[float], rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True)#

Evaluate the kth (non-central) moment at different end times.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – List of ends times or end time when to evaluate the moment.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

ndarray

Returns:

The moment accumulated at the specified times or time.

property demes: MarginalDemeDistributions#

Marginal distributions over each deme.

property loci: MarginalLocusDistributions#

Marginal distributions over each locus.

property m2: float | SFS#

Second (non-central) moment.

property mean: float | SFS#

First moment / mean.

moment(k: int, rewards: Sequence[Reward] = None, start_time: float = None, end_time: float = None, center: bool = True, permute: bool = True)#

Get the kth (non-central) (cross-)moment.

Parameters:
  • k (int) – The order of the moment.

  • rewards (Sequence[Reward]) – Iterable of k rewards. By default, the reward of the underlying distribution.

  • start_time (float) – Time when to start accumulation of moments. By default, the start time specified when initializing the distribution.

  • end_time (float) – Time when to end accumulation of moments. By default, either the end time specified when initializing the distribution or the time until almost sure absorption.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

float

Returns:

The kth moment

plot_accumulation(k: int = 1, end_times: Iterable[float] = None, rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True, ax: plt.Axes = None, show: bool = True, file: str = None, clear: bool = True, label: str = None, title: str = None)#

Plot accumulation of (non-central) moments at different times.

Note

This is different from a CDF, as it shows the accumulation of moments rather than the probability of having reached absorption at a certain time.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times when to evaluate the moment. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

  • ax (plt.Axes) – The axes to plot on.

  • show (bool) – Whether to show the plot.

  • file (str) – File to save the plot to.

  • clear (bool) – Whether to clear the plot before plotting.

  • label (str) – Label for the plot.

  • title (str) – Title of the plot.

Return type:

plt.Axes

Returns:

Axes.

plot_cdf(ax: plt.Axes = None, t: ndarray = None, show: bool = True, file: str = None, clear: bool = True, label: str = None, title: str = 'Tree height CDF')#

Plot cumulative distribution function.

Parameters:
  • ax (plt.Axes) – Axes to plot on.

  • t (ndarray) – Values to evaluate the CDF at. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • show (bool) – Whether to show the plot.

  • file (str) – File to save the plot to.

  • clear (bool) – Whether to clear the plot before plotting.

  • label (str) – Label for the plot.

  • title (str) – Title of the plot.

Return type:

plt.Axes

Returns:

Axes.

plot_pdf(ax: plt.Axes = None, t: ndarray = None, show: bool = True, file: str = None, clear: bool = True, label: str = None, title: str = 'Tree height PDF', dx: float = None)#

Plot density function.

Parameters:
  • ax (plt.Axes) – The axes to plot on.

  • t (ndarray) – Values to evaluate the density function at. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • show (bool) – Whether to show the plot.

  • file (str) – File to save the plot to.

  • clear (bool) – Whether to clear the plot before plotting.

  • label (str) – Label for the plot.

  • title (str) – Title of the plot.

  • dx (float) – Step size for numerical differentiation. By default, the 99th percentile divided by 1e10.

Return type:

plt.Axes

Returns:

Axes.

property std: float | SFS#

Standard deviation.

touch(**kwargs)#

Touch all cached properties.

Parameters:

kwargs – Additional keyword arguments.

property var: float | SFS#

Second central moment / variance.

lineage_config: LineageConfig#

Population configuration

locus_config: LocusConfig#

Locus configuration

reward: Reward#

Reward

regularize: bool#

Whether to regularize the intensity matrix for numerical stability

demography: Demography#

Demography

tree_height: TreeHeightDistribution#

Tree height distribution

class FoldedSFSDistribution(state_space: BlockCountingStateSpace, tree_height: TreeHeightDistribution, demography: Demography, pbar: bool = False, parallelize: bool = False, reward: Reward = None, regularize: bool = True)[source]#

Bases: SFSDistribution

Folded site-frequency spectrum distribution.

__init__(state_space: BlockCountingStateSpace, tree_height: TreeHeightDistribution, demography: Demography, pbar: bool = False, parallelize: bool = False, reward: Reward = None, regularize: bool = True)#

Initialize the distribution.

Parameters:
  • state_space (BlockCountingStateSpace) – Block-counting state space.

  • tree_height (TreeHeightDistribution) – The tree height distribution.

  • demography (Demography) – The demography.

  • pbar (bool) – Whether to show a progress bar.

  • parallelize (bool) – Use parallelization.

  • reward (Reward) – The reward to multiply the SFS reward with. By default, the unit reward is used, which has no effect.

  • regularize (bool) – Whether to regularize the intensity matrix for numerical stability.

accumulate(k: int, end_times: Iterable[float], rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True)#

Evaluate the kth (non-central) moments for site-frequency spectrum at different end times.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times or time when to evaluate the moment.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

ndarray

Returns:

Array of moments accumulated at the specified times, one for each site-frequency count.

property corr: SFS2#

Correlation matrix across site-frequency counts.

property cov: SFS2#

Covariance matrix across site-frequency counts.

property demes: MarginalDemeDistributions#

Marginal distributions over each deme.

get_accumulation(k: int, i: int, end_times: Iterable[float] | float, rewards: Sequence[SFSReward] = None, center: bool = True, permute: bool = True)#

Get accumulation of moments for the ith site-frequency count.

Parameters:
  • k (int) – The order of the moment

  • i (int) – The ith site-frequency count.

  • end_times (Union[Iterable[float], float]) – Times or time when to evaluate the moment.

  • rewards (Sequence[SFSReward]) – Sequence of k rewards.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

ndarray | float

Returns:

The kth SFS (cross)-moment accumulations at the ith site-frequency count

get_corr(i: int, j: int)#

Get the correlation coefficient between the ith and jth site-frequency.

Parameters:
  • i (int) – The ith frequency count

  • j (int) – The jth frequency count

Return type:

float

Returns:

Correlation coefficient

get_cov(i: int, j: int)#

Get the covariance between the ith and jth site-frequency.

Parameters:
  • i (int) – The ith frequency count

  • j (int) – The jth frequency count

Return type:

float

Returns:

covariance

get_mutation_config(config: Sequence[int], theta: float)#

Get the probabilities of observing the given mutational configurations according to the infinite sites model.

Note

This currently only works for a single epoch, i.e. a time-homogeneous demography, and recombination is not supported.

Parameters:
  • config (Sequence[int]) – The mutational configuration. A sequence of integers of length n - 1 for unfolded configurations and n // 2 for folded configurations, where n is the number of lineages. Each element in the sequence is an integer representing the number of mutations at each frequency count starting from 1. For example, the unfolded configuration [2, 1, 0] represents two singleton, one doubleton and zero tripleton mutations for a sample size of 4 lineages. Similarly, the folded configuration [2, 1] represents two singleton or tripleton and one doubleton mutation for the same number of lineages.

  • theta (float) – The mutation rate.

Return type:

float

Returns:

The probability of observing the given mutational configuration.

get_mutation_configs(theta: float)#

An iterator over the probabilities of observing mutational configurations according to the infinite sites model. The order of the mutational configurations generated ascends in the number of mutations observed. See get_mutation_config() for more information on mutational configurations.

Note

This currently only works for a single epoch, i.e. a time-homogeneous demography, and recombination is not supported. Also note that the number of configurations is infinite, so this iterator will never stop. However, depending on the mutation rate, the probability of observing configurations of higher mutation counts will decrease over time. You can keep track of the generated probability mass by checking the generated_mass attribute, which is reset every time this method is called. A good approach is thus to keep generating configurations until the generated mass is above a certain threshold. More complex demographic models and larger sample sizes increase the number of configurations and higher mutation rates, the number of generated configurations necessary to reach a certain mass.

Code example:

coal = pg.Coalescent(n=5)

it = coal.sfs.get_mutation_configs(theta=1)

# continue until generated mass is above 0.8
samples = list(pg.takewhile_inclusive(lambda _: coal.sfs.generated_mass < 0.8, it))
Parameters:

theta (float) – The mutation rate.

Return type:

Iterator[Tuple[Tuple[float, ...], float]]

Returns:

An iterator over the probabilities of observing mutational configurations.

property loci: MarginalLocusDistributions#

Marginal distributions over each locus.

property m2: float | SFS#

Second (non-central) moment.

property mean: float | SFS#

First moment / mean.

moment(k: int, rewards: Sequence[SFSReward] = None, start_time: float = None, end_time: float = None, center: bool = True, permute: bool = True)#

Get the kth moments of the site-frequency spectrum.

Parameters:
  • k (int) – The order of the moment

  • rewards (Sequence[SFSReward]) – Sequence of k rewards

  • start_time (float) – Time when to start accumulation of moments. By default, the start time specified when initializing the distribution.

  • end_time (float) – Time when to end accumulation of moments. By default, either the end time specified when initializing the distribution or the time until almost sure absorption.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

SFS

Returns:

A site-frequency spectrum of kth order moments.

plot_accumulation(k: int = 1, end_times: Iterable[float] = None, rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True, ax: plt.Axes = None, show: bool = True, file: str = None, clear: bool = True, label: str = None, title: str = None)#

Plot accumulation of (non-central) SFS moments at different times.

Note

This is different from a CDF, as it shows the accumulation of moments rather than the probability of having reached absorption at a certain time.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times when to evaluate the moment. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

  • ax (plt.Axes) – The axes to plot on.

  • show (bool) – Whether to show the plot.

  • file (str) – File to save the plot to.

  • clear (bool) – Whether to clear the plot before plotting.

  • label (str) – Label for the plot.

  • title (str) – Title of the plot.

Return type:

plt.Axes

Returns:

Axes.

property std: float | SFS#

Standard deviation.

touch(**kwargs)#

Touch all cached properties.

Parameters:

kwargs – Additional keyword arguments.

property var: float | SFS#

Second central moment / variance.

pbar: bool#

Whether to show a progress bar.

parallelize: bool#

Whether to parallelize computations.

generated_mass#

Generated probability mass by iterator returned from get_mutation_configs().

lineage_config: LineageConfig#

Population configuration

locus_config: LocusConfig#

Locus configuration

reward: Reward#

Reward

regularize: bool#

Whether to regularize the intensity matrix for numerical stability

state_space: StateSpace#

State space

demography: Demography#

Demography

tree_height: TreeHeightDistribution#

Tree height distribution

class UnfoldedSFSDistribution(state_space: BlockCountingStateSpace, tree_height: TreeHeightDistribution, demography: Demography, pbar: bool = False, parallelize: bool = False, reward: Reward = None, regularize: bool = True)[source]#

Bases: SFSDistribution

Unfolded site-frequency spectrum distribution.

__init__(state_space: BlockCountingStateSpace, tree_height: TreeHeightDistribution, demography: Demography, pbar: bool = False, parallelize: bool = False, reward: Reward = None, regularize: bool = True)#

Initialize the distribution.

Parameters:
  • state_space (BlockCountingStateSpace) – Block-counting state space.

  • tree_height (TreeHeightDistribution) – The tree height distribution.

  • demography (Demography) – The demography.

  • pbar (bool) – Whether to show a progress bar.

  • parallelize (bool) – Use parallelization.

  • reward (Reward) – The reward to multiply the SFS reward with. By default, the unit reward is used, which has no effect.

  • regularize (bool) – Whether to regularize the intensity matrix for numerical stability.

accumulate(k: int, end_times: Iterable[float], rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True)#

Evaluate the kth (non-central) moments for site-frequency spectrum at different end times.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times or time when to evaluate the moment.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

ndarray

Returns:

Array of moments accumulated at the specified times, one for each site-frequency count.

property corr: SFS2#

Correlation matrix across site-frequency counts.

property cov: SFS2#

Covariance matrix across site-frequency counts.

property demes: MarginalDemeDistributions#

Marginal distributions over each deme.

get_accumulation(k: int, i: int, end_times: Iterable[float] | float, rewards: Sequence[SFSReward] = None, center: bool = True, permute: bool = True)#

Get accumulation of moments for the ith site-frequency count.

Parameters:
  • k (int) – The order of the moment

  • i (int) – The ith site-frequency count.

  • end_times (Union[Iterable[float], float]) – Times or time when to evaluate the moment.

  • rewards (Sequence[SFSReward]) – Sequence of k rewards.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

ndarray | float

Returns:

The kth SFS (cross)-moment accumulations at the ith site-frequency count

get_corr(i: int, j: int)#

Get the correlation coefficient between the ith and jth site-frequency.

Parameters:
  • i (int) – The ith frequency count

  • j (int) – The jth frequency count

Return type:

float

Returns:

Correlation coefficient

get_cov(i: int, j: int)#

Get the covariance between the ith and jth site-frequency.

Parameters:
  • i (int) – The ith frequency count

  • j (int) – The jth frequency count

Return type:

float

Returns:

covariance

get_mutation_config(config: Sequence[int], theta: float)#

Get the probabilities of observing the given mutational configurations according to the infinite sites model.

Note

This currently only works for a single epoch, i.e. a time-homogeneous demography, and recombination is not supported.

Parameters:
  • config (Sequence[int]) – The mutational configuration. A sequence of integers of length n - 1 for unfolded configurations and n // 2 for folded configurations, where n is the number of lineages. Each element in the sequence is an integer representing the number of mutations at each frequency count starting from 1. For example, the unfolded configuration [2, 1, 0] represents two singleton, one doubleton and zero tripleton mutations for a sample size of 4 lineages. Similarly, the folded configuration [2, 1] represents two singleton or tripleton and one doubleton mutation for the same number of lineages.

  • theta (float) – The mutation rate.

Return type:

float

Returns:

The probability of observing the given mutational configuration.

get_mutation_configs(theta: float)#

An iterator over the probabilities of observing mutational configurations according to the infinite sites model. The order of the mutational configurations generated ascends in the number of mutations observed. See get_mutation_config() for more information on mutational configurations.

Note

This currently only works for a single epoch, i.e. a time-homogeneous demography, and recombination is not supported. Also note that the number of configurations is infinite, so this iterator will never stop. However, depending on the mutation rate, the probability of observing configurations of higher mutation counts will decrease over time. You can keep track of the generated probability mass by checking the generated_mass attribute, which is reset every time this method is called. A good approach is thus to keep generating configurations until the generated mass is above a certain threshold. More complex demographic models and larger sample sizes increase the number of configurations and higher mutation rates, the number of generated configurations necessary to reach a certain mass.

Code example:

coal = pg.Coalescent(n=5)

it = coal.sfs.get_mutation_configs(theta=1)

# continue until generated mass is above 0.8
samples = list(pg.takewhile_inclusive(lambda _: coal.sfs.generated_mass < 0.8, it))
Parameters:

theta (float) – The mutation rate.

Return type:

Iterator[Tuple[Tuple[float, ...], float]]

Returns:

An iterator over the probabilities of observing mutational configurations.

property loci: MarginalLocusDistributions#

Marginal distributions over each locus.

property m2: float | SFS#

Second (non-central) moment.

property mean: float | SFS#

First moment / mean.

moment(k: int, rewards: Sequence[SFSReward] = None, start_time: float = None, end_time: float = None, center: bool = True, permute: bool = True)#

Get the kth moments of the site-frequency spectrum.

Parameters:
  • k (int) – The order of the moment

  • rewards (Sequence[SFSReward]) – Sequence of k rewards

  • start_time (float) – Time when to start accumulation of moments. By default, the start time specified when initializing the distribution.

  • end_time (float) – Time when to end accumulation of moments. By default, either the end time specified when initializing the distribution or the time until almost sure absorption.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

Return type:

SFS

Returns:

A site-frequency spectrum of kth order moments.

plot_accumulation(k: int = 1, end_times: Iterable[float] = None, rewards: Sequence[Reward] = None, center: bool = True, permute: bool = True, ax: plt.Axes = None, show: bool = True, file: str = None, clear: bool = True, label: str = None, title: str = None)#

Plot accumulation of (non-central) SFS moments at different times.

Note

This is different from a CDF, as it shows the accumulation of moments rather than the probability of having reached absorption at a certain time.

Parameters:
  • k (int) – The order of the moment.

  • end_times (Iterable[float]) – Times when to evaluate the moment. By default, 200 evenly spaced values between 0 and the 99th percentile.

  • rewards (Sequence[Reward]) – Sequence of k rewards. By default, the reward of the underlying distribution.

  • center (bool) – Whether to center the moment around the mean.

  • permute (bool) – For cross-moments, whether to average over all permutations of rewards. Default is True, which will provide the correct cross-moment. If set to False, the cross-moment will be conditioned on the order of rewards.

  • ax (plt.Axes) – The axes to plot on.

  • show (bool) – Whether to show the plot.

  • file (str) – File to save the plot to.

  • clear (bool) – Whether to clear the plot before plotting.

  • label (str) – Label for the plot.

  • title (str) – Title of the plot.

Return type:

plt.Axes

Returns:

Axes.

property std: float | SFS#

Standard deviation.

touch(**kwargs)#

Touch all cached properties.

Parameters:

kwargs – Additional keyword arguments.

property var: float | SFS#

Second central moment / variance.

pbar: bool#

Whether to show a progress bar.

parallelize: bool#

Whether to parallelize computations.

generated_mass#

Generated probability mass by iterator returned from get_mutation_configs().

lineage_config: LineageConfig#

Population configuration

locus_config: LocusConfig#

Locus configuration

reward: Reward#

Reward

regularize: bool#

Whether to regularize the intensity matrix for numerical stability

state_space: StateSpace#

State space

demography: Demography#

Demography

tree_height: TreeHeightDistribution#

Tree height distribution

class MarginalLocusDistributions(dist: PhaseTypeDistribution)[source]#

Bases: MarginalDistributions

Marginal locus distributions.

__init__(dist: PhaseTypeDistribution)[source]#

Initialize the distributions.

Parameters:

dist (PhaseTypeDistribution) – The distribution.

property loci: MarginalLocusDistributions#

Distributions marginalized over loci.

get_cov(locus1: int, locus2: int)[source]#

Get the covariance between two loci.

Parameters:
  • locus1 (int) – The first locus.

  • locus2 (int) – The second locus.

Return type:

float

Returns:

The covariance.

property cov: ndarray#

Covariance matrix across loci.

get_corr(locus1: int, locus2: int)[source]#

Get the correlation coefficient between two loci.

Parameters:
  • locus1 (int) – The first locus.

  • locus2 (int) – The second locus.

Return type:

float

Returns:

The correlation coefficient.

property corr: ndarray#

Correlation matrix across loci.

get(k[, d]) D[k] if k in D, else d.  d defaults to None.#
items() a set-like object providing a view on D's items#
keys() a set-like object providing a view on D's keys#
values() an object providing a view on D's values#
class MarginalDemeDistributions(dist: PhaseTypeDistribution)[source]#

Bases: MarginalDistributions

Marginal deme distributions.

__init__(dist: PhaseTypeDistribution)[source]#

Initialize the distributions.

Parameters:

dist (PhaseTypeDistribution) – The distribution.

property demes: MarginalDemeDistributions#

Distributions marginalized over demes.

get_cov(pop1: str, pop2: str)[source]#

Get the covariance between two demes.

Parameters:
  • pop1 (str) – The first deme.

  • pop2 (str) – The second deme.

Return type:

float

Returns:

The covariance.

property cov: ndarray#

Covariance matrix across demes.

get_corr(pop1: str, pop2: str)[source]#

Get the correlation coefficient between two demes.

Parameters:
  • pop1 (str) – The first deme.

  • pop2 (str) – The second deme.

Return type:

float

Returns:

The correlation coefficient.

property corr: ndarray#

Correlation matrix across demes.

get(k[, d]) D[k] if k in D, else d.  d defaults to None.#
items() a set-like object providing a view on D's items#
keys() a set-like object providing a view on D's keys#
values() an object providing a view on D's values#