Inference#
An introduction to inference with phasegen can be found in the Quickstart guide.
Distributed bootstrapping#
Whenever running inference with long runtimes, you might want to distribute the bootstrapping process. This can be done by creating bootstrap samples which are Inference objects themselves (create_bootstrap()). These bootstraps can be run in parallel and the results combined afterwards (add_bootstrap()). Below an example Snakemake workflow for distributed bootstrapping:
Snakefile:
rule all:
input:
"results/graphs/inference/demography.png",
# setup inference and perform initial run
rule setup_inference:
output:
"results/inference/inference.json"
conda:
"phasegen.yaml"
script:
"setup_inference.py"
# create and run bootstrap
rule run_bootstrap:
input:
"results/inference/inference.json"
output:
"results/inference/bootstraps/{i}/inference.json"
conda:
"phasegen.yaml"
script:
"run_bootstrap.py"
# merge bootstraps and visualize
rule merge_bootstraps:
input:
inference="results/inference/inference.json",
bootstraps=expand("results/inference/bootstraps/{i}/inference.json",i=range(100))
output:
inference="results/inference/inference.bootstrapped.json",
demography="results/graphs/inference/demography.png",
bootstraps="results/graphs/inference/bootstraps.png",
conda:
"phasegen.yaml"
script:
"merge_bootstraps.py"
where phasegen.yaml is a conda environment file with the following content:
name: phasegen
channels:
- defaults
dependencies:
- python>=3.10,<3.13
- pip
- pip:
- phasegen
In setup_inference.py, we set up the inference object, perform the initial run and save the results to a file. Note that bootstrapping is disabled here as we are doing it manually.
import phasegen as pg
out = snakemake.output[0]
inf = pg.Inference(
bounds=dict(t=(0, 4), Ne=(0.1, 1)),
observation=pg.SFS(
[177130, 997, 441, 228, 156, 117, 114, 83, 105, 109, 652]
),
coal=lambda t, Ne: pg.Coalescent(
n=10,
demography=pg.Demography(
pop_sizes={'pop_0': {0: 1, t: Ne}}
)
),
loss=lambda coal, obs: pg.PoissonLikelihood().compute(
observed=obs.normalize().polymorphic,
modelled=coal.sfs.mean.normalize().polymorphic
),
resample=lambda sfs, _: sfs.resample(),
do_bootstrap=False
)
inf.run()
inf.to_file(out)
In run_bootstrap.py, we load the inference object from the file, create a bootstrap sample and run the inference for it. This will use the specified resampling function to resample the SFS.
import phasegen as pg
file = snakemake.input[0]
out = snakemake.output[0]
inf = pg.Inference.from_file(file)
bootstrap = inf.create_bootstrap()
bootstrap.run()
bootstrap.to_file(out)
In merge_bootstraps.py, we load the inference object and all bootstraps, merge the bootstraps with the main inference object and visualize the results.
import phasegen as pg
inf_file = snakemake.input.inference
bootstraps_file = snakemake.input.bootstraps
out = snakemake.output.inference
out_demography = snakemake.output.demography
out_bootstraps = snakemake.output.bootstraps
inf = pg.Inference.from_file(inf_file)
inf.add_bootstraps([pg.Inference.from_file(f) for f in bootstraps_file])
inf.plot_demography(file=out_demography)
inf.plot_bootstraps(file=out_bootstraps)
inf.to_file(out)