Hit and run sampling
Sampling the feasible space of the model allows you to gain a realistic insight into the distribution of the flow and its probabilistic nature, often better describing the variance and correlations of the possible fluxes better (but more approximately) than e.g. flux variability analysis.
COBREXA supports a variant of hit-and-run sampling adjusted to the complexities of metabolic models; in particular, it implements a version where the next sample (and indirectly, the next run direction) is generated as an affine combination of the samples in the current sample set. This gives a result similar to the artificially centered hit-and-run sampling, with a slightly (unsubstantially) different biases in the generation of the next hits.
As always, we start by loading everything that is needed:
!isfile("e_coli_core.xml") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")
using COBREXA, GLPK
model = load_model("e_coli_core.xml")
Metabolic model of type SBMLModel sparse([8, 10, 21, 43, 50, 51, 8, 9, 6, 12 … 33, 66, 68, 72, 23, 26, 33, 72, 22, 33], [1, 1, 1, 1, 1, 1, 2, 2, 3, 3 … 93, 93, 93, 93, 94, 94, 94, 94, 95, 95], [-1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0 … 1.0, -1.0, 1.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0], 72, 95) Number of reactions: 95 Number of metabolites: 72
The sampling procedure requires a set of "seed" points that will form the basis for the first iteration of runs. This is commonly called a warm-up. You can generate these points manually with any method of choice, or you can use the COBREXA implementation of warmup that uses the extreme edges of the polytope, similarly to the way used by flux variability analysis:
warmup_points = warmup_from_variability(model, GLPK.Optimizer)
95×190 Matrix{Float64}: -20.0 -20.0 … 0.0 0.0 -20.0 -20.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.55771e-14 -1.55771e-14 0.0 10.0 -1.82795e-14 -1.82795e-14 0.0 10.0 0.0 0.0 … 0.0 0.0 5.07605e-15 5.07605e-15 10.0 -1.20387e-12 5.83169e-15 5.83169e-15 0.0 0.0 0.0 0.0 0.0 -10.0 2.63641e-15 2.63641e-15 0.0 0.0 ⋮ ⋱ 0.0 0.0 -4.57823e-13 -3.44052e-13 3.16102e-15 3.16102e-15 -4.57823e-13 -3.44052e-13 -1.04907e-14 -1.04907e-14 1000.0 1000.0 -4.77335e-15 -4.77335e-15 0.0 0.0 -3.16854e-15 -3.16854e-15 … 20.0 -1.95399e-13 -9.32587e-14 -9.32587e-14 243.22 98.22 0.0 0.0 20.0 -1.95399e-13 1.48086e-15 1.48086e-15 20.0 -1.95399e-13 10.0 10.0 -10.0 10.0
This generates a matrix of fluxes, where each column is one sample point, and rows correspond to the reactions in the model.
With this, starting the sampling procedure is straightforward:
samples = affine_hit_and_run(model, warmup_points)
95×950 Matrix{Float64}: -1.66909 -2.18493 -2.26181 … -2.03885 -1.8475 -2.46558 -0.470124 -0.430387 -0.617061 -0.553997 -0.6049 -0.802518 -1.57443 -2.32797 -1.03189 -1.45541 -1.8394 -1.57826 8.72508 7.92127 7.66237 9.05437 8.46192 6.27915 8.72508 7.92127 7.66237 9.05437 8.46192 6.27915 -1.57443 -2.32797 -1.03189 … -1.45541 -1.8394 -1.57826 12.7365 6.94142 16.8438 10.248 9.4414 3.07769 4.55669 4.5086 3.61501 4.47861 4.48705 3.54988 -0.706878 -0.454369 -0.942265 -1.08006 -0.924169 -0.724058 -1.19896 -1.75454 -1.64475 -1.48485 -1.2426 -1.66306 ⋮ ⋱ 11.1703 14.2109 5.96447 13.0883 10.0192 6.78829 12.0925 14.8723 7.45343 13.5965 10.7236 7.40941 360.742 254.428 346.45 332.564 308.67 405.867 -4.55669 -4.5086 -3.61501 -4.47861 -4.48705 -3.54988 1.67396 1.47322 2.39118 … 1.55207 1.6677 2.92716 8.72465 15.1174 13.7178 9.13832 11.2013 6.78598 1.67396 1.47322 2.39118 1.55207 1.6677 2.92716 1.65308 1.44574 2.36979 1.5258 1.65268 2.90355 7.78632 7.96199 7.0524 8.04223 8.00896 6.28687
As seen above, the result again contains 1 sample per column, with reactions in the same order with rows as before. To get a different sample, you can tune the parameters the function. sample_iters
allows you to specify the iterations at which you want the current sample to be collected and reported – by default, that is done 5 times on each 100th iteration. In the example below, we catch the samples in 10 iterations right after the 200th iteration passed. Similarly, to avoid possible degeneracy, you can choose to run more hit-and-run batch chains than 1, using the chains
parameters. The total number of points collected is the number of points in warmup times the number of sample-collecting iterations times the number of chains.
samples = affine_hit_and_run(model, warmup_points, sample_iters = 201:210, chains = 2)
95×3800 Matrix{Float64}: -1.83943 -2.35747 -1.27553 … -2.30409 -1.79585 -2.11809 -0.884734 -1.18704 -0.0266316 -0.92447 -0.899217 -0.633379 -0.900053 -0.914011 -0.816364 -1.36337 -1.23206 -1.42901 9.67229 8.86613 12.029 7.39692 7.87746 7.01543 9.67229 8.86613 12.029 7.39692 7.87746 7.01543 -0.900053 -0.914011 -0.816364 … -1.36337 -1.23206 -1.42901 10.7787 8.69312 6.35625 11.2565 14.7761 7.13841 6.15562 5.92554 6.82139 4.51902 5.26378 3.73543 -0.552501 -0.749031 -0.296984 -0.889356 -0.817638 -0.95307 -0.954697 -1.17043 -1.2489 -1.37962 -0.896631 -1.48471 ⋮ ⋱ 13.5435 13.547 13.4649 9.40663 10.1687 11.3463 14.1685 14.217 14.149 10.7795 11.4346 12.8174 441.007 412.603 346.423 349.752 371.053 376.361 -6.15562 -5.92554 -6.82139 -4.51902 -5.26378 -3.73543 2.02034 2.22495 0.166716 … 2.62083 2.80645 2.36805 17.4667 16.8022 4.95115 13.9146 14.5093 9.66194 2.02034 2.22495 0.166716 2.62083 2.80645 2.36805 1.99605 2.20138 0.148363 2.60289 2.79257 2.34518 7.36334 6.97874 8.76302 6.81786 6.48962 6.93277
Both procedures used for sampling in this example (warmup_from_variability
, affine_hit_and_run
) can be effectively parallelized by adding workers=
parameter, as summarized in the documentation. Due to the nature of the algorithm, parallelization of the sampling requires at least 1 chain per worker.
Visualizing the samples
Samples can be displayed very efficiently in a scatterplot or a density plot, which naturally show correlations and distributions of the fluxes:
using CairoMakie
o2, co2 = indexin(["R_EX_o2_e", "R_EX_co2_e"], reactions(model))
scatter(
samples[o2, :],
samples[co2, :];
axis = (; xlabel = "Oxygen exchange", ylabel = "Carbon dioxide exchange"),
)
This page was generated using Literate.jl.