# Gene knockouts

Here we will use the knockout function to modify the optimization model before solving, in order to simulate genes knocked out. We can pass knockout to many analysis functions that support parameter modifications, including flux_balance_analysis, flux_variability_analysis, and others.

## Deleting a single gene

!isfile("e_coli_core.xml") &&

using COBREXA, GLPK

model = load_model("e_coli_core.xml")
Metabolic model of type SBMLModel
sparse([8, 10, 21, 43, 50, 51, 8, 9, 6, 12  …  33, 66, 68, 72, 23, 26, 33, 72, 22, 33], [1, 1, 1, 1, 1, 1, 2, 2, 3, 3  …  93, 93, 93, 93, 94, 94, 94, 94, 95, 95], [-1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0  …  1.0, -1.0, 1.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0], 72, 95)
Number of reactions: 95
Number of metabolites: 72


First, let's compute the "original" flux, with no knockouts.

original_flux = flux_balance_analysis_dict(model, GLPK.Optimizer);

One can find gene IDs that we can knock out using genes and gene_name functions:

genes(model)
137-element Vector{String}:
"G_b0008"
"G_b0114"
"G_b0115"
"G_b0116"
"G_b0118"
"G_b0351"
"G_b0356"
"G_b0451"
"G_b0474"
"G_b0485"
⋮
"G_b4122"
"G_b4151"
"G_b4152"
"G_b4153"
"G_b4154"
"G_b4232"
"G_b4301"
"G_b4395"
"G_s0001"

It is possible to sort the genes by gene name to allow easier lookups:

sort(gene_name.(Ref(model), genes(model)) .=> genes(model))
137-element Vector{Pair{String, String}}:
"aceA" => "G_b4015"
"aceB" => "G_b4014"
"aceE" => "G_b0114"
"aceF" => "G_b0115"
"ackA" => "G_b2296"
"acnA" => "G_b1276"
"acnB" => "G_b0118"
⋮
"talB" => "G_b0008"
"tdcD" => "G_b3115"
"tdcE" => "G_b3114"
"tktA" => "G_b2935"
"tktB" => "G_b2465"
"tpiA" => "G_b3919"
"ydjI" => "G_b1773"
"ytjC" => "G_b4395"
"zwf" => "G_b1852"

Compute the flux with a genes knocked out:

flux_with_knockout =
flux_balance_analysis_dict(model, GLPK.Optimizer, modifications = [knockout("G_b3236")])
Dict{String, Float64} with 95 entries:
"R_EX_fum_e"    => 0.0
"R_ACONTb"      => 8.33976
"R_GLNS"        => 0.211162
"R_SUCOAS"      => -7.44879
"R_TPI"         => 9.17856
"R_EX_pi_e"     => -3.03794
"R_PPC"         => 9.81525
"R_O2t"         => 23.9021
"R_G6PDH2r"     => 1.53023e-15
"R_TALA"        => -0.147739
"R_PPCK"        => -1.24923e-14
"R_EX_lac__D_e" => 2.65889e-14
"R_PGL"         => 1.45805e-15
"R_H2Ot"        => -30.8724
"R_GLNabc"      => 0.0
"R_EX_co2_e"    => 24.8568
"R_EX_gln__L_e" => 0.0
"R_EX_nh4_e"    => -4.50303
"R_MALt2_2"     => 0.0
⋮               => ⋮

We can see there is a small decrease in production upon knocking out the gene:

biomass_id = "R_BIOMASS_Ecoli_core_w_GAM"
flux_with_knockout[biomass_id] / original_flux[biomass_id]
0.9449581959159197

Similarly, we can explore how the flux variability has changed once the gene is knocked out:

variability_with_knockout =
flux_variability_analysis(model, GLPK.Optimizer, modifications = [knockout("G_b3236")])
95×2 Matrix{Float64}:
2.9513e-13     -2.67554e-15
2.97805e-13     0.0
6.12843e-13     4.26326e-14
8.33976         8.33976
8.33976         8.33976
6.12843e-13     4.26326e-14
3.67464         3.67464
7.44879         7.44879
1.53157e-13     0.0
2.40799e-13     5.35109e-15
⋮
2.08722e-13     6.57758e-11
0.0             6.57758e-11
7.44879      1000.0
-7.44879        -7.44879
-0.147739       -0.147739
0.0             9.86631e-11
-0.147739       -0.147739
-0.44586        -0.44586
9.17856         9.17856

## Knocking out multiple genes

Multiple genes can be knocked out by simply passing a vector of genes to the knockout modification. This knocks out all genes that can run the FBA reaction:

reaction_gene_association(model, "R_FBA")
3-element Vector{Vector{String}}:
["G_b1773"]
["G_b2097"]
["G_b2925"]
flux_with_double_knockout = flux_balance_analysis_dict(
model,
GLPK.Optimizer,
modifications = [knockout(["G_b2097", "G_b1773", "G_b2925"])],
)
Dict{String, Float64} with 95 entries:
"R_EX_fum_e"    => 0.0
"R_ACONTb"      => 0.759585
"R_GLNS"        => 0.180022
"R_SUCOAS"      => -1.36381e-15
"R_TPI"         => 6.75636e-16
"R_EX_pi_e"     => -2.58994
"R_PPC"         => 2.01749
"R_O2t"         => 27.5263
"R_G6PDH2r"     => 27.8991
"R_TALA"        => 9.17374
"R_PPCK"        => 4.66459e-15
"R_EX_lac__D_e" => 1.39942e-15
"R_PGL"         => 27.8991
"R_H2Ot"        => -31.7697
"R_GLNabc"      => 0.0
"R_EX_co2_e"    => 26.6412
"R_EX_gln__L_e" => 0.0
"R_EX_nh4_e"    => -3.83897
"R_MALt2_2"     => 0.0
⋮               => ⋮
flux_with_double_knockout[biomass_id] / original_flux[biomass_id]
0.8056066159777874

## Processing all single gene knockouts

Function screen provides a parallelizable and extensible way to run the flux balance analysis with the knockout over all genes:

knockout_fluxes = screen(
model,
args = tuple.(genes(model)),
analysis = (m, gene) -> begin
res = flux_balance_analysis_dict(m, GLPK.Optimizer, modifications = [knockout(gene)])
if !isnothing(res)
res[biomass_id]
end
end,
)
137-element Vector{Union{Nothing, Float64}}:
0.8739215069684118
0.7966959254309448
0.7966959254309448
0.7823510529477573
0.8739215069684118
0.8739215069684118
0.8739215069684118
0.8739215069684118
0.8739215069684118
0.8739215069684118
⋮
0.8739215069684118
0.8739215069684213
0.8739215069684213
0.8739215069684213
0.8739215069684213
0.8739215069684118
0.8739215069684118
0.8739215069684118
0.21114065257211534

It is useful to display the biomass growth rates of the knockout models together with the gene name:

sort(gene_name.(Ref(model), genes(model)) .=> knockout_fluxes, by = first)
137-element Vector{Pair{String}}:
"aceA" => 0.8739215069684099
"aceB" => 0.8739215069684118
"aceE" => 0.7966959254309448
"aceF" => 0.7966959254309448
"ackA" => 0.8739215069684118
"acnA" => 0.8739215069684118
"acnB" => 0.8739215069684118
⋮
"talB" => 0.8739215069684118
"tdcD" => 0.8739215069684118
"tdcE" => 0.8739215069684118
"tktA" => 0.8739215069684118
"tktB" => 0.8739215069684118
"tpiA" => 0.7040369478590172
"ydjI" => 0.8739215069684118
"ytjC" => 0.8739215069684118
"zwf" => 0.8638133095039895

## Processing all multiple-gene deletions

### Double gene knockouts

Since we can generate any kind of argument matrix for screen to process, it is straightforward to generate the matrix of all double gene knockouts and let the function process it. This computes the biomass production of all double-gene knockouts:

gene_groups = [[g1, g2] for g1 in genes(model), g2 in genes(model)];
double_knockout_fluxes = screen(
model,
args = tuple.(gene_groups),
analysis = (m, gene_groups) -> begin
res = flux_balance_analysis_dict(
m,
GLPK.Optimizer,
modifications = [knockout(gene_groups)],
)
if !isnothing(res)
res[biomass_id]
end
end,
)
137×137 Matrix{Union{Nothing, Float64}}:
0.873922  0.796696  0.796696  0.782351  …  0.873922  0.873922   0.211141
0.796696  0.796696  0.796696  0.782351     0.796696  0.796696   0.210794
0.796696  0.796696  0.796696  0.782351     0.796696  0.796696   0.210794
0.782351  0.782351  0.782351  0.782351     0.782351  0.782351   0.210794
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351  …  0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922  -6.45106e-16
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
⋮                                       ⋱            ⋮
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351  …  0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351     0.873922  0.873922   0.211141
0.873922  0.796696  0.796696  0.782351  …  0.873922  0.873922   0.211141
0.211141  0.210794  0.210794  0.210794     0.211141  0.211141   0.211141

The results can be converted to an easily scrutinizable form as follows:

reshape([gene_name.(Ref(model), p) for p in gene_groups] .=> double_knockout_fluxes, :)
18769-element Vector{Pair{Vector{String}}}:
["talB", "talB"] => 0.8739215069684118
["aceE", "talB"] => 0.7966959254309448
["aceF", "talB"] => 0.7966959254309448
["lpd", "talB"] => 0.7823510529477573
["acnB", "talB"] => 0.8739215069684118
["mhpF", "talB"] => 0.8739215069684118
["frmA", "talB"] => 0.8739215069684118
["amtB", "talB"] => 0.8739215069684118
["glsA", "talB"] => 0.8739215069684118
⋮
["fumB", "s0001"] => 0.21114065257211534
["frdD", "s0001"] => 0.21114065257211645
["frdC", "s0001"] => 0.21114065257211645
["frdB", "s0001"] => 0.21114065257211645
["frdA", "s0001"] => 0.21114065257211645
["fbp", "s0001"] => 0.21114065257211534
["sgcE", "s0001"] => 0.21114065257211534
["ytjC", "s0001"] => 0.21114065257211534
["s0001", "s0001"] => 0.21114065257211534

### Triple gene knockouts (and others)

We can extend the same analysis to triple or other gene knockouts by generating a different array of gene pairs. For example, one can generate gene_groups for triple gene deletion screening:

gene_groups = [[g1, g2, g3] for g1 in genes(model), g2 in genes(model), g3 in genes(model)];

!!! warning Full triple gene deletion analysis may take a long time to compute. We may use parallel processing with screen to speed up the analysis. Alternatively, process only a subset of the genes triples.