Basic usage of StandardModel
In this tutorial we will use COBREXA
's StandardModel
and functions that specifically operate on it. As usual we will use the toy model of E. coli for demonstration.
!isfile("e_coli_core.json") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")
using COBREXA
Loading a model in the StandardModel format
model = load_model(StandardModel, "e_coli_core.json") # we specifically want to load a StandardModel from the model file
Metabolic model of type StandardModel sparse([9, 51, 55, 64, 65, 34, 44, 59, 66, 64 … 20, 22, 23, 25, 16, 17, 34, 44, 57, 59], [1, 1, 1, 1, 1, 2, 2, 2, 2, 3 … 93, 93, 94, 94, 95, 95, 95, 95, 95, 95], [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0 … 1.0, -1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0], 72, 95) Number of reactions: 95 Number of metabolites: 72
When using load_model(StandardModel, file_location)
the model at file_location
is first loaded into its inferred format and is then converted to a StandardModel
using the generic accessor interface. Thus, data loss may occur. Always check your model to ensure that nothing important has been lost.
Internals of StandardModel
A benefit of StandardModel
is that it supports a richer internal infrastructure that can be used to manipulate internal model attributes in a systematic way. Specifically, the genes, reactions, and metabolites with of a model each have a type. This is particularly useful when modifying or even constructing a model from scratch.
Gene
s, Reaction
s, and Metabolite
s
StandardModel
is composed of ordered dictionaries of Gene
s, Metabolite
s and Reaction
s. Ordered dictionaries are used because the order of the reactions and metabolites are important for constructing a stoichiometric matrix since the rows and columns should correspond to the order of the metabolites and reactions returned by calling the accessors metabolites
and reactions
.
Each StandardModel
is composed of the following fields:
fieldnames(StandardModel) # fields of a StandardModel
(:id, :reactions, :metabolites, :genes)
The :genes
field of a StandardModel
contains an ordered dictionary of gene ids mapped to Gene
s.
model.genes # the keys of this dictionary are the same as genes(model)
OrderedCollections.OrderedDict{String, Gene} with 137 entries: "b1241" => Gene("b1241", "adhE", Dict("original_bigg_ids"=>["b1241"]), Dict("… "b0351" => Gene("b0351", "mhpF", Dict("original_bigg_ids"=>["b0351"]), Dict("… "s0001" => Gene("s0001", "", Dict("original_bigg_ids"=>["s0001"]), Dict("sbo"… "b1849" => Gene("b1849", "purT", Dict("original_bigg_ids"=>["b1849"]), Dict("… "b3115" => Gene("b3115", "tdcD", Dict("original_bigg_ids"=>["b3115"]), Dict("… "b2296" => Gene("b2296", "ackA", Dict("original_bigg_ids"=>["b2296"]), Dict("… "b1276" => Gene("b1276", "acnA", Dict("original_bigg_ids"=>["b1276"]), Dict("… "b0118" => Gene("b0118", "acnB", Dict("original_bigg_ids"=>["b0118"]), Dict("… "b0474" => Gene("b0474", "adk", Dict("original_bigg_ids"=>["b0474"]), Dict("s… "b0116" => Gene("b0116", "lpd", Dict("original_bigg_ids"=>["b0116"]), Dict("s… "b0727" => Gene("b0727", "sucB", Dict("original_bigg_ids"=>["b0727"]), Dict("… "b0726" => Gene("b0726", "sucA", Dict("original_bigg_ids"=>["b0726"]), Dict("… "b2587" => Gene("b2587", "kgtP", Dict("original_bigg_ids"=>["b2587"]), Dict("… "b0356" => Gene("b0356", "frmA", Dict("original_bigg_ids"=>["b0356"]), Dict("… "b1478" => Gene("b1478", "adhP", Dict("original_bigg_ids"=>["b1478"]), Dict("… "b3734" => Gene("b3734", "atpA", Dict("original_bigg_ids"=>["b3734"]), Dict("… "b3733" => Gene("b3733", "atpG", Dict("original_bigg_ids"=>["b3733"]), Dict("… "b3736" => Gene("b3736", "atpF", Dict("original_bigg_ids"=>["b3736"]), Dict("… "b3737" => Gene("b3737", "atpE", Dict("original_bigg_ids"=>["b3737"]), Dict("… ⋮ => ⋮
The Gene
type is a struct that can be used to store information about genes in a StandardModel
. Each Gene
is composed of the following fields:
fieldnames(Gene)
(:id, :name, :notes, :annotations)
Use <tab> to quickly explore the fields of a struct. For example, Gene.<tab> will list all the fields shown above.
The keys used in the ordered dictionaries in model.genes
are the ids returned using the generic accessor genes
. Gene
s have pretty printing, as demonstrated below for a random gene drawn from the model:
random_gene_id = genes(model)[rand(1:n_genes(model))]
model.genes[random_gene_id]
Gene.id: b2278 Gene.name: nuoL Gene.notes: original_bigg_ids: ["b2278"] Gene.annotations: sbo: ["SBO:0000243"] uniprot: ["P33607"] ecogene: ["EG12092"] ncbigene: ["945540"] ncbigi: ["16130213"] refseq_locus_tag: ["b2278"] refseq_name: ["nuoL"] asap: ["ABE-0007532"] refseq_synonym: ["ECK2272", "JW2273"]
The same idea holds for both metabolites (stored as Metabolite
s) and reactions (stored as Reaction
s). This is demonstrated below.
random_metabolite_id = metabolites(model)[rand(1:n_metabolites(model))]
model.metabolites[random_metabolite_id]
Metabolite.id: nadh_c Metabolite.name: Nicotinamide adenine dinucleotide - reduced Metabolite.formula: C21N7P2H27O14 Metabolite.charge: -2 Metabolite.compartment: c Metabolite.notes: original_bigg_ids: ["nadh_c"] Metabolite.annotations: sabiork: ["38"] kegg.compound: ["C00004"] sbo: ["SBO:0000247"] biocyc: ["META:NADH"] chebi: CHEBI:13395, ..., CHEBI:13396 metanetx.chemical: ["MNXM10"] inchi_key: BOPGDPNILDQYTO-NNYOX... hmdb: ["HMDB01487"] bigg.metabolite: ["nadh"] seed.compound: ["cpd00004"] reactome.compound: 192305, ..., 194697
random_reaction_id = reactions(model)[rand(1:n_reactions(model))]
model.reactions[random_reaction_id]
Reaction.id: CYTBD Reaction.name: Cytochrome oxidase bd (ubiquinol-8: 2 protons) Reaction.metabolites: 0.5 o2_c + 1.0 q8h2_c + 2.0 h_c → 1.0 h2o_c + 2.0 h_e + 1.0 q8_c Reaction.lb: 0.0 Reaction.ub: 1000.0 Reaction.grr: (b0733 && b0734) || (b0978 && b0979) Reaction.subsystem: Oxidative Phosphorylation Reaction.notes: original_bigg_ids: ["CYTBD"] Reaction.annotations: metanetx.reaction: ["MNXR97031"] sbo: ["SBO:0000185"] seed.reaction: rxn12494, ..., rxn08288 bigg.reaction: ["CYTBD"] Reaction.objective_coefficient: 0.0
StandardModel
can be used to build your own metabolic model or modify an existing one. One of the main use cases for StandardModel
is that it can be used to merge multiple models or parts of multiple models together. Since the internals are uniform inside each StandardModel
, attributes of other model types are squashed into the required format (using the generic accessors). This ensures that the internals of all StandardModel
s are the same - allowing easy systematic evaluation.
Combining models that use different namespaces requires care. For example, in some models the water exchange reaction is called EX_h2o_e
, while in others it is called R_EX_h2o_s
. This needs to manually addressed to prevent duplicates, e.g. reactions, from being added.
Checking the internals of StandardModel
s: annotation_index
Often when models are automatically reconstructed duplicate genes, reactions or metabolites end up in a model. COBREXA
exports annotation_index
to check for cases where the id of a struct may be different, but the annotations the same (possibly suggesting a duplication). annotation_index
builds a dictionary mapping annotation features to the ids of whatever struct you are inspecting. This makes it easy to find structs that share certain annotation features.
rxn_annotations = annotation_index(model.reactions)
Dict{String, Dict{String, Set{String}}} with 10 entries: "ec-code" => Dict("3.6.3.37"=>Set(["ATPM"]), "3.6.3.42"=>Set(["ATPM… "sabiork" => Dict("109"=>Set(["PGL"]), "762"=>Set(["GLUN"]), "155"=… "metanetx.reaction" => Dict("MNXR104869"=>Set(["TKT2"]), "MNXR99715"=>Set(["E… "rhea" => Dict("27626"=>Set(["TKT2"]), "10229"=>Set(["ACONTa"]),… "sbo" => Dict("SBO:0000627"=>Set(["EX_for_e", "EX_nh4_e", "EX_p… "seed.reaction" => Dict("rxn05297"=>Set(["GLUt2r"]), "rxn09717"=>Set(["PY… "kegg.reaction" => Dict("R00114"=>Set(["GLUSy"]), "R00199"=>Set(["PPS"]),… "biocyc" => Dict("META:TRANS-RXN-121B"=>Set(["FUMt2_2"]), "META:PE… "reactome.reaction" => Dict("R-TGU-71397"=>Set(["PDH"]), "R-XTR-70449"=>Set([… "bigg.reaction" => Dict("ACALD"=>Set(["ACALD"]), "PTAr"=>Set(["PTAr"]), "…
rxn_annotations["ec-code"]
Dict{String, Set{String}} with 141 entries: "3.6.3.37" => Set(["ATPM"]) "3.6.3.42" => Set(["ATPM"]) "3.6.3.38" => Set(["ATPM"]) "3.6.3.19" => Set(["ATPM"]) "2.3.3.1" => Set(["CS"]) "1.6.1.2" => Set(["NADTRHD"]) "3.6.3.35" => Set(["ATPM"]) "6.2.1.5" => Set(["SUCOAS"]) "6.3.5.4" => Set(["GLUN"]) "3.6.3.49" => Set(["ATPM"]) "3.6.3.51" => Set(["ATPM"]) "1.2.1.12" => Set(["GAPD"]) "3.6.3.32" => Set(["ATPM"]) "2.3.3.3" => Set(["CS"]) "2.7.4.3" => Set(["ADK1"]) "6.3.5.5" => Set(["GLUN"]) "3.5.1.2" => Set(["GLUN"]) "1.1.1.49" => Set(["G6PDH2r"]) "5.3.1.9" => Set(["PGI"]) ⋮ => ⋮
The annotation_index
function can also be used on Reaction
s and Gene
s in the same way.
Checking the internals of StandardModel
s: check_duplicate_reaction
Another useful function is check_duplicate_reaction
, which checks for reactions that have duplicate (or similar) reaction equations.
pgm_duplicate = Reaction()
pgm_duplicate.id = "pgm2" # Phosphoglycerate mutase
pgm_duplicate.metabolites = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1)
pgm_duplicate
Reaction.id: pgm2 Reaction.name: --- Reaction.metabolites: 1.0 2pg_c ↔ 1.0 3pg_c Reaction.lb: -1000.0 Reaction.ub: 1000.0 Reaction.grr: --- Reaction.subsystem: --- Reaction.notes: --- Reaction.annotations: --- Reaction.objective_coefficient: 0.0
check_duplicate_reaction(pgm_duplicate, model.reactions; only_metabolites = false) # can also just check if only the metabolites are the same but different stoichiometry is used
"PGM"
Checking the internals of StandardModel
s: reaction_mass_balanced
Finally, reaction_mass_balanced
can be used to check if a reaction is mass balanced based on the formulas of the reaction equation.
rxn_dict = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1, "h2o_c" => 1)
reaction_mass_balanced(model, rxn_dict)
false
Now to determine which atoms are unbalanced, you can use reaction_atom_balance
reaction_atom_balance(model, rxn_dict)
Dict{String, Float64} with 4 entries: "C" => 0.0 "P" => 0.0 "H" => 2.0 "O" => 1.0
Note, since pgm_duplicate
is not in the model, we cannot use the other variants of this function because they find the reaction equation stored inside the model
.
This page was generated using Literate.jl.