{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Constraints-based Flux analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Mass Balance-constrained Analysis (FBA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ETFBA package is capable of performing constraints-based flux analysis, incorporating constraints on enzyme protein allocation and/or reaction thermodynamics. Let's start with the fundamental [flux balance analysis](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3108565/) (FBA). FBA aims to solve the following problem:\n", "
\n", " \n", "
\n", "Here, ${\\bf{S}}$ is the stoichiometric matrix, and ${\\bf{v}}$ denotes the ${\\bf{total}}$ ${\\bf{fluxes}}$, i.e., the net flux through a reversible reaction is split into forward and backward (reverse) fluxes, both maintaining a non-negative value." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a demonstration, we utilize the [translated](https://github.com/Chaowu88/etfba/blob/main/models/e_coli/etfba_iML1515.bin) *E. coli* model iML1515 to conduct basic FBA." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "model iML1515 with 2712 reactions and 1877 metabolites\n" ] } ], "source": [ "from etfba import Model\n", "\n", "model_file = '../../models/e_coli/etfba_iML1515.bin'\n", "model = Model.load(model_file)\n", "print(model)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "objective = {'BIOMASS_Ec_iML1515_core_75p37M': 1}\n", "flux_bound = (0, 1000)\n", "spec_flux_bound = {'ATPM': (6.86, 1000)}\n", "preset_flux = {'EX_glc__D_e_b': 10, 'FHL': 0}\n", "\n", "res = model.optimize(\n", " 'fba', \n", " objective=objective,\n", " flux_bound=flux_bound,\n", " spec_flux_bound=spec_flux_bound,\n", " preset_flux=preset_flux\n", ").solve(solver='gurobi')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The objective of our flux balance analysis can be set through the \"objective\" argument, which accepts a dictionary defining the expression of the objective function which is defined as the linear combination of fluxes. For example, {'v1': 2, 'v2': -1} defines the expression \"2\\*v1 - v2\". \"flux_bound\" is used to set the bounds for all fluxes, while it is prioritized by \"spec_flux_bound,\" which can be used to set the bounds of specific fluxes. \"preset_flux\" is used to set fixed values of fluxes such as measured exchange rates or silenced reactions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Note:

1. Fluxes are in the unit of mmol gCDW$^{-1}$ $h^{-1}$;\n", "

2. Since total fluxes are used as decision variables in the problem, a suffix of '_f' or '_b' is required to indicate forward or backward flux for reversible reactions in the above setting of arguments.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The objecitve at solution can be accessed by the `opt_objective` attributute of the optimization result:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimal growth rate: 0.877\n" ] } ], "source": [ "print('Optimal growth rate:', round(res.opt_objective, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The net reaction fluxes derived from the solution can be accessed by the `opt_fluxes` attribute, which supports a `save` method for direct result saving." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Top 10 reactions with the highest fluxes:\n", " HPYRRx 1000.000\n", " FE2tex 1000.000\n", " CRNt8pp 1000.000\n", " CRNDt2rpp 1000.000\n", " GLBRAN2 1000.000\n", " GLUt4pp 1000.000\n", " MN2tpp 1000.000\n", " GLDBRAN2 1000.000\n", " FUMt1pp 1000.000\n", " R1PK 999.999\n" ] } ], "source": [ "print('Top 10 reactions with the highest fluxes:')\n", "for rxnid, flux in sorted(res.opt_fluxes.items(), \n", " key=lambda item: item[1], reverse=True)[:10]:\n", " print(f'{rxnid:>10} {flux:6.3f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In naive FBA, some reactions may receive unseasonably high fluxes (as shown in the reactions above with the highest fluxes). This can be impractical in terms of resource economy during cell growth. [parsimonious FBA](https://pubmed.ncbi.nlm.nih.gov/20664636/) can be used to address the problem by minimizing the overall fluxes while maintaining the optimality of the objective to some extent. Parsimonious FBA can be formulated as:\n", "
\n", " \n", "
\n", "Here, a small constant $slack$ is provided to relax the objective constraint in case it encounts difficiulties in finding feasible solutions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting the \"parsimonious\" argument to True will prune the obtained fluxes." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO: estimating parsimonious fluxes\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Optimal growth rate: 0.877\n" ] } ], "source": [ "res = model.optimize(\n", " 'fba', \n", " objective=objective,\n", " flux_bound=flux_bound,\n", " spec_flux_bound=spec_flux_bound,\n", " preset_flux=preset_flux,\n", " slack=0,\n", " parsimonious=True\n", ").solve(solver='gurobi')\n", "print('Optimal growth rate:', round(res.opt_objective, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In estimating the parsimonious fluxes, basic FBA will be performed first, followed by minimizing overall fluxes to obtain parsimonious fluxes. Now we can have a more reasonable intracellular fluxes." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Top 10 reactions with the highest fluxes:\n", " ATPS4rpp 70.432\n", " EX_h2o_e 47.162\n", "CYTBO3_4pp 44.256\n", " NADH16pp 37.997\n", " EX_co2_e 24.003\n", " O2tpp 22.132\n", " O2tex 22.132\n", " GAPD 17.105\n", " ENO 15.599\n", " GLCptspp 10.000\n" ] } ], "source": [ "print('Top 10 reactions with the highest fluxes:')\n", "for rxnid, flux in sorted(res.opt_fluxes.items(), \n", " key=lambda item: item[1], reverse=True)[:10]:\n", " print(f'{rxnid:>10} {flux:6.3f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To check the balance of a specific metabolite, one can use the `statement` method, which summarizes the overall production and consumption reaction fluxes to and from that metabolite." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "productions\n", " ICDHyr 6.913\n", " GND 2.355\n", " G6PDH2r 2.355\n", " MTHFD 0.867\n", "\n", "consumptions\n", " SHK3Dr 0.334\n", " G5SD 0.194\n", " KARA1 -0.767\n", " KARA2 0.255\n", " GLUTRR 0.003\n", " AGPR -0.259\n", " DXPRIi 0.002\n", " UAPGR 0.024\n", " DHDPRy 0.325\n", " TRDR 0.217\n", " G3PD2 -0.122\n", " P5CR 0.194\n", " ASAD -0.938\n", " SULR 0.217\n", " HSDy -0.612\n", " GLUDy -7.499\n", " DHFR 0.024\n", " 3OAR140 0.068\n" ] } ], "source": [ "for key, rxn_flux in res.statement('nadph_c').items():\n", " print(key)\n", " for rxnid, flux in rxn_flux.items():\n", " if abs(flux) > 0.001:\n", " print(f'{rxnid:>10} {flux:6.3f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Note:

1. The \"statement\" method summarizes fluxes of the reactions in which a metabolite is involved. To obtain a metabolite-based mass balance, the stoichiometric coefficient in corresponding reactions needs to be considered;\n", "

2. While fluxes are aggregated based on the production or consumption of a metabolite, the sign of the flux value depends on the directionality of the reaction as defined in the model.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Constrained by Both Enzyme Protein Allocation and Mass Balance (EFBA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Metabolic reactions rely on enzymes to proceed. Higher fluxes usually require more enzyme amounts to catalyze the chemical process. Therefore, it is essential to incorporate enzyme protein constraints to optimize flux distributions under resource-limited conditions, which are common in realistic scenarios. EFBA can be formulated as:\n", "
\n", " \n", "
\n", "Here, $n_{irr}$ and $n_{rev}$ are the number of irreversible and reversible reactions, respectively. The superscipts \"+\" and \"-\" denote the forward and reverse reactions, respectively. $\\eta$ represents the enzyme efficiency affected by substrate saturation and thermodynamic feasibility. The product of $\\eta$ and $k_{cat}$ forms the apparent enzyme catalytic constant, $k_{app}$. $Q$ denotes the total available protein resource fraction for metabolic enzymes in biomass (g gCDW$^{-1}$)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's continue with the *E. coli* model with the enzyme protein allocation constraint added:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO: estimating parsimonious fluxes\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Optimal growth rate: 0.866\n" ] } ], "source": [ "import pandas as pd\n", "\n", "kcat_file = '../../models/e_coli/kcats.xlsx'\n", "mw_file = '../../models/e_coli/mws.xlsx'\n", "kcats = pd.read_excel(kcat_file, header=None, index_col=0).squeeze()\n", "mws = pd.read_excel(mw_file, header=None, index_col=0).squeeze()\n", "\n", "# set reactions with available kcats and MWs\n", "eff_rxns = []\n", "for rxnid, rxn in model.reactions.items():\n", " if rxn.rev:\n", " if rxnid+'_f' in kcats.index and rxnid+'_b' in kcats.index and rxnid in mws.index:\n", " rxn.forward_kcat = kcats[rxnid+'_f']\n", " rxn.backward_kcat = kcats[rxnid+'_b']\n", " rxn.molecular_weight = mws[rxnid]\n", " eff_rxns.append(rxnid)\n", " else:\n", " if rxnid in kcats.index and rxnid in mws.index:\n", " rxn.forward_kcat = kcats[rxnid]\n", " eff_rxns.append(rxnid)\n", "enz_ub = 0.15\n", "\n", "preset_flux = {'FHL': 0}\n", "res = model.optimize(\n", " 'efba', \n", " objective=objective, \n", " flux_bound=flux_bound, \n", " spec_flux_bound=spec_flux_bound,\n", " preset_flux=preset_flux,\n", " inc_enz_cons=eff_rxns,\n", " enz_prot_lb=enz_ub,\n", " parsimonious=True # used to obtain parsimonious fluxes\n", ").solve(solver='gurobi')\n", "print('Optimal growth rate:', round(res.opt_objective, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The \"inc_enz_cons\" argument specifies reactions subject to the enzyme protein constraint." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the substrate uptake is not set during optimization; the model will obtain the metabolic capability completely depending on the protein resource of catalytic enzymes." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Glucose uptake rate: 12.887\n" ] } ], "source": [ "print('Glucose uptake rate:', -round(res.opt_fluxes['EX_glc__D_e'], 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The optimal protein fractions allocated for each enzyme subjected to the constraint can be accessed by the `opt_enzyme_costs` attribute of the optimization result, which also has a `save` method for direct saving." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Top 10 enyzme with the most abundant protein allocations:\n", " ENO 0.01082\n", " GAPD 0.00746\n", " FBA 0.00623\n", " PGK 0.00596\n", " KARA2 0.00503\n", " KARA1 0.00494\n", " METS 0.00479\n", " GHMT2r 0.00361\n", " PGM 0.00357\n", " GND 0.0027\n" ] } ], "source": [ "print('Top 10 reactions with the most abundant enzyme protein allocations:')\n", "for rxnid, pro in sorted(res.opt_enzyme_costs.items(), \n", " key=lambda item: item[1], reverse=True)[:10]:\n", " print(f'{rxnid:>10} {round(pro, 5):>10}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Constrained by Both Thermodynamics and Mass Balance (TFBA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To enforce the second law of thermodynamics, constraints on the reaction Gibbs energy change can be added to the model, ensuring reactions proceed in the thermodynamically favorable direction. The TFBA problem can be formulated as:\n", "
\n", " \n", "
\n", "Here, metabolite concentrations ${\\bf{c}}$ (log transformed) are introduced as new varibles, as well as the binary ${\\bf{x}}$ (1 indicates nonzero flux, and 0 otherwise), making the problem a mixed-integer linear programming (MILP). $K$ is big enough constant ensuring the thermodynamic constraints are applied only to nonzero fluxes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We continue to use the *E. coli* model for demonstration." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO: estimating parsimonious fluxes\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Optimal growth rate: 0.877\n" ] } ], "source": [ "dgpm_file = '../../models/e_coli/dgpms.xlsx'\n", "dgpms = pd.read_excel(dgpm_file, header=None, index_col=0).squeeze()\n", "\n", "preset_flux = {'EX_glc__D_e_b': 10, 'FHL': 0}\n", "conc_bound = (0.0001, 100)\n", "spec_conc_bound = {\n", " 'o2_c': (0.0001, 0.0082), # should be less than in the media\n", " 'co2_c': (0.1, 100) # should be greater than in the media\n", "}\n", "preset_conc = {\n", " 'glc__D_p': 20,\n", " 'pi_p': 56,\n", " 'so4_p': 3,\n", " 'nh4_p': 19,\n", " 'na1_p': 160,\n", " 'k_p': 22,\n", " 'fe2_p': 62\n", "}\n", "\n", "# set reactions with available ΔG'm\n", "ex_rxns = []\n", "for rxnid, rxn in model.reactions.items():\n", " if rxnid in dgpms.index:\n", " rxn.standard_gibbs_energy = dgpms[rxnid]\n", " else:\n", " ex_rxns.append(rxnid)\n", "\n", "res = model.optimize(\n", " 'tfba', \n", " objective=objective,\n", " flux_bound=flux_bound,\n", " conc_bound=conc_bound,\n", " spec_flux_bound=spec_flux_bound,\n", " spec_conc_bound=spec_conc_bound,\n", " preset_flux=preset_flux,\n", " preset_conc=preset_conc,\n", " ex_thermo_cons=ex_rxns,\n", " parsimonious=True # used to obtain parsimonious fluxes\n", ").solve(solver='gurobi')\n", "print('Optimal growth rate:', round(res.opt_objective, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obtaining the standard reaction Gibbs energy is a prerequisite to perform TFBA. The batch estimation of $\\Delta G'^0$ can be achieved with [equilibrator-api](https://equilibrator.readthedocs.io/en/latest/index.html) which applies a component contribution method with improved accucary compared to canonical methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Note:

Metabolite concentrations are provided in the unit of mM, thus the standard reaction Gibbs energ is actually $\\Delta G'^m$ which can be transformed from $\\Delta G'^0$ (see the calculation of $\\Delta G'^m$ in the Building a Model section).\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The estimated reaction Gibbs energy, reaction directionality, and metabolite concentrations at the solution can be accessed with `opt_gibbs_energy`, `opt_directions`, and `opt_concentrations` attributes of the optimization result, which also have the `save` method." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " SHK3Dr -46.227 forward\n", " DHORTS 0.001 reverse\n", " OMPDC -25.614 forward\n", " G5SD -45.45 forward\n", " CS -80.096 forward\n", " ICDHyr -0.001 forward\n", " PPA -47.042 forward\n", " APRAUR -65.126 forward\n", " TRPAS2 0.001 reverse\n", " DB4PS -157.053 forward\n", " ALAR -0.041 forward\n", " RBFK -21.895 forward\n", " ALATA_L 24.236 reverse\n", " PPM -13.807 reverse\n", " ASPTA 11.768 reverse\n", " RBFSb -93.656 forward\n" ] } ], "source": [ "count = 0\n", "for rxnid, dgp in res.opt_gibbs_energy.items():\n", " if count <= 15:\n", " rxn_dir = res.opt_directions[rxnid]\n", " if rxn_dir != 'zero flux':\n", " print(f'{rxnid:>10} {round(dgp,3):>10} {rxn_dir:>10}')\n", " count += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Constrained by Enzyme Protein Allocation, Thermodynamics and Mass Balance (ETFBA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Integrating all the above constraints of enzyme protein allocation, thermodynamics, and mass balance yields ETFBA, which allows a more comprehensive understanding of cell metabolism at the genome scale. As an optimization problem, it is defined as:\n", "
\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are still using the *E. coli* model, but this time the subtrate uptake rate is not set." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO: estimating parsimonious fluxes\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Optimal growth rate: 0.864\n" ] } ], "source": [ "preset_flux = {'FHL': 0}\n", "res = model.optimize(\n", " 'etfba', \n", " objective=objective,\n", " flux_bound=flux_bound,\n", " conc_bound=conc_bound,\n", " spec_flux_bound=spec_flux_bound,\n", " spec_conc_bound=spec_conc_bound,\n", " preset_flux=preset_flux,\n", " preset_conc=preset_conc,\n", " ex_thermo_cons=ex_rxns,\n", " inc_enz_cons=eff_rxns,\n", " enz_prot_lb=enz_ub,\n", " parsimonious=True # used to obtain parsimonious fluxes\n", ").solve(solver='gurobi')\n", "print('Optimal growth rate:', round(res.opt_objective, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result we obtained will have all the attributes mentioned above to access the fluxes, metabolite concentrations, enzyme protein costs, reaction Gibbs energies, reaction directionality, etc., as well as the `statement` method to inspect the balance of some metabolite at the solution." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Top 10 reactions with the highest fluxes:\n", " ATPS4rpp 59.383\n", " EX_h2o_e 42.636\n", "CYTBO3_4pp 35.362\n", " NADH16pp 35.075\n", " GAPD 21.880\n", " ENO 20.444\n", " EX_co2_e 19.832\n", " EX_h_e 18.897\n", " O2tpp 17.990\n", " O2tex 17.990\n" ] } ], "source": [ "print('Top 10 reactions with the highest fluxes:')\n", "for rxnid, flux in sorted(res.opt_fluxes.items(), \n", " key=lambda item: item[1], reverse=True)[:10]:\n", " print(f'{rxnid:>10} {flux:6.3f}')" ] } ], "metadata": { "kernelspec": { "display_name": "etfba-py38", "language": "python", "name": "etfba-py38" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.19" } }, "nbformat": 4, "nbformat_minor": 5 }