Modélisation Formelle de Réseaux de Régulation Biologique 2023

Les posters présentés à l'École

Most permissive semantics as a tool to refine Boolean models

Boolean models are more and more used to study the dynamical behavior of biological systems [1]. In Boolean models, the activity of each component is binary and regulated by logical functions that represent the conditions of activation of the component. They are built using data obtained from literature that describes the pairwise interactions between components. Also, in presence of complex regulations, Boolean models can be extended to multivalued models to retrieve some dynamical properties (e.g., nature of attractors of the model and/or their reachability) of the biological system to model. However, identifying the nodes that need to be multivalued and setting their parameters can be tricky because it requires more precise knowledge of the regulation activities. Moreover, to simulate the dynamics of Boolean models, we need to consider the updating semantics, i.e., how the states of the model evolve. Synchronous semantics is attractive because of its simplicity, but far from the reality. Indeed, biological processes in a system are often associated with different time scales, varying widely from fractions of seconds to hours [2]. Hence, as an alternative to the synchronous semantics, we consider the asynchronous semantics. The asynchronous semantics is non-deterministic and allows only one component to update itself at a time.
The most-permissive semantics is also a non-deterministic semantics, recently proposed by [3]. It adds intermediate levels of activity to each node, adding ambiguity to a model, as a node can be seen as in a state 0 or 1 by other nodes. This most-permissive semantics adds a degree of uncertainty to the dynamics, which allows the emergence of different behaviors. For a given Boolean model, the most- permissive semantics generates more trajectories than the asynchronous semantics and allows capturing relevant dynamical behaviors not represented in the asynchronous semantics. Given these properties of the most-permissive semantics, it can represent an alternative to the use of multivalued models when detailed knowledge about the regulation activities in the biological system is lacking. Nevertheless, some dynamical behaviors have no biological relevance (trajectories between state that can not happen is the biological system) and can impede the analysis of the dynamics of the model. In this work, we propose to use the most-permissive semantics as a tool to guide the parametrization of a Boolean model. More precisely, we present a methodology using the comparison between the asynchronous and the most permissive semantics to identify nodes to be multivalued, and thus obtain a multilevel model capturing the required dynamical property with the asynchronous semantics.

[1] Schwab JD, Kühlwein SD, Ikonomi N, Kühl M, Kestler HA. Concepts in Boolean network modeling: What do they all mean? Comput Struct Biotechnol J. 2020 Mar 10;18:571-582. doi: 10.1016/j.csbj.2020.03.001. PMID: 32257043; PMCID: PMC7096748.
[2] Chaves M, Albert R, Sontag ED. Robustness and fragility of Boolean models for genetic regulatory networks. J Theor Biol. 2005 Aug 7;235(3):431-49. doi: 10.1016/j.jtbi.2005.01.023. Epub 2005 Mar 19. PMID: 15882705.
[3] Paulevé L, Kolčák J, Chatain T, Haar S. Reconciling qualitative, abstract, and scalable modeling of biological networks. Nat Commun. 2020 Aug 26;11(1):4256. doi: 10.1038/s41467-020-18112-5. Erratum in: Nat Commun. 2020 Sep 24;11(1):4900. PMID: 32848126; PMCID: PMC7450094.

Logic and linear programming for seed identification in metabolic networks

A genome-scale metabolic network (GSMN) describes the metabolic reactions of a species. It can be built from genomic information based on functional annotations of genes [1]. By combining environmental response data and mathematical modeling, GSMNs can be employed to predict the behavior of the organism in a particular environment. Widely-used models rely on solving linear programming problems, such as Flux Balance Analysis (FBA), but discrete dynamical models were also shown to provide pertinent predictions. The former models are tied to steady state assumption, whereas the latter model consider transient dynamics from an initial state, using notions of network expansion and scope [2].
We are interested here in the reverse problem : the identification of environemental nutrients, that we refer to as seeds, necessary to produce essential metabolites. Those precursor compounds can for example be needed in the environment to ensure the growth of a bacterial species, represented by a biomass reaction. This problem belongs to the field of reverse ecology, presented in [3] as an important analysis to understand the link between a system and its environment [4]. Applications include the design of culture media for uncultivated species through the prediction of optimal environmental compositions.
The identification of seeds has been adressed by various methods over the years following either the steady state or transient dynamics assymptions. In this work, we aim at unifying both perspectives and provide a new hybrid resolution method for identifying necessary nutrient to both permit to light up the reaction network and maintain a steady growth of the cells. We use Answer Setp Programming (ASP), a logic programming paradigm, to define the minimal or subset minimal set of seeds needed that could be selected starting from the initial state. Since FBA is a gold standard standard for controlling the activation of the biomass reaction, we use it as control and or directly use it in our seed inference. We compared two approaches: using solely the discrete approach of the scope or combine it with linear programming (LP) through a constraint propagator (LP-ASP). In the first approach, the set of seeds are the tested with FBA to check the biomass reaction activation. In the hybrid one, the FBA is used on the second setps of the seeds identification directly to eliminate the solutions that do not activate the biomass reaction.
It appeared that only a few solutions of the first method were sufficient to ensure the FBA constraint. With our hybrid resolution for the detections of seeds, the FBA constraint is guaranteed. Moreover, we demonstrated the scalability of our hybrid implementations on a set of 100 GSMN from the BIGG databases, comprising metabolic networks up to thousands of reactions.
Applications of this work are numerous, including facilitating the search for seeds from metabolic networks obtained from microbiotas in which the high proportion of non-cultivated species impedes the understanding of species’s roles and interactions.

[1] C. Francke, R. J. Siezen, and B. Teusink, Reconstructing the metabolic network of a bacterium from its genome, Trends in Microbiology, vol. 13, no. 11. pp. 550–558, Nov. 2005. doi: 10.1016/j.tim.2005.09.001
[2] T. Handorf, O. Ebenhöh, R. Heinrich, Expanding Metabolic Networks: Scopes of Compounds, Robustness, and Evolution, Journal of Molecular Evolution, vol. 61, no. 4. pp. 498–512, Jan. 2005. doi: 10.1007/s00239-005-0027-1
[3] Levy, R., Borenstein, E.: Reverse Ecology: From Systems to Environments and Back. pp. 329–345. Springer, New York, NY (2012). 4614-3567-9_15,{_}15
[4] i, Y.F., Costello, J.C., Holloway, A.K., Hahn, M.W.: Reverse ecology and the power of population genomics. Evolution; international journal of organic evolution 62(12), 2984–94 (dec 2008)., http://www.

Inferring Boolean networks from single-cell human embryo datasets: proof of concept with trophectoderm maturation

A better understanding of human embryonic development and cell fate decision is needed to improve the success rate of assisted reproductive technologies, such as in vitro fertilization (IVF). Fortunately, with novel transcriptomics technologies, vast amounts of data can now be generated, allowing the characterization of individual human embryos at a single-cell level. However, despite the potential of IVF, the current embryo culture systems and assessment methods result in a success rate of only 25%, causing emotional, social, and medical distress for both couples and the infertility medical team. Hence, novel approaches are needed to address this issue.
One such approach is the computational modeling of the human preimplantation embryo, allowing prediction of how embryos respond to specific system perturbations, such as changes in the culture media composition. The study aims to develop a computational model to discriminate different developmental stages during trophectoderm (TE) maturation using scRNAseq data. The proposed method involves selecting pseudo-perturbations specific to each developmental stage, allowing for learning Boolean network models. These models are inferred from the pseudo-perturbations and prior-regulatory networks and optimally fit scRNAseq data for each developmental stage.
Here, we introduce a general framework for inferring Boolean networks from scRNAseq data. Our method allows us to identify a family of Boolean networks specific to medium and late TE developmental stages, revealing opposite regulation pathways and supporting biological hypotheses in this domain. By providing a more comprehensive understanding of human embryonic development, this research has the potential to improve the success rate of assisted reproductive technologies, such as IVF.

Laetitia GIBART
Regulation inference with symbolic AI in René thomas modelling framework

Modelling a biological system aims at understanding the underlying chains of causalities which leads the system behave as observed. Biological systems are called complex because the underlying causalities are difficult to be extracted from global observation. Thus systems biology can be seen as the study of the interactions between the components of biological systems, and of the consequences of these interactions on functions and behaviours of these systems. In order to complicate the portrait of this research field, observations are often made under experimental conditions where all entities cooperation is not completely known.
One of the main challenges to develop such kind of model is the formalisation of the regulation graph associated. In some cases biologists observes global properties without knowing the cooperation between biological components that lead to the properties. In such case when modeller design the regulation graph, they need to test some regulations hypotheses.
Here we propose an automatised method to infers regulations also called multiplexes in the René Thomas Framework keeping the dynamical properties required to model the biological system.

Détection de communautés évolutives dans des graphes pondérés par les nœuds.

Grâce aux outils de biologie moléculaire, il est désormais possible de mesurer, à l’échelle d’un génome entier, les changements qui se produisent au niveau cellulaire dans différentes conditions expérimentales. En fonction du moment auquel sont faites ces mesures, les résultats peuvent varier, exprimant ainsi les différents rôles que peut jouer chaque gêne à différentes étapes du processus biologique observé.
Les méthodes classiques d’analyse de ces données définissent l’importance des gènes étudiés en se basant sur leur variation individuelle, sans prendre en compte l’aspect temporel des données et leurs rôles combinés.
En représentant les données biologiques sous forme de graphes, notre objectif consiste à développer des méthodes basées sur l’Intelligence Artificielle pour détecter les communautés de gènes d’intérêt et suivre leur évolution dans un contexte biologique donné.

Gustavo Magaña López
scBoolSeq scRNA-Seq Data Generation and Binarization from Boolean Dynamics.

Les réseaux booléens sont utilisés afin de modéliser la dynamique des processus de destinée cellulaire et différenciation en décrivant l’évolution qualitative (binaire) des états d’activation des composantes considérées (en général des gènes et facteurs de transcription). Pour la construction de modèles guidée par les données, il est fondamental de pouvoir lier ces états qualitatifs binaires aux mesures quantitatives de l’expression génique au sein des cellules (comme les données scRNA-Seq).
D'une part, la binarisation de données scRNA-Seq est une étape nécessaire afin d’inférer et valider des modèles booléens. D’autre part la génération de données scRNA-Seq synthétiques à partir de modèles booléens de référence permet d’évaluer les performances de différentes méthodes d’inférence. Toutefois, lier les propriétés statistiques des données scRNA-Seq comme les événements « dropout » à des états d’activation booléens demeure une gageure. Nous présentons scBoolSeq, une méthode pour la liaison bidirectionnelle de données scRNA-Seq et des états d’activation issus de modèles booléens. À partir d’un jeu de données scRNA-Seq de référence, scBoolSeq calcule un ensemble de critères statistiques afin de classer les distributions_ empiriques de pseudocomptages dans trois catégories : Unimodaux, Bimodaux, et Zero-Inflated (surreprésentation de zéros). Pour chaque gène du jeu de référence, scBoolSeq ajuste aussi un modèle probabiliste pour simuler le phénomène drop-out qui caractérise les données scRNA-Seq. À partir de ces distributions, scBoolSeq est capable de discrétiser des données scRNA-Seq expérimentales pour l’inférence et validation de modèles booléens ainsi que de générer des jeux de données synthétiques à partir de trajectoires booléennes. Ceci est fait par l’échantillonnage biaisé des distributions paramétriques et la simulation des éventements dropout. Notre méthode permet de reproduire les statistiques des données expérimentales, telles que la relation moyenne-variance, et moyenne-taux de zéros ainsi que les corrélations entre moments d’ordre supérieur. Les données scRNA-Seq synthétiques produites par scBoolSeq peuvent par exemple être utilisées en input pour des méthodes de reconstruction de trajectoire afin de visualiser ces évolutions.