Gene Regulatory Networks: Methods and Protocols Guido Sanguinetti Download PDF
Gene Regulatory Networks: Methods and Protocols Guido Sanguinetti Download PDF
Gene Regulatory Networks: Methods and Protocols Guido Sanguinetti Download PDF
com
https://textbookfull.com/product/gene-
regulatory-networks-methods-and-protocols-
guido-sanguinetti/
textbookfull
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/gene-regulatory-networks-1st-
edition-isabelle-s-peter/
https://textbookfull.com/product/suicide-gene-therapy-methods-
and-protocols-nejat-duzgunes/
https://textbookfull.com/product/mitochondrial-gene-expression-
methods-and-protocols-michal-minczuk/
https://textbookfull.com/product/zebrafish-methods-and-protocols-
koichi-kawakami/
SNAREs: Methods and Protocols Rutilio Fratti
https://textbookfull.com/product/snares-methods-and-protocols-
rutilio-fratti/
https://textbookfull.com/product/epitranscriptomics-methods-and-
protocols-narendra-wajapeyee/
https://textbookfull.com/product/phytoplasmas-methods-and-
protocols-rita-musetti/
https://textbookfull.com/product/metalloproteins-methods-and-
protocols-yilin-hu/
https://textbookfull.com/product/nanotoxicity-methods-and-
protocols-qunwei-zhang/
Methods in
Molecular Biology 1883
Guido Sanguinetti
Vân Anh Huynh-Thu Editors
Gene
Regulatory
Networks
Methods and Protocols
M E THODS IN M OLECULAR B IOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
Edited by
Guido Sanguinetti
School of Informatics, University of Edinburgh, Edinburgh, UK
This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC part of
Springer Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Preface
v
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Contributors
ix
x Contributors
PATRICK E. MEYER • Bioinformatics and Systems Biology (BioSys) Unit, Université de Liège,
Liège, Belgium
TOM MICHOEL • Division of Genetics and Genomics, The Roslin Institute, The University
of Edinburgh, Midlothian, Scotland, UK; Computational Biology Unit, Department of
Informatics, University of Bergen, Bergen, Norway
SACH MUKHERJEE • German Center for Neurodegenerative Diseases (DZNE), Bonn,
Germany
CHRISTOPHER A. PENFOLD • Wellcome/CRUK Gurdon Institute, University of Cam-
bridge, Cambridge, UK
NGOC C. PHAM • Bioinformatics and Systems Biology (BioSys) Unit, Universite de Liège,
Liège, Belgium
SAMANTHA RICCADONNA • Fondazione Edmund Mach, San Michele all’Adige, Italy
GUILLEM RIGAILL • Institute of Plant Sciences Paris-Saclay, UMR 9213/UMR1403,
CNRS, INRA, Université Paris -Sud, Université d’Evry, Université Paris-Diderot,
Sorbonne Paris-Cité, Paris, France; Laboratoire de Mathématiques et Modélisation d’Evry
(LaMME), Université d’Evry, Val d’Essonne, UMR CNRS 8071, ENSIIE, USC INRA,
Paris, France
SUSHMITA ROY • Wisconsin Institute for Discovery, University of Wisconsin-Madison,
Madison, WI, USA; Department of Biostatistics and Medical Informatics, University
of Wisconsin-Madison, Madison, WI, USA
WOUTER SAELENS • Data Mining and Modelling for Biomedicine, VIB Center for Inflam-
mation Research, Ghent, Belgium; Department of Applied Mathematics, Computer Science
and Statistics, Ghent University, Ghent, Belgium
YVAN SAEYS • Data Mining and Modelling for Biomedicine, VIB Center for Inflammation
Research, Ghent, Belgium; Department of Applied Mathematics, Computer Science and
Statistics, Ghent University, Ghent, Belgium
PHILIPPE SALEMBIER • Universitat Politecnica de Catalunya, Barcelona, Spain
GUIDO SANGUINETTI • School of Informatics, University of Edinburgh, Edinburgh, UK
ALIREZA FOTUHI SIAHPIRANI • Wisconsin Institute for Discovery, University of Wisconsin-
Madison, Madison, WI, USA; Department of Computer Sciences, University of Wisconsin-
Madison, Madison, WI, USA
MARTINA SUNDQVIST • UMR MIA-Paris, AgroTechParis, INRA, Université Paris-
Saclay, Paris, France
ANASTASIYA SYBIRNA • Wellcome/CRUK Gurdon Institute, University of Cambridge,
Cambridge, UK; Wellcome/MRC Cambridge Stem Cell Institute, University of Cam-
bridge, Cambridge, UK; Physiology, Development and Neuroscience Department, Univer-
sity of Cambridge, Cambridge, UK
HELENA TODOROV • Data Mining and Modelling for Biomedicine, VIB Center for
Inflammation Research, Ghent, Belgium; Department of Applied Mathematics, Com-
puter Science and Statistics, Ghent University, Ghent, Belgium; Centre International de
Recherche en Infectiologie, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS,
UMR5308, École Normale Supérieure de Lyon, Univ Lyon, Lyon, France
MATTHIEU VIGNES • Institute of Fundamental Sciences, Massey University, Palmerston
North, New Zealand
ROBERTO VISINTAINER • CIBIO, University of Trento, Trento, Italy
LINGFEI WANG • Division of Genetics and Genomics, The Roslin Institute, The University
of Edinburgh, Midlothian, Scotland, UK
Contributors xi
Abstract
Gene regulatory networks are powerful abstractions of biological systems. Since the advent of high-
throughput measurement technologies in biology in the late 1990s, reconstructing the structure of such
networks has been a central computational problem in systems biology. While the problem is certainly
not solved in its entirety, considerable progress has been made in the last two decades, with mature tools
now available. This chapter aims to provide an introduction to the basic concepts underpinning network
inference tools, attempting a categorization which highlights commonalities and relative strengths. While
the chapter is meant to be self-contained, the material presented should provide a useful background to
the later, more specialized chapters of this book.
Key words Gene regulatory networks, Network inference, Network reverse-engineering, Unsuper-
vised inference, Data-driven methods, Probabilistic models, Dynamical models
Guido Sanguinetti and Vân Anh Huynh-Thu (eds.), Gene Regulatory Networks: Methods and Protocols,
Methods in Molecular Biology, vol. 1883, https://doi.org/10.1007/978-1-4939-8882-2_1,
© Springer Science+Business Media, LLC, part of Springer Nature 2019
1
2 Vân Anh Huynh-Thu and Guido Sanguinetti
1.1 Mechanisms The molecular bases of the transcription process have been
of Gene Regulation intensely studied over the last 60 years. Many excellent
monographs are available on the subject; we refer the reader in
particular to the classic books by Ptashne and collaborators [2, 3]
Gene Regulatory Network Inference: An Introductory Survey 3
(see also this recent review [4] for a historical perspective). Here
we give a brief intuitive description of the process, taking, as an
illustrative example, the transcriptional response of the bacterium
Escherichia coli in response to changes in oxygen availability (see
ref. [5] for a modern review of this field). Transcription is carried
out by the enzyme RNA polymerase (RNAP), that slides along the
DNA, opening the double strand and producing a faithful RNA
copy of the gene. The rate of recruitment of RNAP at a gene can
be modulated by the presence or absence of specific transcription
factor (TF) proteins, which contain a DNA-binding module that
enables them to recognize specific DNA-sequence signals near
the start of genes (promoter regions). The classical view of gene
regulation holds that changes in cellular state are orchestrated by
changes in binding by TFs.
For example, in E. coli, oxygen withdrawal leads to dimeriza-
tion of the master regulator protein Fumarate Nitrate Reductase
(FNR); FNR dimers (but not monomers) can bind specifically to
DNA, and change the rate of recruitment of RNAP at the FNR
target genes, thereby changing their levels of expression to enable
the cell to adapt to the changed conditions. However, FNR is not
the only regulator responding to changes in oxygen availability:
another master regulator, the two component system ArcAB, also
senses oxygen changes, albeit through a different mechanism, and
changes its binding to hundreds of genes as a result. FNR and
ArcAB share many targets, and through their combined action they
can give rise to highly complex dynamics [6, 7].
Two important observations can be made from the previous
discussion. Firstly, the regulation of gene expression levels is
enacted through the action of gene products themselves: therefore,
in principle, one may hope to be able to describe the dynamics of
gene expression as an autonomous system. Secondly, even in the
simple case of the bacterium Escherichia coli, regulation of gene
expression is a complex process, likely to involve the interactions of
several molecular players.
In higher organisms, the basic components of the transcrip-
tional regulatory machinery are remarkably similar. However, many
more levels of regulatory control are present: in particular, chemi-
cal modifications of the DNA itself (in particular methylation of C
nucleotides) and of the structural histone proteins, around which
DNA is wound, can affect the structural properties of the DNA,
and hence the local accessibility to the transcriptional machinery.
Such effects, collectively known as epigenetic modifications, have
strong associations with transcription [8–11], and are generally
thought to encode processes of cellular memory associated with
long-term adaptation or cell-type differentiation.
Finally, while we have primarily focused on transcription,
subsequent steps of gene expression are also tightly regulated: RNA
processing, translation, and RNA and protein degradation all pro-
4 Vân Anh Huynh-Thu and Guido Sanguinetti
Fig. 1 A cartoon schematic of a gene regulatory network. A complex biophysical model describes the
interaction between three genes, involving both direct regulation (gene 2 by gene 1) and combinatorial
regulation via complex formation (gene 3 by genes 1 and 2). The abstracted structure of the system is given
in the (directed) network on the right
Fig. 2 Examples of network types: directed (a), undirected (b), and weighted (c), where the weights are
represented by edge thickness. Note that a weighted network can be directed or undirected
3 Data-Driven Methods
3.1 Correlation The simplest score that one may associate to a pair of vector-
Networks valued measurements is their correlation. This is computed in
the following way: given two zero-mean vectors vi and vj , the
(Pearson) correlation between the vectors is given by
vi · vj
corr(vi , vj ) = ρij = (1)
vi vj
3.2 Information As we have seen before, the linearity of Pearson correlation may
Theoretic Scores limit its suitability to capture complex regulatory relations. To
obviate this problem, several groups have considered alternative
scores based on information theory. The main mathematical con-
cept is the mutual information, defined as follows. Let X and Y
be two discrete random variables, and let P(X , Y ) be their joint
probability distribution. The mutual information between the two
random variables is then defined as
P(xi , yj )
MI[X , Y ] = P(xi , yj ) log
xi ,yj
P(xi )P(yj )
P(xi |yj )
= P(xi , yj ) log (2)
xi ,yj
P(xi )
Gene Regulatory Network Inference: An Introductory Survey 9
where xj and yj are the values the two random variables can take,
and P(X ) (resp. P(Y )) is the marginal distribution obtained by
summing out the values of Y (resp. X ) in the joint distribu-
tion. Intuitively, the mutual information quantifies the degree of
dependence of the two random variables: it is zero when the two
random variables are independent (as is clear from the second
formulation in Eq. (2)), and, when the two variables are determin-
istically linked, it returns the entropy of the marginal distribution.
The mutual information is still a symmetric score, so mutual
information networks are naturally undirected. Nevertheless, it can
accommodate more subtle dependencies than the linear correlation
score in (1), therefore potentially catering for a broader class of
regulatory interactions.
In the GRN context, the idea is to replace the probability
distributions in (2) with empirical distributions (estimated from the
samples) of gene expression levels for each pair of genes. This gives
a weight to each possible edge within a fully connected, weighted
undirected network; thresholding at a user-defined parameter
then returns a network topology called relevance network [21]. A
number of methods have been proposed to filter out indirect or
spurious links in relevance networks, the most popular methods
being ARACNE [22], CLR [23], and MRNET [24].
Mutual information networks are among the most widely used
GRN inference methods. They scale to genome-wide networks,
even if they are slightly more computationally intensive than
correlation-based methods, as their computational complexity is
quadratic in the number of genes and samples. However, they
also stop short of providing a predictive framework. Furthermore,
estimation of the joint probabilities in Eq. (2) might be highly
sensitive to noise when the sample size is medium-small.
with εi a noise term, and use the resulting weight wj as the weight
associated with the network edge between gene j and gene g.
Notice that in this case the regression formulation naturally gives
a direction to the network (even though bidirectional edges are
clearly possible).
10 Vân Anh Huynh-Thu and Guido Sanguinetti
4 Probabilistic Models
4.1 Gaussian The simplest probabilistic model one may wish to consider is
Graphical Models a multivariate normal distribution. The probability density for a
multivariate normal vector x ∈ RG is given by
Gene Regulatory Network Inference: An Introductory Survey 11
1 1 −1
p(x|m, Σ) = √ exp − (x − m) Σ (x − m)
T
(4)
2π|Σ| 2
4.2 Bayesian All methods described so far address the problem of network
Networks reconstruction from a top-down approach: start with a fully
connected network, compute pairwise scores (or estimate jointly
a precision matrix in the case of Gaussian Graphical Models), and
12 Vân Anh Huynh-Thu and Guido Sanguinetti
G
P(X1 , . . . , XG ) = P(X1 ) P(Xi |X1 , . . . , Xi−1 ) (5)
i=2
G
P(X1 , . . . , XG |G ) = P(Xi |Xπi ) (6)
i=1
Fig. 3 Example of a valid Bayesian Network with four nodes and four edges.
Given this structure G , the joint distribution P(A, B, C, D|G ) factorizes as
P(A)P(B|A)P(C|A)P(D|B, C)
5 Dynamical Models
Fig. 4 Example of a Dynamic Bayesian Network with four nodes: static repre-
sentation (with cycles) on the left, and unrolled dynamic representation on the
right
5.2 Differential Differential equations represent perhaps the best studied and most
Equation Methods widely used class of dynamical models in science and engineering.
They provide an infinitesimal description of the system dynamics
by relating the rate of change (time derivative) of a variable to its
value,
dx
= f (x, Θ, u(t), t) . (7)
dt
dx
= Ax (8)
dt
16 Vân Anh Huynh-Thu and Guido Sanguinetti
6 Multi-Network Models
All of our previous discussion has assumed that all the data can
be explained by a single network structure. While this may be
reasonable when all data comes from similar conditions, it is a
very strong assumption when one is trying to jointly model data
from heterogeneous scenarios, as different biological conditions
may lead to different pathways being activated, so that effectively
different network structures may be more appropriate.
This idea has been fruitfully exploited in two main directions.
Several papers have considered the scenario where data (e.g., time
series) is available from different, but related conditions. Therefore,
one may reasonably assume some commonalities between the
underlying network structures, so that methods that can transfer
information across conditions are needed. This transfer can be
achieved via introducing a shared diversity penalty within different
optimization problems [47, 48]. Equivalently but more flexibly,
the joint reconstruction of the different networks can be achieved
by adopting a hierarchical Bayesian approach [49, 50].
Gene Regulatory Network Inference: An Introductory Survey 17
7 Evaluation
0.9
0.9
0.8
0.8
True positive rate
0.7
Edge ranking
0.7
Precision
0.6
0.5 0.6
0.4
0.5
0.3
0.4
0.2
0.3
0.1
0 1 0 0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Edge weight False positive rate Recall
Fig. 5 Evaluation of inferred networks: an algorithm typically outputs a ranked list of edges, with the weight
of each edge being given by either a score or a posterior probability (left panel, where true and false edges
are colored in yellow and red, respectively). By progressively lowering the threshold for acceptance, one can
construct either a ROC curve (central panel) or a precision-recall curve (right panel)
Gene Regulatory Network Inference: An Introductory Survey 19
8 Software Tools
Au matin
Au matin avec le tombereau noir…
Nous remonterons à Vine street, au matin !
— C’est à Bow street que vous allez venir, vous, dit l’agent avec
aigreur.
— Cet homme est mourant. (Il geignait, étendu sur le pavé.)
Amenez l’ambulance, dis-je.
Il y a une ambulance derrière St. Clément Danes, ce en quoi je
suis mieux renseigné que beaucoup. L’agent, paraît-il, possédait les
clefs du kiosque où elle gîtait. Nous la sortîmes (c’était un engin à
trois roues, pourvu d’une capote) et nous jetâmes dessus le corps
de l’individu.
Placé dans une voiture d’ambulance, un corps a l’air aussi mort
que possible. A la vue des semelles de bottes roides, les agents se
radoucirent.
— Allons-y donc, firent-ils.
Je m’imaginai qu’ils parlaient toujours de Bow street.
— Laissez-moi voir Dempsey trois minutes, s’il est de service,
répliquai-je.
— Entendu. Il y est.
Je compris alors que tout irait bien, mais avant de nous mettre en
route, je passai la tête sous la capote de l’ambulance, pour voir si
l’individu était encore en vie. Mon oreille perçut un chuchotement
discret.
— Petit gars, tu devras me payer un nouveau chapeau. Ils m’ont
crevé le mien. Ne va pas me lâcher à cette heure, petit gars. Avec
mes cheveux gris je suis trop vieux pour aller en prison par ta faute.
Ne me lâche pas, petit gars.
— Vous aurez de la chance si vous vous en tirez à moins de sept
ans, dis-je à l’agent.
Mûs par une crainte très vive d’avoir outrepassé leur devoir, les
deux agents quittèrent leurs secteurs de surveillance, et le lugubre
convoi se déroula le long du Strand désert. Je savais qu’une fois
arrivé à l’ouest d’Adelphi je serais en pays ami. Les agents
également eurent sujet de le savoir, car tandis que je marchais
fièrement à quelques pas en avant du catafalque, un autre agent me
jeta au passage :
— Bonsoir, monsieur.
— Là, vous voyez, dis-je avec hauteur. Je ne voudrais pour rien
au monde être dans votre peau. Ma parole, j’ai bonne envie de vous
mener tous deux à la préfecture de police.
— Si ce monsieur est de vos amis, peut-être… dit l’agent qui
avait asséné le coup et songeait aux conséquences de son acte.
— Peut-être aimeriez-vous me voir partir sans rien dire de
l’aventure, complétai-je.
Alors apparut à nos yeux la silhouette du brigadier Dempsey, que
son imperméable rendait pour moi pareil à un ange de lumière. Je le
connaissais depuis des mois, il était de mes meilleurs amis, et il
nous arrivait de bavarder ensemble dans le petit matin. Les sots
cherchent à gagner les bonnes grâces des princes et des ministres,
et les cours et ministères les laissent périr misérablement. Le sage
se fait des alliés parmi la police et les cochers de fiacre, en sorte que
ses amis jaillissent du kiosque et de la file de voitures, et que ses
méfaits eux-mêmes se terminent en cortèges triomphaux.
— Dempsey, dis-je, y aurait-il eu une nouvelle grève dans la
police ? On a mis de faction à St. Clément Danes des êtres qui
veulent m’emmener à Bow street comme étrangleur.
— Mon Dieu, monsieur ! fit Dempsey, indigné.
— Dites-leur que je ne suis pas un étrangleur ni un voleur. Il est
tout bonnement honteux qu’un honnête homme ne puisse se
promener dans le Strand sans être malmené par ces rustres. L’un
d’eux a fait son possible pour tuer mon ami ici présent ; et j’emmène
le cadavre chez lui. Parlez en ma faveur, Dempsey.
Les agents dont je faisais ce triste portrait n’eurent pas le temps
de placer un mot. Dempsey les interpella en des termes bien faits
pour les effrayer. Ils voulurent se justifier, mais Dempsey entreprit
une énumération glorieuse de mes vertus, telles qu’elles lui étaient
apparues à la lumière du gaz dans les heures matinales.
— Et en outre, conclut-il avec véhémence, il écrit dans les
journaux. Hein, ça vous plairait, qu’il parle de vous dans les
journaux… et en vers, encore, selon son habitude. Laissez-le donc.
Voilà des mois que lui et moi nous sommes copains.
— Et le mort, qu’en fait-on ? dit l’agent qui n’avait pas asséné le
coup.
— Je vais vous le dire, répliquai-je, me radoucissant.
Et aux trois agents assemblés sous les lumières de Charing
Cross, je fis un récit fidèle et détaillé de mes aventures de la nuit, en
commençant par le Breslau et finissant à St. Clément Danes. Je leur
dépeignis le vieux gredin couché dans la voiture d’ambulance en des
termes qui firent se tortiller ce dernier, et depuis la création de la
police métropolitaine, jamais trois agents ne rirent comme ces trois-
là. Le Strand en retentit, et les louches oiseaux de nuit en restèrent
ébahis.
— Ah Dieu ! fit Dempsey en s’essuyant les yeux, j’aurais donné
gros pour voir ce vieux type galoper avec sa couverture mouillée et
le reste. Excusez-moi, monsieur, mais vous devriez vous faire
ramasser chaque nuit pour nous donner du bon temps.
Et il se répandit en nouveaux esclaffements.
Des pièces d’argent tintèrent, et les deux agents de St. Clément
Danes regagnèrent vivement leurs secteurs : ils riaient tout courants.
— Emmenez-le à Charing Cross, me dit Dempsey entre ses
éclats de rire. On renverra l’ambulance dans la matinée.
— Petit gars, tu m’as appelé de vilains noms, mais je suis trop
vieux pour aller à l’hôpital. Ne me lâche pas, petit gars. Emmène-moi
chez moi auprès de ma femme, dit la voix sortant de l’ambulance.
— Il n’est pas tellement malade. Sa femme lui flanquera un
fameux savon, dit Dempsey qui était marié.
— Où logez-vous ? demandai-je.
— A Brugglesmith, me fut-il répondu.
— Qu’est-ce que c’est que ça ? demandai-je à Dempsey, plus
versé que moi dans les mots composés de ce genre.
— Quartier de Brook Green, arrondissement d’Hammersmith,
traduisit aussitôt Dempsey.
— Évidemment, repris-je. Il ne pouvait pas loger ailleurs. Je
m’étonne seulement que ce ne soit pas à Kew [22] .
[22] Brook Green se trouve à l’extrême ouest de
Londres, à six kilomètres et demi de Charing Cross. Kew
est encore plus loin, dans la même direction.