Nothing Special   »   [go: up one dir, main page]

Genome Organization in E. Coli

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

GENOME ORGANIZATION IN E.

coli

• Escherichia coli, also known as E. coli, is a Gram-negative, facultative anaerobic, rod-


shaped, coliform bacterium of the genus Escherichia which is usually found in colon
region of endothermic organisms.
• E. coli is an extremely important organism. It is found in the lower intestines of animals,
including humans, and survives well when introduced into the environment.
• Pathogenic E. coli strains make the news all too frequently as humans develop
sometimes deadly enteric and other infections after contacting the bacterium at
restaurants (e.g., in tainted meat or on vegetables exposed to raw sewage) or in the
environment (e.g., in lakes with contamination).
• In the laboratory, nonpathogenic E. coli has been an extremely important model
system for molecular biology, genetics, and biotechnology. Thus, the complete
genome sequence of this bacterium was awaited eagerly.
• The E.coli cells are able to survive outside the body for a limited amount of time
which makes them potential indicator organisms to test environmental samples.
• Most of the E. coli strains are harmless, but some strains such as O157:H7 and
some serotypes such as EPEC can cause serious food poisoning in their
hosts(humans etc.) which may be deadly. For example, the above mentioned strain
of E. coli O157:H7 cause 60 deaths per year in United States. The most common
agents of such infection are sources such as tainted meat, raw vegetables exposed
to sewage etc.
• In the 1950s and 1960s, this bacterium became the model organism of choice for
prokaryotic research when a group of scientists used phase-contrast microscopy and
autoradiography to show that the essential genes of E. coli are encoded on a single
circular chromosome packaged within the cell nucleoid (Mason & Powelson, 1956;
Cairns, 1963).
• E. coli chromosome is several orders of magnitude larger than the cell itself.

Figure: E. coli cell- A prokaryotic model (Figure not to scale)


• During the 1980s and 1990s, researchers discovered that multiple proteins act together to
fold and condense the E. coli DNA. In particular, one protein called HU, which is the most
abundant protein in the nucleoid, works with an enzyme called topoisomerase I to bind
DNA and introduce sharp bends in the chromosome, generating the tension necessary
for negative supercoiling. Recent studies have also shown that other proteins, including
integration host factor (IHF), can bind to specific sequences within the genome and
introduce additional bends (Rice et al., 1996). The folded DNA is then organized into a
variety of conformations (Sinden & Pettijohn, 1981) that are supercoiled and wound
around tetramers of the HU protein, much like eukaryotic chromosomes are wrapped
around histones (Murphy &
Zimmerman, 1997). Once the genome has been condensed, DNA topoisomerase I,
DNA gyrase, and other proteins help maintain the supercoils. One of these
maintenance proteins, H-NS, plays an active role in transcription by modulating the
expression of the genes involved in the response to environmental stimuli. Another
maintenance protein, factor for inversion stimulation (FIS), is abundant during
exponential growth and regulates the expression of more than 231 genes, including DNA
topoisomerase I (Bradley et al., 2007).
• In 1997, the annotated genome sequence of lab strain E. coli K12 was reported by
researchers at the E. coli Genome Center at the University of Wisconsin, Madison.
• It was the first genomic sequence of a cellular organism that had undergone extensive
genetic analysis. An unannotated sequence of the E. coli genome made up of sequence
segments from more than one strain was reported at the same time by Takashi Horiuchi
of Japan.
• Subsequently, several other E. coli strains have been sequenced. One of the strains
sequenced by Horiuchi was O157:H7, the strain that is responsible for approximately
70,000 cases of foodborne illness, and about 60 deaths, per year in the United States.
• The circular strain K12 genome was sequenced using the whole-genome shotgun
approach. The genome of E. coli is 4.64 Mb (4,639,221 bp). The 4,288 ORFs make up
87.8% of the genome. Thirty-eight percent of the ORFs had unknown functions.Protein-
coding genes account for 87.8% of the genome, 0.8% encodes stable RNAs, and 0.7%
consists of noncoding repeats, leaving ∼11% for regulatory and other functions.
• Physical Characteristics of the E. coli genome:

o Single chromosome/cell (haploid)


o 4.6 x 106 bp (4600 kilobases)
o About 4300 potential coding sequences
o Only about 1800 known E. coli proteins
o 70% is composed of single (monocistronic) genes
o 6% is polycistronic
o Roughly equal number of genes on each strand
o Average gene size is 1 Kb, No Introns
o Transposons : Strain specific, 60 copies per genome
o 8% Human homologs
• The average distance between E. coli genes is 118 bp. The 70 intergenic regions
larger than 600 bp were reevaluated for the presence of ORFs and searched for DNA
sequence and protein coding features. Closer inspection revealed that 15 of these
regions contain previously unannotated ORFs, which in most cases were overlooked
because of their small size. An additional 11 intergenic regionscontain
• sequence features such as long untranslated leader sequences [for example, oppA
messenger RNA (mRNA) extends ∼500 bp upstream of the start codon] or well-
characterized control regions [for example, the araFGH operon control region].
• The origin and terminus of replication divide the genome into oppositely replicated
halves, termed replichores. Replichore 1, which is replicated clockwise, has the
presented strand of E. coli as its leading strand; in replichore 2 the complementary
strand is the leading one. Many features of E. coli are oriented with respect to
• replication. All seven ribosomal RNA (rRNA) operons, and 53 of 86 tRNA genes, are
expressed in the direction of replication. Approximately 55% of protein-coding genes are
also aligned with the direction of replication.
• Escherichia colihave an array of 14 flagellar synthesis genes (b1070 to b1083), only two
of which have been previously reported: flgM andflgL. One additional gene is
• involved with initiation of filament assembly: flgN, which precedes flgM, a negative
regulator of flagellin synthesis.
• A number of repeated sequences have been characterized in the E. coli genome.
• The largest repeated sequences in E. coli K-12 are the five Rhs elements (all previously
described), which are 5.7 to 9.6 kb in length and together comprise 0.8% of the genome.
They have no known function, although strain comparisons suggest they may be mobile
elements.

The E. coli genome is organized as plectonemic supercoils.

The circular nature of the E. coli chromosome makes it a topologically constrained


molecule that is mostly negatively supercoiled with an estimated average supercoiling
density (σ) of -0.05. In the E. coli nucleoid, about half of the chromosomal DNA is organized
in the form of free, plectonemic supercoils. The remaining DNA is restrained in either the
plectonemic form or alternative forms including but not limited to the toroidal form, by
interaction with proteins such as NAPs. Thus, plectonemic supercoils represent effective
supercoiling of the E. coli genome that is responsible for its condensation and organization.
Both plectonemic and toroidal supercoiling aid in DNA condensation. It is noteworthy that
because of the branching of plectonemic structures, it provides less DNA condensation than
does the toroidal structure. In addition to condensing DNA, supercoiling aids in DNA
organization. It promotes DNA
disentanglement by reducing the probability of catenation. Supercoiling also helps bring two
distant sites of DNA in proximity thereby promoting a potential functional interaction between
different segments of DNA.

• The E. coli chromosome was found to consist of 31 chromosomal interaction


domains (CIDs) in the growth phase. The size of the CIDs ranged from 40
to ~300 kb. It appears that a supercoiling-diffusion barrier responsible for
segregating plectonemic DNA loops into topological domains functions as a
CID boundary in E.coli. In other words, the presence of a supercoiling-
diffusion barrier defines the formation of CIDs.
• Histone-like protein from E. coli strain U93 is an evolutionarily conserved
protein in bacteria. HU exists in E. coli as homo- and heterodimers of two
subunits HUα and HUβ sharing 69% amino acid identity.

• Although bacteria do not have histones, they possess a group of DNA binding
proteins referred to as nucleoid-associated proteins (NAPs) that are
functionally analogous to histones in a broad sense. NAPs are highly abundant
and constitute a significant proportion of the protein component of the
nucleoid. A distinctive characteristic of NAPs is their ability to bind DNA in both
a specific (either sequence- or structure-specific) and non-sequence specific
manner. As a result, NAPs are dual function proteins.
• There are at least 12 NAPs identified in E. coli, the most extensively studied
NAPs are HU, IHF, H-NS, and Fis.

Genome-wide occupancy of nucleoid-associated proteins (NAPs) of E. coli.

A. The circular layout of the E. coli genome (as shown in Fig 1A) additionally depicting the
genome occupancy of indicated NAPs in the growth phase. B. The genome occupancy of
indicated NAPs in the stationary phase. The genome layout is the same as in A. The genome
occupancy of each NAP, determined by ChIP-Seq, is plotted as a histogram (bin size 300 bp)
in which the bar height is indicative of relative binding enrichment.
The “100 min” map of E. coli genome

• The “100 minute map” is a time-based map of the E. coli genome. Based on
the assumption/observation that it takes 100 minutes to replicate the
genome, the map is a listing of at what points in time a particular gene is
copied; in this case, it is
looking at clusters of genes (it is important to note that most genes are not
clustered).

Applications

• Plays an important role in genetic engineering and industrial microbiology. Researchers can
introduce genes into microbes using plasmids which permit high level expression of protein, and
such proteins maybe mass produced in industrial fermentation process.
• Plasmids and restriction enzymes used to create recombinant DNA in E. coli was the basis of
biotechnology. Due to the low cost and speed with which it can be grown and modified in laboratory
settings. E. coli is a popular expression platform for production of the recombinant proteins used in
therapeutics.
REFERENCES
1. Russell, P. J. (2010). iGenetics: A molecular approach. 3rd edition. San Francisco:
Benjamin Cummings
2. Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, et al. The complete
genome sequence of Escherichia coli K-12. Science. 1997 Sep 5; 277(5331): 1453–
62.
3. Griswold, A. (2008) Genome packaging in prokaryotes: the circular chromosome of E.
coli.
Nature Education 1(1):57
4. Verma SC, Qian Z, Adhya SL (2019) Architecture of the Escherichia coli
nucleoid. PLOS Genetics 15(12): e1008456.
https://doi.org/10.1371/journal.pgen.1008456

You might also like