Precise and reliable gene expression via standard transcription and translation initiation elements

ARTICLES Precise and reliable gene expression via standard transcription and translation initiation elements npg © 2013 Nature America, Inc. All rights reserved. Vivek K Mutalik1–3, Joao C Guimaraes1,3,4, Guillaume Cambray1,3, Colin Lam1,3, Marc Juul Christoffersen1,3, Quynh-Anh Mai1,3, Andrew B Tran1,3, Morgan Paull1, Jay D Keasling1–3,5,6, Adam P Arkin1–3,8 & Drew Endy1,7,8 An inability to reliably predict quantitative behaviors for novel combinations of genetic elements limits the rational engineering of biological systems. We developed an expression cassette architecture for genetic elements controlling transcription and translation initiation in Escherichia coli: transcription elements encode a common mRNA start, and translation elements use an overlapping genetic motif found in many natural systems. We engineered libraries of constitutive and repressor-regulated promoters along with translation initiation elements following these definitions. We measured activity distributions for each library and selected elements that collectively resulted in expression across a 1,000-fold observed dynamic range. We studied all combinations of curated elements, demonstrating that arbitrary genes are reliably expressed to within twofold relative target expression windows with ~93% reliability. We expect the genetic element definitions validated here can be collectively expanded to create collections of public-domain standard biological parts that support reliable forward engineering of gene expression at genome scales. One main goal of synthetic biology is to make the engineering of biology easier1,2. DNA synthesis and assembly has progressed to the point where entire metabolic pathways, chromosomes and genomes can now be synthesized and transplanted3–5. However, our capacity to rationally design increasingly complicated genetic systems as enabled by improvements in DNA construction methods has not kept pace2,6. One of the greatest claimed barriers to efficient and scalable genetic design is the lack of standard parts that can be reused reliably in novel combinations6,7. Many examples instead highlight, even within well-studied organisms such as E. coli, how seemingly simple genetic functions behave differently in different settings8,9. For example, a prokaryotic ribosome-binding site (RBS) element that initiates translation for one coding sequence might not function at all with another coding sequence10. If the genetic elements that encode control of central cellular processes such as transcription and translation cannot be reliably reused, then there is little chance that higherorder objects encoded from such basic elements will be reliable in larger-scale systems6,11. Standard biological parts could, in theory, enable hierarchical abstraction of biological functions1,2,12,13. The behavior of integrated genetic systems could then be represented via simpler models of individual elements and ultimately mapped to underlying genetic sequences whose encoded functions are dependent on a limited number of measurable or calculable intrinsic variables. Such abstraction of function seems necessary to manage biological complexity and to allow the engineering of increasingly sophisticated genetic systems6,12,14. We engineered ~500 transcription and translation initiation elements that are compatible within a standardized genetic context, or expression operating unit (EOU), that enables predictable forward engineering of gene expression over a wide dynamic range. We characterized representative parts for each type by testing more than 1,200 part-part combinations to establish and validate functional composition rules while quantifying scores for part activity. From this data we also estimated the ‘quality’ of each part, a second-order statistic that represents the extent to which the activity of a part varies across changes in context15. Our results demonstrate how, when combined with standardized transcription control elements, a more physically complex design for the control of translation initiation creates simply modeled parts enabling reliable forward engineering of gene expression. RESULTS Prioritizing part composition puzzles In related work, we systematically assembled and tested all combinations of frequently used prokaryotic transcription and translation control elements to quantify average part activities and also variation in activities as parts are reused in novel combinations15. Here we focus on developing rules for a genetic layout architecture underlying gene expression cassettes that eliminate 1BIOFAB International Open Facility Advancing Biotechnology, Emeryville, California, USA. 2Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, USA. 3Department of Bioengineering, University of California, Berkeley, Berkeley, California, USA. 4Department of Informatics, Computer Science and Technology Center, University of Minho, Campus de Gualtar, Braga, Portugal. 5Department of Chemical & Biomolecular Engineering, University of California, Berkeley, Berkeley, California, USA. 6Joint BioEnergy Institute, Emeryville, California, USA. 7Department of Bioengineering, Stanford University, Stanford, California, USA. 8These authors contributed equally to this work. Correspondence should be addressed to D.E. (endy@stanford.edu) or A.P.A. (aparkin@lbl.gov). RECEIVED 30 AUGUST 2012; ACCEPTED 14 FEBRUARY 2013; PUBLISHED ONLINE 10 MARCH 2013; DOI:10.1038/NMETH.2404 354 | VOL.10 NO.4 | APRIL 2013 | NATURE METHODS ARTICLES npg © 2013 Nature America, Inc. All rights reserved. a BIOFAB expression operating unit (EOU) Upstream insulator Transcription element Translation initiation element Gene of 3′ UTR interest (GOI) RBS1 RBS2 c 800 900 700 600 500 400 300 GFP fluorescence (a.u.) b 1,000 700 Downstream insulator Translationally coupled BCD:GOI junction Standard +1 promoter:5′ UTR junction 800 Transcription terminator Standard BCD variant Standard promoter variant GFP fluorescence (a.u.) Figure 1 | Rules for regularizing gene expression. (a) We defined an expression operating unit (EOU) to set boundaries and junctions of functional genetic elements underlying the expression of heterologous genes (Supplementary Note). The variable regions within each element type (wider icons) and the standard junctions (labeled lines) between elements that best enable reliable reuse of elements in novel combinations are detailed. The bicistronic design (BCD) with its two Shine-Dalgarno motifs (SD1 and SD2) is shown. (b) Rank-ordered library of constitutive promoters that encode an expected common +1 mRNA boundary and 5′ UTR leader sequence. a.u., arbitrary units. (c) Rank-ordered library of SD2 sites that adhere to the BCD and resulting BCD:GOI junction as established here. Error bars, s.d. (n = 3). 600 500 400 300 200 200 functional uncertainty arising from the 100 100 reuse of transcription and translation ini0 0 300 rank-ordered promoters 168 SD2 rank-ordered randomized BCD variants tiation elements with any gene of interest (GOI) (Fig. 1). Although we herein conWe instead sought an architecture for 5′ UTR:GOI junctions sider only three elements—promoters, 5′ UTRs and GOIs—and two element-element junctions—promoters:5′ UTRs and 5′ UTRs: that would allow an RBS to more reliably encode a distinct and GOIs (Fig. 1)—subsequent work can expand the EOU architecture sequence-specific translation initiation rate without sensitivand variants thereof in a distributed and asynchronous fashion15. ity to variation in the coding sequence of downstream GOIs. Recent studies have focused on regularizing a few examples of We reconsidered past work with difficult-to-express propromoter:5′ UTR junctions via active enzymatic processing of teins and also reexamined the detailed architecture of natural mRNA16,17. However, from our prior systematic study of many polycistronic operons 21–27. Of particular interest were promoter:5′ UTR and 5′ UTR:GOI combinations, we found that past examples in which a second, independently translated variation in translation initiation rates arising from irregular 5′ coding sequence is positioned immediately upstream of or UTR:GOI junctions produced most of observed expression irreg- slightly overlapping with the coding sequence of any given ularities (14% of 17% total)15. Given this information and fur- GOI22,26. In such arrangements, the RBS for the GOI is entirely ther noting that, in prokaryotes, irregularities arising specifically embedded in the coding sequence of the upstream gene, and across 5′ UTR:GOI junctions cannot be eliminated by enzymatic translation of the downstream cistron might thus be coupled cleavage between a Shine-Dalgarno (SD) sequence and translation to translation of the upstream cistron21–26. More specifically, start codon, we decided to first pursue the reliable initiation of the intrinsic helicase activity of ribosomes arriving at the stop codon of an upstream cistron might eliminate inhibitory RNA translation for any gene coding sequence. Differential formation of mRNA secondary structures span- structures that would otherwise disrupt translation initiation of the downstream GOI21–26,28,29. ning 5′ UTR:GOI junctions that then influence ribosome binding To explore whether overlapping genetic elements and active or initiation has long been recognized as a major determinant of variation in translation initiation rates10,18 (Supplementary translation coupling might reliably improve translation initiation, Fig. 1). Given the absence of reliably reusable translation initia- we considered genetic designs that encode short leader peptides tion elements, current engineering methods require construction followed by a downstream GOI25,26. One design encodes a of multiple variant RBSs or recoded coding sequences followed by 16-amino-acid leader peptide in a first cistron that overlaps by 1 base experimental screening to obtain desired expression levels, pre- pair with a variable downstream coding sequence, encoding both sumably through changes in translation initiation efficiency7,10,19. a stop and start codon via a −1 frame shift (Fig. 1a)26. The leader For example, the best available computational tool for designing peptide is synthesized by ribosomes that bind to an upstream context-optimized translation control elements for use in E. coli SD core sequence (SD1); translation of the downstream GOI is gives an ~47% chance to design elements that express proteins to thought to result, primarily, from SD1-directed ribosomes that within twofold of a target expression level10; we note that such recognize and reinitiate translation via a second SD site (SD2) quantitative precision in detailing the compositional reliability that is encoded entirely within the coding sequence of the leader of designer genetic elements is rare yet necessary to evaluate peptide21,22,24,26. We termed this translational coupling archiand improve current engineering practice. However, given cur- tecture a ‘bicistronic design’ (BCD) to acknowledge the major rent forward-engineering design capacities, if a specific protein difference from conventional ‘monocistronic designs’ (MCDs), expression level is required, then repeated design attempts must in which translation of coding sequences initiates from an SD be synthesized and tested experimentally, thereby often resulting site that does not overlap with other functional sequences 25,26. in combinatorial increases in required design attempts as system We found that, unlike SD motifs encoded within MCDs, those encoded within BCDs could initiate protein synthesis even if the complexity increases2,20. NATURE METHODS | VOL.10 NO.4 | APRIL 2013 | 355 npg MCD 5′ UTR variant GOI-36 gfp or rfp MCD2 MCD1 MCD5 MCD7 MCD6 MCD11 MCD10 MCD9 MCD13 MCD15 MCD12 0 MCD17 MCD19 MCD14 MCD18 MCD20 MCD21 MCD16 MCD23 MCD24 MCD8 –5.0 MCD22 0.93 0.65 1.05 0.91 1.60 2.59 1.19 1.19 1.85 1.01 1.57 1.00 8.10 1.53 2.00 1.50 3.52 3.23 0.94 2.52 0.95 0.51 Translationally coupled BCD:GOI junction 0.11 0.11 0.05 0.04 0.16 0.13 0.15 0.13 0.07 0.25 0.17 0.11 0.13 0.14 0.07 0.14 0.71 0.33 0.20 0.35 0.38 0.38 0. 7 0. 8 8 0. 6 8 0. 8 8 0. 8 8 0. 9 8 0. 8 8 0. 5 7 0. 3 8 0. 4 8 0. 1 8 0. 5 8 0. 4 8 0. 7 89 BCD2 BCD1 BCD5 BCD7 BCD6 BCD11 BCD10 BCD9 BCD13 BCD15 BCD12 BCD17 BCD19 BCD14 BCD18 BCD20 BCD21 BCD16 BCD23 BCD24 BCD8 BCD22 0. 4 0. 9 2 0. 1 2 0. 4 4 0. 4 5 0. 2 5 0. 6 4 0. 3 0 0. 3 5 0. 7 5 0. 4 4 0. 5 5 0. 2 5 0. 1 51 3.0 c MCD:GOI Standard +1 promoter:5′ UTR junction Variance Lacl-36-GFP AraC-36-GFP RFP-36-GFP RFP Cell-36-GFP Cell-36-RFP TetR-36-GFP TetR-Full-GFP PMK-36-GFP PMK-36-RFP PA-36-GFP PA-36-RFP GFP GFP-36-RFP Standard +1 promoter:5′ UTR junction gfp or Standard BCD GOI-36 rfp variant b Standard Ptrc* Average Spearman rank correlation (rho) Average Spearman rank correlation (rho) e Variance Standard Ptrc* Lacl-36-GFP AraC-36-GFP RFP-36-GFP RFP Cell-36-GFP Cell-36-RFP TetR-36-GFP TetR-Full-GFP PMK-36-GFP PMK-36-RFP PA-36-GFP PA-36-RFP GFP GFP-36-RFP a Fluorescence (mean centered, a.u., log2) f 16% d BCD:GOI 1.5% 10 BCD max 102 BCD min 101 MCD min 100 SD2 variants (strongest to weakest) BCD 98.4% coding sequence for the GOI contained a perfect reverse complement to the cognate SD site (Supplementary Fig. 1), implying that translation from SD1 disrupts mRNA structure spanning the junction between cistrons such that translation initiation from SD2 is restored. Precise and reliable translation initiation We then sought to establish whether the BCD could be generalized so as to initiate synthesis of many proteins across a wide range of translation initiation rates generated by varying the SD sequence to modify differential ribosome-binding affinities30. Though the significance of specific SD2 sequence elements has been recognized in a few naturally coupled cistrons22–24, there are no reports of engineering a library of SD2 variants to finetune expression of a downstream GOI. We hypothesized that, for a given SD1 sequence element, a wide range of translation initiation rates could be obtained within a BCD by varying the embedded SD2 sequence. We randomized an SD2 motif, preserving a 3-nucleotide (nt) consensus core, and obtained several hundred sequence-distinct clones encoding a ~600-fold range of reporter-protein expression (Fig. 1c, Supplementary Fig. 2 and Online Methods). From this BCD library, we chose 22 SD2 candidates of different strengths to test whether each retained its relative encoded strength when used to express sequence-distinct genes (Supplementary Table 1). Also, to directly compare the performance of BCDs to conventional MCDs, we used the same SD2 sequences in MCDs. We then assembled a test panel of 14 chimeric reporter GOIs by 356 | VOL.10 NO.4 | APRIL 2013 | NATURE METHODS 0.2 MCD max 3 Correlation coefficient (r) MCD 84% 4 0 –0.2 BCD MCD SD2 AUG ∆G 16S rRNA–SD2 pairing –0.4 –0.6 –0.8 –1.0 Lacl-36-GFP AraC-36-GFP RFP-36-GFP RFP Cell-36-GFP Cell-36-RFP TetR-36-GFP PMK-36-GFP PMK-36-RFP PA-36-GFP PA-36-RFP GFP GFP-36-RFP 10 2 1 5 7 6 11 10 9 13 15 12 17 19 14 18 20 21 16 23 24 8 22 Figure 2 | Standard translation initiation elements using a bicistronic design are reliably reusable. (a) Gene expression via a regularized mediumstrength promoter (Ptrc; asterisk indicates an absent operator sequence) and 22 monocistronic design (MCD) 5′ UTRs of varying expression strength. Eight GOIs coding for a total of 14 chimeric reporter fusions with either gfp or rfp (columns) are shown. The 14 chimeric reporter GOIs are encoded via the first 36 nt of the N-terminal coding sequences of lacI, araC, rfp, gfp, tetR and genes encoding putative cellulase (Cell), phosphomevalonate kinase (PMK) and penicillin acylase (PA) and via the full-length coding sequence of tetR (Online Methods). Variance in mean-centered log2 expression (left) from each MCD across all GOIs sequences (right) and average Spearman rank correlations (bottom) as given (Supplementary Fig. 8). a.u., arbitrary units. (b) The same SD sequences used in a encoded within bicistronic designs (BCDs). Rank orderings for a and b were established via data of b. Variance in mean-centered log2 expression from each BCD across all GOIs (right) and average Spearman rank correlations (bottom) as given (Supplementary Fig. 6). (c,d) Analysis of variance (Online Methods) in total protein synthesis levels realized using the MCDs (c) or BCDs (d). (e) Comparison of absolute GFP synthesis ranges produced using MCDs or BCDs across all tested GOIs. (f) Predicted hybridization free energies between 16S rRNA and SD sequences are better correlated to expression for BCDs than that for MCDs (Supplementary Figs. 11 and 12). Expression range for all GFP fusions © 2013 Nature America, Inc. All rights reserved. ARTICLES fusing the first 36 nt (a length thought sufficient to encompass effects of ribosome footprint and mRNA secondary-structure formation on translation initiation 31–33) from the coding sequences of eight transcription factors or enzymes in-frame to the second codon of a gene encoding GFP or RFP (Online Methods). For added controls, we included a chimeric reporter protein encoded by a full-length tetR coding sequence that is fused in-frame to gfp in addition to the full-length gfp and rfp reporter genes (Online Methods). RNA free-energy (∆G) predictions indicated that our GOI set was expected to form a range of stable mRNA secondary structures spanning BCD:GOI junctions (∆G from −7 to −24 kcal mol−1; Supplementary Fig. 3). We assembled two full combinatorial test libraries in which 22 MCDs or 22 BCDs were used to translate the 14 chimeric reporter GOIs (Online Methods, Supplementary Table 1). We quantified absolute and mean-normalized expression levels by measuring single-cell fluorescence from all 308 MCD:GOI and 308 BCD:GOI combinations (Online Methods, Fig. 2 and Supplementary Figs. 4–8). We observed, as expected, that the synthesis of proteins from conventional MCDs was highly sensitive to changes in the coding sequences of genes (~0.4 average Spearman rank correlation (rho) between any two GOIs; Fig. 2a and Supplementary Figs. 7 and 8). For example, in a direct comparison of absolute expression, MCD10 driving the lacI-36gfp fusion produced ~142-fold more fluorescence than MCD10 driving the araC-36-gfp fusion, whereas MCD24 driving the ARTICLES Standard Ptrc* or T7 promoter Standard BCD variant c b T7Standard promoter Translationally coupled BCD:GOI junction Standard +1 promoter:5′ UTR junction 2,000 1,800 1,600 1,400 1,200 1,000 800 600 400 200 0 gfp rho = 0.93 0 100 200 300 400 500 600 700 (Ptrc*:BCD) GFP fluorescence (a.u.) Standard Ptrc* Standard +1 promoter:5′ UTR junction Standard BCD variant WT-SD1 Null-SD1 gfp rho = 0.85 350 300 250 200 150 100 50 0 0 500 1,000 1,500 2,000 (T7:BCD) GFP fluorescence (a.u.) Rare or early stop codons in leader peptide of standard BCD variant gfp or rfp Translationally coupled BCD:GOI junction 700 WT-SD1-BCD Null-SD1-BCD 140 GFP RFP 600 WT l4F5R6 R4F5G6 L4F5R6 I4F5*6 WT l4F5R6 R4F5G6 L4F5R6 I4F5*6 WT l4F5R6 R4F5G6 L4F5R6 I4F5*6 GFP fluorescence (a.u.) WT l4F5R6 R4F5G6 L4F5R6 I4F5*6 2 1 7 6 5 10 11 12 13 19 17 14 18 15 20 21 8 23 16 24 22 GFP fluorescence (a.u.) 120 araC-36-gfp fusion produced ~32-fold 500 100 500 more fluorescence than MCD24 driving the 400 80 400 lacI-36-gfp fusion (Supplementary Fig. 7). 300 60 300 In contrast, we observed that the same 22 200 40 200 SD2 motifs, when used within BCDs, main100 20 100 tained their relative fluorescence regard0 0 0 less of the downstream GOI (Fig. 2b). For SD2 variants (strongest to weakest) example, BCD10 led to only ~1.5-fold BCD12 BCD19 BCD2 BCD6 more lacI-36-gfp than araC-36-gfp expression, which was achieved by both reducing MCD-mediated lacI-36-gfp overexpression (~63% decrease) and activities of different strength BCDs are best mapped to a relaincreasing araC-36-gfp underexpression (~34-fold increase), as tively simpler core SD2 sequence motif. calculated by comparing absolute MCD10- and BCD10-mediated We explored whether BCD performance is limited to particular expression levels (Supplementary Figs. 5 and 7). Within the transcription systems or specific internal sequences. First, we used BCDs, each SD2 reliably encoded a distinct translation initia- a consensus bacteriophage T7 promoter and polymerase to trantion rate across both a wide SD2 activity range and changing GOI scribe BCDs and GOIs. T7 RNA polymerase synthesizes mRNA contexts (average rho ≈ 0.9 between any two GOIs; Fig. 2b and at a rate up to about eightfold faster than native E. coli transcripSupplementary Figs. 5 and 6). Overall, the BCDs reduced varia- tion and translation rates and thus likely results in ribosome-free tion in gene expression levels arising from irregularities spanning 5′ mRNA before ribosome loading and translation initiation34, 5′ UTR:GOI junctions from 16% to 1.5% of the total dynamic potentially leading to changes in mRNA folding or processrange for gene expression (Fig. 2c,d and Supplementary Figs. 9 ing. We found that the T7 expression system increased average and 10). These improvements were achieved through the system- expression levels about fourfold, as expected, and the activities of atic increase of protein synthesis for 5′ UTR:GOI junctions that BCDs remained well correlated to those obtained with a medium encoded below-average synthesis levels within an MCD context strength E. coli promoter (rho ≈ 0.9; Fig. 3a). We confirmed that and the decrease of protein synthesis for 5′ UTR:GOI junctions the T7 transcription system did not significantly disrupt the reliencoding above-average levels within an MCD context (Fig. 2e ability of BCDs across changing GOI contexts (rho ≈ 0.9; Fig. 3b), and Supplementary Figs. 5 and 7). whereas the MCDs showed limited reliability in comparison We determined that an equilibrium thermodynamic model (rho ≈ 0.5; Supplementary Fig. 16). We also demonstrated that based solely on the predicted free energies of binding between 16S an active SD1 motif is required to enable reliable initiation at SD2 rRNA and SD2 sequences is well correlated with observed BCD- motifs of different strengths and to translate downstream GOIs mediated protein synthesis (BCD average Pearson correlation (Fig. 3c). Such results are in agreement with earlier studies on coefficient (r) ≈ −0.8 versus MCD average r ≈ −0.4; Fig. 2f and naturally coupled cistrons in which the significance of varying Supplementary Figs. 11 and 12), further suggesting that the BCD SD1 has been explored to a limited extent within the context of isolates translation initiation activity from variation in downstream a stronger and unchanging SD2 sequence21,24. We determined gene context. Composite free-energy calculations from a statistical that the introduction of rare codons into the leader cistron of thermodynamic model10 that considers intermolecular 16S rRNA the BCD consistently reduced expression levels without major and mRNA base-pairing as well as other sequence features were disruptions to the reliable performance of BCDs, and the addition less well correlated for BCDs but better correlated for MCDs (aver- of a stop codon to the leader cistron nearly eliminated expression age r ≈ −0.6 for both BCD- and MCD-directed protein synthesis; (Fig. 3d, Supplementary Fig. 17 and Online Methods). Finally, Supplementary Figs. 13–15), indicating that the encoded we designed 21 sequence-independent BCDs and confirmed 600 RFP fluorescence (a.u.) © 2013 Nature America, Inc. All rights reserved. Translationally coupled BCD:GOI junction Standard +1 promoter:5′ UTR junction Translationally coupled BCD:GOI junction gfp or rfp 400 d Standard Ptrc* 700 npg Standard BCD variant Standard +1 promoter:5′ UTR junction (T7:BCD) RFP fluorescence (a.u.) a (T7:BCD) GFP fluorescence (a.u.) Figure 3 | Bicistronic designs (BCDs) retain functional reliability with alternate transcription systems and different leader cistrons. (a) Correlated gene expression levels from BCDs with an E. coli Ptrc* promoter (x axis) or bacteriophage T7 (y axis) promoter and RNA polymerase. The asterisk indicates that the promoter has no operator sequence and hence is constitutive in expression. a.u., arbitrary units. (b) Correlated gene expression levels from a phage T7 transcription system but with two GOIs. (c) Rank-ordered GFP expression for BCDs (WT-SD1-BCD) compared to expression for those in which SD1 is disrupted (Null-SD1BCD, schematic). (d) Correlated expression levels from an E. coli promoter but with stop or rare codons inserted in the BCD leader cistron (schematic) across SD2 elements of different expression strengths (x axis, clustered groupings). Error bars, s.d. (n = 3). NATURE METHODS | VOL.10 NO.4 | APRIL 2013 | 357 ARTICLES 22 standard BCD variants c npg 0 P4 P13 P9 P12 P10 P11 P14 P2 P6 BCD2 BCD1 BCD7 BCD6 BCD5 BCD9 BCD11 BCD10 BCD12 BCD14 BCD13 BCD18 BCD15 BCD17 BCD19 BCD20 BCD21 BCD16 BCD23 BCD24 BCD8 BCD22 0.1% Promoter:BCD 3 1% BCD:GOI 2 1 42% BIOFAB BCDs 0 –1 –2 –3 –4 56% BIOFAB promoters Observed fluorescence across all GOIs (mean centered, a.u., log2) f e Standardized junctions R2 = 0.9 2 Irregular junctions R = 0.4 4 P3 BCD2 BCD1 BCD7 BCD6 BCD5 BCD9 BCD11 BCD10 BCD12 BCD14 BCD13 BCD18 BCD15 BCD17 BCD19 BCD20 BCD21 BCD16 BCD23 BCD24 BCD8 BCD22 –4.0 RFP fluorescence (mean centered, a.u., log2) © 2013 Nature America, Inc. All rights reserved. Fluorescence (mean centered, a.u., log2) 3.0 d RFP P4 P13 P9 P12 P10 P11 P14 P8 P2 P6 P1 P5 P7 P3 GFP P1 b Translationally coupled BCD:GOI junction P8 Standard +1 promoter:5′ UTR junction gfp or rfp P5 14 standard promoter variants P7 a 3 R2 = 0.9 2 1 0 –1 RFP PMK PA Cellulase TetR Lacl AraC Linear (RFP) –2 –3 –4 –5 –5 –4 –3 –2 –1 0 1 2 3 4 GFP fluorescence (mean centered, a.u., log2) –6 –5 –4 –3 –2 –1 0 1 2 3 Fluorescence predicted from GFP data (mean centered, a.u., log2) Figure 4 | Precise and reliable gene expression via standard transcription-control and translation-initiation elements. (a) Standard promoters produce mRNA from a common +1 nucleotide position. Translation initiation is entirely encoded by a separate and independent bicistronic design (BCD). (b,c) Mean-centered log2 expression for green (b) and red (c) fluorescent proteins via a full combinatorial library of standardized promoters (14) and BCDs (22). a.u., arbitrary units. (d) Direct correlation of expression from b and c (red circles) against those generated by use of irregular transcriptionand translation-control elements (blue diamonds, data from ref. 15). (e) Factorial analysis of variance for mean-normalized expression from the standard promoter and BCD combinatorial library, with element- and junction-specific contributions to total expression as noted (Online Methods). (f) Correlation of observed versus predicted protein expression for sequence-distinct GOIs, as predicted using expression data from a single GOI (GFP) to estimate activity scores for promoters and BCDs adhering to method for forward-engineering gene expression developed here. Error bars: y axis, s.d. (n = 3); x axis, deviations in predicted values derived from the cross-validated model (Online Methods). Cellulase, putative cellulase; PMK, phosphomevalonate kinase; PA, penicillin acylase. reliable translation initiation across sequence-distinct GOIs (Supplementary Figs. 18 and 19 and Supplementary Table 1). Functional composition and reliable gene expression Building from prior promoter engineering projects35,36 and transcription initiation studies37, we chose to regularize promoter:5′ UTR junctions by using promoters that encode a common +1 mRNA start, thereby hoping to avoid complicating requirements such as post-transcriptional mRNA processing16,17. We developed a library of variable-strength constitutive promoters with consistent putative mRNA start sites that collectively encoded an ~900-fold dynamic range of expressed reporter 358 | VOL.10 NO.4 | APRIL 2013 | NATURE METHODS levels (Fig. 1b and Supplementary Figs. 20–22). We selected 14 sequence- and activity-distinct promoters for further study (Supplementary Table 1 and Supplementary Figs. 22 and 23). We assembled each promoter with all 22 BCDs, and we tested expression using two sequence-distinct GOIs (gfp and rfp; Fig. 4a and Online Methods). We found that the individual rank orderings for promoters and BCDs and resulting GFP or RFP expression levels were systematically maintained and well correlated across a 1,000-fold range for observed protein fluorescence (coefficient of determination (R2) = 0.9; Fig. 4b–d and Supplementary Figs. 24 and 25). An analysis of variance of observed fluorescence indicated that ARTICLES npg © 2013 Nature America, Inc. All rights reserved. 98% of the total dynamic protein expression range was due to encoded differences in the intrinsic activities of individual promoters and BCDs, and not to unknown effects arising from the reuse of these expression control elements in novel combinations (Fig. 4e, Supplementary Fig. 26 and Online Methods). Moreover, a quantitative model for gene expression based only on observed GFP fluorescence levels allowed us to predict observed fluorescence for RFP and other GOIs (R2 = 0.9; Fig. 4f and Online Methods). We also tested the performance of one BCD with variable-strength promoters regulated by one of two popular transcription repressors (Supplementary Fig. 27 and Supplementary Table 1). These results confirmed that BCDs can be used in conjunction with inducible promoters. DISCUSSION Users of the genetic elements described above should achieve an ~93% chance to obtain expected GOI-normalized relative expression for a given gene to within twofold of a target level, which represents an ~87% reduction in forward-engineering expression error compared to the error rates of previously best available methods10 (Online Methods). Our results illustrate that it is possible to overcome many of the challenges thought to limit the engineering of synthetic biological systems via standard biological parts: (i) lack of systematic part characterization, (ii) incompatibility of performance within part collections, (iii) variable part performance across changing genetic contexts and (iv) lack of precise and predictable behavior when used38. However, just as one early, reliable screw-thread standard39 did not itself enable all of mechanical engineering, much work remains in, for example, expanding EOU architectures to incorporate and validate additional genetic functions in E. coli and across many organisms. In establishing reliable promoter:5′ UTR and 5′ UTR:GOI junctions, we used two distinct strategies. The promoter:5′ UTR junction was simply regularized by ensuring that promoters do not contribute mRNA sequence to a standardized 5′ UTR sequence, thereby providing simple functional decoupling. However, rendering a standard and predictably functioning 5′ UTR:GOI junction required a genetic layout in which genetic elements were nested, overlapping and functionally coupled as is common to many natural genetic systems (microbes, phages, viruses and some eukaryotes)22,23,40,41. In contrast, designers of early and ongoing synthetic biology ‘refactoring’ projects have purposefully removed such complexity to enhance physical layout simplicity and presumed functional independence for individual genetic elements42–44. We suspect that natural genetic systems might provide further lessons for how more complicated physical couplings can encode simpler and more reliable functional composition schemes. The BCD could likely be used in combination with other gene expression regulatory elements and designs45–47 to engineer synthetic polycistronic expression cassettes48 or to reduce library sizes in directed-evolution efforts by allowing rational choice of a few sequences that cover a desired expression parameter space49. Sequence-distinct BCDs are available for engineering multigene systems if genetic instability arising from direct repeats of DNA elements were undesirable (Online Methods, Supplementary Figs. 18 and 19 and Supplementary Table 1). Though we did not observe growth defects or other deleterious phenotypes due to expression of BCD-encoded leader peptides, further studies should consider potential impacts arising from their repeated overexpression. Finally, although research to understand translation initiation in MCD contexts is relatively well established50, direct observation of how ribosomes reinitiate translation and overcome inhibitory mRNA structures in BCDs, in polycistronic operons and across varying coding sequence contexts would be helpful. DNA sequence data and functional information detailing the performance of the standard promoters and BCDs established here have been contributed to the public domain and are freely available for use via human- and machine-readable interfaces (http://biofab.org/data/). Potential variation in specific sequencedistinct protein levels due to mechanisms that act downstream of translation initiation must still be accounted for to obtain absolute target protein concentrations19 (Online Methods). Given an expected 93% reliability rate (7% failure rate) for precision expression engineering, designers of heterologous genetic systems and tool developers working to support the engineering design process2,6,7,43 might further explore how to best practically enable a priori quantitative specification of desired protein synthesis levels within systems encoding up to about ten genes. METHODS Methods and any associated references are available in the online version of the paper. Note: Supplementary information is available in the online version of the paper. ACKNOWLEDGMENTS We thank C. Smolke for discussions. We acknowledge support from a US National Science Foundation grant to the BIOFAB (EEC 0946510) and unrestricted gifts from Genencor, Agilent and DSM. J.C.G. acknowledges financial support from the Portuguese Fundação para a Ciência e a Tecnologia (FCT) (SFRH/ BD/47819/2008); G.C. acknowledges the Human Frontier Science Program (LT000873/2011-l) and Bettencourt Schueller Foundation; A.P.A. and D.E. acknowledge the Synthetic Biology Engineering Research Center under National Science Foundation grant 04-570/0540879. This work was conducted at the Joint BioEnergy Institute supported by the Office of Science, Office of Biological and Environmental Research, US Department of Energy, contract DE-AC02-05CH11231. AUTHOR CONTRIBUTIONS V.K.M., A.P.A. and D.E. conceived the study and designed the experiments. V.K.M., C.L., Q.-A.M., A.B.T. and M.P. performed the experiments. V.K.M., J.C.G., G.C., M.J.C., A.P.A. and D.E. analyzed the data. V.K.M., J.C.G., G.C., J.D.K., A.P.A. and D.E. wrote the manuscript. All authors discussed and commented on the manuscript. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Reprints and permissions information is available online at http://www.nature. com/reprints/index.html. 1. 2. 3. 4. 5. 6. 7. 8. Endy, D. Foundations for engineering biology. Nature 438, 449–453 (2005). Purnick, P.E. & Weiss, R. The second wave of synthetic biology: from modules to systems. Nat. Rev. Mol. Cell Biol. 10, 410–422 (2009). Ellis, T., Adie, T. & Baldwin, G.S. DNA assembly for synthetic biology: from parts to pathways and beyond. Integr. Biol. (Camb.) 3, 109–118 (2011). Carr, P.A. & Church, G.M. Genome engineering. Nat. Biotechnol. 27, 1151–1162 (2009). Gibson, D.G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010). Lu, T.K., Khalil, A.S. & Collins, J.J. Next-generation synthetic gene networks. Nat. Biotechnol. 27, 1139–1150 (2009). Keasling, J.D. Manufacturing molecules through metabolic engineering. Science 330, 1355–1358 (2010). Cardinale, S. & Arkin, A.P. Contextualizing context for synthetic biology— identifying causes of failure of synthetic biological systems. Biotechnol. J. 7, 856–866 (2012). NATURE METHODS | VOL.10 NO.4 | APRIL 2013 | 359 ARTICLES 9. 10. 11. 12. 13. 14. 15. 16. 17. © 2013 Nature America, Inc. All rights reserved. 18. 19. 20. 21. 22. 23. 24. 25. npg 26. 27. 28. Kittleson, J.T., Wu, G.C. & Anderson, J.C. Successes and failures in modular genetic engineering. Curr. Opin. Chem. Biol. 16, 329–336 (2012). Salis, H.M., Mirsky, E.A. & Voigt, C.A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27, 946–950 (2009). Cambray, G., Mutalik, V.K. & Arkin, A.P. Toward rational design of bacterial genomes. Curr. Opin. Microbiol. 14, 624–630 (2011). Canton, B., Labno, A. & Endy, D. Refinement and standardization of synthetic biological parts and devices. Nat. Biotechnol. 26, 787–793 (2008). Rosenfeld, N., Young, J.W., Alon, U., Swain, P.S. & Elowitz, M.B. Accurate prediction of gene feedback circuit behavior from component properties. Mol. Syst. Biol. 3, 143 (2007). Smolke, C.D. Building outside of the box: iGEM and the BioBricks Foundation. Nat. Biotechnol. 27, 1099–1102 (2009). Mutalik, V.K. et al. Quantitative estimation of activity and quality for collections of functional genetic elements. Nat. Methods advance online publication, doi:10.1038/nmeth.2403 (10 March 2013). Lou, C., Stanton, B., Chen, Y.J., Munsky, B. & Voigt, C.A. Ribozyme-based insulator parts buffer synthetic circuits from genetic context. Nat. Biotechnol. 30, 1137–1142 (2012). Qi, L., Haurwitz, R.E., Shao, W., Doudna, J.A. & Arkin, A.P. RNA processing enables predictable programming of gene expression. Nat. Biotechnol. 30, 1002–1006 (2012). Dreyfus, M. What constitutes the signal for the initiation of protein synthesis on Escherichia coli mRNAs? J. Mol. Biol. 204, 79–94 (1988). Welch, M., Villalobos, A., Gustafsson, C. & Minshull, J. You’re one in a googol: optimizing genes for protein expression. J. R. Soc. Interface 6 (suppl. 4), S467–S476 (2009). Bonnet, J., Subsoontorn, P. & Endy, D. Rewritable digital data storage in live cells via engineered control of recombination directionality. Proc. Natl. Acad. Sci. USA 109, 8884–8889 (2012). Spanjaard, R.A. & Vanduin, J. Translational reinitiation in the presence and absence of a Shine and Dalgarno sequence. Nucleic Acids Res. 17, 5501–5507 (1989). Oppenheim, D.S. & Yanofsky, C. Translational coupling during expression of the tryptophan operon of Escherichia coli. Genetics 95, 785–795 (1980). Schümperli, D., McKenney, K., Sobieski, D.A. & Rosenberg, M. Translational coupling at an intercistronic boundary of the Escherichia coli galactose operon. Cell 30, 865–871 (1982). Das, A. & Yanofsky, C. A ribosome binding site sequence is necessary for efficient expression of the distal gene of a translationally-coupled gene pair. Nucleic Acids Res. 12, 4757–4768 (1984). Schoner, B.E., Belagaje, R.M. & Schoner, R.G. Translation of a synthetic twocistron mRNA in Escherichia coli. Proc. Natl. Acad. Sci. USA 83, 8506–8510 (1986). Makoff, A.J. & Smallwood, A.E. The use of two-cistron constructions in improving the expression of a heterologous gene in E. coli. Nucleic Acids Res. 18, 1711–1718 (1990). Mendez-Perez, D., Gunasekaran, S., Orler, V.J. & Pfleger, B.F. A translationcoupling DNA cassette for monitoring protein translation in Escherichia coli. Metab. Eng. 14, 298–305 (2012). Takyar, S., Hickerson, R.P. & Noller, H.F. mRNA helicase activity of the ribosome. Cell 120, 49–58 (2005). 360 | VOL.10 NO.4 | APRIL 2013 | NATURE METHODS 29. Qu, X. et al. The ribosome uses two active mechanisms to unwind messenger RNA during translation. Nature 475, 118–121 (2011). 30. Barrick, D. et al. Quantitative analysis of ribosome binding sites in E.coli. Nucleic Acids Res. 22, 1287–1295 (1994). 31. Steitz, J.A. Polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage R17 RNA. Nature 224, 957–964 (1969). 32. Yusupova, G.Z., Yusupov, M.M., Cate, J.H. & Noller, H.F. The path of messenger RNA through the ribosome. Cell 106, 233–241 (2001). 33. Kudla, G., Murray, A.W., Tollervey, D. & Plotkin, J.B. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009). 34. Iost, I., Guillerez, J. & Dreyfus, M. Bacteriophage T7 RNA polymerase travels far ahead of ribosomes in vivo. J. Bacteriol. 174, 619–622 (1992). 35. Alper, H., Fischer, C., Nevoigt, E. & Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA 102, 12678–12683 (2005). 36. Cox, R.S. III, Surette, M.G. & Elowitz, M.B. Programming gene expression with combinatorial promoters. Mol. Syst. Biol. 3, 145 (2007). 37. Hook-Barnard, I.G. & Hinton, D.M. Transcription initiation by mix and match elements: flexibility for polymerase binding to bacterial promoters. Gene Regul. Syst. Bio. 1, 275–293 (2007). 38. Kwok, R. Five hard truths for synthetic biology. Nature 463, 288–290 (2010). 39. Sellers, W. A system of screw threads and nuts. J. Franklin Inst. 77, 344–350 (1864). 40. Kozak, M. Initiation of translation in prokaryotes and eukaryotes. Gene 234, 187–208 (1999). 41. Scherbakov, D.V. & Garber, M.B. Overlapping genes in bacterial and phage genomes. Mol. Biol. 34, 485–495 (2000). 42. Chan, L.Y., Kosuri, S. & Endy, D. Refactoring bacteriophage T7. Mol. Syst. Biol. 1, 2005.0018 (2005). 43. Temme, K., Zhao, D. & Voigt, C.A. Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc. Natl. Acad. Sci. USA 109, 7085–7090 (2012). 44. Jaschke, P.R., Lieberman, E.K., Rodriguez, J., Sierra, A. & Endy, D. A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast. Virology (2012). 45. Mutalik, V.K., Qi, L., Guimaraes, J.C., Lucks, J.B. & Arkin, A.P. Rationally designed families of orthogonal RNA regulators of translation. Nat. Chem. Biol. 8, 447–454, 434, 278–284 (2012). 46. Liu, C.C., Qi, L., Yanofsky, C. & Arkin, A.P. Regulation of transcription by unnatural amino acids. Nat. Biotechnol. 29, 164–168 (2011). 47. Chang, A.L., Wolf, J.J. & Smolke, C.D. Synthetic RNA switches as a tool for temporal and spatial control over gene expression. Curr. Opin. Biotechnol. 23, 679–688 (2012). 48. Pfleger, B.F., Pitera, D.J., Smolke, C.D. & Keasling, J.D. Combinatorial engineering of intergenic regions in operons tunes expression of multiple genes. Nat. Biotechnol. 24, 1027–1032 (2006). 49. Cobb, R.E., Si, T. & Zhao, H. Directed evolution: an evolving and enabling synthetic biology tool. Curr. Opin. Chem. Biol. 16, 285–291 (2012). 50. Aitken, C.E., Petrov, A. & Puglisi, J.D. Single ribosome dynamics and the mechanism of translation. Annu. Rev. Biophys. 39, 491–513 (2010). npg © 2013 Nature America, Inc. All rights reserved. ONLINE METHODS Bacterial strains, plasmids and growth conditions. Strains and plasmids used in this study are listed in Supplementary Data 1, and oligonucleotides are listed in Supplementary Data 2. Detailed information on part design, plasmid maps and corresponding experimental data for each construct are available via http://biofab.org/data/. All plasmid manipulations were performed using standard molecular biology techniques 51. All enzymes used for plasmid manipulations were obtained from New England Biolabs (NEB), and oligonucleotides were received from Integrated DNA Technologies (IDT). E. coli strain BW25113 was used for plasmid construction purposes and for fluorescence measurements (unless specified). All strains were grown in MOPS EZ Rich Medium (Teknova) supplemented with 50 µg/ml kanamycin (kan) at 37 °C, shaken at 900 r.p.m. All of the experiments were conducted in triplicate (biological replicates). Plasmid library construction. The randomized bicistronic design (BCD) library, randomized promoter library (RPL), modular promoter library (MPL), combinatorial monocistronic design (MCD)–gene of interest (GOI) library, BCD-GOI library and promoter-BCD library were assembled on medium-copy vectors derived from pFAB217 (with the reporter sfgfp52, termed gfp hereafter) and pFAB216 (with the reporter mrfp1 (ref. 53), termed rfp hereafter). Both pFAB217 and pFAB216 were derived from the same backbone vector pBbA2k-RFP54 (p15A replication origin, kan resistance) by replacing the Ptet promoter and tetR gene with a defined sequence context including the Ptrc* promoter and Bujard RBS region55 (for further details on the neighboring sequence context, see “Design of an expression operating unit” below and the plasmid maps at http://biofab.org/data/) preceding either the reporter gene gfp (in pFAB217) or rfp (in pFAB216) (Supplementary Figs. 28–31). All PCR amplifications were carried out with high-fidelity Phusion DNA polymerase (NEB, manufacturer’s instructions). The primers used for vector amplification or for preparing an annealed product were phosphorylated using polynucleotide kinase in T4 DNA ligase buffer at 37 °C for 1 h and heat inactivated at 65 °C for 30 min. Design of an expression operating unit. Both vectors pFAB217 and pFAB216 used to construct various backbone vectors (Supplementary Figs. 28–31) for the combinatorial libraries presented in this work have a defined microcontext, which we term as an ‘expression operating unit’ (EOU). The EOU comprises a minimal unit of genetic expression (expression cassette) and an additional flanking region that may play a role of insulation to EOU parts (Supplementary Table 1). The minimal unit of genetic expression is made up of a promoter with a defined transcription start site (Ptrc*, a constitutive promoter, −35 to +1), 5′ UTR55, translation initiation element (BCD context, this work), a proteincoding region (for example, a reporter such as GFP or RFP) and a terminator (3′ UTR, dbi terminator54). To provide functional insulation to the EOU from cryptic promoters, RBS-like regions, intrinsic terminators and AT-rich UP element–like features, we have introduced an additional upstream region composed of three-frame stop codons, an intrinsic terminator56, a transcriptional pause site57 and an insulator region58 doi:10.1038/nmeth.2404 (see Supplementary Table 1 for the entire EOU sequence). Here, the upstream and downstream terminators are designed and positioned to reduce the interactions between the EOU and the immediate genetic context. The EOU thus provides a standardized and well-defined context that insulates functional parts within the EOU from neighboring genetic contexts and provides a more reliable platform for characterization of parts. The use of standardized context thus helps in understanding and describing part performance relative to that of other parts. To facilitate joining of multiple EOUs (for example, to yield an expression operating system), these vectors have EcoRI-BglII sites upstream of the EOU and a XhoI-BamHI site downstream of the stop codon of a reporter, a configuration based on Bgl-Brick design54. The contribution and significance of the EOU design in insulating the functionality and functional composition of parts needs systematic characterization studies and has not been explored further. Design and construction of the randomized BCD library. To generate the randomized BCD library, we first made the plasmid pFAB866 by amplifying the backbone vector pFAB217 (encoding the reporter GFP) using phosphorylated primers oFAB470 and oFAB472. These primers replace the 5′ UTR of pFAB217 with a bicistronic design with a translationally coupled second cistron encoding reporter GFP (Supplementary Fig. 2). The PCRamplified vector backbone products were purified using Qiagen PCR purification kits, digested with DpnI (to remove the intact backbone vector), self-ligated using T4 DNA ligase enzyme and transformed into chemically competent BW25113 E. coli cells. Positive clones were then confirmed by sequencing, stored as glycerol stocks and used for preparing plasmid minipreps for further BCD library construction purposes. For generating the randomized BCD library, pFAB866 was amplified using phosphorylated primers oFAB785 and oFAB786. The forward primer oFAB785 creates variants of the second SD of bicistronic design such that 3 nt upstream and downstream of the GGA motif of SD2 are randomized (NNNGGANNN; Supplementary Fig. 2). The PCR products were purified using Qiagen PCR purification kit, digested with DpnI, ligated using T4 DNA ligase, transformed into chemically competent BW25113 E. coli cells and grown overnight in selective LB agar medium (with kan). The next day, about 200 colonies were picked, and positive clones were confirmed by sequencing. We discarded mutants with STOP codons within cistron 1 in addition to deletion and insertion mutants within the leader peptide library to keep intact the −1 frame shift comprising the coupled BCD core. Positive clones were stored as glycerol stocks and assayed for bulk fluorescence on the plate reader (below). Design and construction of the synthetic constitutive promoter library. We used two distinct approaches to engineer a diverse library of constitutive promoters for engineering gene expression in E. coli. In the first approach, we randomized the −10 and −35 motifs of a strong Ptrc* promoter (the asterisk indicates a promoter with no operator sites downstream of the transcription start site, −35 to +1 of the promoter) to generate an RPL, whereas in the second approach, an MPL was created by the combinatorial assembly of three modules of five well-characterized promoters of different strengths (Fig. 1 and Supplementary Figs. 20–22). The sequences and plasmid maps for RPL and MPL members are listed with their corresponding promoter strengths at http://biofab.org/data/. NATURE METHODS © 2013 Nature America, Inc. All rights reserved. npg RPL. The RPL was created by randomizing the −10 motif (NTANNNTN) or the −35 motif (NTTNNNN) or both the −10 and −35 motifs of a strong Ptrc* constitutive promoter (TTGA CAATTAATCATCCGGCTCGTATAATGTGTGGA; consensus motifs are italicized, and bold ‘A’ is the transcription start site54,59) (Supplementary Fig. 20). This randomization strategy retains the most conserved and functionally important bases37,60–62, with the expectation that it may alleviate the bias toward generating too many weak promoters. To generate the RPL, we used plasmid pFAB217, which comprises the Ptrc* promoter and the Bujard RBS (ACAATTCATTA AAGAGGAGAAAGGTACC)55 to drive the expression of the GFP reporter within EOU architecture. To randomize the −10 motif, we amplified pFAB217 using phosphorylated primers oFAB178 and oFAB177. The forward primer oFAB178 creates variants of the −10 motif (NTANNNTN), and the reverse primer oFAB177 retains the consensus −35 motif of the Ptrc* promoter. To randomize the −35, the plasmid pFAB217 was amplified using phosphorylated primers oFAB176 and oFAB179. The forward primer oFAB176 retains the consensus −10 motif of the Ptrc* promoter, and the reverse primer oFAB179 creates variants of the −35 motif (NTTNNNN). The phosphorylated primers oFAB178 and oFAB179 were used to randomize both the –10 motif and the –35 motif. The PCR products were purified using Qiagen PCR purification kit and digested with DpnI to remove the intact backbone vector. PCR products were then ligated using T4 DNA ligase, transformed into chemically competent BW25113 E. coli cells and grown overnight in selective LB agar medium (with kan) on three large QTray plates. The next day, about 2,000 colonies were picked from all three transformation plates (in total) and grown overnight (~16 h, at 37 °C, 900 r.p.m. on an Inforys shaker) in 500 µl MOPS EZ Rich +kan medium in 96–deep-well plates sealed with a breathable membrane. The following day, 250 µl of overnight culture was stored as a presequencing glycerol stock (250 µl overnight culture + 250 µl of 30% sterile glycerol), and the remaining 150 µl of the overnight culture was subjected to the microplate end-point assay to measure growth (optical density, OD600 nm) and fluorescence (relative fluorescence units or RFU) at an excitation of 481 nm and emission of 507 nm for GFP in a multimode microplate reader-incubator-shaker Synergy-2 (BioTek Instruments). With these preliminary promoter activity results, all promoters were grouped into ten bins of different strengths, and about 300 overnight cultures (showing a wide range of activity) were sent for sequencing. The sequencing was performed on PCR product (using primers soFAB1 and soFAB8) comprising the cloning region using primers soFAB36 and soFAB37. Constructs with mutations in −10 and/or −35 motifs and single or double base deletion or addition in the spacer region were considered as positive clones, and constructs with mutations elsewhere on the plasmid were discarded from the library. The positive clones were stored as glycerol stocks and assayed for growth, bulk and single-cell fluorescence (see below). MPL. The MPL was engineered by combinatorial assembly of three modules originating from five promoters of various strengths (with a known +1 transcription start site) to yield a total of 125 modular promoters. Here, one of the main objectives was to construct a synthetic promoter library made up of modules and key elements from different-strength promoters such that we obtain insight NATURE METHODS on how variation of different promoter elements (UP elements, −35 motif, spacer, −10 motif, discriminator region downstream of −10 motif to +1) affects promoter strength37,63–65. We used the strong T7A1 promoter37, Ptrc promoter54,59 and T5N25 promoter66 and the weaker NM535 series67 and U56D46 version of the pRM promoter series68 as parental sequences for the MPL (Supplementary Fig. 21). The sequence of these five promoters was divided into three modules comprising (i) UP element and −35 motif, (ii) spacer region and (iii) −10 and spacer of −10 to +1. We then determined the promoter sequence of all 125 sequence combinations using an in-house–written Python script. We used a modified Golden Gate method15,69 to assemble the promoters, using annealed oligonucleotides, into a restrictiondigested plasmid. To build the MPL, we first made plasmid pFAB517 by amplifying the backbone vector pFAB217 (encoding the reporter GFP) using phosphorylated primers oFAB124 and oFAB125. These primers replace the promoter Ptrc* in pFAB217 with type II restriction enzyme BsaI recognition sites on either strand of the vector such that after a post-restriction digestion of the ligated PCR products, we obtain appropriately compatible overhangs to clone promoter inserts. The PCR-amplified vector backbone products were purified using Qiagen PCR purification kits, digested with DpnI, self-ligated using T4 DNA ligase enzyme and transformed into chemically competent BW25113 E. coli cells. Positive clones were then confirmed by sequencing (using primers soFAB1 and soFAB8) and stored as glycerol stocks. Plasmid minipreps were prepared and used for further MPL construction purposes. Minipreps of these backbone vectors were then digested with BsaI enzyme (37 °C, overnight (> 16 h)), dephosphorylated, gel-purified (Qiagen) and used for assembling the promoter library. To prepare the promoter elements as inserts for building the MPL, we designed 125 forward and 125 reverse oligonucleotides such that they can be annealed together and their overhangs are compatible with the restriction-digested backbone vector pFAB517. The forward and reverse oligonucleotides used for annealing the promoter parts are listed in Supplementary Data 2. For further details on assembling annealed parts in restriction digested vector see “Assembling the combinatorial libraries” (below). The positive clones were stored as glycerol stocks and assayed for growth, bulk and single-cell fluorescence (Supplementary Fig. 21; below). Combinatorial libraries. For constructing the BCD:GOI, MCD: GOI and promoter:BCD combinatorial libraries, a modified Golden Gate method15,69 was used to comply with the assembly of smaller parts or inserts (promoter, MCD and BCD). This type II endonuclease–mediated assembly method allows a scare-less and multipart assembly. Construction of backbone vectors. To prepare the backbone vector for cloning of combinatorial libraries, phosphorylated forward and reverse oligonucleotides were used to PCR-amplify vectors pFAB217 and pFAB216. The forward and reverse primers introduce type II restriction enzyme BsaI recognition sites on either strand of the vector such that after a post-restriction digestion of the ligated PCR products, we obtain appropriately compatible overhangs to clone inserts (promoter, BCD, MCD, GOI or linkers). The PCR-amplified and purified vector products were then ligated, doi:10.1038/nmeth.2404 © 2013 Nature America, Inc. All rights reserved. npg transformed into chemically competent E. coli DH10B cells and grown overnight on selective medium. Positive clones were then confirmed by PCR-amplifying and sequencing of the ligated region using sequencing primers soFAB1 and soFAB8. The overnight cultures of positive clones were stored in glycerol stocks as explained in the above section. The minipreps of these backbone vectors were then digested with BsaI enzyme (37 °C, overnight (>16 h)), dephosphorylated, gel-purified (Qiagen) and used for assembling combinatorial libraries. The six main backbone vectors pFAB870, pFAB871, pFAB1177, pFAB1178, pFAB1781 and pFAB1782 were constructed as described below for building combinatorial libraries reported in this work (Supplementary Fig. 28). The backbone vector pFAB870 was constructed by amplifying pFAB217 using phosphorylated oFAB625 and oFAB584 primers and ligating the PCR product. The vector pFAB870 was used for cloning the combinatorial library of BCD and various GOI (either 36-nt or full-length) contexts fused to GFP. The backbone vector pFAB871 was constructed by amplifying pFAB216 using phosphorylated oFAB626 and oFAB584 primers and ligating the PCR product. The vector pFAB871 was used for cloning the combinatorial library of BCD and various GOI (either 36-nt or full-length) contexts fused to RFP. The backbone vector pFAB1177 was constructed by amplifying pFAB217 using phosphorylated oFAB950 and oFAB584 primers and ligating the PCR product. The vector pFAB1177 was used for cloning the combinatorial library of BCD fused to GFP. The backbone vector pFAB1178 was constructed by amplifying pFAB216 using phosphorylated oFAB951 and oFAB584 primers and ligating the PCR product. The vector pFAB1178 was used for cloning the combinatorial library of BCD fused to RFP. The backbone vector pFAB1782 was constructed by amplifying pFAB217 using phosphorylated oFAB950 and oFAB125 primers and ligating the PCR product. The vector pFAB1782 was used for cloning the combinatorial library of promoters and BCDs translationally fused to GFP. The backbone vector pFAB1781 was constructed by amplifying pFAB216 using phosphorylated oFAB951 and oFAB125 primers and ligating the PCR product. The vector pFAB1781 was used for cloning the combinatorial library of promoters and BCDs translationally fused to RFP. Preparation of inserts for constructing combinatorial libraries. To prepare the basic transcription and translation elements as inserts for building combinatorial libraries, we first phosphorylated and then annealed the forward and reverse oligonucleotides (by mixing 5 µl of 100 µM of forward and reverse primers with 90 µl of sterile water, incubating at 95 °C for 3 min and cooling at room temperature for 30 min). These annealed inserts were then diluted with sterile water such that the concentration was equivalent to that of the digested and purified vector. The sequences for a subset of all BCD variants, MCD variants, constitutive and inducible promoters and GOI regions and for a linker region and an EOU sequence are listed in Supplementary Table 1. Additional plasmid sequence and activity details are presented at http://biofab.org/data/. The forward and reverse oligonucleotides used for annealing the parts are listed in Supplementary Data 2. Constitutive and inducible promoters. The 14 constitutive promoters used in promoter:BCD combinatorial library were chosen doi:10.1038/nmeth.2404 from a collection of synthetic constitutive promoters (see above: “Design and construction of the synthetic constitutive promoter library”). These promoters are variable in length (though they maintain a defined putative +1 mRNA start site) and have a wide range of promoter activities. To make the combinatorial assembly of these promoter parts with BCD parts easy to scale, we chose the promoters with the same spacer region between the −10 motif and the putative transcription start site (all promoter sequences used here are given in Supplementary Table 1). In addition to 14 constitutive promoters, a consensus 23-base-pair phage T7 promoter (TAATACGACTCACTATAGGGAGA) was chosen to test whether BCDs retain their functional reliability with T7 RNA Polymerase. To test the functional reliability of BCDs with regulated promoters, we chose constitutive promoters of different strengths from the promoter libraries and replaced the promoter spacer region between the −35 and −10 motifs with LacI or TetR operator sequences55 (Supplementary Table 1). Performance of one BCD (BCD2, apFAB682) with ten LacI- and nine TetR-regulated different-strength promoters (Supplementary Table 1) is shown in Supplementary Figure 27. These results demonstrate that inducible promoters retain their function when used in combination with BCD elements. Bicistronic designs. Twenty-two BCDs having a wide dynamic range of translation initiation activity (used in the BCD: GOI combinatorial library and promoter:BCD combinatorial library) were all derived from the randomized BCD library presented in Supplementary Figure 2 and are given in detail via Supplementary Table 1. Because all BCDs used in this work are ~80 nt in length, for easy handling, lower cost and improved quality of oligo synthesis, we decided to separate the BCDs into two parts. Part 1 is the invariable region of the BCD (surrounding RBS1), and part 2 is the variable region of BCD (surrounding RBS2). This design permits the use of the same part 1 for all assemblies in BCD:GOI combinations and P:BCD combinations except for a few special control cases, which use a different part 1 for assembling combinatorial libraries with the 22 sequence- and activity-distinct part 2 components (Supplementary Data 2). These are for (i) BCD:GOI and P:BCD combinatorial libraries (part 1 oligos oFAB979 and oFAB980); (ii) BCDs with early stop codons in the first cistron—in this case, we replaced the GUA6 (valine) codon, a sixth codon of the first cistron with a UAA stop codon (part 1 oligos oFAB1638 and oFAB1639); (iii) BCDs with rare codons in the first cistron—three part 1 variants were designed by inserting different rare codons in the first cistron: (a) AGG6 (arginine codon) replacing GUA6 (valine) codon, a sixth codon of the first cistron (part 1 oligos oFAB1632 and oFAB1635); (b) AGG4 (arginine codon) replacing ATT4 (isoleucine) codon, a fourth codon of the first cistron and GGA6 (glycine codon) replacing GUA6 (valine) codon, a sixth codon of the first cistron (part 1 oligos oFAB1633 and oFAB1636); and (c) CGG6 (arginine codon) replacing GUA6 (valine) codon, a sixth codon of first cistron, and CTA4 (leucine codon) replacing ATT4 (isoleucine) codon, a fourth codon of the first cistron (part 1 oligos oFAB1634 and oFAB1637); (iv) BCD backbones with an inactive first SD (Null-SD1) motif (part 1 oligos oFAB981 and oFAB982)—to inactivate the SD site upstream of the first cistron, we replaced the native AAAGGAGAU motif with AACCUCCAU; NATURE METHODS and (v) promoter T7:BCD combinations (part 1 oligos oFAB1361 and oFAB980)—as the sequence around the T7 transcription start site is different from the synthetic promoters used in this work, we designed compatible part1 for cloning part 2 BCDs. npg © 2013 Nature America, Inc. All rights reserved. Monocistronic designs. Twenty-two MCDs were assembled by annealing phosphorylated forward and reverse oligonucleotides (Supplementary Data 2). These MCDs have the same context around the RBS2 (that is, SD2) region as BCDs and yield a direct comparison of translation initiation around this RBS in the absence of translation from upstream RBS1 (that is, SD1). Genes of interest. To test the reliability of BCDs, as compared to the standard 5′ UTRs (that is, MCDs), in initiating the translation of a sequence-independent coding region, we chose eight sequence-independent GOIs (Supplementary Fig. 3). These include lacI (EG10525, E. coli K-12), araC (EG10054, E. coli K-12), gfp52, rfp53, tetR70 and a penicillin (cephalosporin) acylase gene (M18278)71 from Pseudomonas sp. strain SE83, a codonoptimized putative cellulase (AAY81158)72 gene from Sulfolobus acidocaldarius DSM 639 and a codon-optimized phosphomevalonate kinase gene from Saccharomyces cerevisiae73,74. The choice of these candidate genes was based on their sequence independence with each other, utility and importance in the ongoing in-house projects. The sequence context of 36 nt from each N terminus of the various GOIs fused to the second codon of either gfp or rfp reporter gene (yielding total 14 chimeric reporter GOIs) is listed in Supplementary Table 1 and was assembled by annealing phosphorylated forward and reverse oligonucleotides (Supplementary Data 2). For preparation of the full-length tetR gene as a GOI insert, we used primers oFAB1347 and oFAB1239 to PCR-amplify the tetR gene from the vector VKM81. The forward and reverse primers introduce BsaI recognition sites onto the N- and C-terminal ends of the PCR product such that a post-restriction digestion of the PCR product gives appropriate overhangs to clone into the digested backbone vector along with additional inserts, such as a linker region (see Supplementary Figs. 29 and 30). To examine a BCD’s capacity to overcome the impact of hairpin formation spanning the junction of SD2 and the GOI initiation codon on translation initiation, we designed two special GOIs. These GOIs have sequence complementarity to a strong SD2 motif of BCDs (UAAGGAGGU) such that the mRNA structure predictions indicated a stronger hairpin formation around the translation initiation region (Supplementary Fig. 1). These two GOIs have the same 36-nt tetR gene as the backbone with 9 nt downstream of the start codons mutated such that there is potential (variable-strength) hairpin formation between SD2 and the GOI start codon region (Supplementary Table 1 and Supplementary Fig. 1). Linker region. The full-length tetR-gfp fusion includes a linker region between the TetR and GFP coding regions. This glycinerich linker region also includes a Tev protease site, which can be cleaved if needed (Supplementary Table 1). Sequence-independent BCDs. To test the generality of the BCD across different GOIs, we assembled sequence-independent BCDs as listed in Supplementary Table 1 (Supplementary Figs. 18 and 19). NATURE METHODS In these constructs, we used seven RBS2 regions from intercistronic regions of operons (BCD2 (this work), LeuL, HisB-H junction, TrpB-A junction, LeuA-B junction, HisH-A junction and HisC-B junction)21–26,75,76, whose junctions have overlapping stop-start codon motifs (TAATG) and have SD2 motifs upstream of the stop-start junction (Supplementary Fig. 18). The six RBS1 regions were chosen from different 5′ UTRs45,75,77,78, and the SD1 motif of the RBS1 region was mutated to a consensus SD motif so that translation initiation of the first cistron was not limiting. These sequence-independent BCDs were cloned upstream of the gfp reporter as explained earlier (on restriction-digested vector pFAB1177) and characterized by measuring fluorescence. Several representative BCD candidates were chosen for further characterization by replacing the SD2 motif with different strength variants presented in Supplementary Table 1 and cloned upstream of gfp and rfp reporter genes. The data shown in Supplementary Figures 18 and 19 demonstrated that BCD variants retain their functionality across different GOIs and are generalizable. These sequence-independent BCDs are useful in constructing heterologous pathways or genome-scale engineering efforts. Further studies on sequence-independent BCDs are essential to understand any impacts of overproduction of different peptides on cellular factors or growth. Assembling the combinatorial libraries. The general process for assembling the combinatorial libraries is shown as a schematic in Supplementary Figure 32. All of the cloning steps including phosphorylation of oligonucleotides, annealing of phosphorylated oligonucleotides, dilution of annealed products, ligation of annealed products (inserts) with cut vector backbone, incubation of ligation reactions and transformation were carried out in 96-well PCR plates. All ligation reactions were 10 µl in total volume and made up of 1 µl of each of the annealed parts (~10 ng/µl), 1 µl of digested and pure vector backbone (~10 ng/µl), 1 µl of ligase enzyme, and appropriate volumes of ligase buffer and sterile water to make up the total volume. The ligation reaction was run for 30 min at room temperature (20–22 °C) using concentrated T4 ligase enzyme and then moved onto ice. The ligation reaction was then incubated with 50 µl of chemically competent E. coli cells (BW25113 (ref. 79) and DH5αZ1 (ref. 55) for E. coli RNAP or BL21(DE3) for T7 RNAP) in 96-well plates (in-house–prepared BW25113; DH5αZ1cells and BL21 from NEB) for 30 min on ice. The transformation step was performed with heat shock at 42 °C for 90 s in a PCR machine, and then the plates were moved onto ice for 2 min before 100 µl of sterile SOC medium was added. The transformation reaction in 96-well plates was then incubated at 37 °C for 1 h with 900-r.p.m. shaking. We used the vented QTray with 48-well dividers (Genetix, cat. no. X6029) for plating 35 µl of transformation reaction (leftover reaction mix was stored at 4 °C overnight and discarded the next day for transformations that worked) on solid LB agar plates with kan. Contents of each 96-well plate transformation reaction were plated out on two 48-well QTrays with LB agar +kan. To spread 35 µl of transformant reaction evenly across each of 48 wells, we used 10–15 sterile glass beads per well and stirred gently (with the lid on) to avoid the mix-up of beads between wells. After we removed the beads (by quickly turning the plates upside down and collecting beads on plate lids), the plates were allowed to dry doi:10.1038/nmeth.2404 npg © 2013 Nature America, Inc. All rights reserved. and were incubated overnight at 37 °C. The next day, individual transformant colonies were picked for sequence confirmation and for preparing glycerol stocks. Two colonies per transformant were suspended in 50 µl EB buffer (pH = 8) in a 96-well plate. From this colony suspension, 25 µl was sent to a sequencing service, and the leftover suspension was used to inoculate 250 µl of LB +kan medium in a 96-well plate. The next day, the overnight culture plate was stored as a presequencing glycerol stock (250 µl overnight culture + 250 µl of 30% sterile glycerol) until the sequencing results were obtained and analyzed. The sequencing was performed by PCR-amplifying the cloning region using primers (soFAB1 and soFAB8), and sequencing was done using primers soFAB36 and soFAB37. Once the sequencing results were obtained, the correct clones (from the presequencing glycerol stocks) were used to inoculate fresh LB +kan medium in 96–deep-well plates and were grown overnight and stored as main glycerol stocks. Construction of tRNA complementation plasmid. To study the impact of rare codons in the leader cistron of the BCD on downstream gene expression, we constructed 12 plasmids with three different rare codons (at the fourth and sixth codons within the leader cistron) in the context of four different-strength BCDs (Fig. 3d). As a control, we also inserted early stop codons (at the sixth codon of the leader cistron) in the context of four differentstrength BCDs. To study the impact of complementing the tRNA for rare codons on the gene expression and rank order of BCDs, we chose the plasmid pRARE2 (Novagen), which contains various tRNA genes for the following rare codons in E. coli: AGA, AGG (Arg), GGA (Gly), AUA (Ile), CUA (Leu), CCC (Pro). This plasmid has a chloramphenicol (Cam) resistance cassette and P15A replication origin, and all of the tRNA genes have their endogenous promoters. Because of the incompatibility between the plasmid pRARE2 and all the constructs reported in the present work (both have P15A replication origins), we decided to replace the replication origin of pRARE2 with a ColE1 origin. As the plasmid sequence of pRARE2 is proprietary and unavailable to users, we designed various primers to sequence the region around the P15A replication origin (oFAB1611, oFAB1612 and oFAB1613) and found specific restriction digestion sites for NheI and XbaI enzymes around the replication origin. We prepared plasmid DNA for pRARE2 from E. coli Rosetta 2 (DE3), digested it with NheI and XbaI (NEB) enzymes and gel-purified the digested plasmid. The replication origin ColE1 was PCR-amplified from the plasmid VKM74 using primers oFAB1624 and oFAB1625. The forward and reverse primers introduce NheI and XbaI digestion sites such that after digestion with both enzymes, the PCR product is compatible for ligation with the NheI-XbaI–digested pRARE2. The ligation of the NheI-XbaI–digested PCR product and pRARE2 vector and the transformation of ligation reaction were done according to the standard procedure. The positive clones were confirmed by sequencing with primers oFAB1611 and oFAB1612. Once the sequence was confirmed, we performed miniprep on the pRARE2-ColE1 plasmid (pFAB4526), transformed it into assay strain BW25113 and subsequently stored it as a glycerol stock. To study the impact of overexpression of tRNA genes for rare codons and its effect on the rank order of BCDs (with rare codons and an early stop codon in the leader cistron), we cotransformed pFAB4526 with BCD constructs having rare codons in the leader doi:10.1038/nmeth.2404 cistron (Supplementary Data 1) driving the expression of GFP and RFP. Transformants were then selected and grown on kan and cam selection medium for assay purposes and for storing the glycerol stocks. In vivo assays using the flow cytometer. Assay strains were stored as main glycerol stocks in 96–deep-well plates (2 ml) and in smaller aliquots of 50 µl in 96-well sterile PCR plates as working stocks. Cultures were grown in 2 ml 96–deep-well plates containing 400 µl of MOPS EZ Rich Medium (Teknova, cat. no. M2105) with appropriate antibiotics and inoculated with 3 µl from thawed glycerol stocks. Cultures were grown overnight (~16 h) in 96-, U-shaped-, 2-ml-well plates covered with sterile breathable sealing film at 37 °C with shaking at 900 r.p.m. on a Multitron shaker (Inforys-HT). For microplate end-point assays (to measure the optical density and fluorescence), the overnight cultures were diluted 1:50 into a final volume of 400 µl fresh MOPS EZ Rich Medium with appropriate antibiotics in 1-ml-deep–well plates and grown for 2 h at 37 °C with shaking at 900 r.p.m. on a Multitron shaker. Samples were collected (150 µl in clear-bottom black plates) to measure growth (optical density, OD600 nm) and fluorescence (RFU; excitation at 481 nm and emission at 507 nm for GFP; excitation at 560 nm and emission at 650 nm for RFP) in a multimode microplate readerincubator-shaker Synergy-2 (BioTek Instruments). Repeated assays showed that we were sampling the cultures at OD600 of 0.3–0.5 and that these cultures were in the exponential growth phase. All experiments were repeated at least three times. Gen5 software for the BioTek plate reader was used for data acquisition, and further data analysis was performed using MATLAB software (MathWorks) with in-house–developed scripts. For the flow cytometer assays, the overnight cultures of BW25113 cells with plasmid libraries were diluted 1:50 into a final volume of 200 µl fresh MOPS EZ Rich Medium with appropriate antibiotics in 1-ml-deep–well plates and grown for 2 h (to exponential phase with OD600 in the range of 0.3–0.5 in the microplate reader) at 37 °C with shaking at 900 r.p.m. on a Multitron shaker. For constructs encoding a T7 promoter, the overnight cultures of BL21 (DE3) with plasmid libraries were diluted 1:50 in to a final volume of 200 µl fresh MOPS EZ Rich Medium with appropriate antibiotics and 0.4 mM IPTG (to induce T7 RNAP expression from wild-type lac promoter on chromosome) in 1-ml-deep–well plates and grown for 2 h (to exponential phase with OD600 in the range of 0.3–0.5 in the microplate reader) at 37 °C with shaking at 900 r.p.m. on a Multitron shaker. For inducible promoter:BCD combinations, the overnight cultures of DH5αZ1 (wherein LacI and TetR were constitutively expressed from the bacterial chromosome) with plasmid libraries were diluted 1:50 in to a final volume of 200 µl fresh MOPS EZ Rich Medium with appropriate antibiotics and 1 mM IPTG or 100 ng/ml aTC in 1-ml-deep–well plates and grown for 2 h (to exponential phase with OD600 in the range of 0.3–0.5 in the microplate reader) at 37 °C with shaking at 900 r.p.m. on a Multitron shaker. Cultures at exponential phase were diluted 1:2,000 in chilled and filtered PBS (Gibco, pH 7.4) containing 500 µg/ml streptomycin in chilled 96-well clear plates (Costar) and immediately subjected to flow cytometer analysis. We used a Guava EasyCyte flow cytometer (EMD Millipore) equipped with autosampling capabilities NATURE METHODS npg © 2013 Nature America, Inc. All rights reserved. and paired dual blue (488-nm, 75-mW) and green (532-nm, 40-mW) laser excitation with two customized filter options for emission detection of 510/20 for GFP and 610/20 for RFP, respectively. During the assay, the sample concentration was kept below 500 cells per µl, and samples were run on a high flow rate (1.18 µl/s) until 2,000 cells (with a range of 60–300 events per µl) had been collected within small forward- and side-scatter gates. Guavasoft software was used for data acquisition, and the resulting FCS files were further analyzed using in-house–developed R scripts15. The fluorescence-per-cell values for each GOI construct were log 2transformed and then mean-normalized for comparative analysis of fluorescence from sequence-distinct GOI fusions. Absolute and mean-normalized expression. Absolute observed fluorescence values for all genes tested depended on the selected fluorophore and the specific 36-nt coding-sequence leader. To visually compare the rank-ordered activities of 5′ UTRs encoding MCDs and BCDs across various GOIs, we estimated meannormalized expression levels from absolute expression data, wherein we divided absolute expression values for any given 5′ UTR:GOI combination by the average of all absolute expression values for a given GOI and 5′ UTR design (for example, the average for a given gene across all MCD:GOI absolute expression levels) (Fig. 2a,b,e and Supplementary Figs. 5 and 7). Sequence-identity calculations. Global pairwise alignment of the 36-nt sequences were computed using the emboss implementation of the Needleman and Wunch algorithm80,81 with default parameters. Percentage identities were calculated from these alignments as the number of matching nucleotides divided by 36. The average identity between GOIs was 27% with an s.d. of 23%. These values are comparable to what one would expect by chance. Supplementary Figure 3 shows the percentage identities for different GOIs used in this work. Free-energy calculations of mRNA folding at the MCD:GOI or BCD:GOI junction. To understand the potential for forming stable inhibitory structures between different chosen GOIs with 5′ UTRs, we used UNAfold software82 to predict the minimumfolding-energy structure conformation. We considered the junction region to comprise between the positions −26 and +37 with respect to the translation start site. These boundaries were selected on the basis of the size of the monocistronic 5′ UTR (MCD) and 36-nt region of the GOI, respectively. The predicted minimum free-energy calculations depicted a wide diversity in the stability of mRNA structures formed at the translation initiation region of GOIs (Supplementary Fig. 3). Hybridization-energy calculations for the SD2 variant–16S RNA duplexes. To evaluate the affinity between SD2 sequences (in BCD and in MCD) and the SD-complementary region from the 16S rRNA (ACCTCCTTA), we used UNAfold software82 to calculate the hybridization energy for each resulting RNA duplex. We considered the region spanning from positions −26 to −1 with respect to the translation start site. As this region is the same for both BCD and MCD constructs, we can use these free-energy calculations and correlate with the fluorescence measurements from fusion reporters for both MCD and BCD constructs (Fig. 2f and Supplementary Figs. 11, 12 and 15). NATURE METHODS Use of RBS Calculator to predict the ∆Gtotal. The current version of the RBS Calculator software10 was downloaded from https:// github.com/hsalis/ribosome-binding-site-calculator/ (download date: 3 June 2012). We wrote a script in the Ruby programming language to automate the analysis and used the calculator to estimate the total ∆G as defined in ref. 10. We used 5′ UTR sequences spanning from 27 nt upstream to 33 nt downstream of the start codons of gfp, rfp, lacI, tetR, araC and PMK, PA or cellulase gene fusions (Supplementary Figs. 13–15). ANOVA models for MCD:GOI and BCD:GOI combinatorial data sets. To understand the contribution and coupling between translation elements (i.e., MCD and BCD) and the GOI on the overall gene expression, we performed ANOVA as reported in ref. 15. Briefly, we performed ANOVA on the following linear model using fluorescence data from chimeric GFP fusions log(Fluorescenceij ) = a + U i + GOI j + (U : GOI)ij + e ijk for i = (1 − 22); j = (1 − 8) (1) where Fluorescenceij is the fluorescent output signal measured from a genetic construct comprising a translation element, Ui, and a gene of interest, GOIj. U:GOIij represents any interaction between the ith translational element and jth gene of interest, α is the overall average signal, and the term εijk represents the error term for each particular U:GOI combination. In this approach, we assume that log(gene expression) is a linear function of different factors and their interactions, whereas each factor is an abstraction of the complex biophysical functions encoded at the sequence level. For example, Ui captures contributions due to ribosome binding and mRNA stabilization (codon usage and translation elongation in the case of BCD), which results in differential rates of translation initiation and transcript degradation, whereas its interaction term with GOI, (U:GOI)ij describes the impact of the GOI on each U’s translation initiation rate (for example, due to inhibitory mRNA structures or modification of transcript stability). The factor GOIj defines the intrinsic differences in translation elongation property of codons that are coding this region (translation pause, codon effects and folding of polypeptide), protein degradation and fluorescence intensity itself. The analysis outputs are presented in Supplementary Figures 9 and 10. ANOVA models for promoter:BCD:GOI combinatorial data sets. To understand the contribution and coupling between a transcriptional (P) element, translation (U) element and the fluorescent reporter on overall gene expression, we performed ANOVA on the following linear model log(Fluorescenceijk ) = a + Pi + U j + GOIk + (P : U )ij + (U : GOI)ik + (P : GOI) jk + (P : U : GOI)ijk + e ijk for i = (1 − 14); j = (1 − 22); k = (1, 2) (2) where Fluorescenceijk is the fluorescent output signal measured from a genetic construct comprising a transcriptional element i, a translation element j and a reporter k. (P:U)ij represents the effect of any interaction between the ith transcriptional element and jth translational element; (P:GOI)ik represents the effect of interaction between the ith transcriptional element and kth doi:10.1038/nmeth.2404 npg © 2013 Nature America, Inc. All rights reserved. reporter; (U:GOI)jk represents effects of any interactions between the jth translational element and kth reporter; (P:U:GOI)ijk represents the interaction between the ith transcriptional element, jth translational element and kth reporter; α is the overall average signal; and the term εijk represents the error term for each particular combination. The analysis outputs are presented in Supplementary Figure 26. ANOVA models: sum of squares and score calculations. The models described in equations (1) and (2) relate the Fluorescence (proxy for protein abundance) to the transcriptional and translational elements that comprise each genetic construct. Using three replicates of fluorescence, we performed ANOVA83 on the linear models described above using the “anova” routine in R software (http://www.r-project.org/). ANOVA results are presented in Figure 2c,d and Supplementary Figures 9, 10 and 26. To account for the differences in fluorescence intensities of reporter fusions, we normalized the data sets with their respective mean fluorescence for each GOI, thus disregarding the part of the variance arising from the GOI factor. The main effects (the primary scores for promoters, BCDs and GOI reporters) were directly retrieved from the ANOVA table of effects (accessed using the “model.tables” function in R) as explained elsewhere15. The integrated deviation of the main effect (secondary scores) for each element, resulting from its composition with different parts, was calculated as the s.e.m. of the appropriate interaction term effects as described in ref. 15 and is shown in Supplementary Figures 9, 10 and 26. Predictive regression model of promoter:BCD combinatorial library. A full-factorial ANOVA (linear model, equation (2)) modeling on the observed fluorescence from members of the promoter:BCD combinatorial library showed 98% of the total dynamic fluorescence range was due to encoded differences in the intrinsic activity of individual promoters and BCDs (Fig. 4e), and ~1% of the variance was explained by the element-element interaction (promoter:BCD and BCD:GOI). Given the high degree of explanatory power, and the independence of the elementary parts, we hypothesized that a regression model for predicting expression from the identity of a particular promoter and BCD trained on expression measurements of any given reporter could be used to predict the expression of another GOI using the same translation control elements. To do this we considered a simplified linear model with the GOI held constant (equation below). log 2 (Expression)ij = bi (Pi ) + g j (U j ) (3) where βi and γj are the strengths for the ith and jth promoter and translation element, respectively. In this categorical regression model, each promoter and BCD is a separate object/variable that can be recoded within a matrix of 1s (for presence) and 0s (for absence) that serve as predictors, with log-transformed fluorescence values serving as the response variable and betas representing regression weights, where i varies from promoter 1 to 14 and j varies from BCD 1 to 22. To build a predictive regression model based on recoded predictors and experimentally observed GFP fluorescence values, we used the partial least-squares regression (PLSR) approach84. We used the Unscrambler X10 (CAMO software) for PLSR model (PLSR1) building and calculation of regression coefficients. doi:10.1038/nmeth.2404 All models were built by applying the standard data preprocessing procedures. To test whether the model was overfitting the data, tenfold cross-validation was performed. This cross-validated model explained ~96% of the variance in the fluorescence data (cross-validated R2 of 0.96, r.m.s. error of 0.25 with two principal components). We used the cross-validated model trained on the experimental data set from the promoter:BCD:GFP combinatorial library (Fig. 4b) to predict the RFP fluorescence from the same combination of transcription and translation elements (Fig. 4c) as well as the expression of other GOI fusions from the BCD: GOI combinatorial library driven by promoter P14 and 22 BCDs (Fig. 2b). The model successfully predicted the RFP (with R2 of 0.9) and other GOI expression data sets (with R2 of 0.89; combined RFP and GOIs yielded an R2 of 0.9 and r.m.s. error for prediction of 0.48). Note that the GOIs are expressed on a different vector series than the vector used for the promoter:BCD combinatorial library (Supplementary Fig. 28), which demonstrates the prediction reliability across DNA contexts. The predicted output results with deviations are shown in Figure 4f. Expression probability calculations. The probability of observed expression falling within a factor of 2 of the predicted expression was determined using two separate methods. For each of the strains, the means of the mean-normalized log2 fluorescence values (observed) and the predicted values were calculated for a total of 440 pairs of observed and predicted values (Fig. 4f, see above). The absolute difference between the observed and predicted values was calculated. For the first method, the percentage of absolute difference values <1 was empirically determined to be 93.86%. For the second method, a histogram was generated using bin sizes (W) calculated according to Freedman and Diaconis85 using the formula W = 2 × (IQR ) × N −1/ 3 (4) where N is the number of samples and IQR is the interquartile range, defined as the 75th percentile minus the 25th percentile. A Gaussian was fitted to the histogram, and the probability of the error between observed and predicted being less than or equal to a factor of 2 was determined using the formula erf(log 2 (2)/(s × √ 2)) (5) with σ = 0.5437 from the fitted Gaussian and ‘erf ’ is shorthand for the error function. The result using this method is 93.41% of observations falling within a factor of 2 of the predicted values. The estimated ~87% error reduction reflects a decrease in reported expression level errors from 53% (ref. 10) to 7% (this work). Data representation. The heat map representations and hierarchical clustering of combinatorial data sets were performed using Multiexperiment Viewer (MeV) software86. The sequence logos were generated using the WebLogo web-based application87. Selected statistics83. Coefficient of determination (R2). We used R2 to represent how well simple linear regression models fit various data sets and, thus, to what extent models can be used to predict future outcomes. The value of R2 can range from 0 (poor fit) to 1 (perfect fit). NATURE METHODS For example, we found that standardized promoter and BCD elements could be used to express GFP across a range of levels and developed a model predicting expression levels for other genes from the GFP data. We then found that the observed expression levels for other genes were well correlated to predictions made using only the GFP data (R2 = 0.9, Fig. 4f). npg © 2013 Nature America, Inc. All rights reserved. Pearson’s correlation coefficient (r). Also known as the ‘sample correlation coefficient’, we used r to represent the covariance of two variables divided by the product of each variable’s s.d. The value of r can range from −1 to 1 and can thus be used to communicate the ‘direction’ of a correlation. For example, we observed a negative correlation between 16S rRNA + SD mRNA binding free energies and resulting protein expression levels (r = various values, Fig. 2f). Spearman’s rank correlation coefficient (rho). Also known as ‘Spearman’s rho’, we used this nonparametric statistic measure to assess the extent to which the relationship between two variables can be represented via a monotonic function. The value of rho can range from −1 to 1 in representing inverse to direct correlation of rank orderings, respectively. For example, we found that the rank correlations for the activities of BCDs, when used across multiple GOIs, was much higher than when the same SD sequences were used within MCDs (Fig. 2a,b and Supplementary Figs. 6 and 8). Stated differently, we used rho to quantify nonparametrically to what extent BCDs improved preservation of rank ordering for translation initiation elements as compared to MCDs. Variance. We used this statistic to quantify to what extent the intrinsic activities encoded by various genetic elements lead to unexpected differences in observed protein expression. For example, we found that MCDs led to much more widely varying expressed protein levels relative to the levels realized using BCDs (Fig. 2a,b). 51. Ausubel, F.M. Short Protocols in Molecular Biology 5th edn. (Wiley, New York, 2002). 52. Pédelacq, J.D., Cabantous, S., Tran, T., Terwilliger, T.C. & Waldo, G.S. Engineering and characterization of a superfolder green fluorescent protein. Nat. Biotechnol. 24, 79–88 (2006). 53. Campbell, R.E. et al. A monomeric red fluorescent protein. Proc. Natl. Acad. Sci. USA 99, 7877–7882 (2002). 54. Lee, T.S. et al. BglBrick vectors and datasheets: a synthetic biology platform for gene expression. J. Biol. Eng. 5, 12 (2011). 55. Lutz, R. & Bujard, H. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1–I2 regulatory elements. Nucleic Acids Res. 25, 1203–1210 (1997). 56. McDowell, J.C., Roberts, J.W., Jin, D.J. & Gross, C. Determination of intrinsic transcription termination efficiency by RNA polymerase elongation rate. Science 266, 822–825 (1994). 57. Kireeva, M.L. & Kashlev, M. Mechanism of sequence-specific pausing of bacterial RNA polymerase. Proc. Natl. Acad. Sci. USA 106, 8900–8905 (2009). 58. Davis, J.H., Rubin, A.J. & Sauer, R.T. Design, construction and characterization of a set of insulated bacterial promoters. Nucleic Acids Res. 39, 1131–1141 (2011). 59. Brosius, J., Erfle, M. & Storella, J. Spacing of the −10 and −35 regions in the tac promoter. J. Biol. Chem. 260, 3539–3541 (1985). 60. Saecker, R.M., Record, M.T. Jr. & Dehaseth, P.L. Mechanism of bacterial transcription initiation: RNA polymerase - promoter binding, isomerization to initiation-competent open complexes, and initiation of RNA synthesis. J. Mol. Biol. 412, 754–771 (2011). NATURE METHODS 61. Gross, C.A. et al. The functional and regulatory roles of sigma factors in transcription. Cold Spring Harb. Symp. Quant. Biol. 63, 141–155 (1998). 62. Shultzaberger, R.K., Malashock, D.S., Kirsch, J.F. & Eisen, M.B. The fitness landscapes of cis-acting binding sites in different promoter and environmental contexts. PLoS Genet. 6, e1001042 (2010). 63. Rhodius, V.A., Mutalik, V.K. & Gross, C.A. Predicting the strength of UP-elements and full-length E. coli σE promoters. Nucleic Acids Res. 40, 2907–2924 (2012). 64. Rhodius, V.A. & Mutalik, V.K. Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, σE. Proc. Natl. Acad. Sci. USA 107, 2854–2859 (2010). 65. Mutalik, V.K., Nonaka, G., Ades, S.E., Rhodius, V.A. & Gross, C.A. Promoter strength properties of the complete sigma E regulon of Escherichia coli and Salmonella enterica. J. Bacteriol. 191, 7279–7287 (2009). 66. Bujard, H. et al. A T5 promoter-based transcription-translation system for the analysis of proteins in vitro and in vivo. Methods Enzymol. 155, 416–433 (1987). 67. Miroslavova, N.S. & Busby, S.J. Investigations of the modular structure of bacterial promoters. Biochem. Soc. Symp. 73, 1–10 (2006). 68. Szoke, P.A., Allen, T.L. & deHaseth, P.L. Promoter recognition by Escherichia coli RNA polymerase: effects of base substitutions in the -10 and -35 regions. Biochemistry 26, 6188–6194 (1987). 69. Engler, C., Kandzia, R. & Marillonnet, S. A one pot, one step, precision cloning method with high throughput capability. PLoS ONE 3, e3647 (2008). 70. Postle, K., Nguyen, T.T. & Bertrand, K.P. Nucleotide sequence of the repressor gene of the TN10 tetracycline resistance determinant. Nucleic Acids Res. 12, 4849–4863 (1984). 71. Matsuda, A., Toma, K. & Komatsu, K. Nucleotide sequences of the genes for two distinct cephalosporin acylases from a Pseudomonas strain. J. Bacteriol. 169, 5821–5826 (1987). 72. Master, E.R., Zheng, Y., Storms, R., Tsang, A. & Powlowski, J. A xyloglucan-specific family 12 glycosyl hydrolase from Aspergillus niger: recombinant expression, purification and characterization. Biochem. J. 411, 161–170 (2008). 73. Redding-Johanson, A.M. et al. Targeted proteomics for metabolic pathway optimization: application to terpene production. Metab. Eng. 13, 194–203 (2011). 74. Martin, V.J., Pitera, D.J., Withers, S.T., Newman, J.D. & Keasling, J.D. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotechnol. 21, 796–802 (2003). 75. Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997). 76. Gulevich, A.Y. et al. A new method for the construction of translationally coupled operons in a bacterial chromosome. Mol. Biol. 43, 505–514 (2009). 77. Olins, P.O., Devine, C.S., Rangwala, S.H. & Kavka, K.S. The T7 phage gene 10 leader RNA, a ribosome-binding site that dramatically enhances the expression of foreign genes in Escherichia coli. Gene 73, 227–235 (1988). 78. Olins, P.O. & Rangwala, S.H. A novel sequence element derived from bacteriophage T7 mRNA acts as an enhancer of translation of the lacZ gene in Escherichia coli. J. Biol. Chem. 264, 16973–16976 (1989). 79. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006). 80. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276–277 (2000). 81. Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970). 82. Markham, N.R. & Zuker, M. UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol. 453, 3–31 (2008). 83. Wu, C.F.J. & Hamada, M.S. Experiments: Planning, Analysis, and Optimization 2nd edn. (Wiley, Hoboken, New Jersey, USA, 2009). 84. Wold, S., Sjöström, M. & Eriksson, L. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130 (2001). 85. Freedman, D. & Diaconis, P. On the histogram as a density estimator: L2 theory. Z Wahrscheinlichkeit 57, 453–476 (1981). 86. Saeed, A.I. et al. TM4 microarray software suite. Methods Enzymol. 411, 134–193 (2006). 87. Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004). doi:10.1038/nmeth.2404

Log In

Precise and reliable gene expression via standard transcription and translation initiation elements

Free related PDFsRelated papers

Free related PDFsRelated papers