Nothing Special   »   [go: up one dir, main page]

High-Throughput Protein Expression Screening and Purification in Escherichia Coli

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Methods 55 (2011) 65–72

Contents lists available at SciVerse ScienceDirect

Methods
journal homepage: www.elsevier.com/locate/ymeth

High-throughput protein expression screening and purification in Escherichia coli


Renaud Vincentelli a,⇑, Agnès Cimino a, Arie Geerlof b, Atsushi Kubo c, Yutaka Satou c, Christian Cambillau a
a
Architecture et Fonction des Macromolécules Biologiques (A.F.M.B.), UMR6098 CNRS, Université de Provence and Université de la Méditerranée, Case 932, 163 Avenue de
Luminy, 13288 Marseille cedex 9, France
b
Helmholtz Center Munich, Institute of Structural Biology, Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany
c
Department of Zoology, Graduate School of Science, Kyoto University, Sakyo-ku, Kyoto 606-8502, Japan

a r t i c l e i n f o a b s t r a c t

Article history: Escherichia coli (E. coli) is the most widely used expression system for the production of recombinant pro-
Available online 7 September 2011 teins for structural and functional studies. However, to obtain milligrams of soluble proteins is still
challenging since many proteins are expressed in an insoluble form without optimization. Therefore
Keywords: when working with tens of proteins or protein domains it is recommended that high-throughput expres-
High-throughput sion screening at a small scale (1–4 ml of culture) is carried out to identify the optimal conditions for
Solubility screening soluble protein production. Once determined, these culture conditions can be applied at a large scale
Escherichia coli
to produce sufficient protein for structural or functional studies.
Thioredoxin
Ciona intestinalis
We describe a procedure that has enabled the systematic screening of culture conditions or fusion-tags
on hundreds of cultures per week. The analysis of the optimal conditions for the soluble production of
these proteins helped us to design a simple and efficient protocol for soluble protein expression screen-
ing. This protocol has since been used on hundreds of proteins and is illustrated with the genome wide
scale production of proteins containing the DNA binding domains of Ciona intestinalis.
Ó 2011 Elsevier Inc. All rights reserved.

1. Introduction among teams, but they allowed screening for soluble expression
at a genome-wide scale and participated in the structure determi-
The production of proteins in such sufficient quantity and of nation of thousands of proteins by the structural genomics commu-
appropriate quality remains a major bottleneck in structural biol- nity [15]. With experience, screening methods have been refined,
ogy. In response to the explosion in the amount of genome sequenc- optimized and have converged on a simpler and relatively universal
ing data, many structural biology laboratories worldwide have protocol that has reached maturity [4]. This consensus approach
implemented production pipelines to accelerate the expression gives reliable qualitative and quantitative information that can be
and purification of proteins. A common feature of these pipelines reproduced easily in larger-scale experiments [14]. Expression
is the use of expression screening to identify quickly the conditions screening protocols can now be used in any laboratory without
that will allow sufficient protein production upon scaling up for special equipment and with maximum efficiency.
structural (or functional) analysis [1–13]. Escherichia coli is the most Before setting up an expression screening procedure, the results
widely used expression host and screening procedures are mainly generated in the last decade ([4] and this article), can provide
based on decreasing culture volume and increasing parallelization, guidance for initial culture tests. To improve the probability of
making it possible to work with 96 cultures, or more, simulta- obtaining sufficient soluble protein we suggest that the starting
neously. This process enables more proteins or protein domains point for expression trials of any protein should be using the
to be tested and several expression condition parameters to be simple protocol defined in this paper. There is a reasonable chance
optimized. Soluble proteins obtained at the analytical scale are that the majority of the proteins of interest will express sufficient
selected for scaled-up production and purification. amounts of soluble proteins directly in these initial cultures. In the
From the late 1990s, expression screening protocols have been case of the protocol(s) failing, it is possible to perform a second
developed independently by structural genomics groups reflecting round of expression screening based on diverse and well-docu-
each laboratory’s needs in terms of throughput and protein source mented strategies ([4,14] and in this article). Since 2001, the AFMB
(e.g. prokaryotic, eukaryotic and viral) and also budget [4,14]. These laboratory has cloned and checked for expression more than three
protocols were initially very diverse and not always transferable thousand proteins for several medium-scale European Union
funded structural genomics projects: X-TB [16], SPINE [17], VIZIER
⇑ Corresponding author. Fax: +33 491 26 67 20. [18], EMeP (e-mep.org), SPINE2 (spine2.eu) and one private funded
E-mail address: renaud.vincentelli@afmb.univ-mrs.fr (R. Vincentelli). network, MEPNET [19].

1046-2023/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved.
doi:10.1016/j.ymeth.2011.08.010
66 R. Vincentelli et al. / Methods 55 (2011) 65–72

The protein production facility has been one of the official the AFMB, since almost all the proteins we have worked on do
French national service facilities since 2003 (ibisa.net) and has not have activity assays, we decided to use quantitative automated
cloned and checked for expression more than two thousand pro- immunodetection with an anti-His-tag antibody [22] to probe the
teins from eukaryotic or prokaryotic origins for academic or indus- soluble fractions in dot blots. At the time, this was the only choice
trial customers. Expression screening protocols have played a that allowed for screening of a thousand samples in a week. The
central role in our protein production and crystallization facilities limit of detection using parallelized nickel purification and dot
since the beginning of the structural genomics projects [16,20]. blots is below 0.1 mg/L of culture. This detection system is quicker
When this work was started, not much was known about the spe- and more sensitive than Coomassie blue-stained SDS–PAGE but
cific impact of culture conditions, fusion-tags, refolding and pro- does not provide molecular weight information. However, we have
tein sequences on the final results. We then developed a strategy successfully used the dot blot detection system on tens of thou-
with multiple rounds of screening in which, after cloning with a sands of cultures, providing most of the data presented in this arti-
GatewayÓ (Invitrogen, USA) vector [21] and initial expression cle. Recently, we have replaced the dot blot assay with a Labchip
trials, insoluble proteins were rescued by screening the culture GX II (Caliper, USA) microfluidic detection system that provides
conditions or expressing the protein with various fusion proteins results more quickly and cheaply. In addition, the Labchip system
[16]. This allowed us to quickly supply crystallographers with pro- provides the molecular weight, concentration and purity of the
teins but also, with time, to collect data and rank each condition proteins with a detection limit of approximately 0.1 mg/L of
(strains, induction and culture temperature, media and use of culture, making it almost as sensitive as the dot blot assay.
fusion-tag) on the level of impact it has on soluble protein expres- To optimize the level of soluble expression, the first parameters
sion. We now have a clear picture on which culture conditions to tune are the culture conditions because this is easy, cheap and
should be tried first to increase the success rate while decreasing has been proven to have an impact on protein solubility levels
the number of cultures, and which alternatives should be tried [16]. Factors known to have an impact on the protein solubility lev-
next. els include the plasmid copy number, the promoter [29], the cul-
To cope with the thousands of proteins in the pipeline and still ture media, the temperature, the bacterial strain, the time, the
be able to benchmark the impact of each parameter on the final type and concentration of the inducer [6]. Once the system of
soluble yield, we designed a high throughput protocol that we detection is selected, the next step is to choose the culture condi-
adapted for use with a Tecan (Switzerland) robot [4,14,22]. Using tion parameters that will be fixed and the conditions that will be
this protocol it is possible, for a single person to work on up to tuned. To decrease the number of cultures that have to be grown
1152 cultures in parallel, and within a week, determine the best and evaluated, we have chosen to use a fractional factorial ap-
conditions for producing a protein for structure determinations. proach very similar to that previously described by Abergel and
Using the same protocol to obtain data for hundreds of targets co-workers [30]. In this approach, three parameters are varied,
from various organisms allowed us to predict the best conditions namely the choice of bacterial expression strain (four possible
for expressing a wide spectrum of proteins in a soluble form. This strains), the type of culture media (three media formulations)
article summarizes the analysis of the statistics on the expression and induction temperatures (three growth temperatures). In gen-
screening of more than a thousand proteins that were mainly pro- eral we adapted the expression conditions we had previously used
duced for collaborators and customers through the French national for screening [30]. Thus we chose similar expression strains but
service facility. These statistics allowed us to create and validate a with an additional pLys S plasmid for the Rosetta (Ros) and the Ori-
simple protocol that gives maximum soluble expression with min- gami (Ori) strains to decrease expression leakage before induction
imum numbers of cultures. [31]. Therefore we used BL21 (DE3) pLys S (B), Ros (DE3) pLys S (R)
and Ori (DE3) pLys S (O), respectively. We added the Ros (DE3)
pLys S plasmid (called pRARE) to the C41 (DE3) strain [32] to sup-
2. Screening of culture conditions with high throughput ply the pLys S and rare codons to this strain (C41 (DE3) pLys S, C).
expression screening This introduced the same resistance to all strains (Chlorampheni-
col), which minimized the possibility of antibiotic errors in the
The most important part of an expression screening protocol is culture setup. Furthermore, the use of two antibiotics allowed us
the detection method that must be sensitive and in agreement to work with minimal sterility concerns. Compared with [30], we
with downstream applications. Most of the time, the preferred kept two temperatures identical (37 °C and 25 °C) but replaced
approach is the detection of ‘‘soluble’’ proteins directly in the crude the culture at 42 °C with a culture at 17 °C to increase the chance
lysate of E. coli after a centrifugation step. Using SDS–PAGE, the of the soluble expression of difficult targets. Finally, the media
lower limit of detection of a protein in the soluble fraction is were kept identical to the initial study (their composition can be
around 2 mg/L of culture. But expression levels around 0.5 mg/L found in the supplementary section of [14]). Most of the other
of culture give enough protein to perform a wide range of crystal- experimental conditions described previously [30] remained un-
lization screenings when using nanodrop-dispensing robots. changed, including the plasmid backbone [21], the T7 promoter
Therefore, a capture step in the analytical expression screening [33] and the time of induction for each temperature (3 h at 37 °C,
that enriches the fraction containing the protein of interest and 18 h at 25 °C/17 °C). Some parameters were slightly adapted, for
hence increases the sensitivity of the assay is recommended. This example, the time of culture pre-induction (two hours instead of
capture step is often based on nickel affinity chromatography with one hour) and the concentration of the inducer (1 mM IPTG instead
a poly-histidine tag (His-tag) fused to the protein of interest. Some- of 0.5 mM). Under these culture conditions (temperature, medium,
times other tags can be used, such as the Strep-tag [23,24], gluta- time of harvest post induction, etc.) the four bacterial expression
thione S-transferase (GST) [25,26] and maltose binding protein strains grow on average at a similar growth rate and reach the
(MBP) [27]. ‘‘plateau’’ growth phase before the time of harvest. It is therefore,
It is worth noting that soluble proteins identified in an expres- in most cases, not necessary to compensate for differences in bio-
sion screen may not always be properly folded. There are several mass in the determination of soluble expression levels. Initially,
approaches for further characterization of the soluble proteins at the Ori (DE3) pLys S cultures grew more slowly than the three
the screening stage (e.g. small-scale size exclusion chromatogra- other strains. This slow doubling time was the consequence of
phy [28]) but probably the best method is an activity test that the selection of the double mutation of the thioredoxin reductase
can discriminate between soluble active and inactive proteins. At and glutathione reductase gene by kanamycin and tetracycline.
R. Vincentelli et al. / Methods 55 (2011) 65–72 67

By leaving out these antibiotics, we observed that the Ori (DE3) 3. Choice of the default culture conditions
pLys S grew at almost the same rate as the other strains. We have
kept the protocol identical throughout the years and the details The analysis of thousands of cultures that went through our
have been described elsewhere [4,14,22] and are shown schemat- expression screening pipeline, following identical procedures,
ically in Fig. 1. sheds lights on the culture conditions that are, on average, the

Fig. 1. Schematic representation of the high-throughput expression screening procedure. The protocol can be used for 96 proteins in 12 culture conditions (4 strains, 3
temperatures and 3 culture media) or up to 1152 proteins in one culture condition (1 strains, 1 medium, 1 temperature, Supplementary protocol 1). The protocol depicted
here is for 96 proteins in 12 culture conditions. The detailed methods can be found in [4,14,22] and in Supplementary protocol 1. (A) Day 1: Transformation of plasmids in
expression strains and pre-culture. 96 plasmids are transformed in four expression strains (see text for the strain choices) or 1152 plasmids in one strain (see text for the
strain choices).At the end of the transformation, a portion is used to plate on agar-plates to keep the colonies for the scale up culture, and a portion is diluted in a DW96 as a
pre-culture for the expression screening. Day 2: Culture, induction, harvest.1152 cultures at three temperatures (or one temperature for one culture condition). Induction is
done when OD is between 0.4 and 0.8. The culture at 37 °C are harvested 3 h post-induction, re-suspended in lysis buffer and frozen. The cultures at 17 °C/25 °C are grown
overnight (o/n). For the simplified protocol the temperature of culture is adjusted to 17 °C or 25 °C and grown o/n. Day 3: Harvest and purification. The cultures at 17 °C/25 °C
are harvested in the morning in lysis buffer and frozen. The cultures at 37 °C are thaw in the morning and purified on Nickel 96 in the afternoon. Day 4: Purification and
analysis. The cultures at 17 °C/25 °C are thaw in the morning and purified on Nickel 96 in the afternoon. The elution from the 37 °C culture plates are analyzed on dot blot (B)
or Labchip GX II (C).Day 5: Analysis. The elution from the 17 °C/25 °C culture plates are analyzed on dot blot (B) or Labchip GX II (C) on day 4/5. The choice of target to put in
production and the conditions of cultures are determined. The scale up cultures are done the week after from the colonies on plates obtained on day 1 after the
transformation. (B) Analysis of the purified proteins by automated dot blot. The robot re-organizes the purified fractions so that the 12 culture conditions per protein are on
the same dot blot (6 proteins and a calibration curve per plate). Media: gray, 2YT, dark gray, SB, black, TB. Culture temperatures (in °C). Bacterial strains: BL21 (DE3) pLys S (B),
Rosetta (DE3) pLys S (R), Origami (DE3) pLys S (0), C41 (DE3) pROS (C). The quantification is automated and the best combination of medium, temperature and strains is
determined for the scale up culture. (C) Analysis of the purified proteins by the Labchip GX II. Fraction of a 96 virtual gel image of purified fractions analyzedusing a HT Low
MW Protein Express LabChipÒ Kit (following the manufacturer’s protocol). A protein ladder for molecular weight determination and quantification (molecular weight in
kDalton) is loaded every 12 samples. For further details, see Supplementary Fig. 1.
68 R. Vincentelli et al. / Methods 55 (2011) 65–72

Fig. 2. Results in percentage for the effect of each E. coli strain (A) temperature at induction (B) and medium (C) on prokaryotic protein solubility at analytical scale. The
results for soluble expression were collected from the screening of several hundred prokaryotic targets in the facility.

most favorable for soluble protein expression. Therefore, these ture is not as efficient as 25 °C. Again, the composition of the media
conditions should be the first to be tested among the 12 cultures has a minimal impact; overall, the solubility is the same per cell,
that we have systematically checked. We have analyzed the condi- and the higher the biomass, the more protein is produced. This
tions with two groups of protein targets: prokaryotic and eukary- was confirmed by experiments (data not shown) on 96 proteins
otic proteins. The prokaryotic target statistics based on several in 4 strains and 3 temperatures using the traditional protocol (3
hundreds proteins can be found in Fig. 2. media) compared with the same experiments in auto-induction
As expected when expressing proteins from a prokaryotic medium ZYP5052 [34] only. The overall results showed that the
organism, the best of the four strains (Fig. 2A) used was the BL21 same proteins were soluble in both experiments, but the biomass
(DE3) pLys S (B) strain. Interestingly, the Ros (DE3) pLys S (R) strain was higher for the ZYP5052, resulting in more protein purified.
performed almost as well. The differences were probably because The other advantages of auto-induction include the elimination
of a slightly longer doubling generation time as a result of tRNA of the need to monitor the growth, production of toxic proteins
overexpression. In general, the Origami (DE3) pLys S (O) and the and growth to high cell densities (OD600 of 12/20 at 17 °C and
C41 (DE3) pRos (C) strains were significantly less useful than the 37 °C, respectively).This prompted us to use this medium exclu-
other strains. However, for some targets, they were the only strains sively in all of our expression screens and production phases. In
that led to soluble expression and should therefore be considered our hands, in both Deep-Well plates or flasks, the optimum condi-
as second-choice rescue strains. The impact of temperature tions of growth in Rosetta (DE3) pLys S, BL21 (DE3) pLys S or C41
(Fig. 2B) on the level of soluble expression is roughly correlated (DE3) pRos strains in ZYP5052 is four hours at 37 °C (time of the
with the biomass level, indicating that overall, prokaryotic target glucose depletion and induction by the lactose) followed by 18 h
folding is as efficient at 37 °C as at 17 °C. Working at 37 °C allows at 17 °C. This produces significantly more soluble proteins than
for a higher protein quantity to be produced per target, but shifting the culture at 25 °C for 18 h in the same conditions. The
decreasing the temperature of the culture often increases the vari- Origami (DE3) pLys S strains growth rate is slightly slower. There-
ety of targets that can be detected in a soluble form. Among the fore, 37 °C/25 °C is advisable for this strain.
three media tested (Fig. 2C), the results are similar, indicating that
the medium is not a major determinant of protein solubility for
prokaryotic targets. 4. Choice of protein constructs
For the analysis of eukaryotic targets, the results that were ob-
tained on the expression screening benchmarking experiment In the last 10 years, we have cloned several hundred proteins
within the SPINE consortium [14] still represent our observations with a His-tag alone or with a His-tag and a fusion partner and
on a wider range of targets expressed in the lab. Our conclusion compared their solubility to determine the influence of several
on this set of 96 proteins was that by screening with a sparse ma- fusion-tags on the level of soluble protein expression [35–38].
trix and selecting the best combination per target, we could double After a bibliographic survey, among many other possible tags
the chances of obtaining sufficient amounts of protein for crystal- [39], the four most popular fusion-tags were selected and compat-
lization studies. Out of 96 proteins, we could produce 39 proteins ible gateway destination vectors were created [16] with glutathi-
with the best combination for each target. But, when using only one S-transferase (GST) [25,26], maltose binding protein (MBP)
one culture, the Rosetta (DE3) pLysS at 25 °C in Super-Broth (SB) [27,40–42], thioredoxin (TRX) [43,44]and the N-utilizing substance
medium, we could already produce 27 proteins. We could not al- A (Nus A) [26,43]. In all cases, a His-tag and a TEV protease cleav-
ways get the highest quantities, but we were able to achieve levels age site were present in the final construct to enable if necessary,
comparable with subsequent production for structure determina- removal of the tag prior to crystallization [16]. We have conducted
tion (Code P 2 [14]).This led to the question of why 12 cultures extensive expression trials with these four vectors [16,19,45].
needed to be screened when two-thirds of the proteins could be The first project in which we used these fusion-tags dealt with
obtained by a single condition. Since this benchmarking experi- proteins of unknown function from Mycobacterium tuberculosis
ment [14], the statistics generated from the protein expression [16,46], a well-known source of genes that are difficult to express
screening analysis of several hundreds new eukaryotic proteins as soluble proteins. We present a summary of the results obtained
in the lab confirmed that for eukaryotic proteins, the Rosetta with the fusion proteins in this project because they are very sim-
(DE3) pLys S strain is more efficient than the BL21 (DE3) pLys S ilar to the vast majority of the data that we have obtained for other
strain (data not shown). The difference comes from additional difficult to express protein families. After screening the culture
tRNAs overcoming the effect of codon bias, and this explains the conditions with variable strains, temperatures and media, we
inversion of strain efficiency compared to the prokaryotic targets. could only detect 13 soluble proteins from the 182 that we cloned
Again, these two strains are significantly more efficient than the with a His-tag alone [16,47]. Therefore, we sub-cloned the 182
Origami (DE3) pLys S and the C41 (DE3) pROS. The optimum clones in parallel into the 4 destination vectors with various N-ter-
culture temperature is 25 °C, followed by 37 °C and 17 °C. The low- minal fusion-tags and a His-tag and compared them to the His-tag
er optimum temperature than that found for prokaryotic targets constructs (pDEST17 OI [16]) alone. The results for the 118/182
demonstrates that a slower growth rate helps the more complex proteins that have been successfully cloned into all vectors can
folding process of eukaryotic proteins. At 17 °C, the biomass has be found in Fig. 3. For the 64 other proteins, at least one construct
not always reached a plateau, which explains why this tempera- was missing and so they are not included in these statistics. With
R. Vincentelli et al. / Methods 55 (2011) 65–72 69

Table 1
Targets recovered after cleavage and removal of the fusion-tags on the Mycobacterium
tuberculosis proteins.

GST TRX Nus A MBP Total


Soluble 19 24 39 50 132
Recovered after TEV cleavage 2 24 6 25 57
Yield (%) 11 100 15 50 43

MBP fusion probably gives a large proportion of false positives


which has also been observed by other researchers [48,49]. The
impact of the respective fusion partners on solubility obtained in
the X-TB project has since been confirmed on several hundred
new proteins in our lab.MBP tends to, in many cases, produce sol-
uble aggregates [48] whilst for more than 80% of our proteins a sol-
uble protein with a TRX fusion will still be usable for downstream
processes and crystal-structure determination after the removing
of the fusion-tag by a TEV cleavage. As a consequence, TRX is al-
Fig. 3. Effect of protein fusion-tags on the soluble expression. Example of the
ways preferred as the first fusion-tag, MBP as a second option
Mycobacterium tuberculosis project at AFMB. The different steps are depicted along and Nus A and GST as final options. In addition, TRX seems to help
the x-axis in a chronological order from left to right. The exact numbers are written properly folded proteins to be expressed at a higher level. In some
on top of each bar. The various fusion-tags used are along the y-axis. pDEST17 OI: cases, TRX probably acts as a chaperone where its co-expression
His-tag alone. For the map of the other constructs, see Supplementary Fig. 2. For the
would help the folding of the protein of interest.
list of available Gateway vectors see Supplementary Table 1. Protocol: after an
analytical culture followed by a Nickel 96 capture step the purified proteins with
correct molecular weight (soluble) were cleaved by the TEV protease (1/10 w/w) 5. Simplified expression screening and protein production
18 h at 4 °C. The samples were filtered on a 96-filter 0.22 lM plate to remove
protocols
precipitated protein and the cleaved fraction (cleaved) was compared to the un-
cleaved and total fraction (expressed) on a SDS–PAGE.
Using the information generated from thousands of analytical
cultures, we have modified our protocol to decrease the number
the use of a fusion partner, the number of soluble proteins in- of screens necessary to obtain soluble expression in the first set
creased from 8.5% with a His-tag alone to 16% (GST), 20% (TRX), of trials (see Fig. 4). For a new project, expression screening is
33% (NusA) and 42% (MBP). We obtained an impressive fivefold initially carried out for a protein in only one culture condition. Be-
increase with the use of MBP compared with His-tag alone. Once cause we usually produce several proteins at a time, we use the
expressed as a fusion partner, the concentration of the proteins Rosetta (DE3) pLys S strain as the default strain without consider-
that were already soluble with a His-tag alone increased in all ing the protein origin. The cultures are grown in the ZYP5052 auto
cases. Overall, we detected 132 soluble proteins with a fusion part- induction medium at 37 °C for four hours, the temperature is then
ner representing 70 out of 118 different targets of the AFMB M. dropped to 17 °C and left overnight for another 18–20 h The
tuberculosis project in at least one construct. Some targets were detailed protocol can be found in Supplementary protocol 1. The
soluble in all 5 constructs, some in only one or two (59% of the temperature change corresponds to the glucose depletion time.
M. tuberculosis targets cloned with fusion-tags are soluble with at We clone all proteins directly into two plasmids: an N terminal
least one fusion-tag). After cleavage of the fusion-tag and elimina- His-tag (pDEST17OI) or a TRX fusion-tag [16]. If the His-tag con-
tion of the uncut and precipitated proteins, we obtained the results struct is soluble (above 2 mg/L), then production is carried out with
detailed in Table 1. this construct. Alternatively, if the His-tag construct is not soluble
In summary, from all of the 132 soluble fusion proteins (repre- or produces less than 0.5 mg/L of protein, and if the TRX construct
senting 70 targets), only 57(43%) have the potential to be useful for is soluble above 2 mg/L, production is carried out using this con-
structural studies. These correspond to 29/70 different M. tubercu- struct. In all other cases, constructs are screened following the res-
losis proteins (41% of the targets soluble with a fusion-tag, 25% of cue strategy 1 using the 4 strains (B, R, O, C) at three temperatures
the 118 targets used on the fusion-tag screening). The 29 targets (37 °C/25 °C/17 °C) in ZYP5052. The same thresholds are applied,
were almost exclusively rescued by fusion to either TRX or MBP. and for recalcitrant targets, rescue strategy 2 is applied (cloning
The results still represent a threefold increase in the number of sol- with three fusion-tags and checking for solubility in the default
uble proteins compared to using a His-tag alone, but indicate that culture conditions or alternative culture conditions). The TRX fu-
for fusion proteins, a significant number are probably misfolded or sion-tag is removed during purification only when the protein is
unfolded and precipitate when the fusion partner is removed. produced for crystal-structure determination. Finally, if this fails,
When expressing a protein of interest with a fusion-tag, the risk the His-tag construct is refolded using the suitable protocol when
of getting soluble aggregates or non-functional proteins is high it is already known [50,51] or after a protein refolding screening
and depends on both the protein of interest and the chosen step [47].
fusion-tag [48]. Therefore, when working with fusion partners, if
possible, the protein integrity should be confirmed by an activity 6. Case study: Production of proteins containing the DNA-
test or by the confirmation of solubility after the cleavage and sep- binding domains of Ciona intestinalis
aration of the protein of interest from the fusion-tag. In this exper-
iment, the GST and Nus A fusions performed badly compared to the The procedures described above were successfully applied to
MBP and TRX fusions. The protein concentration after removing several small- to medium-sized projects. Here, we illustrate the
the MBP or TRX fusions is similar in both cases and higher than application of our protocol to one of the largest projects that we
the concentration of the soluble protein expressed with a His-tag have undertaken so far, the study of Ciona intestinalis transcription
alone. However, 50% of the MBP fusions are lost during the process factor (TF) DNA-binding specificity. This organism was selected as
where we can recover all the TRX fusions. This indicates that the a model for characterizing the full repertoire of DNA-binding
70 R. Vincentelli et al. / Methods 55 (2011) 65–72

domain (DNA BD) specificities of all the transcription factors in a proteins tried, we were able to purify 11 proteins with a TRX or
metazoan genome. We have cloned and checked the expression MBP fusion-tag, 7 with a Nus A fusion-tag and 5 with a GST fu-
of most of the Ciona intestinalis DNA BDs. Soluble DNA BDs could sion-tag. The missing protein was either insoluble or had stability
be purified in milligram amounts, and their DNA binding specific- problems (Nus A/MBP fusion-tags). The ranking of the tags was in
ity was characterized in vitro by a newly designed automated Sys- strong agreement with our previous experiments on proteins of
tematic Evolution of Ligands by Exponential enrichment (SELEX) unknown function from M. tuberculosis. However, the gains in pro-
assay [45]. For this project, 424 DNA BD sequences were cloned tein concentration were much higher. DNA BD proteins are small
into pENTR/D-Topo vectors (Invitrogen, USA) [45]. To be more cost proteins, which could explain the differences in yield. The level
efficient, we first studied 96 DNA BDs, a subset of the 424 DNA BD of solubility of target AC 14 (2 mg/L of culture), the only protein
clones that represented the various families of DNA-binding pro- soluble when using only a His-tag, increased 40–100-fold, depend-
teins. We sub-cloned the BDs into pDest17OI and checked directly ing on the fusion-tags. AC14 was one of the proteins that had the
for soluble expression by dot blot in one culture condition. Only highest level of soluble expression using the TRX and MBP fu-
15% of the proteins were detected in the soluble fraction, at an sion-tags, which suggests a quantitative correlation between the
average concentration of approximately 0.2 mg/L. The 96 clones level of soluble expression with a His-tag alone and the level with
were then checked for soluble expression using the automated a tag-boosting soluble expression. For this particular project, less
expression screening method after Nickel 96. The soluble levels in- than one milligram of purified protein was necessary for the char-
creased for many proteins, and it was possible to detect soluble acterization, and there was no need to remove the tag. Based on
expression for 80% of the proteins. However, the overall average these solubility levels, we were able to downsize the cultures in
soluble level was only 0.5 mg/L and most of these proteins would the production phase to 32 mL (8 wells of a DW24 plate) and pro-
not have been detected without the dot blot protocol. To check ceed directly to purification with a 1 mL nickel column. We were
the influence of tags on the level of soluble expression, we selected able to skip the expression screening procedure completely and di-
12 DNA BDs from 10 transcription factor families and sub-cloned rectly process 48–96 proteins at a time in the milligram range
them in parallel into the four destination vectors with TRX, MBP, using a newly designed 24 nickel column purification procedure
Nus A and GST. We compared their soluble levels to cultures of (see Supplementary protocol 2). We first cloned the 424 proteins
the same DNA BD with only a His-tag. To confirm that the expres- into a TRX fusion vector, expressed them in Ros (DE3) pLys S in
sion screening results were correct (not shown), we used a nickel auto-induction media (37 °C/17 °C) and purified them with the
column to purify the protein from 750 mL of ZYP5052 and used 24 nickel system. If there was less than 1 mg of purified protein,
MALDI to confirm the protein and tag identities. The results are the proteins were sub-cloned into the MBP, Nus A and GST vectors,
summarized in Table 2. one after the other, and purified. Overall, 268 proteins could be
As anticipated, the use of fusion-tags had a drastic effect on the successfully purified (63% of the clones) by this procedure. The
number and expression level of soluble proteins. Out of the 12 simplestproduction protocol (Ros (DE3) pLys S strain, 37 °C/17 °C

Table 2
Effect of the fusion-tags on the solubility of DNA BD. The proteins are produced in the consensus protocol and purified. Protein concentrations are in milligram per liter of culture.
ND, not determined. The protein sequences will be disclosed elsewhere [45].
R. Vincentelli et al. / Methods 55 (2011) 65–72 71

Fig. 4. Schematic of the simplified expression screening and protein production protocol. Bacterial strains: BL21 (DE3) pLys S (B), Rosetta (DE3) pLys S (R), Origami (DE3) pLys
S (0), C41 (DE3) pROS (C). Proteins are cloned in parallel with a His-tag and a His-TRX fusion-tag. Soluble expression is checked in one culture condition after nickel chelate
affinity purification. Depending on the soluble expression level, the His-tag or the His-TRX clone is kept for production and the tag removed if necessary. In case of failure, the
alternative 1 is followed; new cultures are carried out under different culture conditions. The level of soluble expression is checked following nickel chelate affinity
purification. In case of failure, the alternative 2 is followed; going back to the Entry clone, the proteins are cloned in a His-MBP fusion-tag and the screening process repeated.
Depending on the level of soluble expression, the His-MBP clone is used for production. Alternatively, the protein is cloned in His-Nus A and again the level of soluble
expression assessed.

in ZYP5052) was successful for 190 targets (71%). Alternative yields ten- to a hundred-fold compared to our initial default pro-
cultures (change of strains and/or temperature) were needed to tocol (Round 1, [16]). When necessary, failures were rescued by
produce the other proteins. Of these, 260 were TRX fusion-tags alternative expression conditions, which were conducted accord-
(97%), 7 were MBP (2.6%) and 1 was a Nus A (0.4%). No additional ing to the order suggested by statistics (see Fig. 4). For all projects
DNA BDs could be rescued by the GST or the His-tag alone. All tran- at our facility, between 50% and 100% of the proteins that were
scription factor families were well-represented by the 77% of all produced at a suitable soluble level were already being produced
purified proteins if we separately count the abundant Zinc finger with the default combination of conditions. To obtain soluble
family (almost half of the total DNA BDs present in Ciona intestinal- proteins from the other targets, additional culture conditions
is have zinc finger domains) in which only 47% of the proteins were necessary. In our hands, when the TRX and MBP fusion-tags
could be purified. The 268 DNA BDs were characterized by auto- failed to produce soluble proteins, the Nus A fusion-tag rescued
mated SELEX, and the DNA-binding specificity of at least 140 only very few targets and the GST was never successful. For
DNA BDs have been identified to date, validating the functionality projects with only tens of targets, we skipped the initial expres-
of these TRX fusion-tags [45]. sion screening and directly produced the protein at a flask scale
(culture in 750 mL ZYP5052 medium at 37 °C for four hours then
culture at 17 °C overnight), followed by purification on the AKTA
7. Conclusion Xpress (GE healthcare, Sweden) or on our custom nickel 24
system. When there was not enough soluble protein, depending
Our involvement in large-scale projects, and the initial lack of on the scale of the project, we would go back to the analytical
suitable statistics on optimum culture conditions, prompted us to expression screening to determine the best conditions for soluble
set up a high-throughput expression-screening protocol (Fig. 1). expression. If screening of culture conditions and constructs
Over the last ten years, we systematically checked the soluble failed, we tried to increase our chances of success by expressing
expression of hundreds of proteins under 12 culture conditions. homologs and/or orthologs [18,52] or by expressing multiple
This has enabled for the rapid production of proteins for struc- variants of the proteins of interest [53–57].
ture determination, and at the same time, allowed us to empiri-
cally determine the best protocols for expressing suitable
amounts of protein. On the basis of this experience, we have Acknowledgments
developed a revised and simpler protocol (see Supplementary
protocol 1) that can be easily adapted to any laboratory needing We want to Acknowledge: Dr. S. Canaan (EIPL-UPR 9025 CNRS,
only medium throughput (96 cultures in parallel) with the use of Marseille, France) and Dr. Y. Bourne (AFMB, Marseille, France) for
deep-well plates, a simple vacuum manifold and SDS–PAGE their participation on the Mycobacterium tuberculosis project. Dr.
detection. G. Stier (BZH, University of Heidelberg, Germany) for the modified
In our last projects, with the use of our simple protocol based pET vectors that were the templates for the pETG vectors used in
on the culture of two constructs (His-tag and His-TRX) per pro- this work. Dr. K.R. Nitta and Dr. P. Lemaire (IBDML – UMR6216,
tein, we decreased the number of cultures necessary to produce Marseille, France) for our fruitful collaboration of the Ciona intesti-
suitable quantities of soluble proteins while we increased the nalis project.
72 R. Vincentelli et al. / Methods 55 (2011) 65–72

Funding: This work was supported by the ANR Blanc Chor-reg- [24] T.G. Schmidt, A. Skerra, Nature Protocols 2 (6) (2007) 1528–1535.
[25] D.B. Smith, K.S. Johnson, Gene 67 (1988) 31–40.
net (NT05-2_42083), the ANR PFTV, the grant-in-Aid from MEXT
[26] K. Terpe, Appl. Microbiol. Biotechnol. 60 (5) (2003) 523–533.
(21671004) and the Marseille-Nice Génopole. [27] C.V. Maina et al., Gene 74 (1988) 365–373.
[28] E. Sala, A. de Marco, Protein Expres. Purif. 74 (2) (2010) 231–235.
[29] F.W. Studier, B.A. Moffatt, J. Mol. Bio. 189 (1) (1986) 113–130.
Appendix A. Supplementary data [30] C. Abergel et al., J. Struct. Funct. Genomics 4 (2–3) (2003) 141–157.
[31] J.W. Dubendorff, F.W. Studier, J. Mol. Bio. 219 (1) (1991) 45–59.
Supplementary data associated with this article can be found, in [32] B. Miroux, J.E. Walker, J. Mol. Biol. 260 (3) (1996) 289–298.
[33] F.W. Studier et al., Method. Enzymol. 185 (1990) 60–89.
the online version, at doi:10.1016/j.ymeth.2011.08.010. [34] F.W. Studier, Protein. Expr. Purif. 41 (1) (2005) 207–234.
[35] J. Bogomolovas et al., Protein Expres. Purif. 64 (1) (2009) 16–23.
References [36] S. Kim, S.B. Lee, Protein Expres. Purif. 62 (1) (2008) 116–119.
[37] D. Esposito, D.K. Chatterjee, Curr. Opin. Biotechnol. 17 (4) (2006) 353–358.
[38] M. Hammarstrom et al., Protein Sci. 11 (2) (2002) 313–321.
[1] U. Heinemann, Nat. Struct. Biol. 7 (Suppl) (2000) 940–942.
[39] D. Walls, S.T. Loughran, Meth. Mol. Biol. 681 (2011) 151–175.
[2] S.A. Lesley et al., Proc. Natl. Acad. Sci. USA 99 (2002) 11664–11669.
[40] C. di Guan et al., Gene 67 (1) (1988) 21–30.
[3] R.C. Stevens, S. Yokoyama, I.A. Wilson, Science 294 (2001) 89–92.
[41] R.B. Kapust, D.S. Waugh, Protein Science: A Publication of the Protein Society 8
[4] S. Graslund et al., Nat. Meth. 5 (2) (2008) 135–146.
(8) (1999) 1668–1674.
[5] R. Xiao et al., J. Struct. Biol. 172 (1) (2010) 21–33.
[42] S. Nallamsetty, D.S. Waugh, Nature Protocols 2 (2) (2007) 383–391.
[6] A. Correa, P. Oppezzo, Biotechnol. J. 6 (6) (2011) 715–730.
[43] G.D. Davis et al., Biotechnol. Bioeng. 65 (1999) 382–388.
[7] P. Savitsky et al., J. Struct. Biol. 172 (1) (2010) 3–13.
[44] E.R. LaVallie et al., Bio/Technol. 11 (2) (1993) 187–193.
[8] E. Vernet et al., Protein Expres. Purif. 77 (1) (2011) 104–111.
[45] K.R. Nitta et al., An extensive atlas of transcription factor DNA-binding
[9] R.K. Knaust, P. Nordlund, Anal. Biochem. 297 (2001) 79–85.
specificity in a chordate, in preparation.
[10] C. Scheich et al., BMC Struct. Biol. 4 (2004) 4.
[46] S.T. Cole, P.M. Alzari, Science 307 (5707) (2005) 214–215.
[11] P.M. Alzari et al., Acta Crystallogr. D 62 (Pt 10) (2006) 1103–1113.
[47] R. Vincentelli et al., Protein Sci. 13 (10) (2004) 2782–2792.
[12] D. Busso et al., J. Struct. Funct. Genomics 6 (2–3) (2005) 81–88.
[48] K. Zanier et al., Protein Expres. Purif. 51 (1) (2007) 59–70.
[13] M.B. Murphy, S.A. Doyle, Method. Mol. Biol. 310 (2005) 123–130.
[49] W. Peti, R. Page, Protein Expres. Purif. 51 (1) (2007) 1–10.
[14] N.S. Berrow et al., Acta Crystallogr. D 62 (Pt 10) (2006) 1218–1226.
[50] A.M. Buckle et al., Nature Meth. 2 (1) (2005) 3.
[15] T.C. Terwilliger, D. Stuart, S. Yokoyama, Ann. Rev. Bio. 38 (2009) 371–383.
[51] J. Phan et al., Meth. Mol. Biol. 752 (2011) 45–57.
[16] R. Vincentelli et al., Acc. Chem. Res. 36 (3) (2003) 165–172.
[52] A. Savchenko et al., Proteins 50 (3) (2003) 392–399.
[17] M.J. Fogg et al., Acta Crystallogr. D 62 (Pt 10) (2006) 196–207.
[53] S. Graslund et al., Protein Expres. Purif. 58 (2) (2008) 210–221.
[18] B. Coutard, B. Canard, Antivir. Res. 87 (2) (2010) 85–94.
[54] G.S. Waldo et al., Nat. Biotechnol. 17 (7) (1999) 691–695.
[19] K. Michalke et al., Anal. Biochem. 386 (2) (2009) 147–155.
[55] F. Tarendeau et al., Nat. Struct. Mol. Biol. 14 (3) (2007) 229–233.
[20] G. Sulzenbacher et al., Acta Crystallogr. D 58 (Pt 12) (2002) 2109–2115.
[56] H. Yumerefendi et al., J. Struct. Biol. 172 (1) (2010) 66–74.
[21] A.J. Walhout et al., Methods Enzymol. 328 (2000) 575–592.
[57] E. Littler, Drug Discov. Today 15 (11–12) (2010) 461–467.
[22] R. Vincentelli et al., Anal. Biochem. 346 (1) (2005) 77–84.
[23] A. Skerra, T.G. Schmidt, Biomol. Eng. 16 (1–4) (1999) 79–86.

You might also like