Introduction

High-throughput genome sequencing of pathogens and tumor antigens and advances in bioinformatics have disclosed a massive number of potential protein targets of immune responses. The use of this new knowledge for the development of novel vaccines is termed “reverse vaccinology” (Rappuoli 2000). The use of major histocompatibility complex (MHC)-binding algorithms has allowed the identification of novel T cell epitopes that could be used in vaccine design against infectious diseases and cancer, the so-called epitope-driven vaccine design. Epitope-based vaccines have the ability to focus the immune response on highly antigenic epitopes, free from the original protein scaffold, which can be selected to be conserved and amply recognized by the target population while increasing safety. Given the fundamental role played by CD4+ T cells in determining the functional status of both innate and adaptive immune responses, the inclusion of appropriate CD4+ T cell epitopes may be essential for vaccine efficacy. Here we describe the background underlying the need for CD4+ T cell help in vaccine development and how to select CD4+ T cell epitopes for an epitope-based vaccine. We also provide examples of how the approach can be used to successfully overcome barriers observed in whole protein-based vaccines.

Antigen Presentation and T Cell Recognition

T cells recognize antigen via the binding of T cell receptors (TCRs) to self or foreign proteins (Braga-Neto and Marques 2006). This phenomenon is dependent on the previous binding of the antigenic peptides to cell-surface glycoproteins, the MHC proteins. The human MHC locus (HLA—human leukocyte antigens) encodes three HLA class I molecules (HLA-A, -B, -C) and three HLA class II molecules (HLA-DR, -DQ, -DP). The structures of the MHC class I and class II molecules complexed with different peptides were solved by X-ray crystallographic studies. CD8+ T cell epitopes bound by MHC class I range from 8 to 11 residues, while CD4+ T cell epitopes bound to HLA class II range from 11 to more than 20 residues in length. Bound peptides are buried in the antigen-binding groove formed between the helices of the MHC molecule, leaving only a few of their side chains available for direct TCR contact (Bjorkman et al. 1987; Stern et al. 1994). Only nine residues from a given peptide bind to MHC class II within the antigen-binding groove, while flanking residues may also interact with the TCR (Carson et al. 1997). Specific binding of certain peptides to a MHC molecule comes from the interaction of peptide side chains with the irregular surface of the floor and sides of the groove, the “pockets” and ridges formed by the protrusion of MHC residues. The major pockets in the floor of the groove of HLA-DR molecules are occupied by the side chains of residues 1, 4, 6, and 9 of the bound peptide (Stern et al. 1994).

HLA molecules are highly polymorphic. To date, more than 2,215 HLA class I and 986 HLA class II allelic sequences have been identified (Robinson et al. 2009). This polymorphism is concentrated in the region encoding the peptide-binding groove, yielding very diverse amino-acid sequences in this region among different HLA alleles. Thus each pocket in the peptide-binding groove of each HLA molecule is shaped by clusters of polymorphic residues. It follows that each allelic HLA molecule only binds peptides with amino-acid sequences that are capable of interacting with its antigen-binding groove. This implies that T cells from different individuals may be able to recognize distinct epitopes from a given protein.

Escape from Presentation and Recognition by “Molecular Evolution”

For millions of years, pathogens have evolved molecular mechanisms to escape effective presentation and recognition by the immune system. Tumor cells and pathogens evolve accumulating mutations in protein antigens, either flanking or inside an epitope, to escape immune pressure by one of several mechanisms: (1) abrogation of epitope binding to host MHC or TCR (De Groot et al. 2008), (2) loss of sites for sequence-specific processing proteases (Moudgil et al. 1996), and (3) antagonism or partial agonism of TCR signaling (Franco et al. 1995). Thus the immune response, or lack thereof, against pathogen protein antigens may reflect the evolutionary success of the parasite. Immunization with substituted synthetic peptides or DNA/proteins can present neoepitopes or alter the hierarchy of dominant/cryptic T cell epitopes, bypassing escape mechanisms (Cunha-Neto 1999).

Immunodominance

The hallmarks of the adaptive immune response are specificity and memory, for which T cells are indispensable. Protein antigens typically contain multiple epitopes capable of binding MHC class II molecules, but T cell responses are limited to only a small number of these determinants in each individual. The ability of the immune system to regulate and focus T cell responses to a select number of epitopes is termed immunodominance (Berzofsky 1988). As the immune response is always mounted against immunodominant epitopes of an antigen, exposure to these regions will result in efficient priming of the immune system, which can generate protection on subsequent challenge. In other words, immunodominant epitopes, or collections of them, can be potential vaccine candidates. Such epitope-based vaccines exploit small but useful antigenic regions of a protein and ignore portions that are poorly immunogenic or can cause a harmful response (Gowthaman and Agrewala 2008).

A number of mutually nonexclusive hypotheses have been put forth to explain this very restricted specificity of T cells, including: (1) endosomal antigen processing may restrict the array of peptides available to recruit CD4+ T cells, (2) MHC molecules may be able to bind only a limited subset of antigenic peptides that are released during antigen processing, the so-called determinant selection and (3) the TCR repertoire may be limited and only able to detect some MHC:peptide combinations (holes in the repertoire) (Sant et al. 2005). Processing reactions within an antigen-presenting cell may also influence immunodominance since gross changes in antigen structure may modulate the efficiency of an epitope’s presentation (Li et al. 2009).

While dominant peptides efficiently elicit recall responses in animals primed with whole antigens, nondominant or cryptic peptides are only immunogenic when directly administered in vivo. Epitope immunodominance is an MHC-restricted phenomenon (i.e. individuals with distinct MHCs will select different immunodominant epitopes). Immunodominance of certain pathogen epitopes can also induce a detrimental effect by restricting the breadth of recognized T cell epitopes. In a recent phase I clinical trial, subjects immunized with a recombinant adenovirus 5 vector encoding the whole human immunodeficiency virus (HIV-1) proteins Gag, Pol, and Nef recognized on average a single epitope per HIV protein, a number clearly insufficient to provide protection to the highly variable HIV-1 (Watkins et al. 2008).

The Importance of CD4+ T Cell Help

CD4+ T cells play a central role in a functional adaptive immune response. They promote the optimal expansion of cytotoxic CD8+ T cells, maintain CD8+ T cell memory, and communicate with innate immune cells. Furthermore, they promote B cell differentiation into plasma cells to produce neutralizing antibodies and assist memory B cells for a swift recall response to re-infection (Yang and Yu 2009). The help provided by CD4+ T cells is essential for the generation of a robust primary and memory CD8+ T cell response and protective immunity against various viral and bacterial infections (Bevan 2004; Novy et al. 2007). Besides, CD4+ T cells can themselves act as antiviral effector cells either by killing infected cells directly or by secretion of antiviral cytokines, such as IFN-γ and TNF-α, and can become memory helper T cells. The importance of CD4+ T cells in immunity to infection is further underscored in that mice with absent or defective CD4+ T cell help have an impaired ability to clear viral and protozoan pathogens, such as lymphocytic choriomeningitis virus (Khanolkar et al. 2004), herpes simplex virus-1 (Rajasagi et al. 2009), and Plasmodium (Xu et al. 2002).

The mechanisms underlying CD4+ T cell help to CD8+ T cells are not completely understood, but recent evidence suggests that CD4+ T cells facilitate the activation and development of CD8+ T cell responses either directly through the provision of cytokines or by the major pathway, i.e. dendritic cell (DC) licensing (Lanzavecchia and Sallusto 2001; Smith et al. 2004; Wodarz and Jansen 2001). In DC licensing, the CD40 ligand-CD40 interaction results in IL-12 and IL-15 production and up-regulation of costimulatory molecules, which leads to subsequent activation and maturation of DCs, making them competent to stimulate an antigen-specific CD8+ T cell response (Bennett et al. 1998; Ridge et al. 1998; Schoenberger et al. 1998; Smith et al. 2004; Zhang et al. 2009). In addition to signaling via surface molecules, CD4+ T cell-derived IL-2 is an important component of help for CD8+ T cell immunity to pathogens (Livingstone et al. 2009).

Viruses with a tropism for helper T cells, such as HIV, can potentially impair the CD4+ cell response, resulting in compromised cytotoxic T lymphocyte (CTL) activity and persistent infection. The importance of HIV-specific CD4+ T cells and its association with a protective antiviral immune response has been demonstrated (Gandhi and Walker 2002). Preservation of memory CD4+ T cells correlated with primate survival after challenge with simian immunodeficiency virus (SIV) (Letvin et al. 2006), further showing the importance of memory CD4+ T cells in protection.

Tumor-associated antigen-specific CD4+ T cell helper activity is exerted for the induction and maintenance of CTLs, in addition to other immune cells. Moreover, CD4+ T cells can also possess cytotoxic ability and directly kill tumor cells (Guo et al. 2005) and thus may have a role in cancer immunotherapy (Ohkuri et al. 2009). Taken together, evidence indicates that effective CD4+ T cell help or direct effector action is essential for anti-infection or anticancer immunity.

Prediction of CD4+ T Cell Epitopes

Given the fundamental role of CD4+ T cell function on anti-infection/anticancer immunity, the identification and characterization of CD4+ T cell epitopes is a crucial step in vaccine design. It is also important for studying the immunobiology of autoimmunity, allergy, and transplantation along with immune diagnostics (Moise and De Groot 2006; Valentino and Frelinger 2009). A conventional approach to identifying T cell epitopes is to synthesize many overlapping peptides (usually 15-mers) spanning the full length of the target antigen and test for immunogenicity using T cell assays. However, this approach is time consuming, expensive (especially for whole-proteome analyses), and may not disclose all the longer CD4+ T cell epitopes. On the other hand, bioinformatic/in silico tools can predict which peptides are more likely to contain T cell epitopes, greatly reducing the number of candidate sequences (Gowthaman and Agrewala 2008).

T cell epitope prediction dates back to the 1980s, when the first algorithm was developed based on the identification of amphipathic helical regions on protein antigens (Berzofsky et al. 1987). Since then, new methods based initially on MHC peptide-binding motifs and, more recently, on MHC-binding properties based on scores calculated from actual peptide-binding assays have been developed. MHC class II binding prediction methods are more complex than those for MHC-I binding (Yang and Yu 2009).

A range of bioinformatic algorithms have been developed to predict MHC class II epitopes (Table 1). Since the most selective requirement for a peptide to be immunogenic is its ability to bind to the MHC molecule, most prediction methods focus on this stage of the pathway. The algorithms that are used vary in complexity and accuracy. Several of them rely on the fact that most pockets in the MHC class II binding groove are shaped by clusters of polymorphic residues and thus have distinct chemical and size characteristics in different MHC class II alleles. MHC class II binding prediction methods are categorized into two main groups: quantitative and qualitative. Qualitative matrices determine the binding status (whether a peptide is a “binder” or “non-binder”) based on the predictive score (based on position-specific binding profiles) (e.g. SYFPEITHI, RANKPEP, MULTIPRED), while quantitative approaches predict the strength of binding as well (e.g. TEPITOPE, PROPRED, NetMHCII pan, SVRMHC, ARB, and SMM-align) (Rajapakse et al. 2007; Wan et al. 2007). Several studies have compared the performances of MHC class II binding prediction methods (Gowthaman and Agrewala 2008; Lin et al. 2008; Wang et al. 2008). Briefly, the conclusions of these studies were similar, indicating that despite the difference in datasets used, the contemporary methods have shown little improvement over the older TEPITOPE algorithm (Lin et al. 2008). The TEPITOPE HLA-DR binding prediction algorithm (Sturniolo et al. 1999) uses the concept that each HLA-DR pocket can be characterized by “pocket profiles,” a quantitative representation of the interaction of all natural amino-acid residues with a given pocket (Hammer et al. 1994). A small database of pocket profiles was sufficient to generate a large number of virtual HLA-DR matrices representing a significant proportion of HLA-DR peptide-binding specificity. Such matrices were incorporated in the TEPITOPE software. For each HLA-DR specificity, TEPITOPE generated a binding score corresponding to the algebraic sum of the strength of interaction between each residue and pocket, which correlated to binding affinity. Peptide scores along a scanned protein sequence were normalized for each HLA-DR as the proportion of the best binder peptides. Since the software displays a significant number of HLA-DR specificities, it is also capable of predicting promiscuous HLA class II ligands (Panigada et al. 2002). The TEPITOPE prediction model has been successfully applied to the identification of T cell epitopes in the context of several human diseases (Bian and Hammer 2004).

Table 1 Examples of in silico tools for class II epitope prediction

The applications of computational HLA class II epitope prediction include vaccine candidate discovery, the study of pathogenesis (infectious diseases, autoimmunity, and cancer), allergy treatment, drug development, engineering of therapeutic proteins, and diagnostics. A brief account of some studies identifying epitopes with the aid of prediction algorithms performed over the last decade is presented in Table 2.

Table 2 Applications of computational class II epitope prediction

Our group and others have successfully used TEPITOPE for mapping promiscuous CD4+ T cell epitopes in different pathogen antigens, including HIV-1 whole proteome (Fonseca et al. 2006), Schistosoma mansoni Sm14 and paramyosin (Fonseca et al. 2005a; Fonseca et al. 2005b), Plasmodium vivax MSP-1 (Rosa et al. 2006), Paracoccidioides braziliensis gp43 (Iwai et al. 2007), several Mycobacterium tuberculosis proteins, cytomegalovirus pp65 and glyP86, and SIV whole proteome (unpublished observations). Peptide-HLA DR binding assays confirmed the capacity of the predicted epitopes to bind to several HLA class II molecules in that most peptides bound to at least 50% of the HLA-DR molecules tested. Furthermore, a significant correlation was observed between the TEPITOPE-predicted promiscuity (i.e. the number of HLA-DR molecules predicted to bind to a certain peptide) and the promiscuity observed in binding assays to multiple HLA-DR molecules (Fig. 1). Individual peptides were recognized by 25–50% of patients; combined T cell recognition of such peptides was detected in a high proportion of patients (75–90%).

Fig. 1
figure 1

Correlation between the prediction of promiscuous epitopes by TEPITOPE and the promiscuity assessed by binding assays to nine prevalent HLA-DR molecules. TEPITOPE prediction was performed using peptides selected at a 3% threshold (25 HLA-DR molecules scanned). Data from peptides derived from Schistosoma mansoni (paramyosin and Sm14); Paracoccidioides braziliensis (gp43); Mycobacterium tuberculosis (groEL2, PBP-1, 19kAg, 19Kmmp, mtp40, and mce-1) and cytomegalovirus (pp65 and glyP86). HLA-DR binding data (DRB1*0101, *0301, *0401, *0405, *0701, *1101, *1302,*1501, DRB5*0101) were generated by Alessandro Sette and John Sidney (La Jolla Institute of Allergy and Immunology, USA)

Predicted CD4+ Epitopes and Vaccine Design

Immunization strategies that focus solely on CD8+ T cell immunity might prove to be insufficient because even though they might stimulate vigorous early responses, they will be unable to confer long-term protective immunity (Khanolkar et al. 2007). Although considerable information has been gathered on CD8+ T cell epitopes, comparatively few CD4+ T cell epitopes have been identified so far, no matter whether the origin is pathogens or tumor-associated proteins. The rational selection of protein sequences that function as promiscuous CD4+ epitopes in vaccine formulations is crucial for successful application of this vaccination strategy. Given their major role in determining the functional status and memory of effector responses, appropriate CD4+ T cell epitopes should be an essential part of any candidate vaccine. In this scenario it is thus necessary to identify the still missing CD4+ T cell epitopes recognized by the majority of individuals.

The advent of whole-genome sequencing and advances in bioinformatics marked the beginning of a new era that approaches vaccine development starting from genomic information, a process named “reverse vaccinology” (Rappuoli 2000). In just a few years, reverse vaccinology applied to Neisseria meningitidis has resulted in the identification of more vaccine candidates than those discovered during the previous four decades of research by conventional methods (Pizza et al. 2000; Serruto and Rappuoli 2006). A related approach, termed “epitope-driven vaccine design” (De Groot et al. 2001), originally named “reverse immunogenetics” by Davenport and Hill, employs T cell-epitope mapping tools for finding new protein candidates for vaccines and diagnostic tests (Davenport and Hill 1996) (Fig. 2). Epitope-driven vaccine design allows the discovery of previously unknown and undescribed antigens and epitopes as vaccine candidates. Furthermore, predicted CD4+ T cell epitopes should be validated using in vitro evaluation of HLA binding, ex vivo assays with T cells from sensitized hosts (ELISpot assays, MHC-tetramers, flow cytometry for cytokine production and proliferation), and in vivo validation using HLA class II transgenic mice.

Fig. 2
figure 2

Epitope-driven vaccine design flowchart

The rational design of promiscuous epitopes has been applied to create an artificial TH-cell epitope by inserting anchor residues for diverse HLA-DR molecules, yielding a powerful pan-DR epitope (PADRE) that is presented by multiple HLA DR molecules (Alexander et al. 1994). The use of synthetic epitopes such as PADRE to optimize CD8+ function (Kim et al. 2008) and antibody production (Rosa et al. 2004) holds promise for a new generation of highly efficacious vaccines.

The epitope-based vaccine approach allows focusing immune responses on relevant epitopes, the rational engineering of epitopes for increased potency and breadth, as well as increasing safety (Sette and Fikes 2003). The major disadvantage of the epitope-based approach is that algorithms may fail to predict all the relevant epitopes (Iwai et al. 2003). Moreover, only a limited set of HLA class I specificities have algorithms designed to them. Furthermore, whole-protein immunization resembles infection-associated immunity and may offer epitopes that can be recognized by patients bearing such “uncharted” HLA class I specificities. This may be minimized, however, by the use of a significant number of longer, promiscuous CD4+ epitopes containing CD8+ T cell epitopes.

The use of the available matrix-based MHC-binding algorithms allows the potential selection of high-affinity binding peptides, the ones with a higher chance of eliciting T cell responses. From the vaccine immunologist’s point of view, however, the identification of MHC allele-specific T cell epitopes may not be enough, since one is searching for epitopes for vaccine epitopes that can effectively cover the human population, with its large diversity of HLA alleles (Wilson et al. 2001). This implies the identification of multiple “promiscuous” epitopes that can bind to multiple distinct HLA alleles (Sturniolo et al. 1999) whose combined frequency in the population approaches 100%. The alignment of peptides binding to several distinct HLA-DR molecules with TEPITOPE or other quantitative matrix MHC class II prediction algorithm can identify such potentially “promiscuous” epitopes. Since many pathogens exhibit high mutation rates, it is important to select conserved protein sequences for scanning with MHC-binding algorithms (Khan et al. 2006).

Our group recently identified a group of such epitopes from the whole proteome of the HIV-1 B-subtype consensus sequence using the TEPITOPE algorithm. This led to the identification of 18 sequences predicted to bind significantly to at least two-thirds of the HLA-DR molecules covered by the TEPITOPE algorithm, representing ca. 90% of the Caucasian population (Fonseca et al. 2006). Binding assays of the 18 selected peptides with the 9 most prevalent HLA-DR molecules in the general population confirmed the ability of the selected peptides to bind to multiple HLA-DR molecules. Peripheral blood mononuclear cells (PBMCs) from over 90% of a group of HIV-1-infected patients recognized at least one of the promiscuous peptides. All 18 peptides were recognized, and the PBMCs from most of the patients recognized multiple peptides, demonstrating that the epitopes are presented in vivo during infection. Similar responses were obtained in CD8+ T cell-depleted PBMCs. Together, this may suggest that this epitope combination could have good potential for use as an immunogen against HIV. Polyepitopic recombinant vaccines, containing multiple epitopes in tandem, may alter the hierarchy of dominant epitopes, circumventing the mechanisms of escape from processing and presentation built into sequences of native viral proteins (Fu et al. 1997). In addition, a polyepitopic vaccine in which each epitope is promiscuous increases the chances that each individual in a genetically heterogeneous population acquires immunity to multiple epitopes, a necessary measure in a vaccine against the highly polymorphic HIV virus. We thus constructed a polyepitopic DNA vaccine encoding the 18 HIV-1 CD4+ T cell epitopes. Preliminary data from immunogenicity studies in mice indicate the vaccine elicits a powerful CD4+ and CD8+ T cell response, detectable against all the contained epitopes (unpublished observations). Significantly, this showed that epitope prediction by TEPITOPE can also predict epitopes recognized in nonhuman species. Rosa et al. (2006) used the TEPITOPE algorithm to identify a promiscuous CD4+ T cell epitope in P. vivax merozoite surface protein-1. Mice immunized with a recombinant P. vivax MSP119 protein containing the above promiscuous epitope developed significantly higher anti-MSP19 IgG antibody titers than those immunized with intact MSP119.

In addition to immunogenicity studies, several recent studies suggest that CD4+ T cell epitope-based vaccines may indeed have an increased protective effect. Mice immunized with TEPITOPE-selected peptides derived from S. mansoni Sm14 protein displayed partial protection against infectious challenge (Garcia et al. 2008). Barker et al. (2008) scanned the Chlamydia proteome with the SYFPEITHI and PROPRED algorithms to identify proteins with promiscuous epitopes. Adoptive transfer of antigen-primed CD4+ T cells was found to confer significant reduction in shedding of chlamydial organisms and duration of infection. Moreover, serum from immunized mice was found to neutralize Chlamydia infection of a cell monolayer in vitro, demonstrating the importance of CD4 help in generating appropriate antibody responses. Despite the importance of CD4+ T cells, it must be kept in mind that antibodies, CD8+ T cells, and the delayed-type hypersensitivity reaction are the major end-effectors of acquired immunity. Thus CD4+ T cell-based vaccines must necessarily engage these effectors to achieve any degree of protection.

Conclusion

Increasing evidence demonstrates the critical role of CD4+ T cells in natural antigen encounter and immunization. We have reviewed the essential role of CD4+ T cell responses in the acquired immune response and the unavoidable need to include a CD4 epitope component to provide cognate help in vaccines that aim also to induce cytotoxic CD8+ T cells and neutralizing antibodies. In light of the availability of epitope prediction tools, and the recent demonstration of the important immunogenicity of epitope-based vaccines, we anticipate that the near future will show an increasing number of candidate vaccines containing cassettes of promiscuous, conserved CD4+ T cell epitopes, with consequent improvement of immunogenicity and protection.