Nothing Special   »   [go: up one dir, main page]

Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2002 Jun;76(11):5435–5451. doi: 10.1128/JVI.76.11.5435-5451.2002

Human Immunodeficiency Virus Type 1 Subtype C Molecular Phylogeny: Consensus Sequence for an AIDS Vaccine Design?

V Novitsky 1, U R Smith 2, P Gilbert 3, M F McLane 1, P Chigwedere 1, C Williamson 4, T Ndung'u 1, I Klein 1, S Y Chang 1, T Peter 5, I Thior 5, B T Foley 2, S Gaolekwe 6, N Rybak 1, S Gaseitsiwe 5, F Vannberg 7, R Marlink 1, T H Lee 1, M Essex 1,*
PMCID: PMC137027  PMID: 11991972

Abstract

An evolving dominance of human immunodeficiency virus type 1 subtype C (HIV-1C) in the AIDS epidemic has been associated with a high prevalence of HIV-1C infection in the southern African countries and with an expanding epidemic in India and China. Understanding the molecular phylogeny and genetic diversity of HIV-1C viruses may be important for the design and evaluation of an HIV vaccine for ultimate use in the developing world. In this study we analyzed the phylogenetic relationships (i) between 73 nonrecombinant HIV-1C near-full-length genome sequences, including 51 isolates from Botswana; (ii) between HIV-1C consensus sequences that represent different geographic subsets; and (iii) between specific isolates and consensus sequences. Based on the phylogenetic analyses of 73 near-full-length genomes, 16 “lineages” (a term that is used hereafter for discussion purposes and does not imply taxonomic standing) were identified within HIV-1C. The lineages were supported by high bootstrap values in maximum-parsimony and neighbor-joining analyses and were confirmed by the maximum-likelihood method. The nucleotide diversity between the 73 HIV-1C isolates (mean value of 8.93%; range, 2.9 to 11.7%) was significantly higher than the diversity of the samples to the consensus sequence (mean value of 4.86%; range, 3.3 to 7.2%, P < 0.0001). The translated amino acid distances to the consensus sequence were significantly lower than distances between samples within all HIV-1C proteins. The consensus sequences of HIV-1C proteins accompanied by amino acid frequencies were presented (that of Gag is presented in this work; those of Pol, Vif, Vpr, Tat, Rev, Vpu, Env, and Nef are presented elsewhere [http://www.aids.harvard.edu/lab_research/concensus_sequence.htm]). Additionally, in the promoter region three NF-κB sites (GGGRNNYYCC) were identified within the consensus sequences of the entire set or any subset of HIV-1C isolates. This study suggests that the consensus sequence approach could overcome the high genetic diversity of HIV-1C and facilitate an AIDS vaccine design, particularly if the assumption that an HIV-1C antigen with a more extensive match to the circulating viruses is likely to be more efficacious is proven in efficacy trials.


The dynamic character of the AIDS epidemic and nonuniform historical distribution of human immunodeficiency virus type 1 (HIV-1) subtypes has been exemplified by an unprecedented increase in prevalence of HIV-1 subtype C (HIV-1C). Dominating in several major areas, including southern Africa, the Horn of Africa, India, and China, HIV-1C represents at least 56% of all circulating group M infections (18). HIV-1C is the only HIV-1 subtype that accounts for prevalence rates as high as 20 to 40% in the general adult population of entire countries (51; also see http://www.who.int/emc-hiv/fact_sheets/).

The divergent patterns of the AIDS epidemic in different geographic areas may be an important consideration for the design and testing of HIV vaccines. The monophyletic HIV-1C epidemic in southern Africa contrasts with the multisubtype epidemic in other regions of sub-Saharan Africa (reviewed in references 19, 31, and 43). Both cross-clade immunity (6, 10, 17, 30, 53) and subtype-specific immune responses have been reported (11, 15, 45). This indicates that the relative importance of cross-reactive versus clade-specific immunity that might be elicited by a protective vaccine remains unknown.

Several studies that analyze the full-length HIV-1C genome have been recently reported (21, 28, 33, 36, 37, 39, 41, 44, 46). The 27 nonrecombinant near-full-length HIV-1C isolates that were sequenced and analyzed earlier (21, 28, 37, 41, 44, 46) represented viruses from Botswana (9 isolates), India (9 isolates), Tanzania (2 isolates), Zambia (2 isolates), Brazil (2 isolates), Ethiopia (1 isolate), South Africa (1 isolate), and Israel (1 isolate). An additional set of four near-full-length genome sequences from South Africa (52) was used in this study. Another new set of five near-full-length genome sequences from Ethiopia was reported recently (24), although the sequences were not available in the public domain and were not included in this study. By performing analyses of phylogenetic patterns within available near-full-length genome nonrecombinant HIV-1C, we addressed (i) the phylogenetic relationship of HIV-1C viruses, including genetic diversity within and between geographically distinct subsets, and (ii) analysis of the HIV-1C consensus sequence as reference information for HIV-1C vaccine studies in southern Africa and, particularly, in Botswana.

HIV-1 diversity has been considered a major problem for the development of a vaccine. A relatively high level of HIV-1C intrasubtype diversity in southern Africa (12, 41, 52) increases the magnitude of the challenge. Choosing the “right” vaccine candidate by employing a particular viral strain, clone, or isolate is still the current approach to HIV vaccine design. A homologous vaccine might be an ideal one, although in the case of HIV-1C infection it is a rather unrealistic goal. If a higher homology between vaccine and circulating strain(s) results in a more efficacious vaccine, then a consensus sequence approach to AIDS vaccine design might lead to a better vaccine. In this study we address (i) whether the genetic diversity of HIV-1C might be overcome by using a consensus sequence as a vaccine candidate instead of any particular viral isolate and (ii) what the extent of potential vaccine coverage of circulating viral variants would be.

The most common presentation of the consensus is a one-string consensus sequence showing nucleotides or amino acids that are the most frequent (i.e., HIV consensus sequences in reference 9) or occurred in more than 50% of cases at a particular position of alignment, and uncertainties (i.e., question marks or “X”) otherwise. However, such a one-string consensus sequence does not include any diversity information (except for uncertainties) and might not be very informative in the case of high viral diversity. Moreover, minor residues with high frequencies (i.e., 20 to 49%) might be excluded from the simple one-string consensus. In this study we presented an extended version of the HIV-1C consensus sequence that addressed amino acid diversity by combining the consensus sequence with the amino acid frequency data across the HIV-1C proteins. An extended consensus of the HIV-1C proteins should provide valuable information for vaccine design and for the potential monitoring of epitope immunity in vaccine efficacy trials (22).

MATERIALS AND METHODS

HIV-1 sequences.

Thirty-one sequences reported previously by us (37, 41) and others (21, 28, 44, 46) and 42 of 43 new, near-full-length genome sequences were used (Fig. 1 legend), for a total of 73 HIV-1 subtype C sequences. Each sequence represents a different patient. When multiple clones were available, one clone per patient was included in the analysis. All sequences used are nonrecombinant. For all 43 new sequences, phylogenetic analyses (data not shown) were performed to determine the subtype and to check for recombination across the entire genome: one sequence (00BW98MO35 [GenBank accession no. AY074891]) proved to be a C/Denv/C recombinant and was excluded from this study. All 43 new sequences were isolated in Botswana from 1996 to 2000 (as reflected in the sequence names): 2 isolates in 1996; 7 isolates in 1998; 5 isolates in 1999; and 28 isolates in 2000. Most isolates were from asymptomatic patients and were collected in Gaborone, except for a few isolates collected in Mochudi and Molepolole (reflected in the sequence names as “MC” and “MO,” respectively).

FIG. 1.

FIG. 1.

FIG. 1.

Phylogenetic relationship of near-full-length genome HIV-1 subtype C sequences. The identified HIV-1C lineages are shaded, and the corresponding nodes are indicated with an oval across the trees. The following sequences were included in the analysis: 9 isolates from Botswana described previously (accession number shown in parentheses)—96BW01B21 (AF110960), 96BW04.07 (AF110963), 96BW0502 (AF110967), 96BW06.J4 (AF290028), 96BW11.06 (AF110970), 96BW12.10 (AF110972), 96BW15B03 (AF110973), 96BW16.26 (AF110978), and 96BW17A09 (AF110979); 9 sequences from India—98IN022 (AF286232), 94IN476.104 (AF286223), 98IN012.14 (AF286231), 93IN.101 (AB023804), 94IN.11246 (AF067159), 95IN.21068 (AF067155), 93IN.301999 (AF067154), 93IN301904 (AF067157), and 93IN301905 (AF067158); 2 sequences from Zambia—96ZM651.8m (AF286224) and 96ZM751.3m (AF286225); 2 sequences from Tanzania—98TZ013.10 (AF286234) and 98TZ017.2 (AF286235); 2 sequences from Brazil—92BR025 (U52953) and 98BR004 (AF286228); 1 sequence from Ethiopia, ETH2220 (U46016); 1 sequence from Israel, 98IS002.5 (AF286233); 5 sequences from South Africa—97ZA012.1 (AF286227) plus 4 recently described sequences (52); and 42 newly generated sequences from Botswana. The consensus sequences are shown in black boxes and are designated as follows: 73C_cons is   the consensus of the entire set of 73 HIV-1 subtype C sequences, 51BW_cons is a consensus for a subset of 51 sequences from Botswana, 22nonBW_cons is a consensus for a subset of 22 non-Botswana sequences, 9IN_cons is a consensus sequence for 9 samples from India, and 5ZA_cons is a consensus for 5 sequences from South Africa. When multiple clones were available, one clone per sample was included in the analysis. (A) An MP tree is shown. The SIV CPZGAB (accession number X52154) was used as an outgroup. The numbers above or beyond the branches correspond to the number of changes between nodes and depict branches' lengths according to the scale at the bottom left. Bootstrap values obtained in MP and NJ analyses that were higher than 75% (at least by one of the methods, MP or NJ) for the delineated lineages are shown at the right of the tree. MPars, MP. (B) An ML tree is shown. The numbers above or beyond the branches correspond to the substitution per site and depict branches' lengths according to the scale at the bottom left. Abbreviations: BW, Botswana; ZA, South Africa; IN, India; ETH, Ethiopia; BR, Brazil; and IS, Israel.

Amplification, cloning, and sequencing.

Forty-two near-full-length genome sequences from patients in Botswana were amplified, cloned, and sequenced as described previously (41) with modifications (38, 40). Briefly, isolated peripheral blood mononuclear cells (PBMC) were frozen in cell-freezing medium and transported from Gaborone, Botswana, to Boston, Mass. Thawed PBMC were short-term cocultured with donor PBMC, followed by DNA extraction. An ∼9-kb amplicon was obtained by one round of long-range PCR using the 696-9690 primer set (14). After a gel extraction of the amplicon, cloning was performed using a TOPO XL PCR cloning kit (Invitrogen, Carlsbad, Calif.). DNA plasmid purification and both-strand sequencing were performed as described previously (38, 40, 41).

Multiple alignment procedures.

The alignment procedure applied in the phylogenetic study involved the application of ClustalX (version 1.81) (50) followed by manual alignment editing using BioEdit (23). Pairwise alignment parameters were set to the dynamic “slow-accurate” programming, using 10 as the gap opening penalty and 0.1 as the gap extension penalty. Multiple alignment parameters included a gap extension penalty equal to 0.2.

Phylogenetic distances.

The pairwise evolutionary distances of nucleotide alignment were computed by DNADist with the Kimura two-parameter model (PHYLIP: phylogeny inference package [versions 3.52c and 3.572c]; University of Washington, Seattle). Pairwise distances between translated amino acid alignments were performed by PROTDist with the PAM model (PHYLIP: phylogeny inference package [versions 3.52c and 3.572c]). By using sequence distances that weighted positions in the viral genome equally, our method implicitly assumed that all positions were equally important in determining vaccine protection. If information were available on the relative structural and immunological importance of positions, then it could be incorporated into the analysis.

Consensus sequences.

Consensus nucleotide sequences were obtained using BioEdit (23). Gaps were treated as a fifth residue. The threshold frequency for inclusion of a residue in a consensus sequence was 51%. Sites where no residue exceeded the threshold were scored as missing. Consensus amino acid sequences were obtained by translating nucleotide sequences, realigning codons using ClustalX (version 1.81) (50) followed by manual editing using BioEdit (23), and then computing the consensus sequences using Consensus (version 3; available from M. Essex). In Fig. 1 to 3 and 5 the consensus sequence 73C (or 73C_cons) is based on all 73 HIV-1 subtype C sequences, 51BW (or 51BW_cons) is based on 51 sequences from Botswana, 22nonBW (or 22nonBW_cons) is based on 22 non-Botswana sequences, 9IN (or 9IN_cons) is based on 9 sequences from India, and 5ZA (or 5ZA_cons) is based on 5 sequences from South Africa.

FIG. 3.

FIG. 3.

Nucleotide distances and corresponding statistics. Distances between samples are compared with distances to the consensus sequence. The boundary of the box closest to zero indicates the 25th percentile, a solid line within the box marks the mean value, a dashed line within the box shows the median, and the boundary of the box farthest from zero indicates the 75th percentile. Whiskers above and below the box indicate the 10th and 90th percentiles. Points above and below the whiskers indicate the 5th and 95th percentiles when the sample size permitted these calculations. The 95% CI and 99% CI for the between-sample analysis are not shown because the nonindependence of the pairwise distances invalidates the standard calculations. “Size” delineates the number of distances in each set or subset. (A) Distances among the entire set of 73 near-full-length HIV-1C genomes. (B) Distances among the subsets of near-full-length HIV-1C genomes. The consensus sequences are designated as follows: 51BW is a subset of 51 sequences from Botswana, 22nonBW is a subset of 22 non-Botswana sequences, 9IN is a subset of 9 samples from India, and 5ZA is a subset of 5 sequences from South Africa.

FIG.5.

FIG.5.

HIV-1C LTR promoter-enhancer region. Alignment of 73 HIV-1C nucleotide sequences and 5 consensus sequences demonstrates the promoter-enhancer region immediately downstream of the nef stop-codon TGA. 73C_cons is a consensus sequence for 73 subtype C sequences, 51BW_cons is a consensus of 51 isolates from Botswana, 22nonBW_cons is a consensus sequence for 22 non-Botswana subtype C sequences, 9IN_cons stands for 9 sequences from India, and 5ZA_cons is a consensus sequence of 5 viral isolates from South Africa. Sequences were compared with 73C_cons. Dashes across the alignment indicate identity, while periods denote gaps introduced to improve the alignment. The dashed boxes delineate NF-κB sites. NF-κB sites that do not conform to the GGGRNNYYCC consensus are shown as black boxes. Open boxes correspond to potential or prospective NF-κB sites, representing a region that does not comply with but is relatively close to the GGGRNNYYCC consensus. The number of NF-κB sites observed is shown in the last column; pluses denote a potential or prospective NF-κB site.

Phylogenetic trees.

The neighbor-joining (NJ) and maximum-parsimony (MP) methods were employed for tree-building using the program PAUP* 4.0b8 (Sinauer Associates, Sunderland, Mass.). The character-status summary of the phylogenetic analysis was as follows: (i) all 9,601 characters had equal weight, (ii) 3,955 characters were constant, (iii) 1,856 variable characters were parsimony uninformative, (iv) the number of parsimony-informative characters equals 3,790, and (v) gaps were treated as “missing.” Parameters in the MP analysis were set to default in the PAUP* 4.0b8 software (Sinauer Associates) except for the “addition of sequences,” which was set to “random.” An MP tree is shown in Fig. 1A. The numbers above or beyond the branches correspond to the number of changes between nodes and depict the branches' length according to the scale at the bottom left of Fig. 1A. To obtain a statistical estimate of the reliability of particular groupings, a bootstrapping was employed for both NJ and MP analyses using 100 replicates and resampling of all 9,601 characters in the alignment. Bootstrap values for the delineated groups across the phylogenetic trees that were higher than 75 (at least by one of the methods used) are shown at the right of Fig. 1A. The maximum-likelihood (ML) analysis was performed on Nirvana, a 2,048-processor cluster of SGI Origin 2000 systems, and on a dual-processor Pentium III computer. Access to Nirvana was provided by the Advanced Computing Laboratory, Los Alamos National Laboratory, Los Alamos, N.Mex. The ML analysis was performed using versions of fastDNAml (20, 42) and DNArates (G. J. Olsen, S. Pracht, and R. Overbeek, unpublished data [http://geta.life.uiuc.edu/∼gary/programs/DNArates.html]) modified as described in reference 26. After stripping sites coded as gaps, missing, or uncertain in the included consensus sequences (see below), the alignment was reduced from 9,601 to 7,664 sites that were used in the ML analysis. Custom rate and category matrices were computed from the alignment. As described previously (26), the ML analysis consisted of iterated pairs of runs: a tree run to compute a new tree and then a rate run to compute a new pair of rate and category matrices; each run was based on results of the previous run. The transition/transversion ratio was set to 2.1, based on extensive analyses of full-length sequences of HIV-1 group M (C. L. Kuiken and T. Bhattacharya, unpublished data). Ten categories were used. Twelve iterations were required for convergence of the ML score of the best trees computed from the tree runs. In all tree runs, the Jumble option (PHYLIP: phylogeny inference package, version 3.3; University of Washington) was turned on, and each tree run consisted of three parallel searches for the best tree. This allowed the search algorithm to cover more of the space of all possible trees. The best of the three trees resulting from each tree run was used in the subsequent rate run. The root of the ML tree was based on a permutation test using seven different sets of outgroup sequences (49). The best ML tree with the lowest score was drawn using TreeExplorer 2.12 (K. Tamura [http://evolgen.biol.metro-u.ac.jp/TE/TE_man.html]) (Fig. 1B). The simian immunodeficiency virus (SIV) CPZGAB (accession number X52154) was used as an outgroup in MP and NJ analysis. The SIV outgroup was omitted from the ML analysis. This was done because, in an initial ML analysis, the SIV CPZGAB outgroup was connected to the subtype C clade at a node that is considered to be deeply nested within the clade, based on an ML analysis of all available full-length sequences for HIV and SIV (U. R. Smith, unpublished data). Incorrect (and often unstable) placement of individual SIV sequences with respect to HIV group M subtypes was observed also by Korber and coworkers (26, 56). We recommend that, when performing an ML analysis of any part of HIV group M, individual SIV sequences should be used as the outgroup only with extreme caution.

Statistical analysis.

Statistical analysis and basic graphical delineation were done using Microsoft Excel 2000 software (Microsoft Corp.), Splus (version 6.0; Insightful Corporation, Seattle, Wash.), and SigmaPlot 2001 (SPSS Inc.). Additional graphical presentation was performed using Adobe Illustrator software (version 8.0). Two sample t tests were used to compare the mean values of distances to the consensus sequence between sets of sequences. Corrected chi-square tests were used to compare the frequency of sequences with three NF-κB sites between sets of sequences. To assess if mean diversity between samples was different from the mean distance to consensus, we developed a new test statistic that appropriately accounted for the nonindependence of pairwise distances. For a set of n sequences, the new statistic is defined as the difference in between-sample and to-consensus-sample means, d-bd-c, divided by the square root of the sum of the squared standard error of d-b and the squared standard error of d-c. The standard error of d-b cannot be calculated in the standard way, because many of the m = n(n − 1)/2 pairwise distances are not independent. To compute it, we first noted that, by a direct calculation, c of the m(m − 1) pairs of pairwise distances, where c = m(m − 1) − m(n − 2) (n − 3)/2, have a positive correlation due to a common sequence in the two distances. We assumed a common linear correlation and estimated it by Pearson's correlation, r, based on all possible pairs of pairwise distances with a shared sequence. Then, the standard error of d-b can be shown to equal the square root of [(1/m) + (c × r/m2)] multiplied by the standard deviation calculated using all pairwise distances. We compared the resulting Z statistic to a standard normal distribution to test the null hypothesis. For nucleotide distances, we calculated r to be 0.432, 0.350, 0.486, 0.696, and 0.869 for the 73 HIV-1C, 51BW, 22nonBW, 9IN, and 5ZA sequence sets, respectively. For protein distances, we calculated r to be 0.319, 0.299, 0.304, 0.413, 0.289, 0.420, 0.286, and 0.134 for the 72 or 73 HIV-1C Pol, Vif, Vpr, Tat, Rev, Vpu, Env, and Nef sequences, respectively.

Distribution of distance to consensus.

If the consensus sequence from a sample of n sequences (one per individual) in a geographic region is used in a vaccine targeted to the region, then it is of interest to assess the distribution of the distance from a randomly sampled sequence to the observed consensus sequence. The mean distance of sequences in the population to the observed consensus was estimated by the average distance of the samples to the consensus sequence. Using the 73 HIV-1C sequences, a 95% confidence interval (CI) for the mean distance was calculated under the assumption that the distances are an independent, identically distributed random sample from a normal distribution. In addition, the 80th and 95th percentiles of the distance to the observed consensus were estimated with the sample percentiles, and the nonparametric bootstrap percentile method was used to compute 95% CI for the percentiles.

Proximity of a new sequence to the consensus.

The probability that a randomly selected sequence in a population is within a certain threshold, D%, of the observed consensus sequence was estimated by the sample proportion of sequences within the threshold. A 95% CI for the probability was calculated using the normality approximation to a binomial random variable.

Extent of the consensus change.

We considered the extent of anticipated change in the observed consensus sequence that was built on n = 73 HIV-1C sequences if it is rebuilt including a new sequence. We addressed this by evaluating how much the distribution of the distance of a randomly sampled virus to the observed consensus sequence changes in response to the introduction of a new sequence. The mean amount by which the mean distance to the observed consensus changes was estimated by the sample mean of the n numbers di = x-[−i]x-|, i = 1,…, n, where x-[−i] is the sample mean distance to the observed consensus, which was calculated using the n sequences with the ith sequence removed, and x- is the sample mean distance to consensus for the full data set. A 95% CI for the mean change was calculated using the nonparametric bootstrap percentile method. In addition, the nonparametric bootstrap method was used to estimate 95% CIs for the mean amount by which the 80th and 95th percentiles of the distance to the observed consensus changes.

All tests were two tailed, and a cutoff P of 0.05 was used to judge statistical significance.

Accession numbers.

The 42 new, nonrecombinant HIV-1C nucleotide sequences from Botswana were deposited in GenBank under accession numbers AF443074 to AF443115.

RESULTS

Overview of newly generated HIV-1C sequences.

Together with previously described sequences (37, 40, 41), the 43 new sequences reported here bring the total number of near-full-length genome sequences from Botswana to 53, including two recombinants. The nonrecombinant sequences, all from different patients, are all subtype C, consistent with the 96.2% predominance of subtype C in Botswana. These sequences contained a number of previously described subtype C signatures (21, 41, 44, 46), namely, (i) an insertion at the N-terminal region of the Vpu (from 5 to 13 amino acids in 70.0% of isolates), (ii) a prematurely truncated second exon of Rev (9 amino acids; 100% of cases), (iii) a third NF-κB site seen in the majority of the isolates (see details below), and (iv) a 4-amino-acid extension at the C-terminal region of Pol in some sequences (21.4%).

Phylogenetic relationship of near-full-length HIV-1C.

The molecular phylogeny of the 73 nonrecombinant HIV-1C sequences was analyzed by applying MP, NJ (or distance), and ML methods. Figure 1A represents an MP phylogenetic tree that was found to be the most parsimonious tree and had a minimal score of 29,382 when gaps were treated as missing. Figure 1B shows an ML tree with a score of −91931.28. The MP and ML scores are not comparable.

Branch lengths in the MP and ML analyses are not easily compared. This is because each method uses a large number of sites not used by the other method. MP uses sites that contain gaps; ML does not. This difference may be responsible for the relatively longer “backbone” branches of the MP tree (Fig. 1A) versus the ML tree (Fig. 1B). ML uses sites that are nearly constant, differing for only one sequence; MP does not. The longer terminal branches in the ML tree (Fig. 1B), compared to the MP tree (Fig. 1A), are due in part to this difference. These sites, which represent nucleotide changes unique to a single sequence in the analysis, necessarily increase the length of the terminal branch for that sequence.

Although there are numerous well-supported “lineages” (a term that is used hereafter for discussion purposes and does not imply taxonomic standing) of sequences within HIV-1C, the backbone of the HIV-1C lineage itself is not well supported; hence, the tree topology is unstable. This instability is evident in the very different positions of lineages between the MP and ML trees (Fig. 1) and in the prevalence of relatively short branch lengths along the backbones of both trees. The sequences used in this study are nearly full-length genomes: these genomes can yield little or no further sequence data. Hence, to obtain a more-stable, reliable topology it will be necessary to find new genomes having sequences that help to support these unstable portions of the tree and/or to improve methods of phylogenetic analysis to make better use of the available sequences. For example, current ML analyses discard sites that have gaps for any sequences included in the analysis. This is unfortunate because many of these sites represent “indels” (insertions and deletions) outside the hypervariable regions of the HIV-1 genome, and many indels in other organisms are known to contain significant phylogenetic information.

Because a detailed phylogenetic analysis of HIV-1C sequence evolution was outside the scope of this study, we addressed only the general shape of the ML tree based on near-full-length genome HIV-1C sequences, as well as congruency among the ML, MP, and NJ trees. The basal pair of sequences found in the MP tree (96BW17A09 and 00BW1471.27 [Fig. 1A]) agrees with results of an explicit search for the root of the HIV-1C lineage (49). This result is reflected in the ML tree (Fig. 1B), which is assumed to have the same root as the MP tree. Note that the ML tree includes only HIV-1C sequences and that, in phylogenetic terms, all trees produced in this study are unrooted.

A number of well-supported lineages (shaded in Fig. 1) were identified within the 73 near-full-length genome HIV-1C sequences. All these lineages occur in the best trees generated by all three phylogenetic methods (Fig. 1). Topology of the samples grouped within lineages was nearly identical in MP, NJ, and ML trees (sequence 00BW1811.3 was within the 00BW2128.3-96BW16.26-00BW1773.2-96BW0502 lineage in the ML but not the NJ and MP trees). Generally the bootstrap values were highly consistent by the MP and NJ analysis, although in one case (samples 96BW0407 and 98BWMO18.d5) a relatively high bootstrap value of 83 for a lineage in the NJ tree was not supported in the bootstrap analysis of the MP tree (value of 57 only). Overall, 57 of 73 (78.1%) HIV-1 sequences were found within lineages. Eight of the identified lineages were composed of two isolates, two lineages contained three sequences, three lineages embraced four isolates, one lineage included five, and two lineages consisted of eight sequences. Eight of nine samples from India and four of five samples from South Africa formed separate lineages, apparently demonstrating relevance to the phylogenetic founder effect. Sequences from Brazil, Ethiopia, and Israel were found in one lineage, which suggested a phylogenetic relatedness of these geographically distinct isolates. The Botswana sequences were scattered across the phylogenetic tree, with 39 viral isolates (76.5%) that formed 13 distinct lineages supported by high bootstrap values (Fig. 1). The Botswana lineages did not include any of the 22 non-Botswana sequences, and none of the Botswana sequences were part of any of three non-Botswana lineages. A number of sequences outside the lineages demonstrated similar topologies in MP and ML trees (00BW3886.8, 94IN476.104, and 00BW1783.5). In the MP analysis, neither the content of each lineage nor the overall topology of the best tree was changed when additional outgroup sequences were introduced to the alignment (HIV-1 subtype A, Q23-17, accession number AF004885; HIV-1 subtype B, HXB2CG, accession number K03455; data not shown).

Having addressed a phylogenetic relationship among consensus sequences, we compared the topology of the consensus sequence that represented the entire set of the 73 available HIV-1C sequences in the study to the topology of the consensus sequences that represented different subsets comprising 51 Botswana isolates, 22 non-Botswana sequences, 9 sequences from India, and 5 sequences from South Africa. The origin of the consensus sequences was reflected in their phylogenetic relationships (Fig. 1). While an observed closeness of the consensus sequences to the tree root in MP (Fig. 1A), NJ (data not shown), and ML (Fig. 1B) trees was not unusual, note that the consensus sequences were placed closer to the root of the tree than was any particular sequence from which they originated. As expected, the consensuses of the Indian and South African subsets clustered within the corresponding groups of sequences. Interestingly, a consensus sequence of the 22 non-Botswana isolate subsets, which included both subsets from India and South Africa, was closer to the root of the tree than was 5ZA_cons or 9IN_cons. An obvious nearness between consensuses representing 51 sequences from Botswana and the entire set of 73 HIV-1C sequences was not surprising, because Botswana sequences were dominant in the entire set. However, a relative closeness between consensuses of 51 sequences from Botswana and 22 non-Botswana sequences was striking. Assuming that sequence homology between the vaccine candidate and the infecting or challenging virus is essential, the overall topology of the consensus sequences and their phylogenetic relationship with corresponding sequences might suggest that candidate AIDS vaccines incorporating the consensus sequence have a greater potential than those incorporating any particular isolate sequence.

In terms of pairwise genetic distances between consensus sequences, the entire set of 73 HIV-1C was almost identical to the 51BW set (Fig. 2). This result is due to the predominance of Botswana sequences in the entire set. Nucleotide distance between 51BW and 22nonBW consensuses was only 0.48%. The consensus sequences for nine sequences from India and five sequences from South Africa differed from the 51BW consensus by 2.08 and 1.84%, respectively, while the distance between Indian and South African consensus sequences was 3.5%. Overall, pairwise genetic distances (Fig. 2) demonstrated a remarkable closeness between consensus sequences of different HIV-1C subsets.

FIG. 2.

FIG. 2.

Nucleotide distances between consensus sequences of 73 near-full-length HIV-1C genomes and subsets within HIV-1C. The consensus sequences are designated as follows: 73C is the consensus of the entire set of 73 HIV-1 subtype C sequences, 51BW is a consensus for a subset of 51 sequences from Botswana, 22nonBW is a consensus for a subset of 22 non-Botswana sequences, 9IN is a consensus sequence for 9 samples from India, and 5ZA is a consensus for 5 sequences from South Africa.

HIV-1C genetic distances.

The pairwise nucleotide distances and corresponding statistics among the entire set of 73 near-full-length genome HIV-1C sequences are shown in Fig. 3A. While the mean value of diversity between samples reached 8.93% (range, 2.9 to 11.7%), the diversity to the consensus sequence was significantly lower, with a mean value of 4.86% (range, 3.3 to 7.2%; P < 0.0001). The results of a similar analysis and accompanying statistical data among subsets of sequences are presented in Fig. 3B. The diversity between samples was significantly higher than the distances to the consensus sequences in all groups analyzed (P < 0.0001 for 51BW and 22nonBW, P = 0.0049 for 9IN, and P = 0.0076 for 5ZA), highlighting the notion of a potential advantage for the consensus sequence to be used in the vaccine design. It is noteworthy that the subset of sequences from Botswana (51BW) demonstrated the highest level of pairwise diversity between samples (mean value of 9.17%), as well as distances to the consensus sequence (mean value of 4.92%) that were significantly higher than the corresponding mean value among subsets of 22nonBW, 9IN, and 5ZA (t tests comparing mean distances to consensus: P = 0.0014 for 22nonBW and P < 0.001 for 9IN and 5ZA).

HIV-1C Gag extended consensus.

Figure 4 delineates the extended version of the consensus sequence for the HIV-1C Gag p17, p24, and p2/p7/p1/p6 based on the amino acid alignment of 73 subtype C sequences. The invariable amino acid residues within the HIV-1C Gag p17 were observed at 37 out of 129 positions (28.7%). In addition to the invariable amino acids, 57 positions (44.2%) across HIV-1C Gag p17 were relatively conserved by showing less than 10% diversity at a particular amino acid residue in the consensus sequence. The number of variable residues in the consensus that had a frequency of 90% or less was 35 (27.1%), which was at the level of invariable amino acids. The most variable amino acid residues that had frequencies of less than 50% in the consensus sequence were seen at four positions within p17. The alternative characteristics of amino acid residues (i.e., by charge) at positions 15 (K46, T30, A12), 90 (E36, A27, K21), 91 (G38, K29, N11, E10), and 119 (K45, E44) might suggest a substantial difference in the biological properties of different p17 proteins (subscripts indicate the percentage of a residue's frequency at that particular position in the alignment). Positions 15, 28, 62, 90, 91, 93, 111, 115, 118, and 119 might be considered the most variable within p17 by virtue of accommodating from seven to nine different amino acids at each position. A few sequences demonstrated indels in the C-terminal part of p17.

FIG. 4.

FIG. 4.

FIG. 4.

FIG. 4.

Extended consensus of the HIV-1 subtype C Gag. Consensus was built based on the 73 near-full-length HIV-1C genome sequences. A horizontal string of amino acid residues represents a consensus sequence. Columns of amino acid residues are accompanied by the percentage of their frequency at a particular position in the alignment (shown as a subscript). Dashes denote gaps introduced to improve alignment. Mutations that resulted in frameshifts and/or stop codons are indicated by an X. Open boxes highlight variable positions with 10% and higher diversity in the consensus sequence. Shaded boxes represent insertions that were seen among the minority of samples. There are two numbering systems used: (i) a sequential numbering of amino acid residues in the HIV-1C consensus sequence as a scale with plain numbers above the consensus and (ii) the HXB2 numbering system (27a), shown as numbers with asterisks in brackets. Numbering according to the HXB2 numbering system (27a) does not necessarily correspond to the sequential numbers of amino acid residues in the HIV-1C consensus sequence. (A) Extended consensus of HIV-1C Gag p17. (B) Extended consensus of HIV-1C Gag p24. (C) Extended consensus of HIV-1C Gag p2/p7/1/6.

The more conserved p24 protein demonstrated a lesser extent of amino acid variation (Fig. 4B). One hundred forty invariable amino acid residues (60.6%) and 69 relatively conserved amino acids that had less than 10% variation (29.9%) determined a strong conservation of the HIV-1C p24. However, 22 amino acid positions (9.5%) showed equal or more than 10% diversity in the consensus sequence of HIV-1C Gag p24. Although amino acids at some positions (252, 256, and 357 in Gag) were close to the frequency of 50% in the consensus sequence, only one (position 223 of Gag, V47, I46, N4, and A3) has reached a frequency lower than 50%. In contrast to the variations in p17, substitutions within HIV-1C p24 occurred between relatively similar amino acids, i.e., I85 and V15 at positions 159 (I-to-V, or I159V), V223I, E230D, V256I, D260E, K286R, D312E, R335K, T342S, etc. No indels were found within the HIV-1C p24.

As shown in Fig. 4C, HIV-1C Gag p2/p7/p1/p6 was comprised of 45 invariable amino acid residues (34.2%), 56 relatively conserved residues with less than 10% diversity in the consensus (42.4%), and 31 variable residues with a frequency of 90% or less in the consensus sequence (23.5%). The frequency of amino acid residues at positions 372 and 373 was less than 50% in the consensus sequence: N49, S40, —8, T1, Q1, G1 and T42, —16, A15, I10, V5, P5, S3, M3, G1, respectively (where “—” denotes a gap introduced to improve alignment). The asparagine and serine at position 451 were observed at the frequency of 50% and 47%, respectively. The positions 373, 389, and 478 were represented by eight or nine different amino acid residues each. Although the insertions were not rare across p2/p7/p1/p6, most amino acid residues in the insertions were seen at low frequency. However, the frequency of some amino acid residues within the insertion between positions 455 and 456 reached the level of 30% (Fig. 4C).

Table 1 displays the amino acid distances across HIV-1C Gag, p17, p24, and p2/p7/p1/p6 for the entire set of 73 HIV-1C sequences (73C) and for the subsets. A high diversity between samples in the p17 (mean, 14.8%) and in the p2/p7/p1/p6 (mean, 12.7%), together with a low diversity within the p24 region (mean, 5.1%), resulted in the overall mean diversity of 9.5% across the entire HIV-1C Gag. The amino acid diversity among isolates from Botswana was slightly higher than the diversity among non-Botswana samples for the entire Gag and its subregions. Statistical significance of the differences depended on the presence of sequences from India in the group of non-Botswana samples, which changed from highly significant to nonsignificant if sequences from India were excluded. No significant differences were found between the subsets of Botswana and non-Botswana sequences for the amino acid distances to the consensus sequences (P > 0.10). However, the subset of sequences from Botswana demonstrated significantly higher amino acid diversity than the subset from India for the entire HIV-1C Gag and for any of the Gag regions in both between-sample (P < 0.0001) and to-consensus (P < 0.0001) comparisons. The subset of HIV-1C sequences from South Africa showed lower diversity within the p24 region than Indian sequences, although the difference was not statistically significant (P = 0.40).

TABLE 1.

HIV-1C Gag diversity, expressed as percent amino acid distances across Gag, p17, p24, and p2/p7/p1/p6a

Sequence subset Gag
p17
p24
p2/p7/p1/p6
Mean % distance ± SD (range)
P Mean % distance ± SD (range)
P Mean % distance ± SD (range)
P Mean % distance ± SD (range)
P
Between samples To consensus Between samples To consensus Between samples To consensus Between samples To consensus
73C 9.5 ± 1.8 (2.6-15.4) 5.0 ± 1.3 (2.7-8.3) <0.0001 14.8 ± 4.1 (1.0-32.3) 7.4 ± 2.8 (2.4-15.1) <0.0001 5.1 ± 1.5 (0.4-10.4) 3.0 ± 1.1 (0.8-6.4) <0.0001 12.7 ± 3.9 (0.9-29.9) 6.5 ± 2.9 (1.6-14.6) <0.0001
51BW 9.7 ± 1.6 (5.3-14.7) 5.4 ± 1.0 (3.8-8.0) <0.0001 15.2 ± 3.9 (3.1-29.2) 8.0 ± 2.8 (3.2-16.0) <0.0001 5.1 ±1.4 (0.4-10.0) 3.1 ± 1.1 (1.1-5.9) <0.0001 13.1 ± 3.6 (2.3-29.9) 7.4 ± 2.7 (2.3-16.3) <0.0001
22nonBW 8.6 ± 2.5 (2.6-14.6) 4.7 ± 1.9 (2.2-8.9) <0.0001 13.3 ± 4.9 (1.0-27.5) 7.2 ± 3.8 (1.6-16.9) 0.0003 4.7 ± 1.6 (0.5-9.6) 2.7 ± 1.3 (1.2-5.1) 0.0003 11.6 ± 4.6 (0.9-22.5) 6.3 ± 3.5 (0.8-14.8) 0.0007
9IN 5.9 ± 1.9 (2.6-9.1) 3.2 ± 1.4 (1.8-6.0) 0.018 7.7 ± 3.1 (1.0-13.6) 3.9 ± 2.6 (0.8-9.1) 0.046 4.2 ± 1.5 (1.2-7.3) 2.2 ± 1.2 (0.5-4.2) 0.028 7.3 ± 3.2 (0.9-13.1) 3.9 ± 2.0 (1.6-7.3) 0.069
5ZA 6.7 ± 1.5 (3.6-8.9) 3.4 ± 1.4 (1.4-5.3) 0.014 11.2 ± 3.1 (5.5-13.8) 5.7 ± 3.2 (0.8-8.9) 0.052 3.7 ± 1.1 (2.1-5.9) 1.6 ± 1.3 (0.6-3.4) 0.044 8.1 ± 2.8 (4.6-13.3) 4.2 ± 2.2 (2.3-7.8) 0.11
a

The entire set of 73 HIV-1C sequences is shown as 73C. The subset of 51 sequences from Botswana is designated 51BW, that of 22 non-Botswana sequences is designated 22nonBW, that of 9 isolates from India is designated 9IN, and that of 5 sequences from South Africa is designated 5ZA. Computation of the consensus sequences for the entire set (73C) or for the subsets (51BW, 22nonBW, 9IN, and 5ZA) is described in Materials and Methods. Distances between samples are compared with the distances to the consensus sequence. The P values indicate the significance of the test comparing between-sample diversity to the mean to-consensus diversity.

For the 73 HIV-1C sequences, the 51BW sequences, and the 22nonBW sequences, the mean amino acid diversity measured between samples was significantly higher than the mean distance measured to the consensus sequence for all Gag regions (P < 0.001 in each test). For the 9IN sequences there were significant differences for Gag (P = 0.018), p17 (P = 0.046), and p24 (P = 0.028), but not for p2/p7/p1/p6 (P = 0.069). Similarly, for the 5ZA sequences there were significant differences for Gag (P = 0.014) and p24 (P = 0.044), with a trend toward significance for p17 (P = 0.052) and for p2/p7/p1/p6 (P = 0.11). We believe that the lack of statistical significance in some subregions was due to a small sample size and propose that between-sample diversity in the populations of sequences is consistently larger than the to-consensus diversity in all Gag subregions. Overall the diversity to the consensus was 1.9 times lower (range, 1.6 to 2.3) than the diversity between sequences for the entire HIV-1C Gag and its subregions.

Amino acid diversity within other HIV-1C proteins is shown in Table 2 together with P values and ratios for the comparison of between-sample and to-consensus distances. The between-sample distances ranged from 6.42% in Pol to 25.20% in Vpu, while to-consensus sequences were in the range of 3.45% in Pol to 13.68% in Vpu. Importantly, to-consensus distances were significantly lower than between-sample distances for each HIV-1C protein. Amino acid diversity in Env demonstrated the highest ratio of 2.4 in comparison of between-samples and to-consensus distances. Table 3 summarizes the diversity among lineages across HIV-1C proteins. For most cases amino acid distances are shorter than the average distances for the entire set of 73 HIV-1C, although there are a number of examples of equal or even higher diversity within lineages.

TABLE 2.

HIV-1C diversity, expressed as percent amino acid distances in various proteinsa

Protein Mean % distance ± SD (range)
Ratio
Between samples To consensus
Pol 6.42 ± 1.01 (0.15-10.17) 3.45 ± 0.72 (1.89-5.32) 1.9
Vif 11.62 ± 2.61 (0.54-22.06) 6.56 ± 1.97 (2.75-12.27) 1.8
Vpr 11.94 ± 3.45 (1.04-24.53) 6.59 ± 2.60 (0.98-12.62) 1.8
Tat 18.27 ± 5.36 (4.15-39.11) 9.93 ± 4.54 (0.65-24.27) 1.8
Rev 17.35 ± 5.98 (2.73-46.84) 9.33 ± 3.94 (1.63-26.40) 1.9
Vpu 25.20 ± 6.89 (2.45-49.97) 13.87 ± 5.46 (4.92-26.35) 1.8
Env 20.02 ± 2.28 (7.43-26.35) 8.32 ± 1.62 (4.81-12.72) 2.4
Nef 18.56 ± 3.28 (5.11-30.57) 11.43 ± 2.18 (7.32-17.76) 1.6
a

The Pol, Vif, Vpr, Env, and Nef amino acid distances were computed based on the entire set of 73 HIV-1C sequences. The Tat, Rev, and Vpu amino acid distances were computed based on the set of 72 HIV-1C sequences (clone 00BW2128.3 was excluded from the analysis because of the deletion spanning the first exons of Tat and Rev and the N terminus of Vpu). Consensus sequences were built as follows: those of Pol, Vif, Vpr, Env, and Nef were based on 73 HIV-1C, while those of Tat, Rev, and Vpu were based on 72 HIV-1C (clone 00BW2128.3 was excluded). Distances between samples are compared with distances to the consensus sequence. The P values indicate the significance of the test comparing between-samples diversity to the mean to-consensus diversity. The last column shows the ratio of the mean value of distances between samples to the distances to the consensus sequence. P < 0.0001 in all cases.

TABLE 3.

Protein diversity of lineages across the HIV-1C proteinsa

Sequences in HIV-1C lineage Amino acid diversity
Gag Pol Vif Vpr Tat Rev Vpu Env Nef
98IN022, 93IN.101, 94IN.11246, 93IN301904, 93IN301905, 95IN.21068, 93IN.301999, 98IN012.14 5.4 3.6 6.2 8.1 10.2 9.4 10.7 13.5 11.1
ZA.Du151, ZA.Du.422, 97ZA012.1, ZA.Du.179 6.7 4.8 8.1 4.8 12.0 20.4 14.6 16.9 11.6
96BW06.J4, 98BWMC14.a3, 00BW1686.8, 00BW3871.3, 00BW1921.13 7.3 4.7 7.7 11.3 19.0 14.7 19.1 18.5 14.8
96BW0502, 00BW1773.2, 96BW16.26, 00BW2128.3 7.8 5.4 12.1 11.9 13.9b 12.8b 25.5b 17.4 20.6
96BW15B03, 00BW2036.1, 00BW3842.8 7.5 3.9 7.2 10.6 12.8 7.3 16.6 17.3 12.6
99BW4642.4, 00BW1859.5 5.5 4.0 7.3 6.5 18.8 7.4 8.2 13.7 12.4
98BWMC12.2, 00BW0874.21 8.6 5.0 9.1 4.3 20.0 14.2 22.2 14.9 13.3
99BW3932.12, 00BW2127.214 9.6 5.3 9.7 7.8 15.1 15.4 13.5 20.1 18.4
99BW4745.8, 00BW3891.6 8.7 4.7 10.0 10.5 14.7 8.8 26.4 20.2 11.2
98BWMO37.d5, 00BW3970.2 7.4 4.9 10.2 12.6 12.1 18.5 22.7 22.2 14.7
96BW1210, 99BWMC16.8 9.9 7.1 10.8 5.5 22.2 18.2 35.4 17.4 14.2
96BW0407, 98BWMO18.d5 8.4 7.0 10.3 10.3 13.9 6.4 18.0 20.3 14.4
98BWMO14.10, 00BW2087.2 8.7 1.8 12.2 12.7 14.5 14.5 25.2 16.4 12.8
ETH2220, 98IS002.5, 92BR025, 98BR004 12.0 5.1 9.9 9.4 15.1 18.2 26.1 18.2 16.5
96BW11.06, 96BWMO1.5, 00BW1795.6, 98BWMC13.4, 00BW2063.6, 99BW4754.7, 00BW0762.1, 00BW1880.2 7.8 6.3 9.9 9.3 14.4 11.2 20.7 16.3 18.8
96BW17A09, 00BW1471.27 5.3 3.6 10.0 4.4 9.6 10.1 13.5 14.9 13.0
    73 HIV-1Cc 9.5 6.4 11.6 11.9 18.3b 17.4b 25.2b 20.0 18.6
a

Amino acid diversity was calculated as the mean pairwise sequence distances within lineages for each HIV-1C protein. Actual amino acid distances represent lineages that include only two sequences.\

b

Clone 00BW2128.3 was removed from the analysis because of the deletion spanning the first exons of Tat and Rev and the N terminus of Vpu.\

c

This row demonstrates amino acid distances for the entire set of 73 sequences across the HIV-1C proteins (72 for Tat, Rev, and Vpu).

We addressed the predictive power of the consensus sequence for new hypothetical (not-yet-sequenced) HIV-1C isolates. Assuming that sequence homology between the vaccine candidate and the infecting virus is essential for the vaccine design, it might be important to know (i) the distribution of the distance of a randomly sampled sequence to the observed consensus (i.e., how far do individual sequences tend to be from the observed consensus sequence?); (ii) the probability that an arbitrary new sequence in the population will be within a certain threshold, D%, of the observed consensus sequence; and (iii) the degree of perturbation of the observed consensus sequence if a new sequence is used in its recalculation (i.e., how does the distribution of the distance of a randomly sampled virus to the observed consensus sequence change as a result of the introduction of the new sequence?). If the consensus sequence is used for the vaccine antigen, then analyses of points i, ii, and iii address the extent of vaccine coverage.

Based on the set of 73 HIV-1C sequences, we estimated that the mean distance to the observed consensus sequence was 4.86%, with a 95% CI of 4.69 to 5.02%. The estimated 80th percentile was 5.45% (95% CI, 5.21 to 5.52%), and the 95th percentile was 5.80% (95% CI, 5.51 to 6.79%). Thus, on average HIV-1C sequences are within about 5% of the observed consensus, and with high confidence, 95% of viruses in the sampled population are within 6.79% of the observed consensus sequence.

The observed proportion of the 73 HIV-1C sequences within D% of the observed consensus, with D being equal to 4, 5, 6, and 7%, respectively, was 8.22% (95% CI, 1.92 to 14.52%), 54.79% (95% CI, 43.48 to 66.21%), 97.26% (95% CI, 93.52 to 100.00%), and 98.63% (95% CI, 95.96 to 100.00%). Thus, fewer than 15% of HIV-1C viruses are inferred to be within 4% of the observed consensus, between 43% and 66% of viruses are inferred to be within 5%, and at least 96% are inferred to be within 7% of the observed consensus.

We found that a single sequence does not appreciably alter the distribution of distances to the observed consensus. Based on the 73 HIV-1C viruses, we calculated that the mean of the 73 numbers di = |-x[−i]-x| was 0.135%. The bootstrap 95% CI about the mean change in the mean distance to the observed consensus was 0.121 to 0.159%. The estimated mean change in the 80th percentile of the distance to the observed consensus was 0.126% (95% CI, 0.111 to 0.135%), and for the 95th percentile the estimated mean change was 0.117% (95% CI, 0.075 to 0.197%).

HIV-1C LTR promoter-enhancer region.

Figure 5 depicts the extent of conformation to the GGGRNNYYCC consensus within NF-κB sites among the 73 HIV-1C sequences and 5 consensus sequences. The number of NF-κB sites within the promoter-enhancer region of HIV-1C varied from one (isolate 00BW1880.2) to three or more (isolates 96BW0502 and 96BWMO3.2), while the consensus sequences for the entire set or any subset of HIV-1C isolates demonstrated three NF-κB sites that conformed to the GGGRNNYYCC consensus. Within the subsets the HIV-1C sequences from Botswana demonstrated three NF-κB sites in 32 of 51 (62.7%) of cases, while only 10 of 22 (45.5%) non-Botswana sequences had the third NF-κB site (P > 0.10 [chi-square test]). The potential or prospective NF-κB sites represent a region that does not comply with the GGGRNNYYCC consensus but, in fact, is relatively close to it and might become a new NF-κB site due to a few point mutations. Viral isolates from Botswana that did not have three NF-κB sites demonstrated a potential/prospective NF-κB site more often than non-Botswana subtype C sequences (25.5 versus 4.5%, respectively; P < 0.001), which suggests that the promoter-enhancer region might be a hot zone within the evolving HIV-1C.

DISCUSSION

Although correlates of immune protection are still unknown, there is steady progress toward a better design of an AIDS vaccine, and there is optimism concerning the benefits a vaccine could provide for the control of HIV infection, reduced transmission, and prevention of the development of the disease. Studies of nonhuman primates demonstrated vaccine protection against a potentially pathogenic SIV or SHIV (reviewed in references 4, 25, and 35). Examples of cross-protection between HIV-1 and HIV-2 (3) or SIV and SHIV (5, 8, 13, 16, 29, 32, 48, 54, 55) with the macaque monkey experiments suggested that a high level of viral diversity might be overcome and that viral variability may not necessarily preclude effective AIDS vaccine development (3), while viral virulence might also impact vaccine protection (34). However, while it is obvious that heterologous cross-challenge provides protection compared with naïve controls, an advantage of heterologous versus homologous challenge or infection is not evident. The extent of vaccine cross-protection needs to be further assessed by addressing heterologous versus homologous challenge (perhaps, in a series of compatible experiments employing primate lentivirus models). How the HIV-1 subtype specificity is relevant to vaccine protection and whether a mismatch between vaccine and challenge virus could confer better immune protection than a complete match between vaccine and infecting virus are still questions to be addressed. Taking a similar methodology in the design, identity of delivery vehicles (equivalent potency at eliciting immune responses), and comparable immunization protocols, it seems unlikely that a mismatched vaccine could provide a better breadth, strength, and durability of immune response than a specific vaccine. If this assumption is true, sequence homology between the vaccine candidate and the infecting or challenging virus may be an important consideration for selecting the antigenic component in the design of an AIDS vaccine.

From the point of view of vaccine design strategy, a preferred homology between the vaccine candidate and the circulating viruses might be a hard task to achieve, and a differential approach to an AIDS vaccine formulation might be required for different geographic areas based on particular molecular epidemiological data. For example, the predominance of HIV-1C in the southern African epidemic might be a strong argument for a subtype C specific vaccine design for the southern African countries.

Our results of the phylogenetic analysis of 73 near-full-length HIV-1C genomes, taken together with the presumed homology between the vaccine and the circulating or infecting virus, suggest that a consensus sequence approach to vaccine design could surmount the high viral diversity. Results obtained in the distance analysis on both the nucleotide (Fig. 3) and the translated amino acid (Tables 1 and 2) levels across the entire genome of HIV-1C and each viral protein convincingly justify the rationale for the consensus-based vaccine, although the concept might await evaluation in an efficacy trial.

The cumulative genetic information of a relatively large number of near-full-length genome sequences had sufficient power to segregate HIV-1 subtype C into multiple lineages. It is worth mentioning that, recently, lineages within HIV-1C have been found also among incomplete (i.e., env C2-V5 or C2-V3) sequences from India (47) and Ethiopia (1, 2). In this study, within the 73 near-full-length genome sequences of HIV-1C, numerous lineages were supported by (i) high bootstrap values by both MP and NJ methods, (ii) nearly identical content and topology of the lineages in MP, NJ, and ML analyses, and (iii) shorter pairwise distances between sequences within a lineage. In addition, certain of these lineages were further supported by (iv) accord with geographic area (Indian lineage, South African lineage), (v) consistent topology when different outgroups were used, and (vi) consistent topology when some sequences were excluded.

It is important to recognize that the appearance of lineages within HIV-1 subtype C might depend on the sampling. For example, without clone 94IN476.104, the two sequences 98BWMO37.d5 and 00BW3970.2 would have shared 208 (107 + 101) sites and could have received 100% rather than 90% bootstrap support. Some lineages had a short branch length at the base and might be unstable (e.g., collapse upon adding new isolates or disintegrate within shorter regions). Furthermore, based on data that were generated for a subset of sequences from Botswana (39), the lineages demonstrated no specific patterns related to the sequence segregation by viral load, CD4/CD8 counts, and/or HLA class I types. The number of NF-κB sites, the size of the insertion at the N terminus of the Vpu, or the extension at the C terminus of the Pol could not be assigned either independently or collectively for the lineages within HIV-1C.

New lineages might be identified in the future. For example, all but one Indian sequence and all but one sequence from South Africa form separate lineages. The out-of-main-group sequences from India (94IN476.104) and South Africa (ZA.CTSc2) proved that there were at least two distinct lineages of HIV-1C in India and in South Africa. In fact, some new HIV-1C sequences from India form additional lineages within the subtype C tree (49). Perhaps every sequence outside of an identified lineage might be seen as a potentially underrepresented lineage. Additionally, unidentified lineages (i.e., that cannot yet be distinguished due to a small sample) might fill their niche in the HIV-1C tree; also, new lineages might evolve within the HIV-1C.

An assumption about the beneficial effect of sequence homology between vaccine candidate and infecting virus raises a few issues related to the HIV-1C lineages. First, should a vaccine include different lineage sequences? If the epidemic is represented by multiple lineages, like the epidemic in Botswana, this may be worthwhile. Second, a representative consensus sequence as a vaccine for southern Africa and/or India might be a better choice irrespective of the number of lineages identified. Third, the notion of lineages within HIV-1C should be adjusted for the probability that available viral sequences adequately represent circulating viruses in a given geographic area. Fourth, a consensus approach to an AIDS vaccine design might overcome lineage clustering within HIV-1C. Finally, assuming that the available sequences accurately represent the epidemic, a superconsensus based on the consensus sequences of lineages instead of individual isolates might be a more appropriate candidate for the vaccine (B. Korber, personal communication).

The evaluation of the predictive power of the HIV-1C consensus sequence provided additional evidence for the beneficial use of the consensus-based vaccine. Five percent was found to be the average deviation of HIV-1C sequences from the consensus sequence. Moreover, according to the prediction made based on the present data, a new sequence would not alter the distribution of distances to the consensus sequence. Reiteration of the results of the sequence relationships to the consensus, the high probability that a new sequence will be within the identified consensus sequence, and the consensus consistency and stability to the introduction of a new isolate strongly suggest that a vaccine construct based on a HIV-1C consensus sequence could be superior to any particular viral isolate.

Despite potentially surmounting high viral diversity and the demonstrated high predictive power of the HIV-1C consensus sequence, the immunogenicity and protection efficiency of the consensus-based vaccine needs to be assessed in further studies. As a concept, a consensus sequence approach to the vaccine design and development may contain certain limitations. Covariability that was described previously within the HIV-1 V3 loop (7, 27) implies that amino acid substitutions across HIV-1 proteins are not independent. Use of the consensus sequence for the vaccine design might face the problem of generating artificial constructs that do not occur in wild-type virus. Apparently, this problem could be overcome by narrowing the set of residues and by preventing the inclusion of residue combinations that rarely occur in the population. The vaccine constructs could be adjusted for the optimal expression and proper folding of the consensus-based protein. If the vaccine incorporates multiple copies of numerous viral variants that correspond to the minor amino acids in the extended consensus (eCons), the size of such a hypothetical vaccine construct could be dramatically excessive, and it would be unrealistic to generate this kind of vaccine product. Perhaps, if some particular regions within HIV-1C proteins could be identified as promising candidates for the vaccine, a reasonable vaccine construct could be limited to these immunodominant and subdominant regions and could include multiple variants of protein regions to cover the variability of the virus, as well as potentially prevent viral escape from the immune recognition. Further studies should address the feasibility of this approach in detail. Additionally, if a vaccine efficacy trial could provide the data needed for selecting the weights optimally based on protective efficacy, the information on the relative structural and immunological importance of particular positions across the sequence could be incorporated straightforwardly by weighting the distance measures. Thus, the concept of the consensus-based vaccine needs to be tested in regards to the correlates of immune protection.

The sequence summary of HIV-1C proteins was presented in the form of eCons sequence by ranking amino acid frequencies across the proteins and by highlighting the prevailing amino acid residues at each position. The eCons sequence might be seen as a halfway point between the traditional alignment and the simple one-string consensus sequence. While alignment of sequences is a valuable tool for detailed phylogenetic analysis, the eCons might be a more convenient instrument for the analysis of a relatively sizable genomic region of a large sample set. The eCons sequence has also overcome the simplicity of the commonly used one-string consensus, which is good for relatively conserved proteins but is less informative for the variable regions. The eCons might be helpful for analysis in which the frequency of amino acids at a particular position is critical, i.e., characterization of immunodominant regions or epitopes. Moreover, the eCons can be used in the vaccine design by launching multiple copies of viral variants into vaccine constructs, which might increase vaccine coverage for exposure to naturally occurring viruses. A vaccine based on an eCons sequence might theoretically shift the distribution of genetic distances of viruses in the population to the vaccine sequence toward zero and considerably reduce the average deviation of the ordinary consensus sequence. Available via the Internet (http://www.aids.harvard.edu/lab_research/concensus_sequence.htm), an eCons of the HIV-1C proteins might be a useful reference for vaccine formulation, as well as for the generation of synthetic peptides and other reagents to be used in assessing immune responses in relation to potential vaccine efficacy.

In summary, we examined the molecular phylogeny patterns of the HIV-1C epidemic based on near-full-length genome sequences. Most of the analyzed sequences represented southern Africa, a region with the most severe HIV epidemic in the world. A number of lineages were identified within HIV-1C. A consensus approach to vaccine design is suggested to potentially overcome the high genetic diversity of HIV-1C. A generated, extended version of the consensus sequence for all HIV-1C proteins might be a useful tool for vaccine design and to monitor vaccine trials.

Acknowledgments

We thank the Botswana Ministry of Health for encouragement; Bette T. Korber for critical analysis; Sergei Novitsky for the program Consensus; E. Sepako, G. Sebetso, N. Monametsi, and Y. Wu for sample processing and HIV-1 diagnostics; personnel of the National Blood Transfusion Center in Botswana for collaboration; and Chanc E VanWinkle for editorial assistance.

This research was supported in part by grants AI47067, AI43255, and HD37793 from the National Institutes of Health and grant TW00004 from the Fogarty International Center, National Institutes of Health.

REFERENCES

  • 1.Abebe, A., V. V. Lukashov, T. F. Rinke De Wit, B. Fisseha, B. Tegbaru, A. Kliphuis, G. Tesfaye, H. Negassa, A. L. Fontanet, J. Goudsmit, and G. Pollakis. 2001. Timing of the introduction into Ethiopia of subcluster C′ of HIV type 1 subtype C. AIDS Res. Hum. Retrovir. 17:657-661. [DOI] [PubMed] [Google Scholar]
  • 2.Abebe, A., G. Pollakis, A. L. Fontanet, B. Fisseha, B. Tegbaru, A. Kliphuis, G. Tesfaye, H. Negassa, M. Cornelissen, J. Goudsmit, and T. F. Rinke de Wit. 2000. Identification of a genetic subcluster of HIV type 1 subtype C (C′) widespread in Ethiopia. AIDS Res. Hum. Retrovir. 16:1909-1914. [DOI] [PubMed] [Google Scholar]
  • 3.Abimiku, A. G., G. Franchini, J. Tartaglia, K. Aldrich, M. Myagkikh, P. D. Markham, P. Chong, M. Klein, M. P. Kieny, E. Paoletti, R. C. Gallo, and M. Robert-Guroff. 1995. HIV-1 recombinant poxvirus vaccine induces cross-protection against HIV-2 challenge in rhesus macaques. Nat. Med. 1:321-329. [DOI] [PubMed] [Google Scholar]
  • 4.Almond, N. M., and J. L. Heeney. 1998. AIDS vaccine development in primate models. AIDS 12(Suppl. A):S133-S140. [PubMed] [Google Scholar]
  • 5.Barouch, D. H., S. Santra, M. J. Kuroda, J. E. Schmitz, R. Plishka, A. Buckler-White, A. E. Gaitan, R. Zin, J. H. Nam, L. S. Wyatt, M. A. Lifton, C. E. Nickerson, B. Moss, D. C. Montefiori, V. M. Hirsch, and N. L. Letvin. 2001. Reduction of simian-human immunodeficiency virus 89.6P viremia in rhesus monkeys by recombinant modified vaccinia virus Ankara vaccination. J. Virol. 75:5151-5158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Betts, M. R., J. Krowka, C. Santamaria, K. Balsamo, F. Gao, G. Mulundu, C. Luo, N. N’Gandu, H. Sheppard, B. H. Hahn, S. Allen, and J. A. Frelinger. 1997. Cross-clade human immunodeficiency virus (HIV)-specific cytotoxic T-lymphocyte responses in HIV-infected Zambians. J. Virol. 71:8908-8911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bickel, P. J., P. C. Cosman, R. A. Olshen, P. C. Spector, A. G. Rodrigo, and J. I. Mullins. 1996. Covariability of V3 loop amino acids. AIDS Res. Hum. Retrovir. 12:1401-1411. [DOI] [PubMed] [Google Scholar]
  • 8.Bogers, W. M., H. Niphuis, P. ten Haaft, J. D. Laman, W. Koornstra, and J. L. Heeney. 1995. Protection from HIV-1 envelope-bearing chimeric simian immunodeficiency virus (SHIV) in rhesus macaques infected with attenuated SIV: consequences of challenge. AIDS 9:F13-F18. [PubMed] [Google Scholar]
  • 9.Calef, C., R. Thakallapally, D. Lang, C. Brander, P. Goulder, O. Yang, and B. Korber. 2000. PeptGen: designing peptides for immunological studies and application to HIV consensus sequences, p. I-63-I-100. In B. Korber, C. Brander, B. Haynes, R. Koup, C. Kuiken, J. Moore, B. Walker, and D. Watkins (ed.), HIV molecular immunology 2000. Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, N.Mex.
  • 10.Cao, H., P. Kanki, J. L. Sankale, A. Dieng-Sarr, G. P. Mazzara, S. A. Kalams, B. Korber, S. Mboup, and B. D. Walker. 1997. Cytotoxic T-lymphocyte cross-reactivity among different human immunodeficiency virus type 1 clades: implications for vaccine development. J. Virol. 71:8615-8623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cao, H., I. Mani, R. Vincent, R. Mugerwa, P. Mugyenyi, P. Kanki, J. Ellner, and B. D. Walker. 2000. Cellular immunity to human immunodeficiency virus type 1 (HIV-1) clades: relevance to HIV-1 vaccine trials in Uganda. J. Infect. Dis. 182:1350-1356. [DOI] [PubMed] [Google Scholar]
  • 12.Choudhury, S., M. A. Montano, C. Womack, J. T. Blackard, J. K. Maniar, D. G. Saple, S. Tripathy, S. Sahni, S. Shah, G. P. Babu, and M. Essex. 2000. Increased promoter diversity reveals a complex phylogeny of human immunodeficiency virus type 1 subtype C in India. J. Hum. Virol. 3:35-43. [PubMed] [Google Scholar]
  • 13.Daniel, M. D., F. Kirchhoff, S. C. Czajak, P. K. Sehgal, and R. C. Desrosiers. 1992. Protective effects of a live attenuated SIV vaccine with a deletion in the nef gene. Science 258:1938-1941. [DOI] [PubMed] [Google Scholar]
  • 14.Dittmar, M. T., G. Simmons, Y. Donaldson, P. Simmonds, P. R. Clapham, T. F. Schulz, and R. A. Weiss. 1997. Biological characterization of human immunodeficiency virus type 1 clones derived from different organs of an AIDS patient by long-range PCR. J. Virol. 71:5140-5147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Dorrell, L., T. Dong, G. S. Ogg, S. Lister, S. McAdam, T. Rostron, C. Conlon, A. J. McMichael, and S. L. Rowland-Jones. 1999. Distinct recognition of non-clade B human immunodeficiency virus type 1 epitopes by cytotoxic T lymphocytes generated from donors infected in Africa. J. Virol. 73:1708-1714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Dunn, C. S., B. Hurtrel, C. Beyer, L. Gloeckler, T. N. Ledger, C. Moog, M. P. Kieny, M. Mehtali, D. Schmitt, J. P. Gut, A. Kirn, and A. M. Aubertin. 1997. Protection of SIVmac-infected macaque monkeys against superinfection by a simian immunodeficiency virus expressing envelope glycoproteins of HIV type 1. AIDS Res. Hum. Retrovir. 13:913-922. [DOI] [PubMed] [Google Scholar]
  • 17.Durali, D., J. Morvan, F. Letourneur, D. Schmitt, N. Guegan, M. Dalod, S. Saragosti, D. Sicard, J. P. Levy, and E. Gomard. 1998. Cross-reactions between the cytotoxic T-lymphocyte responses of human immunodeficiency virus-infected African and European patients. J. Virol. 72:3547-3553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Esparza, J., and N. Bhamarapravati. 2000. Accelerating the development and future availability of HIV-1 vaccines: why, when, where, and how? Lancet 355:2061-2066. [DOI] [PubMed] [Google Scholar]
  • 19.Essex, M. 1998. State of the HIV pandemic. J. Hum. Virol. 1:427-429. [PubMed] [Google Scholar]
  • 20.Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376. [DOI] [PubMed] [Google Scholar]
  • 21.Gao, F., D. L. Robertson, C. D. Carruthers, S. G. Morrison, B. Jian, Y. Chen, F. Barre-Sinoussi, M. Girard, A. Srinivasan, A. G. Abimiku, G. M. Shaw, P. M. Sharp, and B. H. Hahn. 1998. A comprehensive panel of near-full-length clones and reference sequences for non-subtype B isolates of human immunodeficiency virus type 1. J. Virol. 72:5680-5698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gilbert, P., S. Self, M. Rao, A. Naficy, and J. Clemens. 2001. Sieve analysis: methods for assessing from vaccine trial data how vaccine efficacy varies with genotypic and phenotypic pathogen variation. J. Clin. Epidemiol. 54:68-85. [DOI] [PubMed] [Google Scholar]
  • 23.Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95-98. [Google Scholar]
  • 24.Harris, M., G. Ferrari, S. Maayan, D. Birx, and F. McCutchan. 2001. Genetic analysis of five near full-length sequences of human immunodeficiency virus type 1 from Ethiopia, p. 114. In AIDS vaccine 2001. Foundation for AIDS Vaccine Research and Development, Philadelphia, Pa.
  • 25.Johnston, M. I. 2000. The role of nonhuman primate models in AIDS vaccine development. Mol. Med. Today 6:267-270. [DOI] [PubMed] [Google Scholar]
  • 26.Korber, B., M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinsky, and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288:1789-1796. [DOI] [PubMed] [Google Scholar]
  • 27.Korber, B. T., R. M. Farber, D. H. Wolpert, and A. S. Lapedes. 1993. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl. Acad. Sci. USA 90:7176-7180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27a.Korber, B. T., B. T. Foley, C. L. Kuiken, S. K. Pillai, and J. G. Sodroski. 1998. Numbering positions in HIV relative to hXB2GC, p. III-102-III-111. In B. Korber, C. Kuiken, B. Foley, B. Hahn, F. McCutchan, J. Mellors, and J. Sodroski (ed.), Human retroviruses and AIDS 1998. Theoretical Biology and Biophysics, Group T-10, Los Alamos National Laboratory, Los Alamos, N.Mex.
  • 28.Lole, K. S., R. C. Bollinger, R. S. Paranjape, D. Gadkari, S. S. Kulkarni, N. G. Novak, R. Ingersoll, H. W. Sheppard, and S. C. Ray. 1999. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. Virol. 73:152-160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mascola, J. R., G. Stiegler, T. C. VanCott, H. Katinger, C. B. Carpenter, C. E. Hanson, H. Beary, D. Hayes, S. S. Frankel, D. L. Birx, and M. G. Lewis. 2000. Protection of macaques against vaginal transmission of a pathogenic HIV-1/SIV chimeric virus by passive infusion of neutralizing antibodies. Nat. Med. 6:207-210. [DOI] [PubMed] [Google Scholar]
  • 30.McAdam, S., P. Kaleebu, P. Krausa, P. Goulder, N. French, B. Collin, T. Blanchard, J. Whitworth, A. McMichael, and F. Gotch. 1998. Cross-clade recognition of p55 by cytotoxic T lymphocytes in HIV-1 infection. AIDS 12:571-579. [DOI] [PubMed] [Google Scholar]
  • 31.McCutchan, F. E. 2000. Understanding the genetic diversity of HIV-1. AIDS 14(Suppl. 3):S31-S44. [PubMed] [Google Scholar]
  • 32.Miller, C. J., M. B. McChesney, X. Lu, P. J. Dailey, C. Chutkowski, D. Lu, P. Brosio, B. Roberts, and Y. Lu. 1997. Rhesus macaques previously infected with simian/human immunodeficiency virus are protected from vaginal challenge with pathogenic SIVmac239. J. Virol. 71:1911-1921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mochizuki, N., N. Otsuka, K. Matsuo, T. Shiino, A. Kojima, T. Kurata, K. Sakai, N. Yamamoto, S. Isomura, T. N. Dhole, Y. Takebe, M. Matsuda, and M. Tatsumi. 1999. An infectious DNA clone of HIV type 1 subtype C. AIDS Res. Hum. Retrovir. 15:1321-1324. [DOI] [PubMed] [Google Scholar]
  • 34.Mooij, P., W. M. Bogers, H. Oostermeijer, W. Koornstra, P. J. Ten Haaft, B. E. Verstrepen, G. Van Der Auwera, and J. L. Heeney. 2000. Evidence for viral virulence as a predominant factor limiting human immunodeficiency virus vaccine efficacy. J. Virol. 74:4017-4027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Nathanson, N., V. M. Hirsch, and B. J. Mathieson. 1999. The role of nonhuman primates in the development of an AIDS vaccine. AIDS 13(Suppl. A):S113-S120. [PubMed] [Google Scholar]
  • 36.Ndung'u, T., B. Renjifo, and M. Essex. 2001. Construction and analysis of an infectious human immunodeficiency virus type 1 subtype C molecular clone. J. Virol. 75:4964-4972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ndung'u, T., B. Renjifo, V. A. Novitsky, M. F. McLane, S. Gaolekwe, and M. Essex. 2000. Molecular cloning and biological characterization of full-length HIV-1 subtype C from Botswana. Virology 278:390-399. [DOI] [PubMed] [Google Scholar]
  • 38.Novitsky, V., P. O. Flores-Villanueva, P. Chigwedere, S. Gaolekwe, H. Bussman, G. Sebetso, R. Marlink, E. J. Yunis, and M. Essex. 2001. Identification of most frequent HLA class I antigen specificities in Botswana: relevance for HIV vaccine design. Hum. Immunol. 62:146-156. [DOI] [PubMed] [Google Scholar]
  • 39.Novitsky, V., N. Rybak, M. F. McLane, P. Gilbert, P. Chigwedere, I. Klein, S. Gaolekwe, S. Y. Chang, T. Peter, I. Thior, T. Ndung'u, F. Vannberg, B. T. Foley, R. Marlink, T. H. Lee, and M. Essex. 2001. Identification of human immunodeficiency virus type 1 subtype C Gag-, Tat-, Rev-, and Nef-specific Elispot-based cytotoxic T-lymphocyte responses for AIDS vaccine design. J. Virol. 75:9210-9228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Novitsky, V. A., S. Gaolekwe, M. F. McLane, T. P. Ndung'u, B. T. Foley, F. Vannberg, R. Marlink, and M. Essex. 2000. HIV type 1 A/J recombinant with a pronounced pol gene mosaicism. AIDS Res. Hum. Retrovir. 16:1015-1020. [DOI] [PubMed] [Google Scholar]
  • 41.Novitsky, V. A., M. A. Montano, M. F. McLane, B. Renjifo, F. Vannberg, B. T. Foley, T. P. Ndung'u, M. Rahman, M. J. Makhema, R. Marlink, and M. Essex. 1999. Molecular cloning and phylogenetic analysis of HIV-1 subtype C: a set of 23 full-length clones from Botswana. J. Virol. 73:4427-4432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 1994. fastDNAML: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10:41-48. [DOI] [PubMed] [Google Scholar]
  • 43.Peeters, M., and P. M. Sharp. 2000. Genetic diversity of HIV-1: the moving target. AIDS 14(Suppl. 3):S129-S140. [PubMed] [Google Scholar]
  • 44.Rodenburg, C. M., Y. Li, S. A. Trask, Y. Chen, J. Decker, D. L. Robertson, M. L. Kalish, G. M. Shaw, S. Allen, B. H. Hahn, and F. Gao. 2001. Near full-length clones and reference sequences for subtype C isolates of HIV type 1 from three different continents. AIDS Res. Hum. Retrovir. 17:161-168. [DOI] [PubMed] [Google Scholar]
  • 45.Rowland-Jones, S. L., T. Dong, K. R. Fowke, J. Kimani, P. Krausa, H. Newell, T. Blanchard, K. Ariyoshi, J. Oyugi, E. Ngugi, J. Bwayo, K. S. MacDonald, A. J. McMichael, and F. A. Plummer. 1998. Cytotoxic T cell responses to multiple conserved HIV epitopes in HIV-resistant prostitutes in Nairobi. J. Clin. Investig. 102:1758-1765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Salminen, M. O., B. Johansson, A. Sonnerborg, S. Ayehunie, D. Gotte, P. Leinikki, D. S. Burke, and F. E. McCutchan. 1996. Full-length sequence of an Ethiopian human immunodeficiency virus type 1 (HIV-1) isolate of genetic subtype C. AIDS Res. Hum. Retrovir. 12:1329-1339. [DOI] [PubMed] [Google Scholar]
  • 47.Shankarappa, R., R. Chatterjee, G. H. Learn, D. Neogi, M. Ding, P. Roy, A. Ghosh, L. Kingsley, L. Harrison, J. I. Mullins, and P. Gupta. 2001. Human immunodeficiency virus type 1 Env sequences from Calcutta in eastern India: identification of features that distinguish subtype C sequences in India from other subtype C sequences. J. Virol. 75:10479-10487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Shibata, R., C. Siemon, S. C. Czajak, R. C. Desrosiers, and M. A. Martin. 1997. Live, attenuated simian immunodeficiency virus vaccines elicit potent resistance against a challenge with a human immunodeficiency virus type 1 chimeric virus. J. Virol. 71:8141-8148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Smith, U. R., C. L. Kuiken, and T. Bhattacharya. 2001. The evolutionary and biogeographic history of HIV-1 subtype C, p. 112. In AIDS vaccine 2001. Foundation for AIDS Vaccine Research and Development, Philadelphia, Pa.
  • 50.Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.UNAIDS. 2000. AIDS epidemic update: December 2000. Joint United Nations Programme on HIV/AIDS (UNAIDS), Geneva, Switzerland.
  • 52.van Harmelen, J., C. Williamson, B. Kim, L. Morris, J. Carr, S. S. Abdool Karim, and F. McCutchan. 2001. Characterization of full length HIV-1 subtype C sequences from South Africa. AIDS Res. Hum. Retrovir. 17:1527-1531. [DOI] [PubMed] [Google Scholar]
  • 53.Wilson, S. E., S. L. Pedersen, J. C. Kunich, V. L. Wilkins, D. L. Mann, G. P. Mazzara, J. Tartaglia, C. L. Celum, and H. W. Sheppard. 1998. Cross-clade envelope glycoprotein 160-specific CD8+ cytotoxic T lymphocyte responses in early HIV type 1 clade B infection. AIDS Res. Hum. Retrovir. 14:925-937. [DOI] [PubMed] [Google Scholar]
  • 54.Wyand, M. S., K. Manson, D. C. Montefiori, J. D. Lifson, R. P. Johnson, and R. C. Desrosiers. 1999. Protection by live, attenuated simian immunodeficiency virus against heterologous challenge. J. Virol. 73:8356-8363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Wyand, M. S., K. H. Manson, M. Garcia-Moll, D. Montefiori, and R. C. Desrosiers. 1996. Vaccine protection by a triple deletion mutant of simian immunodeficiency virus. J. Virol. 70:3724-3733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Zhu, T., B. T. Korber, A. J. Nahmias, E. Hooper, P. M. Sharp, and D. D. Ho. 1998. An African HIV-1 sequence from 1959 and implications for the origin of the epidemic. Nature 391:594-597. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES