AU2179101A

AU2179101A - Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds

Info

Publication number: AU2179101A
Application number: AU21791/01A
Authority: AU
Inventors: Maria Ball; Carmela Cappellano; Sophie Courtois; Francois Francou; Asa Frostegard; Michel Guerineau; Pascale Jeannin; Jean-Luc Pernodet; Alain Raynal; Guennadi Sezonov; Pascal Simonet; Karine Tuphile
Original assignee: Aventis Pharma SA
Current assignee: Aventis Pharma SA
Priority date: 1999-11-29
Filing date: 2000-11-27
Publication date: 2001-06-12
Anticipated expiration: 2020-11-27
Also published as: NO20022532L; IL149846A0; JP2003520578A; EP1268764A2; NO20022532D0; WO2001040497A3; AU781961B2; KR20020060242A; CA2393041A1; BR0015993A; WO2001040497A2; AU2005211587A1

Description

WO 01/40497 1 PCT/FR0O/03311 Process for obtaining nucleic acids from an environmental sample, nucleic acids thus obtained and their use in the synthesis of novel compounds. The present invention relates to a process for preparing nucleic 5 acids from an environmental sample, more particularly a process for obtaining a collection of nucleic acids from a sample. The invention also relates to the nucleic acids or to the collections of nucleic acids obtained according to the process and to their use in the synthesis of novel compounds, in particular novel compounds of therapeutic interest. 10 The invention also relates to the novel means used in the above process for obtaining nucleic acids, such as novel vectors and novel processes for preparing such vectors or alternatively recombinant host cells comprising a nucleic acid of the invention. The invention also relates to processes for detecting a nucleic 15 acid of interest in a collection of nucleic acids obtained according to the above process, as well as to the nucleic acids detected by such a process and to the polypeptides encoded by such nucleic acids. The invention also relates to nucleic acids obtained and detected according to the above processes, in particular nucleic acids encoding an 20 enzyme which participates in the pathway for the biosynthesis of antibiotics such as 6-lactams, aminoglycosides, heterocyclic nucleotides or polyketides, as well as the enzyme encoded by these nucleic acids, the polyketides produced by means of the expression of these nucleic acids and, finally, pharmaceutical compositions comprising a pharmacologically 25 active amount of a polyketide produced by means of the expression of such nucleic acids. Since the discovery of the production of streptomycin by actinomycetes, the search for novel compounds of therapeutic interest, 2 and most particularly of novel antibiotics, has made increasing use of methods for screening the metabolites produced by soil microorganisms. Such methods consist mainly in isolating the organisms of the telluric microflora, in culturing them on specially adapted nutrient media 5 and then in detecting a pharmacological activity in the products found in the culture supernatants or in the cell lysates which have, where appropriate, undergone one or more prior separation and/or purification steps. Thus, the methods for the in vitro isolation and culturing of the 10 organisms constituting the telluric microflora have, to date, enabled the characterization of about 40,000 molecules, about half of which show biological activity. Major products have been characterized according to such in vitro culture methods, such as antibiotics (penicillin, erythromycin, actinomycin, 15 tetracycline, cephalosporin), anticancer agents, anti-cholesterolaemiants or pesticides. The products of therapeutic interest of microbial origin which are known to date originate in the majority (about 70%) from the actinomycetes and more particularly from the Streptomyces genus. However, other 20 therapeutic compounds, such as teicoplanins, gentamycin and spinosins, have been isolated from microorganisms of genera that are more difficult to culture, such as Micromonospora, Actinomadura, Actinoplanes, Nocardia, Streptosporangium, Kitasatosporia or Saccharomonospora. However, the practice illustrates the fact that the characterization 25 of novel natural products synthesized by the microorganisms of soil microflora remains limited, partly on account of the fact that the in vitro culturing step usually results in a selection of organisms that are already previously known.

3 The methods for in vitro separation and culturing of telluric organisms in order to identify novel compounds of interest thus have many limitations. For example, in actinomycetes, the level of rediscovery of 5 antibiotics that are already previously known is about 99%. Specifically, fluorescence microscopy techniques have made it possible to count more than 101 bacterial cells in 1 g of soil, whereas only 0.1 to 1% of these bacteria can be isolated after inoculation on culture media. With the aid of DNA recombination kinetics techniques, it has 10 been possible to show that between 12,000 and 18,000 bacterial species can be contained in 1 g of soil, whereas, to date, only 5000 non-eukaryotic microorganisms have been described, all habitats considered. Molecular ecology studies have made it possible to amplify and clone many novel sequences of 16S rDNA from environmental DNA. 15 The results of these studies have led to a trebling of the number of bacterial divisions previously characterized. At the present time, bacteria are subdivided into 40 divisions, some of which consist only of bacteria which cannot be cultured. These latest results bear witness to the breadth of microbial biodiversity which 20 remains unexploited to date. Recent studies have attempted to overcome the many obstacles to gaining access to the biodiversity of the soil microflora, in particular including the step of in vitro culturing prior to the isolation and characterization of compounds of industrial interest, especially of 25 therapeutic interest. Methods have thus been developed which include a step of extracting the DNA from telluric organisms, where appropriate after a prior isolation of the organisms contained in the soil samples.

4 The DNA thus extracted, after lysis of the bacterial cells without prior in vitro culturing, is cloned into vectors used to transfect host organisms, in order to constitute libraries of DNA originating from soil bacteria. 5 These libraries of recombinant clones are used to detect the presence of genes encoding compounds of therapeutic interest or alternatively to detect the production of compounds of therapeutic interest by these recombinant clones. However, the methods for gaining direct access to the DNA of soil 10 microflora, described in the prior art, present drawbacks during the implementation of each of the steps described above, these drawbacks being of a nature to considerably affect the quantity and quality of the genetic material obtained and exploitable. The prior art regarding each of the steps for constructing libraries 15 of DNA originating from soil samples is detailed below, along with the technical drawbacks identified by the Applicant and which have been overcome according to the present invention. 1. Step of extracting DNA from a soil sample 20 1.1 Direct extraction of environmental DNA This is essentially a process using DNA extraction techniques performed directly on the environmental sample, usually after a prior in situ 25 lysis of the organisms in the sample. Such techniques have been used on samples originating from aquatic media, both from freshwater and marine water. They comprise a first step of preconcentrating the cells present in free form or in the form of particles, which generally consists of a filtration of large volumes of water 5 on different filtration devices, for example conventional membrane filtration, tangential or rotational filtration or alternatively ultrafiltration. The pore size is between 0.22 and 0.45 mm and often requires a prefiltration in order to avoid blockages due to the treatment of large 5 volumes. In a second stage, the cells harvested are lysed directly on the filters in small volumes of solutions, by enzymatic and/or chemical treatment. This technique is illustrated for example by the studies by Stein 10 et al., 1996, Journal of Bacteriology, Vol.178 (3): 591-599 who describes the cloning of genes encoding ribosomal DNA and encoding a transcription elongation factor (EF 2) from Archaebacteria of marine plankton. Techniques of direct extraction of DNA from samples of soil or sediment have also been described, which are based on protocols of 15 physical, chemical or enzymatic lysis performed in situ. For example, US patent No. 5 824 485 (Chromaxome Corporation) describes a chemical lysis of bacteria directly on the sample taken by addition of a hot lysis buffer based on guanidium isothiocyanate. International patent application No. WO 99/20799 (Wisconsin 20 Alumni Research Foundation) decribes a step of in situ lysis of bacteria using an extraction buffer containing a protease and SDS. Other techniques have also been used, such as carrying out several cycles of freezing-thawing on the sample followed by high-pressure pressing of the thawed sample. Techniques of bacterial lysis using a 25 succession of steps of sonication, heating with microwaves and heat shocks have also been used (Picard et al. 1992). However, the techniques of the prior art described above for the direct extraction of DNA have very variable efficacy in quantitative and qualitative terms.

6 Thus, in situ chemical or enzymatic treatments of the sample have the drawback of lysing only certain categories of microorganisms on account of the selective resistance of the various microorganisms indigenous to the lysis step due to their heterogeneous morphology. 5 Thus, Gram-positive bacteria withstand a treatment with hot SDS detergent whereas virtually all Gram-negative cells are lysed. In addition, some of the direct extraction protocols described above promote the adsorption of the nucleic acids extracted onto the mineral particles of the sample, thus significantly reducing the amount of 10 available DNA. Moreover, although some of the protocols of the prior art disclose a mechanical treatment step to lyse the microorganisms in the sample taken, such a mechanical lysis step is systematically carried out in liquid medium in an extraction buffer, which does not allow good homogenization 15 of the starting sample in the form of fine particles enabling maximum accessibility to the diversity of organisms present in the sample. Grinding tests have also been carried out on crude soil samples using glass beads, but the amount of DNA extracted was low. It has been observed according to the invention that a first step of 20 in situ mechanical lysis in liquid medium has negative effects on the amount of DNA which can be extracted. The amount of DNA which can be used directly for cloning in recombinant vectors is also dependent on the purification steps subsequent to its extraction. 25 In the prior art, the DNA extracted is then purified, for example by using polyvinylpolypyrrolidone, by a precipitation in the presence of ammonium acetate or potassium acetate, by centrifugations on a caesium chloride gradient, or by chromatographic techniques, in particular on a 7 hydroxyapatite support, on an ion-exchange column or molecular sieving, or by electrophoresis techniques on agarose gel. The DNA purification techniques previously described, especially when combined with the abovementioned techniques for extracting 5 environmental DNA, are liable to lead to a co-purification of the DNA with inhibitory compounds, originating from the initial sample, that are difficult to remove. The co-extraction of inhibitory compounds with the DNA necessitates the multiplication of the number of purification steps, which 10 leads to considerable losses of the DNA initially extracted and simultaneously reduces the diversity of the genetic material initially contained in the sample, as well as its quantity. Another aim of the invention was to overcome the drawbacks of the prior purification protocols and to develop a DNA purifcation step which 15 makes it possible to maintain an optimum level of diversity of the DNA in the initial sample, on the one hand, and to promote quantitatively its production, on the other hand. Most particularly, the qualitative and quantitative improvements to the purification of DNA are at a maximum when they make use of a 20 combination of a direct DNA extraction process according to the invention and a subsequent purification process, as will be described hereinbelow. 1.2. Indirect extraction of environmental DNA. 25 Such techniques involve a first step of separation of the various organisms in the telluric microflora from the other constituents of the starting sample, prior to the actual DNA extraction step. In the state of the art, the prior separation of a microbial fraction from a soil sample usually comprises a physical dispersion of the sample 8 by grinding it in liquid medium, for example using devices such as a Waring Blender or a mortar. Chemical dispersions have also been described, for example dispersions on ion-exchange resins or dispersions using non-specific 5 detergents such as sodium deoxycholate or polyethylene glycol. Whatever the mode of dispersion, the solid sample should be suspended in water, phosphate buffer or a saline solution. The physical or chemical dispersion step can be followed by a centrifugation on a density gradient allowing the separation of the cells 10 contained in the sample and of the particles of this sample, it being understood that bacteria have lower densities than those of most soil particles. The physical dispersion step can also alternatively be followed by a step of low-speed centrifugation or a step of cell elutriation. 15 The DNA can then be extracted from the separated cells by any available method of lysis and can be purified by many methods, including the purification methods described in paragraph 1.1 above. In particular, the inclusion of the cells in low-melting agarose can be carried out in order to control the lysis. 20 However, the methods described in the prior art that are known to the Applicant are unsatisfactory on account of the presence, in the fractions containing the extracted DNA, of unwanted constituents of the starting sample which have a significant influence on the final quality and quantity of DNA. 25 The present invention proposes to solve the technical difficulties encountered in the processes of the prior art, as will be described hereinbelow. 2. Molecular characterization of the extracted DNA.

9 When it is desired to construct a DNA library from an environmental sample, in particular from a soil sample, it is advantageous to check the quality and diversity of the source of DNA extracted and 5 purified before it is inserted into suitable vectors. The object of such a molecular characterization of the DNA extracted and purified is to obtain profiles representing the proportions of the various bacterial taxons present in this DNA extract. The molecular characterization of the DNA extracted and purified makes it possible to 10 determine whether or not artefacts have been introduced during the implementation of the various extraction and purification steps and, where appropriate, whether or not the original diversity of the DNA extracted and purified is representative of the microbial diversity initially present in the sample, in particular in the soil sample. 15 To the Applicant's knowledge, the prior art makes use of quantitative hybridization processes using oligonucleotide probes that are specific for different bacterial groups, applied directly to the DNA extracted from the environment. Unfortunately, such an approach is relatively insensitive and does 20 not make it possible to detect taxonomic groups or genera that are present in low abundance. The prior art also describes quantitative PCR processes, such as MPN-PCR or competitive quantitative PCR. However, these techniques have major drawbacks. 25 Thus, MPN-PCR is complicated to carry out on account of the multiplication of the dilutions and repetitions, making it unsuitable for a large number of samples or for primer couples. Moreover, competitive quantitative PCR is difficult to carry out on account of the need to construct a competitor which is specific to the target 10 DNA and which, in addition, does not induce any bias or artefacts into the competition itself. According to the invention, a process is thus proposed for prescreening a library of DNA originating from an environmental sample, 5 which is both quick, simple and reliable and which makes it possible to test the quality of the DNA extracted and purified beforehand and thus to determine the value of constructing a library of clones prepared from this purified starting DNA. 10 3. Vectors for cloning DNA extracted and purified from an environmental sample. Many vectors have already been described in the prior art for cloning DNA preextracted from an environmental sample. 15 Thus, according to the description of international patent application No. WO 99/20799, viral vectors, phages, plasmids, phagemids, cosmids, phosmids, vectors of the BAC (bacterial artificial chromosome) type or bacteriophage P1, vectors of PAC type (artificial chromosome based on bacteriophage P1), vectors of the YAC (yeast artificial 20 chromosome) type, yeast plasmids or any other vector capable of maintaining and expressing a genomic DNA in a stable manner can be used. Example 1 of PCT patent application No. WO 99/20799 describes the construction of a genomic DNA library by cloning into a vector of the 25 BAC type. To the Applicant's knowledge, no DNA library originating from an environmental sample has yet been effectively produced with vectors of conjugative type, such a technique being made available to and 11 reproducible by those skilled in the art for the first time by virtue of the teaching of the present invention. 4. Host cells 5 In the prior art, many host cells have been described as being able to be used in order to accommodate vectors containing inserts of DNA originating from the DNA extracted and purified from an environmental sample. 10 Thus, PCT patent application No. WO 99/20799 cites many suitable host cells, such as Escherichia coli, in particular the strain DH 1OB or the strain 294 (ATCC 31446, the strain E. coli B, E. ColiX 1776 (ATCC No. 31.537), E.coli DH5 ax and E.coli W31 10 (ATCC No. 27.325). This PCT patent application also cites other suitable host cells 15 such as Enterobacter, Erwinia, Klebsiella, Proteus, Salmonella, Serratia, Schigella or strains of the bacillus type such as B. subtilis and B. licheniformis as well as bacteria of the genus Pseudomonas, Streptomyces or Actinomyces. US patent No. 5 824 485 in particular cites the Streptomyces 20 lividans TK66 strain or yeast cells such as those of Saccharomyces pombe. 5. Characterization of genes of interest in DNA libraries originating from an environmental sample. 25 PCT patent application No. WO 99/20799 describes an identification of the phenotype of different clones belonging to the DNA library of B. cereus, respectively -a clone producing haemolysin, a clone hydrolysing esculin or a clone producing an orange pigment.

12 Mutagenesis techniques based on the use of a transposon encoding the pho A enzyme made it possible subsequently to isolate mutated clones and to characterize the sequences responsible for the phenotypes observed. 5 The abovementioned article by Stein et al. (1996) describes the use of specific primers for ribosomal DNA in order to amplify the DNA inserted into the vectors harboured by certain clones of a genomic DNA library of marine plankton Archaebacteria and the identification of several coding sequences in the DNA thus amplified. 10 The article by Borschert S. et al. (1992) describes the screening of a genomic DNA library of Bacillus subtilis using pairs of primers which hybridize with conserved regions of known peptide synthetases in order to identify one or more corresponding genes in the genome of Bacillus subtilis. 15 This technique made it possible to detect a chromosomal DNA fragment of about 26 kb carrying a portion of the surfactin biosynthesis operon. The article by Kah-Tong S. et al. (1997) describes the screening of a library of DNA originating from the soil with the aid of primers which 20 hybridize with conserved sequences of the operon responsible for the biosynthetic pathway of type il polyketides and shows the identification, in this DNA library, of sequences belonging to the PKS-P gene. This article also describes the construction of hybrid expression cassettes in which the sequence of the PKS-P subunit, found naturally in the operon responsible 25 for polyketide biosynthesis, has been replaced with various similar sequences found in the DNA library. Similarly, the article by Hong-Fu et al. (1995) describes the construction of expression cassettes containing the various open reading frames of the operon responsible for polyketide biosynthesis, the various 13 expression cassettes having been constructed artificially by combining the open reading frames which are not found together naturally in the genome of Streptomyces coelicolor. This article shows that the combination, in the artificial expression cassettes, of open reading frames originating from 5 different bacterial strains allows the production of polyketides that have different structural characteristics and relatively large antibiotic activities with respect to Bacillus subtilis and Bacillus cereus. Polyketides form part of a large family of natural products of variable structure having great diversity of biological activity. Among the 10 polyketides are, for example, tetracyclines and erythromycin (antibiotics), FK506 (immunosuppressant), doxorubicin (anticancer agent), monensin (a coccidiostatic agent) and avermectin (an antiparasitic agent). These molecules are synthesized by means of multifunctional enzymes known as polyketide synthases, which catalyse repeated cycles 15 of condensation between acyl thioesters (in general acetyl, propionyl, malonyl or methylmalonyl thioesters). Each condensation cycle results in the formation, on a growing carbon chain, of a P-keto group which can then undergo, where appropriate, one or more series of reductive steps. Given the major clinical interest of polyketides, their common 20 mechanism of biosynthesis and the high degree of conservation observed between the groups of genes encoding polyketide synthases, increased interest has developed for the development of novel polyketides by genetic engineering. Novel artificial polyketides have thus been produced by genetic 25 engineering, such as mederrhodin A or dihydrogranatirhodin. The vast majority of the novel polyketide molecules obtained by genetic engineering are very different, in structural terms, from the corresponding natural polyketides.

14 From the prior art, it thus emerges that there is a need to obtain novel polyketides of interest and most particularly polyketides of therapeutic interest which have in particular, relative to their natural homologues, an increased level of antibiotic activity or a different spectrum 5 of antibiotic activity, either which is broader than that of the known polyketides, or which is, on the other hand, more selective. As will be described below, this need is partly fulfilled according to the present invention. 10 DESCRIPTION OF THE INVENTION The invention relates firstly to a process for constructing libraries of DNA originating from an environmental sample, such a sample possibly being, without discrimination, an aquatic medium (fresh water or marine 15 water), a sample of soil (surface layer of soil, subsoil or sediments), or a sample of eukaryotic organisms containing an associated microflora, such as, for example, a sample originating from plants, insects or marine organisms and having an associate microflora. The development of a process for constructing a library of DNA 20 from an environmental sample, and most particularly from a soil sample, comprises critical steps whose implementation must necessarily be optimized in order to obtain a library of DNA whose content of nucleic acids of interest satisfies the objectives initially set. A first critical step consists in extracting and subsequently 25 purifying the nucleic acids initially contained in the sample, i.e. mainly the nucleic acids contained in the various organisms of which the microflora of this sample is composed. The quality of purification of the extracted DNA is a factor which determines the result obtained.

15 A second important step of a process for constructing a library of nucleic acids originating from an environmental sample is the evaluation of the genetic diversity of the nucleic acids extracted and purified. The development of a step for the simple and reliable pre-screening of the DNA 5 extracted and purified in order to check that it takes account, at least partially, of the phylogenetic diversity of the organisms initially present in the starting sample effectively makes it possible to determine the value or otherwise of using the initial source of extracted and purified DNA for the construction of the nucleic acid library itself or, on the contrary, to not 10 continue the construction of the nucleic acid library on account of excessive artefacts introduced at the time of the extraction and purification of the nucleic acids. It has also been identified, according to the invention, that the quality of the inserts introduced into the vectors to construct the library is a determining factor. It has thus been determined that the use of 15 restriction enzymes to cleave the DNA extracted and purified from the environmental sample was of a nature to introduce artefacts or "bias" into the structure of the inserts obtained. Specifically, the DNA extracted from the soil or from other environments, originating in the vast majority of cases from unculturable organisms, is composed of molecules whose content of 20 G and C bases is by definition unknown and furthermore variable as a function of the origin of these organisms. A third critical step is the insertion of the extracted and purified nucleic acids into vectors capable of integrating nucleic acids of chosen length, on the one hand, and to allow their transfection or integration into 25 the genome of given host cells, on the other hand, as well as, where appropriate, to allow their expression in such host cells. Vectors capable of integrating large nucleic acids, i.e. larger than 100 kb in size, constitute vectors of interest when the objective pursued consists in cloning and identifying a complete operon capable of directing a 16 complete biosynthetic pathway of a compound of industrial interest, in particular of a compound of pharmaceutical or agronomic interest. DEFINITIONS 5 For the purposes of the present invention, the terms "nucleic acids", "polynucleotides" and "oligonucleotides" mean not only DNA and RNA sequences but also hybrid RNA/DNA sequences of more than 2 nucleotides, in either single-stranded or double-stranded form. 10 The term "library" or "collection" is used in the present description with reference either to a set of extracted, and where appropriate purified, nucleic acids originating from an environmental sample, to a set of recombinant vectors, each of the recombinant vectors of the set comprising a nucleic acid originating from the set of abovementioned 15 extracted, and where appropriate purified, nucleic acids, or to a set of recombinant host cells comprising one or more nucleic acids originating from the set of abovementioned extracted, and where appropriate purified, nucleic acids, the said nucleic acids being either carried by one or more recombinant vectors or integrated into the genome of the said recombinant 20 host cells. The expression "environmental sample" denotes, without discrimination, a sample of aquatic origin, for example from fresh or salt water, or a telluric sample originating from the surface layer of a soil, from sediments or from lower layers of the soil (subsoil), as well as samples of 25 eukaryotic organisms, which may be multicellular, of plant origin, originating from marine organisms or from insects and having an associated microflora, this associated microflora constituting organisms of interest.

17 According to the invention, the term "operon" means a set of open reading frames whose transcription and/or translation is co-regulated by a unique set of signals for regulating the transcription and/or translation. According to the invention, an operon can also comprise the said signals 5 for regulating the transcription and/or translation. For the purposes of the invention, the expression "metabolic pathway" or "biosynthetic pathway" means a set of anabolic or catabolic biochemical reactions which results in the conversion of a first chemical species into a second chemical species. 10 For example, a biosynthetic pathway for an antibiotic consists of the set of biochemical reactions converting primary metabolites into intermediate products of the antibiotics, and then subsequently into antibiotics. The expression "regulation sequence which is operably linked 15 relative to a nucleotide sequence whose expression is desired" means that the transcription regulation sequence(s) is (are) located, relative to the nucleotide sequence of interest whose expression is desired, so as to allow the expression of the said sequence of interest, the regulation of the said expression being dependent on factors which interact with the regulatory 20 nucleotide sequences. According to another terminology, it may also be said that the nucleotide sequence of interest whose expression is desired is placed "under the control" of the transcription-regulating nucleotide sequences. For the purposes of the present invention, the term "isolated" 25 denotes a biological material which has been abstracted from its original environment (the environment in which it is naturally located). For example, a polynucleotide or a polypeptide present in the natural state in an organism (virus, bacterium, fungus, yeast, plant or animal) is not isolated. The same polypeptide separated from its natural 18 environment or the same polynucleotide separated from the adjacent nucleic acids within which it is naturally inserted in the genome of the organism, is isolated. Such a polynucleotide can be included into a vector and/or such a 5 polynucleotide can be included into a composition and nevertheless remain in isolated form, due to the fact that the vector or composition does not constitute its natural environment. The term "purified" does not require the material to be present in a form of absolute purity, exclusive of the presence of other compounds. 10 Rather, this is a relative definition. A polypeptide or polynucleotide is in purified form after purification of the starting material by at least one order of magnitude, preferably two or three and preferentially four or five orders of magnitude. For the purposes of the present invention, the "percentage of 15 identity" between two sequences of nucleotides or of amino acids can be determined by comparing two optimally aligned sequences across a comparison window. The portion of the nucleotide or polypeptide sequence in the comparison window can thus comprise additions or deletions (for example 20 "gaps") relative to the reference sequence (which does not comprise these additions or deletions) so as to obtain an optimum alignment of the two sequences. The percentage is calculated by determining the number of positions at which an identical nucleic base or an identical amino acid 25 residue is observed for the two compared sequences (nucleic acid or peptide), followed by dividing the number of positions at which there is identity between the two bases or amino acid residues by the total number of positions in the comparison window, followed by multiplying the result by 100 in order to obtain the percentage of sequence identity.

19 The optimum alignment of the sequences for the comparison can be achieved by computer with the aid of known algorithms contained in the package from the company Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Doctor, Madison, 5 Wisconsin. By way of illustration, the percentage of sequence identity may be determined using the BLAST software (BLAST versions 1.4.9 of March 1996, BLAST 2.0.4. of February 1998 and BLAST 2.0.6. of September 1998), exclusively using the default parameters (S.F. Altschul et al., J. Mol. 10 Biol. 1990 215: 403-410, S. F. Altschul et al., Nucleic Acids Res. 1997 25: 3389-3402). Blast recherche des sequences similaires/homologues a une sequence " requete " de reference, a l'aide de l'algorithme [Blast search for sequences similar/homologous to a reference "request" sequence, with the aid of the algorithm] from Altschul et al. The request sequence and the 15 databases used can be of peptide or nucleic nature, any combination being possible. EXTRACTION AND PURIFICATION OF NUCLEIC ACIDS ORIGINATING FROM AN ENVIRONMENTAL SAMPLE. 20 1. Direct extraction of nucleic acids It has been shown according to the present invention that, in order to obtain a library of nucleic acids originating from organisms contained in 25 a sample of soil, it was important to create conditions under which, on the one hand, the various organisms in the sample are made accessible to the subsequent steps for extracting the nucleic acids, and, on the other hand, that the initial step of treatment of the sample of soil allows a maximum mechanical lysis of the organisms in the sample, which is of a nature to 20 make the nucleic acids of these organisms directly accessible, mainly the genomic and plasmid DNA, to the buffers used for the subsequent extraction steps. It has thus been demonstrated according to the invention that 5 maximum accessibility of nucleic acids originating from microorganisms from a sample of soil was achieved by a thorough dry-grinding of the pre dried soil sample in order to obtain microparticles. The Applicant has thus determined that the drying of the soil sample prior to any subsequent treatment brings about a significant reduction in the cohesion of the crude 10 soil sample and consequently promotes its subsequent disintegration in the form of microparticles, when a suitable grinding treatment is carried out. Surprisingly, the Applicant has shown that microparticles of dry soil samples combined physicochemical properties that are favourable to the extraction of an optimum quantity of nucleic acids which, in their nature, 15 could be representative of the genetic diversity of the organisms initially present in the starting soil sample. It has been shown in particular that the process of direct extraction of nucleic acids according to the invention allows the extraction of DNA originating from rare microorganisms, such as certain rare Streptomyces or sporulated microorganisms. 20 For the purposes of the present invention, the term "microparticles" of the soil sample means particles derived from the sample which have an average size of about 50 pm, i.e. on average between 45 and 55 pm. According to the invention, the microparticles are obtained from 25 soil samples that are pre-dried or pre-desiccated and then ground until microparticles with an average size of between 2 pm and 50 pm are obtained, before resuspension of the microparticles obtained in a liquid buffer medium.

21 Such a liquid buffer medium can consist of a nucleic acid extraction buffer, in particular a conventional DNA extraction buffer which is well known to those skilled in the art. The grinding of the soil sample into microparticles has the twin 5 function of mechanically lysing most of the organisms present in the initial soil sample and of making the organisms that are not lysed by this mechanical treatment accessible to optional subsequent steps of chemical and/or enzymatic lysis. Thus, a first subject of the invention consists of a process for 10 preparing a collection of nucleic acids from a soil sample containing organisms, the said process comprising a first step (I-a)) of obtaining microparticles by grinding the pre-dried or pre-desiccated soil sample, followed by suspending the microparticles in a liquid buffer medium. In an entirely preferred manner, the grinding step is carried out 15 using a device with agate or tungsten beads or alternatively using a device with tungsten rings. These devices are preferred since the hardness of materials such as agate or tungsten significantly facilitates the production of microparticles of the size specified above. For this reason, use of a grinding device with glass beads, which is found to be much less efficient, 20 will preferably not be chosen, or will be avoided. The drying or classification of the soil sample can be carried out by any method known to those skilled in the art. For example, the crude soil sample can be dried at room temperature for a period of 24 to 48 hours. 25 As indicated previously, the liquid buffer medium can consist of a medium for extracting the DNA present in the microparticles. An extraction buffer known as TENP containing, respectively, 50 mM Tris, 20 mM EDTA, 100 mM NaCI and 1 % (weight/volume) of polyvinylpolypyrrolidone, at pH 9.0, will most preferably be used.

22 The process for preparing a collection of nucleic acids from a soil sample is also characterized in that the step for obtaining microparticles by grinding the pre-dried or pre-desiccated soil sample is followed by a step 1-(b) of extracting the nucleic acids present in the microparticles. 5 It is common ground that the extraction of the nucleic acids is accompanied by a co-extraction of unwanted soil constituents and/or compounds, thus necessitating the subsequent purification of the nucleic acids extracted, such a subsequent purification step needing to be both selective enough to allow the removal of the unwanted soil constituents 10 and/or compounds, and of a yield which is sufficient to entail a small loss in terms of the amount of pre-extracted DNA. It has been shown according to the invention that a step of purifying the DNA extracted from the microparticles of the soil sample which satisfies the selectivity and yield criteria defined above comprises a 15 treatment of the extracted DNA with a combination of two successive chromatography steps, a chromatography on molecular sieves and an anion-exchange chromatography, respectively. According to another characteristic of the above process, step 1-(b) of extracting the nucleic acids is followed by a step 1-(c) of purifying 20 the extracted nucleic acids with the aid of the following two chromatography steps: - passing the solution containing the nucleic acids over a molecular sieve, followed by recovery of the elution fractions enriched in nucleic acids; 25 - passing the elution fractions enriched in nucleic acids over an anion-exchange chromatography support, followed by recovery of the elution fractions containing the nucleic acids. The nature and order of the above chromatography steps are essential for good selectivity and an excellent yield for the step of purifying 23 the DNA pre-extracted from the microparticles of the pre-dried or pre desiccated soil sample. In a very advantageous manner, the chromatographic support of the "molecular sieve" type in the above nucleic acid purification step 5 consists of a chromatographic support of Sephacryl* S400 HR type or a chromatographic support of equivalent characteristics. In an entirely preferred manner, the anion-exchange chromatographic support used in the second step for purifying the extracted DNA is a support of Elutip* d type, or a chromatographic support 10 of equivalent characteristics. By combining the steps 1-(a) of obtaining microparticles of the dry soil sample, 1-(b) of extracting the nucleic acids present in the microparticles and 1-(c) of purification by the chromatography steps described above, it is possible according to the invention to extract the 15 DNA from the soil directly without prior purification of the cells of the organisms initially contained in the sample, while at the same time avoiding the co-extraction of soil contaminants, such as, for example, humic acids, which is observed with the processes of the prior art. The contaminants, such as humic acids, severely impair the 20 analyses and the subsequent uses of the nucleic acids whose purification is desired. According to the above process, it is also possible to gain access to the nucleic acids contained in the organisms which have not been lysed mechanically during step 1-(a) of obtaining microparticles of the soil 25 sample, with the aim of obtaining a virtually exhaustive collection of the genetic diversity of nucleic acids initially present in the soil sample. Thus, the microparticles of the soil sample can undergo subsequent steps of chemical, enzymatic or physical lysis treatment, or alternatively a combination of chemical, enzymatic or physical treatments.

24 According to a first aspect, the process for preparing a collection of nucleic acids from a soil sample according to the invention can also be characterized in that step 1-(a) is followed by the following steps: 5 9 treatment of the soil suspension in a liquid buffer medium by sonication; e extraction and recovery of the nucleic acids. 10 In a preferred manner, for a treatment by sonication, use will be made of a device of titanium micro-point type, such as the 600 W Vibracell Ultrasonicator device sold by the company Bioblock or a sonicator of Cup Horn type. In an entirely preferred manner, the sonication step is carried out 15 at a power of 15 W for a duration of 7 to 10 minutes and comprises successive cycles of sonication, the sonication itself being carried out for 50% of the duration of each cycle. According to a second aspect, the above process can also be characterized in that step 1-(a) is followed by the following steps: 20 e treatment of the soil suspension in a liquid buffer medium by sonication; e incubation of the suspension at 37 0 C after sonication in the 25 presence of lysozyme and achromopeptidase; * addition of SDS before centrifugation and precipitation of the nucleic acids; 25 recovery of the precipitated nucleic acids. Preferably, the step of incubation in the presence of lysozyme and achromopeptidase will be carried out at a final concentration of 0.3 mg/ml 5 of each of the two enzymes, preferably for 30 minutes at 37*C. Preferably, the SDS will be used at a final concentration of 1% and for an incubation time of 1 hour at a temperature of 600C before centrifugation and precipitation. According to a third aspect, the process for preparing a collection 10 of nucleic acids from a soil sample above is also characterized in that step 1-(a) is followed by the following steps: - homogenization of the soil suspension with a step of vigorous mixing (vortex) followed by a step of simple stirring; 15 - freezing of the homogeneous suspension followed by thawing; - treatment of the suspension by sonication after thawing; - incubation of the suspension at 370C after sonication in the presence of lysozyme and achromopeptidase; - addition of SDS before centrifugation and precipitation of the 20 nucleic acids; - recovery of the nucleic acids. Preferably, the suspensions of soil microparticles are mixed on the vortex machine and then homogenized by gentle stirring on a stirrer 25 with circular rotation for a duration of two hours, after which they are frozen at -200C. Preferably, the suspensions are again vigorously stirred with a vortex machine for 10 minutes, after thawing and before the sonication step.

26 It goes without saying that the nucleic acids extracted by the embodiments of the process described above for the direct extraction of nucleic acids are preferably purified according to the purification step consisting of a first passage over molecular sieves and then a subsequent 5 passage, of the elution fractions obtained after the chromatography on molecular sieves, over an anion-exchange chromatographic support. 2. Indirect extraction of nucleic acids 10 According to a second embodiment of the process for preparing a collection of nucleic acids from an environmental sample, according to the invention, the said environmental sample undergoes a first treatment which is of a nature to allow separation of the organisms, contained in this sample, from the other macro-constituents of the sample. 15 This second embodiment of the process for preparing a collection of nucleic acids according to the invention promotes the production of large nucleic acids, which are virtually impossible to obtain according to the first embodiment of the process according to the invention described above, the mechanical lysis step performed in order to obtain the microparticles 20 also having the effect of physically breaking the nucleic acids in the soil sample or the nucleic acids contained in the organisms in the soil sample. The production of large nucleic acids has been sought by the Applicant for the purpose of isolating and characterizing nucleic acids comprising, at least partially, all of the coding sequences belonging to the 25 same operon capable of directing the biosynthesis of a compound of industrial interest. Preferably, by carrying out the second embodiment of the process for preparing a collection of nucleic acids from a soil sample according to the invention, nucleic acids are obtained which are greater than 100 kb in 27 size, preferably greater than 200, 250 or 300 kb, and most preferably nucleic acids greater than 400, 500 or even 600 kb in size. This second embodiment of a process for preparing a collection of nucleic acids from an environmental sample according to the invention 5 consists of a combination of four successive steps intended to obtain nucleic acids having the characteristics described above. When the environmental sample is a soil sample, it has been shown according to the invention that a first step for obtaining a suspension by dispersing the soil sample in liquid medium promotes the 10 accessibility of the organisms contained in the sample without bringing about any significant mechanical lysis of the cells. The first step of obtaining a dispersion of the above soil sample makes the organisms in the sample accessible to the external medium and also allows a partial dissociation of the organisms in the sample and of the 15 macro-constituents. It thus makes possible a subsequent separation of the organisms initially contained in the sample from the other constituents of this sample. When the environmental sample originates, for example, from plants, from marine organisms or from insects, a pretreatment by grinding 20 is necessary in order to make the organisms of the associated microflora accessible to the subsequent steps of the process. Thus, the present process comprises a step of separating the organisms from the other inorganic and/or organic constituents obtained above by means of centrifugation on a density gradient. The organisms 25 thus separated are then subjected to a step of lysis and then of extraction of the nucleic acids. The step of centrifugation on a density gradient makes it possible, surprisingly, to separate the cells of organisms in the soil particles contained in the sample suspension. In point of fact, it might have been 28 expected that a proportion of the cells would be entrained with the macroparticles in the gradient phase. In addition, it had never been demonstrated hitherto that a centrifugation of a soil sample on a density gradient made it possible to find, at the aqueous phase/gradient interface, 5 a population of organisms representative of the diversity of the organisms present in the starting sample, due to the fact that these organisms are extremely variable in volume, density and shape. It could reasonably be assumed that they would be found either in the aqueous phase, at the aqueous phase/density gradient interface or in the density gradient itself. 10 Thus, a person skilled in the art could expect that organisms with densities less than or greater than the density of the density gradient used (density of the density gradient of between 1.2 and 1.5 g/ml, preferably 1.3 g/ml) could not be recovered, the effect of which would have been to introduce a bias into the representativeness of the organisms effectively 15 separated and, consequently, also into the diversity of the nucleic acids extracted. Also, in one specific embodiment of the process, a step of germination of spores, in particular of actinomycetes, is carried out, the effect of which is to significantly increase the amount of actinomycete DNA 20 recovered. The final step consists of a step of purifying the nucleic acids thus extracted on a caesium chloride gradient. Surprisingly, the purification of the nucleic acids on the caesium chloride gradient allows a substantial or even complete removal of the 25 substances of which the density gradient is composed. This characteristic is a determining factor as regards the subsequent use of the purified nucleic acids, since the density gradient is known as being a powerful enzymatic inhibitor, capable where appropriate of inhibiting the catalytic 29 activity of the enzymes used to prepare the insertion of extracted nucleic acids into vectors. According to this second embodiment, the process for preparing a collection of nucleic acids from an environmental sample containing 5 organisms according to the invention comprises the succession of steps below: (i) production of a suspension by dispersing the environmental sample in liquid medium and then homogenizing the suspension obtained 10 by gentle stirring; (ii) separating the organisms from the other inorganic and/or organic constituents of the homogeneous suspension obtained in step (i) by centrifugation on a density gradient; 15 (iii) lysis of the microorganisms separated in step (ii) and extraction of the nucleic acids; (iv) purification of the nucleic acids on a caesium chloride 20 gradient. Preferably, the suspension of the soil sample is obtained by dispersing this sample by grinding with the aid of a device such as a Waring Blender or a device of equivalent characteristics. In an entirely 25 preferred manner, the sample suspension is obtained after three successive grinding operations each lasting one minute in a device such as a Waring Blender. Preferably, the ground sample will be cooled in ice between each of the grinding operations.

30 Preferably, the organisms are then separated from the soil particles by centrifugation on a density cushion of the "Nycodenz" type, sold by the company Nycomed Pharma AS. (Oslo, Norway). The preferred centrifugation conditions are 10,000xg for 40 minutes at 40C, 5 advantageously in a rotor with swing-out buckets of the "rotor TST 28.38" type sold by the company Kontron. The ring of organisms located, after centrifugation, at the interphase of the upper aqueous phase and the lower Nycodenz phase is then removed and washed by centrifugation before taking up the cell pellet 10 in a suitable buffer. Step (iii) of lysis of the organisms separated out in step (ii) described above can be carried out in any manner known to those skilled in the art. Advantageously, the cells are lysed in a 10 mM Tris-100 mM 15 EDTA solution at pH 8.0 in the presence of lysozyme and achromopeptidase, advantageously for one hour at 370C. The actual extraction of the DNA can advantageously be carried out by adding a solution of lauryl sarcosyl (1% of the final weight of the solution) in the presence of proteinase K and incubation of the final 20 solution at 370C for 30 minutes. The nucleic acids extracted in step (iii) are then purified on a caesium chloride gradient. Preferably, the step of purifying the nucleic acids on a caesium chloride gradient is carried out by centrifugation at 35,000 rpm for 36 hours, for example on a rotor of the Kontron 65.13 type. 25 According to one specific aspect of the process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention, the said nucleic acids consist predominantly, if not exclusively, of DNA molecules.

31 According to another aspect, the nucleic acids can be recovered after inclusion of the organisms, separated on a density gradient, in an agarose block and lysis, for example chemical and/or enzymatic lysis, or the organisms included in the agarose block. 5 Another subject of the invention consists of a collection of nucleic acids consisting of the nucleic acids obtained in step ll-(iv) of the process for preparing a collection of nucleic acids according to the invention, or alternatively obtained in step (c) or a subsequent step of the process for preparing a collection of nucleic acids according to the invention. 10 The invention also relates to a nucleic acid which is characterized in that it is contained in a collection of nucleic acids as defined above. According to a first aspect, such a nucleic acid constituting a collection of nucleic acids according to the invention is characterized in that it comprises a nucleotide sequence encoding at least one operon, or part 15 of an operon. Most preferably, such an operon encodes all or part of a metabolic pathway. Example 9 describes the construction of a genomic DNA library from a strain of Streptomyces alboniger and its cloning into the shuttle 20 cosmids pOS7001 and pOS700R, respectively. It has been shown according to the invention that, in the DNA library prepared in the integrative vector pOS7001, new clones contain nucleotide sequences belonging to the operon responsible for the puromyocin biosynthetic pathway. Similarly, twelve clones containing nucleotide sequences of the 25 operon responsible for the puromycin biosynthetic pathway have been identified in the DNA library prepared in the replicative vector pOS 700R. In particular, certain integrative and replicative cosmids of the libraries produced have, after digestion with the restriction endonucleases Clal and EcoRV, a 12-kb fragment capable of containing all of the 32 sequences of the operon responsible for the puromycin biosynthetic pathway. Thus, according to another aspect, a nucleic acid according to the invention contains, at least partially, nucleotide sequences of the operon 5 responsible for the puromycin biosynthetic pathway. Example 2 below describes the construction of a DNA library according to a process in accordance with the present invention, in a pBluescript SK~ vector starting with a soil contaminated with lindane. The recombinant vectors were transfected into Escherichia coli 10 DH10B cells and the transformed cells were then cultured in a suitable culture medium in the presence of lindane. Screening of the clones on transformed cells of the library made it possible to show that, out of 10,000 screened clones, 35 of them had a lindane degradation phenotype. The presence of the linA gene in these clones was confirmed by PCT 15 amplification by means of primers specific for this gene. Thus, according to another aspect, the invention also relates to a nucleic acid containing a nucleotide sequence for the metabolic pathway which brings about the biodegradation of lindane. It is thus clearly demonstrated, as described above, that a 20 process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention and a process for preparing a collection of recombinant vectors containing the constituent nucleic acids of the collection of abovementioned nucleic acids was entirely suitable for the isolation and characterization of nucleotide sequences 25 included in an operon. An additional demonstration of the ability of a process according to the invention to identify coding nucleotide sequences involved in a biosynthetic pathway regulated in the form of an operon is also described later: this concerns the cloning and characterization of sequences 33 encoding polyketide synthases involved in the pathway for the biosynthesis of polyketides, which belong to a family of molecules certain representatives of which are of major therapeutic interest, in particular antibiotic interest. 5 A subject of the present invention is thus also a constituent nucleic acid of a collection of nucleic acids according to the invention, characterized in that it comprises all of a nucleotide sequence encoding a polypeptide. According to a first aspect, a constituent nucleic acid of a 10 collection of nucleic acids according to the invention is of prokaryotic origin. According to a second aspect, a constituent nucleic acid of a collection of nucleic acids according to the invention originates from a bacterium or from a virus. According to a third aspect, a constituent nucleic acid of a 15 collection of nucleic acids according to the invention is of eukaryotic origin. In particular, such a nucleic acid is characterized in that it originates from a fungus, a yeast, a plant or an animal. MOLECULAR CHARACTERIZATION OF THE COLLECTION OF 20 NUCLEIC ACIDS EXTRACTED FROM THE SOIL. In order to overcome the various technical drawbacks of the methods for characterizing libraries of DNA extracted and purified from an environmental sample which have been described in the section of the 25 description relating to the prior art, the Applicant has developed a simple and reliable process for qualitatively and semi-quantitatively characterizing the nucleic acids obtained from the process described above. The process according to the invention thus consists in universally amplifying a 700 bp fragment located inside a sequence of ribosomal DNA 34 of 16S type, and then in hybridizing the amplified DNA with an oligonucleotide probe of variable specificity and finally in comparing the hybridization intensity of the sample relative to an external calibration range of DNA of known sequence or origin. 5 The amplification prior to the hybridization with the oligonucleotide probe makes it possible to quantify relatively scarce microorganism genera or species. Furthermore, the amplification with universal primers makes it possible, during the hybridization, to use a broad series of oligonucleotide probes. 10 Thus, a subject of the invention is also a process for determining the diversity of nucleic acids contained in a collection of nucleic acids, and most particularly of a collection of nucleic acids originating from an environmental sample, preferably from a soil sample, the said process comprising the following steps: 15 - placing the nucleic acids of the collection of nucleic acids to be tested in contact with a pair of oligonucleotide primers hybridizing at any sequence of bacterial 16S ribosomal DNA; - carrying out at least three amplification cycles; - detection of the amplified nucleic acids using an oligonucleotide 20 probe or a plurality of oligonucleotide probes, each probe hybridizing specifically with a 16S ribosomal DNA sequence common to a bacterial kingdom, order, subclass or genus; - where appropriate, comparison of the results from the preceding detection step with the detection results, using the probe or the plurality of 25 probes of nucleic acids of known sequence constituting a calibration range. Preferably, a first pair of primers hybridizing with universally conserved regions of the gene for the 16S ribosomal RNA consists, respectively, of the primers FGPS 612 (SEQ ID No 12) and FGPS 669 (SEQ ID No 13).

35 A second embodiment of a preferred pair of primers according to the invention consists of the pair of universal primers 63 f (SEQ ID No 22) and 1387 r (SEQ ID No 23). According to one specific embodiment of a process for 5 determining the diversity of nucleic acids in a collection of nucleic acids, the amplification step using a pair of universal primers can be carried out on a collection of recombinant vectors into each of which has been inserted a nucleic acid from the collection of nucleic acids under consideration, prior to the step of hybridization with the oligonucleotide 10 probes specific for a particular bacterial kingdom, order, subclass or genus. Such a process for determining the diversity of the nucleic acids contained in a collection is most particularly applicable to the collections of nucleic acids obtained in accordance with the teaching of the present description. 15 Thus, Example 3 details a process for preparing a collection of nucleic acids from a soil sample containing organisms, comprising a step of indirect extraction of DNA by dispersion of a soil sample prior to the separation of the cells on a Nycodenz gradient, lysis of the cells and then purification of the DNA on a caesium chloride gradient. 20 The collection of nucleic acids thus obtained was used as obtained or in the form of inserts into vectors of cosmid type in an amplification process using the abovementioned universal primers for 16S rDNA, and the amplified DNA was then subjected to a step of detection using oligonucleotide probes of sequences SEQ ID No 14 to SEQ ID No 21 25 which are presented in Table 4. The results show that a process for preparing a collection of nucleic acids starting with a soil sample containing organisms according to the invention makes it possible to gain access to the DNA of more than 14% of the total telluric microflora, i.e. 2 x 108 cells per gram of soil, 36 whereas the total microflora which can be cultured represents barely 2% of the total microbial population. In order to determine the phylogenetic diversity of a collection of nucleic acids prepared in accordance with the invention, 47 sequences of 5 the 16S rRNA gene were isolated and sequenced. These sequences correspond, respectively, to the nucleotide sequences SEQ ID No 60 to SEQ ID No 106. The nucleic acids comprising the sequences SEQ ID No 60 to SEQ ID No 106 also form part of the invention, as do nucleic acids 10 possessing at least 99%, preferably 99.5% or 99.8%, nucleic acid identity with the nucleic acids comprising the sequences SEQ ID No 60 to SEQ ID No 106. Such sequences can be used in particular as probes for screening clones of a DNA library and for thus identifying those, among the clones of the library, which contain such sequences, these sequences being liable to 15 be close to coding sequences of interest, such as sequences encoding enzymes involved in the biosynthetic pathway of antibiotic metabolites, for example polyketides. Comparison of the sequences of 16S rRNA from a DNA library prepared in accordance with the invention, with the sequences listed in the 20 RDP database (Maidak B.L., Cole J.R., Parker C.T., Garrity G.M., Larsen N., Li B., Lilburn T.G., McCaughey, M.J., Olsen G.J., Overbeek R., Pramanik S., Schmidt T.M., Tiedje J.M., Woese C.R. (1999) "A new project of the RDP (Ribosomal Database Project)" Nucleic Acids Research Vol. 27: 171-173) made it possible to determine that the nucleic acids contained 25 in a collection of nucleic acids according to the invention originate from ax-proteobacteria, from p-proteobacteria, from 8-proteobacteria, from y-proteobacteria, from actinomycetes and from a genus related to acidobacterium. These results, presented in Table 7 and in the phylogenetic tree in Figure 7, take account of the huge phylogenetic 37 diversity of the nucleic acids contained in a DNA library prepared in accordance with the process according to the invention. CLONING AND/OR EXPRESSION VECTORS 5 Each of the nucleic acids contained in a collection of nucleic acids prepared in accordance with the invention can be inserted into a cloning and/or expression vector. For this purpose, any type of vector known in the prior art can be 10 used, such as viral vectors, phages, plasmids, phagemids, cosmids, phosmids, vectors of BAC type, P1 bacteriophages, vectors of BAC type, vectors of YAC type, yeast plasmids or any other vector known in the prior art to a person skilled in the art. Use will advantageously be made according to the invention of 15 vectors which allow a stable expression of the nucleic acids of a DNA library. To this end, such vectors preferentially include transcription regulation sequences which are operably linked with the genomic insert so as to allow the initiation and/or regulation of the expression of at least a portion of the said DNA insert. 20 It results from the text hereinabove that the invention also relates to a process for preparing a collection of recombinant vectors, characterized in that the nucleic acids obtained in step ll-(iv) or in step l-(c) or any other subsequent step of a process for preparing a collection of nucleic acids from a soil sample containing organisms according to the 25 invention are inserted into a cloning and/or expression vector. Prior to their insertion into a cloning and/or expression vector, the constituent nucleic acids of a collection of nucleic acids according to the invention can be separated as a function of their size, for example by 38 electrophoresis on an agarose gel, where appropriate after digestion with a restriction endonuclease. According to another aspect, the average size of the constituent nucleic acids of a collection of nucleic acids according to the invention can 5 be rendered into a substantially uniform size by carrying out a step of physical rupture prior to their insertion into the cloning and/or expression vector. Such a step of physical or mechanical rupture of nucleic acids can consist of successive passages of these nucleic acids, in solution, in a 10 metal channel about 0.4 mm in diameter, for example the channel of a syringe needle having such a diameter. The average size of the nucleic acids can be, in this case, between 30 and 40 kb in length. The construction of the vectors that are preferred according to the 15 invention is represented schematically in Figures 25 (conjugative integrative cosmid) and 26 (integrative BAC). Cloning and/or expression vectors which can be used advantageously for the purposes of inserting nucleic acids contained in a DNA library or collection according to the invention are, in particular, the 20 vectors described in European patent No EP 0 350 341 and in US patent No 5 688 689, such vectors being especially suitable for the transformation of actinomycete strains. Such vectors contain, besides an insert DNA sequence, an attachment sequence att and a DNA sequence encoding an integrase (int sequence) which is functional in actinomycete strains. 25 However, it has been observed according to the invention that certain cloning and/or expression vectors had drawbacks and that their theoretical functional capacity was not achieved in practice. Thus, it was seen that the integration system contained in vectors of the prior art, and in particular in the vectors described in European 39 patent No EP 0 350 341, do not in reality allow good integration of the DNA insert from the library into the bacterial chromosome. Starting from the hypothesis that the functional defects in the integration of such vectors into the bacterial chromosome were due to a 5 defect in the expression of the integrase gene present in these vectors, the Applicant first attempted to increase the expression of the integrase gene by replacing the initial transcription promoter with a transcription promoter capable of significantly increasing the number of integrase transcripts. The results were disappointing and the function of integration of 10 these vectors into the chromosome was not improved. Surprisingly, it has been shown according to the invention that the integrase expression difficulties contained in this family of integrative vectors did not lie in the amount of transcript expression, but in the stability of the transcripts. 15 According to a second hypothesis, the Applicant was able to show that the stability defect of the integrase transcripts was caused by defects in termination of the transcription of the corresponding messenger RNA. The Applicant thus inserted a stop site placed downstream of the sequence encoding the integrase of the vector so as to obtain a 20 messenger RNA of given size. The insertion of an additional termination signal downstream of the nucleotide sequence encoding the integrase of the vector made it possible to obtain a family of integrative vectors of cosmid type and of BAC type. Preferentially, the stop site is placed downstream of the 25 attachment site att. In addition, the Applicant has developed novel conjugative vectors and novel replicative vectors of cosmid type and novel conjugative vectors of BAC type which can be used advantageously to insert constituent 40 nucleic acids of a collection of nucleic acids prepared according to the process of the invention. When the insertion of DNA fragments of average size is desired, vectors of the cosmid type, capable of receiving inserts having a maximum 5 size of about 50 kb, are preferably used. Such cosmid vectors are most particularly suitable for inserting constituent nucleic acids of a collection of nucleic acids obtained according to the process of the invention comprising a first step of direct DNA extraction by mechanical lysis of the organisms contained in the initial soil 10 sample. When the insertion of large nucleic acids, in particular of nucleic acids greater than 100 kb in size, or even greater than 200, 300, 400, 500 or 600 kb, is desired, use will then preferentially be made of vectors of the BAC type which are capable of receiving DNA inserts of such a size. 15 Such vectors of BAC type are most particularly suitable for inserting constituent nucleic acids of a collection of nucleic acids obtained in accordance with the process according to the invention, in which the first step consists of an indirect extraction of the DNA by prior separation of the organisms contained in the initial soil sample and removal of the macro 20 constituents from the said soil sample. In particular, vectors of the BAC type are advantageously used to insert large nucleic acids containing, at least partially, the nucleotide sequence of an operon. Thus, the process for preparing a collection of recombinant 25 cloning and/or expression vectors according to the invention is also characterized in that the cloning and/or expression vector is of the plasmid type. According to another aspect, such a process is characterized in that the cloning and/or expression vector is of the cosmid type.

41 According to a first aspect, it can be a cosmid which is replicative in E. coli and integrative in Streptomyces. An entirely preferred cosmid corresponding to such a definition is the cosmid pOS7001 described in Example 3. 5 According to yet another aspect, the cosmid vector is conjugative and integrative in Streptomyces. In general, conjugative vectors of cosmid type or of BAC type, which comprise in their nucleotide sequences a unit recognized by the cellular enzymatic machinery known as a "conjugation origin", are used 10 whenever it is desired to avoid resorting to laborious transformation techniques that are difficult to automate. For example, the transfection of vectors initially harboured by E. coli cells into Streptomyces cells conventionally requires a step of recovering the recombinant vector contained in the Escherichia coli cells, 15 and purifying it prior to the step of transforming Streptomyces protoplasts. It is commonly accepted that a transfection of an assembly of 1000 Escherichia coli clones into Streptomyces requires the production of about 8000 clones in order for each E. coli clone to have a chance of being represented. 20 Conversely, a step of transfection by conjugating a vector harboured by E. coli into Streptomyces cells requires the same number of clones of each of the microorganisms, the conjugation step taking place "clone to clone" and moreover not comprising the technical difficulties associated with the step for transferring genetic material by transformation 25 of protoplasts, for example in the presence of polyethylene glycol. In order to optimize the construction of a DNA library in Streptomyces, novel conjugative vectors of cosmid type and of BAC type which are of a nature to allow maximum efficacy of the conjugation step have been developed according to the invention.

42 In particular, the novel conjugative vectors according to the invention have been constructed by placing a selection marker gene at the end of the DNA of the vector which is transferred into the recipient bacterium at the end. This improvement to the conjugative vectors of the 5 prior art makes it possible to positively select only the recipient bacteria which have received all of the vector DNA and, consequently, all of the insert DNA of interest. Cosmids which are conjugative and integrative in Streptomyces and which are preferred according to the invention are the cosmids 10 pOSV303, pOSV306 and pOSV307 described in Example 5. According to another aspect, a process for preparing a collection of recombinant vectors according to the invention is carried out using a cosmid which is replicative both in E. coi and in Streptomyces. Such a cosmid is advantageously the cosmid pOS700R described in Example 6. 15 According to yet another aspect, the above process can be carried out with a cosmid which is replicative in E. coi and Streptomyces and conjugative in Streptomyces. Such a replicative and conjugative cosmid can be obtained from a replicative cosmid in accordance with the invention, by inserting a suitable 20 transfer origin, such as RK2, as described in Example 5 for the construction of the vector pOSV303. According to another advantageous embodiment of the process for preparing a collection of recombinant vectors according to the invention, use is made of a cloning and/or expression vector of BAC type. 25 According to a first aspect, the vector of the BAC type is integrative and conjugative in Streptomyces. In an entirely preferred manner, such a BAC vector which is integrative and conjugative in Streptomyces is the vector BAC pOSV403 43 described in Example 8 or else the vectors BAC pMBD-1, pMBD-2, pMBD 3, pMBD-4, pMBD-5 and pMBD-6 described in Example 15. A subject of the invention is also a recombinant vector, characterized in that it is chosen from the following recombinant vectors: 5 a) a vector comprising a constituent nucleic acid of a collection of nucleic acids according to the invention; b) a vector as obtained according to a process which avoids any involvement of the action of a restriction endonuclease on the DNA fragment to be inserted, as described previously. 10 In an entirely preferable manner, the invention also relates to a vector chosen from the following vectors: - the cosmid pOS7001; - the cosmid pOSV303; - the cosmid pOSV306; 15 - the cosmid pOSV307; - the cosmid pOS700R; - the vector BAC pOSV403; - the vector BAC pMBD-1; - the vector BAC pMBD-2; 20 - the vector BAC pMBD-3; - the vector BAC pMBD-4; - the vector BAC pMBD-5; - the vector BAC pMBD-6. The invention also relates to a collection of recombinant vectors 25 as obtained according to any one of the processes according to the invention. Process for preparing a recombinant cloning and/or expression vector according to the invention.

44 The conventional techniques for inserting DNA into a vector in order to prepare a recombinant cloning and/or expression vector conventionally involve a first step in which a restriction endonuclease is incubated both with the DNA to be inserted and with the recipient vector, 5 thus creating compatible ends between the DNA to be inserted and the vector DNA, allowing the assembly of the two DNAs before a final ligation step allowing the production of the recombinant vector. However, such a conventional technique has notable drawbacks, most particularly when it is desired to insert large nucleic acids into a 10 cloning and/or expression vector. Specifically, the prior action of a restriction enzyme on the DNA fragments intended to be inserted into a vector is liable to appreciably reduce the size of this DNA prior to its insertion into the vector. It goes without saying that a significant reduction in the size of the DNA prior to its 15 insertion into a vector is a situation that is particularly unfavourable when it is desired to clone large fragments of DNA liable to contain all of the coding sequences and, where appropriate, also the regulatory sequences, of an operon whose expression constitutes a complete biosynthetic pathway of a metabolite of industrial interest, and most particularly of a 20 compound of therapeutic interest. To overcome the drawbacks of the prior art, two processes have been developed according to the invention, for preparing a recombinant cloning and/or expression vector which do not use a restriction endonuclease on the DNA to be inserted prior to its introduction into the 25 vector. Such processes are consequently entirely suitable for cloning long DNA fragments liable to contain, at least partially, all of the coding sequences and, where appropriate, also the regulatory sequences, of a complete operon responsible for a biosynthetic pathway.

45 According to a first aspect, one process for preparing a recombinant cloning and/or expression vector according to the invention is characterized in that the insertion of a nucleic acid into the cloning and/or expression vector comprises the following steps: 5 - opening the cloning and/or expression vector at a chosen cloning site, using a suitable restriction endonuclease; - adding a first homopolymeric nucleic acid at the free 3' end of 10 the open vector; - adding a second homopolymeric nucleic acid, whose sequence is complementary to the first homopolymeric nucleic acid, at the free 3' end of the nucleic acid to be inserted into the vector; 15 - assembling the nucleic acid of the vector and the nucleic acid by hybridizing the first and second homopolymeric nucleic acids of mutually complementary sequence; 20 - closing the vector by ligation. Such a process is described in Examples 10 and 13 below. Advantageously, the above process can comprise the following characteristics, separately or in combination: 25 - the first homopolymeric nucleic acid is of poly(A) or poly(T) sequence; 46 - the second homopolymeric nucleic acid is of poly(T) or poly(A) sequence. In an entirely preferred manner, the homopolymeric nucleic acids 5 have a length of between 25 and 100 nucleotide bases, preferably between 25 and 70 nucleotide bases. The process for preparing a recombinant cloning and/or expression vector described above is particularly suitable for the construction of DNA libraries in vectors of BAC type. Thus, according to 10 one advantageous embodiment of the process for preparing a recombinant vector described above, the said process is also characterized in that the size of the nucleic acid to be inserted is at least 100 kb and preferably at least 200, 300, 400, 500 or 600 kb. Such a preparation process is thus particularly suited to the 15 insertion of nucleic acids contained in a collection of nucleic acids obtained according to the process of the invention. In order to allow the insertion of large DNA fragments into cloning and/or expression vectors, a second process has been developed according to the invention, which makes it possible to dispense with any 20 use of a restriction endonuclease on the DNA intended to be inserted into the vector. Such a process for preparing a recombinant cloning and/or expression vector according to the invention is characterized in that the step of inserting a nucleic acid into the said cloning and/or expression 25 vector comprises the following steps: - creation of blunt ends on the ends of the nucleic acid of the collection by removing the protruding 3' sequences and filling in the protruding 5' sequences; 47 - opening the cloning and/or expression vector at a chosen cloning site using a suitable restriction endonuclease; - adding complementary oligonucleotide adapters; 5 - creation of blunt ends at the ends of the vector nucleic acid by removing the protruding 3' sequences and filling in the protruding 5' sequences, then dephosphorylating the 5' ends in order to prevent a recircularization of the vector; 10 - inserting the nucleic acid of the collection into the vector by ligation. Preferably, the removal of the protruding 3' sequences is carried 15 out using an exonuclease, such as the Klenow enzyme. Preferably, the filling in of the protruding 5' sequences is carried out using a polymerase, and most preferably T4 polymerase, in the presence of the four nucleotide triphosphates. A process for preparing a recombinant cloning and/or expression 20 vector by removing the protruding 3' sequences and filling in the protruding 5' sequences as described above is particularly suitable for the construction of DNA libraries from vectors of cosmid type. Such a process for obtaining recombinant vectors is described in Example 12. 25 In one specific method for preparing a recombinant vector according to the invention, oligonucleotides comprising one or more rare restriction sites are added to the vector in the cloning site of the DNA to be inserted, in accordance with the teaching of Example 10. This addition of 48 oligonucleotides facilitates the subsequent recovery of the inserts without cleavage thereof. HOST CELLS 5 Although any type of host cell can be used for the transfection or transformation with a nucleic acid or a recombinant vector according to the invention, in particular a prokaryotic or eukaryotic host cell, host cells whose physiological, biochemical and genetic properties are well 10 characterized, which can be cultured easily on a large scale and whose culturing conditions for the production of metabolites are well known will preferably be used. Preferably, the host cell receiving a nucleic acid or a recombinant vector according to the invention is phylogenetically close to the donor 15 organisms initially contained in the environmental sample from which the nucleic acids originate. In a most preferred manner, a host cell according to the invention should have a similar, or at least close, codon usage in the donor organisms initially present in the environmental sample, most particularly in 20 the soil sample. The size of the DNA fragments liable to carry the desired nucleotide sequences of interest can be variable. Thus, enzymes encoded by genes with an average size of 1 kb may be expressed using inserts of small size, while the expression of secondary metabolites will require the 25 maintenance in the host organism of much larger fragments, for example from 40 kb to more than 100 kb, 200 kb, 300 kb, 400 kb or 600 kb. Thus, the host cells of Escherichia coli constitute a preferred choice for cloning large DNA fragments.

49 In a most preferred manner, use will be made of the Escherichia coli strain known as DH10B and described by Shizuya et al. (1992), for which protocols for cloning into BAC vectors have been optimized. However, other strains of Escherichia coli can be used 5 advantageously to construct a DNA library according to the invention, such as the strains E.coli Sure, E.coli DH5 c, or E.coli 294 (ATCC No. 31446). In addition, the construction of a DNA library by transfecting E. coli cells with recombinant vectors according to the invention is also possible, the expression of genes of various prokaryotes such as Bacillus, 10 Thermotoga, Corynebacterium, Lactobacillus or Clostridium having been described in PCT patent application No WO 99/20799. In general, E. coli host cells can in all cases constitute transient hosts in which recombinant vectors according to the invention may be 15 maintained highly effectively, it being possible for the genetic material to be handled easily and archived stably. For the purposes of expressing the widest possible molecular diversity, other host cells may also advantageously be used, such as Bacillus, Pseudomonas, Streptomyces, Myxococcus, Aspergillus nidulans 20 or Neurospora crassa cells. It has also been shown according to the present invention that Streptomyces lividans cells can be used successfully and constitute expression systems complementary to Escherichia coli. Streptomyces lividans constitutes a model for studying the 25 genetics of Streptomyces and has also been used as a host for the heterologous expression of many secondary metabolites. Streptomyces lividans has, in common with other actinomycetes such as Streptomyces coelicolor, Streptomyces griseus, Streptomyces fradiae and Streptomyces griseochromogenes, the precursor molecules and the regulatory systems 50 required for the expression of all or part of complex biosynthetic pathways, such as, for example, the polyketide biosynthetic pathway or the pathway for the biosynthesis of non-ribosomal polypeptides representing classes of molecules of very diverse structure. 5 Streptomyces lividans also has the advantage of accepting foreign DNA with high transformation efficacies. Thus, the invention also relates to a recombinant host cell comprising a nucleic acid according to the invention, which is a constituent of a collection of nucleic acids prepared according to a process in 10 accordance with the invention, or alternatively a recombinant host cell comprising a recombinant vector as defined above. According to a first aspect, it may be a recombinant host cell of prokaryotic or eukaryotic origin. Advantageously, a recombinant cell according to the invention is a 15 bacterium, and most preferably a bacterium chosen from E. coli and Streptomyces. According to another aspect, a recombinant host cell according to the invention is characterized in that it is a yeast or a filamentous fungus. The invention also relates to a collection of recombinant host 20 cells, each of the constituent host cells of the collection comprising a nucleic acid originating from a collection of nucleic acids prepared in accordance with a process for preparing a collection of nucleic acids from a soil sample containing organisms as described above. The invention also relates to a collection of recombinant host 25 cells, each of the constituent host cells of the collection comprising a recombinant vector according to the invention. On account of the large size of the inserts, it is necessary to have maximum transformation efficacy. With this aim, a recipient strain of Streptomyces lividans constitutively expressing the pSAM2 integrase in 51 order to promote the site-specific integration of the vector is preferred. For this, the int gene under the control of a strong promoter is integrated into the chromosome. The overproduction of integrase does not induce any excision phenomena (Raynal et al., 1998). 5 The production of a novel metabolite from the insert might be toxic for Streptomyces if the insert does not contain genes for resistance to the antibiotic produced or if this gene is not expressed or only expressed to a small extent. The capacity of the various genes for allowing Streptomyces ambofaciens to resist the antibiotic that it produces has been 10 studied (Gourmelen et al., 1998; Pernodet et al., 1999). Some of these genes encode transporters of ABC type which are liable to impart a broad spectrum of resistance. These genes can be introduced into and overexpressed in the Streptomyces lividans host strain. Conversely, a strain that is hypersensitive to antibiotics can be 15 used (Pernodet et al., 1996) in order to detect the presence of resistance genes in the library. Specifically, in antibiotic-producing microorganisms, these resistance genes are often associated with the genes for the biosynthetic pathway of the antibiotic. The selection of resistance clones can make it possible to carry out a first sorting easily before the more 20 complex tests for detecting a novel metabolite produced by the clone. ISOLATION AND CHARACTERIZATION OF NOVEL NUCLEOTIDE SEQUENCES ENCODING POLYKETIDE SYNTHASES. 25 According to the invention, a collection of recombinant host cells was obtained after transfecting host cells with a collection of recombinant vectors each containing a nucleic acid insert originating from a collection of nucleic acids prepared in accordance with the process according to the invention.

52 More specifically, the DNA fragments obtained according to the process of the invention, in which a step of indirect extraction of DNA from the organisms contained in the soil sample is carried out, were first cloned into the integrative cosmid pOS7001. 5 The step of inserting DNA fragments into the integrative cosmid pOS7001 was carried out according to the process of the invention in which homopolymeric polynucleotide tails poly(A) and poly(T) were added to the 3' end of the vector nucleic acid and of the DNA fragments to be inserted, respectively. 10 The recombinant vectors thus constructed were encapsidated in lambda phage heads and the phages obtained were used to infect E. coli cells according to techniques that are well known to those skilled in the art. A library of about 5000 Escherichia coliclones was obtained. This library of clones was screened with pairs of primers specific 15 for a nucleotide sequence encoding an enzyme involved in the polyketide biosynthetic pathway, the type I PKS enzyme, also known as p-ketoacyl synthase. It is recalled here that polyketides constitute a chemical category of wide structural diversity comprising a large number of molecules of 20 pharmaceutical interest such as tylosin, monensin, vermectin, erythromycin, doxorubicin or FK506. Polyketides are synthesized by condensation of acetate molecules under the action of enzymes known as polyketide synthases (PKSs). Two types of polyketide synthase exist. The type Il polyketide 25 synthases are generally involved in the synthesis of polycyclic aromatic antibiotics and catalyze the iterative condensation of acetate units. The type I polyketide synthases are involved in the synthesis of macrocyclic or macrolide polyketides and constitute modular multifunctional enzymes.

53 Given their therapeutic interest, there is a need in the state of the art to isolate and characterize novel polyketide synthases which can be used for the production of novel pharmaceutical compounds, in particular novel pharmaceutical compounds with antibiotic activity. 5 The screening of the library of recombinant clones described above using PCR primers which selectively amplify nucleotide sequences encoding type I polyketide synthases has made it possible to identify recombinant clones containing DNA inserts comprising a nucleotide sequence encoding novel polyketide synthases. The nucleotide sequences 10 encoding these novel polyketide synthases are referenced as the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120. Another subject of the invention consists of a nucleic acid encoding a novel polyketide synthase I, characterized in that it comprises 15 one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120. Preferably, such a nucleic acid is in isolated and/or purified form. The invention also relates to a recombinant vector comprising a polynucleotide comprising one of the sequences SEQ ID No 34 to SEQ ID 20 No 44 and SEQ ID No. 115 to SEQ ID No. 120. The invention also relates to a recombinant host cell comprising a nucleic acid chosen from polynucleotides comprising one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120 as well as to a recombinant host cell comprising a recombinant 25 vector into which is inserted a polynucleotide comprising one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120.

54 Advantageously, the recombinant vectors containing a DNA insert encoding a novel type I polyketide synthase according to the invention are cloning and expression vectors. Preferably, a recombinant host cell as described above is a 5 bacterium, a yeast or a filamentous fungus. The amino acid sequences of novel polyketide synthases originating from organisms contained in a soil sample were deduced from the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120 above. They are polypeptides comprising one of 10 the amino acid sequences SEQ ID No 48 to SEQ ID No 59 and SEQ ID No. 121 to 126. The invention also relates to novel polyketide synthases comprising an amino acid sequence chosen from the sequences SEQ ID No 48 to SEQ ID No 59 and SEQ ID No. 121 to SEQ ID No. 126. 15 The nucleotide sequence SEQ ID No. 114 which comprises six open reading frames respectively encoding the polypeptides of sequences SEQ ID No. 121 to SEQ ID No. 126 also forms part of the invention. The nucleotide sequence SEQ ID No. 113 of the a26G1 cosmid, which contains the sequence complementary to the sequence SEQ ID 20 No. 114 also forms part of the invention. Genomic DNA originating from pure bacterial strains, such as Streptomyces coelicolor (ATCC No. 101.478), Streptomyces ambofaciens (NRRL No. 2.420), Streptomyces lactamandurans (ATCC No. 27.382), Streptomyces rimosus (ATCC No. 109.610), Bacillus subtilis 25 (ATCC No. 6633) or Bacillus lichenifornis and Saccharopolyspora erythrea, was also extracted and amplified according to the invention. A PCR amplification of DNA from each of the bacterial strains described above was carried out using pairs of primers specific for the nucleic acid sequences of type I polyketide synthase.

55 Novel bacterial type I polyketide synthase genes were thus able to be isolated and characterized. These are the nucleic acid sequences SEQ ID No 30 to SEQ ID No 32. A subject of the invention is also, therefore, nucleotide sequences 5 encoding novel type I polyketide synthases chosen from the polynucleotides comprising one of the nucleotide sequences SEQ ID No 30 to SEQ ID No 32. Recombinant vectors comprising the nucleotide sequences encoding novel type I polyketide synthases defined above also form part of 10 the invention. The invention also relates to recombinant host cells, characterized in that they contain a nucleic acid encoding a novel type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 30 to SEQ ID No 32 and recombinant host cells comprising a 15 recombinant vector as defined above. A subject of the invention is also polypeptides encoded by sequences comprising the nucleic acids SEQ ID No 30 to 32, and more specifically polypeptides comprising the amino acid sequences SEQ ID No 47 to SEQ ID No 50. 20 A subject of the invention is also a process for producing a type I polyketide synthase according to the invention, the said production process comprising the following steps: - production of a recombinant host cell comprising a nucleic acid 25 encoding a type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 and SEQ ID No. 115 to SEQ ID No. 120; 56 - culturing of the recombinant host cells in a suitable culture medium; - recovery and, where appropriate, purification of the type I 5 polyketide synthase from the culture supernatant or from the cell lysate. The novel type I polyketide synthases obtained according to the process described above can be characterized by binding to an immuno affinity chromatography column onto which antibodies recognizing these 10 polyketide synthases have been pre-immobilized. The type I polyketide synthases according to the invention, and more particularly the recombinant polyketide synthases described above, can also be purified by high performance liquid chromatography (HPLC) techniques such as, for example, reverse-phase chromatography 15 techniques or anion-exchange or cation-exchange chromatography techniques, that are well known to those skilled in the art. The recombinant or non-recombinant polyketide synthases according to the invention can be used for the preparation of antibodies. According to another aspect, a subject of the invention is also an 20 antibody which specifically recognizes a type I polyketide synthase according to the invention or a peptide fragment of such a polyketide synthase. The antibodies according to the invention may be monoclonal or polyclonal. The monoclonal antibodies can be prepared from hybridoma 25 cells according to the technique described by Kohler and Milstein C. (1975), Nature, Vol. 256:495. The polyclonal antibodies can be prepared by immunizing a mammal, in particular mice, rats or rabbits, with a type I polyketide synthase according to the invention, where appropriate in the presence of 57 an immunity-adjuvant compound, such as complete Freund's adjuvant, incomplete Freund's adjuvant, aluminium hydroxide or a compound from the muramyl peptide family. For the purposes of the present invention, antibody fragments 5 such as the Fab, Fab', F(ab') 2 , or single-chain antibody fragments containing the variable portion (ScFv) described by Martineau et al. (1998) J. Mol. Biol., Vol. 280 (1):117-127 or in US patent 4 946 778, and the humanized antibodies described by Reinmann KA et al. (1997), AIDS Res. Hum. Retroviruses, Vol. 13(11):933-943 or by Leger O.J et al. (1997), 10 Hum. Antibodies, Vol. 8 (1): 3-16, also constitute "antibodies". The antibody preparations according to the invention are useful in particular in qualitative or quantitative immunological tests intended either simply to detect the presence of a type I polyketide synthase according to the invention or to quantify the amount of this polyketide synthase, for 15 example in the culture supernatant or the cell lysate of a bacterial strain capable of producing such an enzyme. Another subject of the invention consists of a process for detecting a type I polyketide synthase according to the invention or a peptide fragment of this enzyme, in a sample, the said process comprising 20 the steps of: a) placing an antibody according to the invention in contact with the sample to be tested; 25 b) detecting the antigen/antibody complex possibly formed. The invention also relates to a kit or equipment for detecting a type I polyketide synthase according to the invention in a sample, comprising: 58 a) an antibody according to the invention; b) where appropriate, reagents required for detecting the antigen/antibody complex possibly formed. 5 An antibody directed against a type I polyketide synthase according to the invention can be labelled using an isotopic or non-isotopic detectable label, according to processes that are well known to those skilled in the art. Screening of a DNA library according to the invention using a pair 10 of primers which hybridize with target sequences whose presence is desired, such as sequences of the puromycin biosynthetic pathway, sequences of the linA gene involved in the biodegradation of lindane or sequences encoding type I polyketide synthases, have been detailed hereinabove. 15 A subject of the invention is thus a process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: 20 - placing the collection of recombinant host cells in contact with a pair of primers which hybridize with the given nucleotide sequence or which hybridize with the nucleotide sequence that is structurally similar to a given nucleotide sequence; - carrying out at least three amplification cycles; 25 - detecting any nucleic acid amplified. For the amplification conditions that are appropriate as a function of the desired target sequences, a person skilled in the art may advantageously refer to the examples below.

59 According to another aspect, the invention also relates to a process for detecting a nucleic acid, given nucleotide sequences or nucleotide sequences that are structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to the 5 invention, characterized in that it comprises the following steps: - placing the collection of recombinant host cells in contact with a probe which hybridizes with the given nucleotide sequence or which hybridizes with a nucleotide sequence that is structurally similar to the given nucleotide sequence; 10 - detecting the hybrid possibly formed between the probe and the nucleic acids included in the vectors of the collection. To carry out the screening of a DNA library according to the invention in order to detect the presence of a nucleotide sequence 15 encoding a polypeptide capable of degrading lindane, the recombinant clones of interest were detected on the basis of their phenotype corresponding to their capacity to degrade lindane. With this aim, the clones isolated and/or sets of clones of the DNA library prepared were cultured in a culture medium in the presence of lindane and the lindane 20 degradation was observed by the formation of a cloudy halo in the immediate environment of the cells. The invention also relates to a process for identifying the production of a compound of interest by one or more recombinant host cells in a collection of recombinant host cells according to the invention, 25 characterized in that it comprises the following steps: - culturing the recombinant host cells of the collection in a suitable culture medium; - detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant cells cultured.

60 A subject of the invention is also a process for selecting a recombinant host cell which produces a compound of interest in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: 5 - culturing recombinant host cells of the collection in a suitable culture medium; - detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured; - selecting recombinant host cells which produce the compound of 10 interest. The invention also relates to a process for producing a compound of interest, characterized in that it comprises the following steps: - culturing a recombinant host cell selected according to the process described above; 15 - recovering and, where appropriate, purifying the compound produced by the said recombinant host cell. The invention also relates to a compound of interest, characterized in that it is obtained according to the process described above. 20 A compound of interest according to the invention can consist of a polyketide produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from the sequences SEQ ID No 33 to 44 and SEQ ID No 30 to 32 and SEQ ID No. 115 to SEQ ID No. 120. 25 The invention also relates to a composition comprising a polyketide produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 and SEQ ID No. 115 to SEQ ID No. 120.

61 A polyketide produced by means of expressing at least one nucleotide sequence above is preferentially the product of the activity of several coding sequences included in a functional operon whose translation products are the various enzymes required for the synthesis of 5 a polyketide, one of the above sequences being included and expressed in the said operon. Such an operon comprising a nucleic acid sequence according to the invention encoding a polyketide synthase can be constructed, for example, according to the teaching of Borchert et al. (1992). 10 The invention also relates to a pharmaceutical composition comprising a pharmacologically active amount of a polyketide according to the invention, where appropriate in combination with a pharmaceutically compatible vehicle. Such pharmaceutical compositions will advantageously be 15 adapted for the administration, for example parenteral administration, of an amount of a polyketide synthesized by a type I polyketide synthase according to the invention ranging from 1 pg/kg per day to 10 mg/kg per day, preferably at least 0.01 mg/kg per day and most preferably between 0.01 and 1 mg/kg per day. 20 The pharmaceutical compositions according to the invention can be administered either orally, rectally, parenterally, intravenously, subcutaneously or intradermally. The invention also relates to the use of a polyketide obtained by means of expressing a type I polyketide synthase according to the 25 invention, for the manufacture of a medicinal product, in particular a medicinal product with antibiotic activity. The invention will also be illustrated, without however being limited, by the figures and examples below.

62 Figure 1 illustrates the scheme of the various lysis steps carried out according to protocols 1, 2, 3n, 4a, 4b, 5a and 5b described in Example 1. Figure 2 illustrates an electrophoresis on 0.8% agarose gel of the 5 DNAs extracted from 300 mg of soil No 3 (St Andre coast) after various lysis treatments (protocols 1 to 5, cf. Fig. 1). M: lambda phage molecular weight marker. Figure 3 illustrates the proportion of various genera of 10 actinomycetes cultured after treatments 1 to 5 (cf. Fig. 1). The cfu (colony forming unit) number was determined on a medium which is selective for this group of bacteria. A total number of about 400 colonies was analysed. Figure 4 illustrates the recovery of lambda phage DNA digested 15 with Hindlll added to the soils at different concentrations before (G) or after (G*) grinding. The treatments T (heat shocks) and S (sonication) are additional lysis treatments. The quantification was carried out by analysis with a phospho-imager after dot-blot hybridization. A sample of each soil was used for each concentration of lambda phage added. The 20 characteristics of the soils are given in Table 1. The samples corresponding to 10 and 15 pg pf DNA added were not treated. Figure 5 illustrates the PCR amplification of the DNAs extracted from soil No 3 according to protocols 1, 2, 3, 5a and 5b. The primers FGPS 25 122 and FGPS 350 (Table 2) were used to target indigenous Streptosporangium spp. The DNAs extracted were used undiluted or at 10-fold and 100-fold dilutions. M: 123 bp molecular weight marker (Gibco BRL), C: DNA-free amplification control.

63 Figure 6 illustrates the amounts of DNA extracted after inoculating spores (a) or mycelium (b) of S. lividans OS48.3 inoculated into the soils at different concentrations. The amounts of mycelium added to the soil correspond to the number of spores inoculated in the germination medium. 5 About 50% of the spores germinated and the number of cells or genomes contained in the germinated spore hyphae was not determined. The amounts of spores and of mycelium inoculated are thus not directly comparable. The extraction protocol was carried out according to protocol 6 (cf. materials and methods section). Symbol (') indicates that RNA was 10 included in the extraction buffer. The target DNA was amplified by PCR with the primers FGPS 516 and FGPS 517, and the quantification was carried out with a phospho-imager after dot-blot hybridization using the probe FGPS 518. A sample of each soil was used for each concentration of hyphae or of spores. The characteristics of the soils are described in 15 Table 1. Figure 7 represents the phylogenetic tree obtained with the Neighbour Joining algorithm, positioning the 16S rDNA sequences contained in the soil DNA library, relative to cultured reference bacteria. 20 In grey: the sequences obtained from the pools of clones of the library. The bootstrap values are indicated at the nodes, after re-sampling of 100 repetitions. The scale bar indicates the number of substitutions per site. The access number of the sequences in the Genbank database is 25 indicated in parentheses. Figure 8 represents a scheme of the vector pOSint 1. Figure 9 represents a scheme of the vector pWED 1. Figure 10 represents a scheme of the vector pWE15 (ATCC No 37503).

64 Figure 11 represents a scheme of the vector pOS7001. Figure 12 represents a scheme of the vector pOSV010. Figure 13 represents the fragment containing a "cos" site inserted into the plasmid pOSV01 0 during construction of the vector pOSV303. 5 Figure 14 represents a scheme of the vector pOSV303. Figure 15 represents a scheme of the vector pEl 16. Figure 16 represents a scheme of the vector pOS700R. Figure 17 represents a scheme of the vector pOSVO01. Figure 18 represents a scheme of the vector pOSVO02. 10 Figure 19 represents a scheme of the vector pOSVO14. Figure 20 represents a scheme of the vector pBAC1 1. Figure 21 represents a scheme of the vector pOSV403. Figure 22 represents the electrophoresis gels for DNA of the library after digestion with the enzymes BamHI and Dral of the positive 15 clones of the library screened with the PKS-l oligonucleotides. Figure 23 illustrates the production of puromycin by the S. lividans recombinants compared with the production of the S. alboniger wild-type strain. Figure 24 illustrates the alignment of soil PKSs with the conserved 20 active sites of other PKSs. The references for each peptide are indicated. The beta-ketoacyl synthase domains were aligned using the GCG PILEUP program (Wisconsin Package Version 9.1, Genetics Computer Group, Madison, Wisc). Figure 25 illustrates the construction of an integrative conjugative 25 cosmid. Figure 26 illustrates the construction of an integrative conjugative BAC. Figure 27 illustrates the scheme for constructing the vector pOSV308.

65 Figure 28 illustrates the scheme for constructing the vector pOSV306. Figure 29 illustrates the scheme for constructing the vector pOSV307. 5 Figure 30 illustrates the scheme for constructing the vector PMBD-1. Figure 31 shows a detailed map of the plasmid pMBD-2 and also a scheme for constructing the vector pMBD-3. Figure 32 illustrates a detailed map of the plasmid pMBD-4. 10 Figure 33 illustrates the scheme for constructing the plasmid pMBD-5 from the plasmid pMBD-1. Figure 34 illustrates the detailed map of the vector pBTP-3. Figure 35 illustrates the scheme for constructing the vector pMBD-6 from the vector pMBD-1. 15 Figure 36 illustrates the map of the cosmid a26G1 whose DNA insertion contains open reading frames encoding several polyketide synthases. Figure 37 is a scheme representing the DNA insertion (+ strand) of the cosmid a26G1, on which are positioned the various reading frames 20 encoding several polyketide synthases. EXAMPLES: EXAMPLE 1: Process for preparing a collection of nucleic acids from a soil sample containing organisms, comprising a step of direct 25 extraction of DNA from the soil sample. 1. MATERIAL AND METHODS 66 1.1 SOILS: The characteristics of the six soils used in this study are listed in Table 1. The clay content and organic matter content range, respectively, from 9 to 47% and from 1.7 to 4.7%, the pH ranging from 4.3 to 5.8. 5 Soil samples were collected from the surface layer of 5 to 10 cm in depth. All the visible roots were removed and the soils were stored at 4 0 C for a few days if necessary, after which they were dried for 24 hours at room temperature and screened (average mesh size: 2 mm) and then stored for up to several months at 4 0 C. 10 1.2 BACTERIAL STRAIN AND CULTURE CONDITIONS: The extracellular DNA and the bacterial strains supplying vegetative cells, spores or hyphae, used to inoculate the soil samples, were chosen such that their presence could be specifically monitored. 15 In order to obtain large amounts of extracellular DNA, the lysogenic strain of E.coli 1192 Hfr P4X (metB), containing the lambda phage C1857 Sam7, was cultured on Luria-Bertani (LB) medium for two hours at 30 0 C, then for 30 minutes at 400C, and then for 3 hours at 37 0 C. The lambda phage DNA was extracted according to the technique desribed 20 by Sambrook J. et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor N.Y. The avirulent strain of Bacillus anthracis (STERNE 7700) was used as bacterial cell inoculum. Bacillus anthracis was multiplied on a "trypticase soy broth " (TSB) (Biom6rieux, Lyons, France) culture broth for 25 about 6 hours, checking that the OD 600 was maintained below 0.6. These conditions allow the growth of vegetative cells without formation of spores (Patra et al., (1996), FEMS Immunol. Medical Microbiology, vol.15:223 231). The spores of Streptomyces lividans OS48.3 (Clerc-Bardin et al., unpublished) were removed mechanically from the organism cultures on a 67 R2YE medium (Hopwood et al., (1985), Genetic Manipulation of Streptomyces-A Laboratory Manual. The John Innes Foundation, Norwich, United Kingdom). The hyphae of S.lividans OS48.3 were obtained from pre-germination spores, since it was expected that the use of short hyphae 5 would minimize the rupture and subsequent loss of DNA. The spores were suspended in TES buffer (N-tris [hydroxymethyl]methyl-2-aminoethane sulphonic acid; Sigma-Aldrich Chimie, France) (0.05 M; pH 8) (Holben WE et al., (1988), APPL. Environ. Microbiol. vol. 54:703-711), and were then subjected to a heat shock (500C for 10 minutes followed by cooling under 10 cold running water and then addition to an equal volume of pre-germination medium (1% yeast extract, 1% casamino acids, 0.01 M CaCl 2 ). The solution was incubated at 370C on an agitator. The proportion of germinated spores was estimated at about 50%, in accordance with the results of Hopwood et al. (1985). After centrifugation, the pellets were 15 resuspended in TES buffer, added to 3% TSB medium and incubated at 370C until an OD 450 of 0.15 was obtained (Hopwood et al., (1985)). Streptomyces hygroscopicus SWN 736 and Streptosporangium fragile AC1296 (Institute Pushino, Moscow) were cultured according to techniques described by Hickey and Tresner (1952). 20 The DNA of the spores and hyphae of S. lividans was extracted from pure cultures according to the lysis protocol 6 described below (except that no grinding was carried out), while the spores of S. hygroscopicus and S. fragile were extracted by chemical/enzymatic lysis (Hintermann et al., 1981). 25 1.3 CHOICE OF THE EXTRACTION BUFFER: A TENP buffer (50 mM Tris, 20 mM EDTA, 100 mM NaCl, 1% wt/vol of polyvinylpolypyrrolidone) developed by Picard (1992) was used. Similar buffers were subsequently 68 used by other authors (Clegg et al., 1997; Kuske et al., 1998; Zhou et al., 1996). The Tris and the EDTA protect the DNA from the nuclease activity, the NaCI provides a dispersant effect and the PVPP absorbs the 5 humic acids and the other phenolic compounds (Holben et al. (1988); Picard et al., (1992)). In this study, the extraction efficacy of this buffer was evaluated at different pH values (6.0-10.0) using 20 different soils having a pH range from 5.8 to 8.3 and an organic matter content of between 0.2 and 6.3%. 10 These twenty soils (the other characteristics are not indicated) were used only in this experiment. The amount of DNA was determined by colorimetric means as described by Richard (1974), and detailed below. 1.4 PROTOCOL OF IN SITU LYSIS AND OF DNA EXTRACTION: Several 15 protocols using an increasing number of steps were tested in order to evaluate the efficacy of various techniques for lysing the soil microbes in situ. For these experiments, the indigenous soil microflora was targetted in six soils. Additional experiments were carried out in order to study the effects of the lysis treatments on the DNA released, by analysing the 20 quantities and quality of DNA recovered originating from a lambda phage DNA added beforehand to the soils. Once an optimized protocol (referred to as protocol 6) had been developed, this protocol was used to quantify the DNA originating from indigenous Actinomycetes and of DNA originating from gram-positive 25 bacteria inoculated in the selected soils. In all cases, the soil samples were dried and screened as described above. After grinding, 0.5 ml of TENP buffer was added to 200 mg dry weight of soil, except for protocol 1 in which the buffer was added to an unground soil.

69 For the various lysis treatments (see below), the soil suspensions were vortexed for 10 minutes and centrifuged (4000 x g for five minutes), after which an aliquot fraction (25 pl) of the supernatant was analysed by gel electrophoresis (0.8% agarose). 5 Another aliquot fraction of the supernatant representing a known volume, generally 350 pl, was precipitated with isopropanol. Five aliquot fractions (representing the DNA derived from 1 g of soil) were combined and resuspended in 100 ptl of a sterile TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0) before purification (protocol D, see 10 below) and quantification, either by hybridization (Dot-Blot) of the total DNA, or by hybridization (Dot-Blot) of the PCR amplification products (see below). The hybridization signals were quantified by phosphorescence imaging ("phospho-imaging" technique, see below). 15 1.5 EVALUATION OF THE METHODS OF IN SITU CELL LYSIS: The quality and quantity of DNA extracted after an increasing number of lysis treatment steps (protocol 2-5b) were compared with those of the extracellular DNA obtained after washing the soil with an extraction buffer 20 (protocol 1; see also Figure 1). Protocol 1: No lysis treatment. The TENP buffer was added to an unground soil, and a DNA 25 extraction step was carried out as described above. Protocol 2: Grinding of the soil followed by a DNA extraction. Two different types of device were used to grind the soil.

70 In order to compare their respective efficacy, 5 g of dry soil were ground for 30 seconds in a grinder containing tungsten rings, or for times varying up to 60 minutes in a soil grinder containing a mortar and agate beads (20 mm in diameter). 5 The TENP buffer was then added and the DNA was extracted as described above. The gel electrophoresis results showed that grinding for 40 minutes using agate beads was necessary in order to obtain amounts of extracted DNA equivalent to those obtained after grinding for 30 seconds 10 using tungsten rings. The size distribution of the DNA fragments is similar whatever the method used. Thus, these treatments were considered as equivalent and the one which is used in the protocols described below will consequently not 15 be specified. In protocols 3 to 5, the efficacy of several other lysis treatments subsequent to the grinding of the soil was tested, either separately or in different combinations. 20 Protocol 3: This protocol is identical to protocol 2, except that it comprises a step of homogenization using an Ultra-turrax type mixer (Janker and Kunkel, IKA Labortechnik, Germany) set at half the maximum speed for 5 25 minutes. PROTOCOLS 4a and 4b: 71 These protocols are identical to protocol 3, except for an additional sonication step. Two types of sonicator device were compared: a titanium micropoint sonicator (600W Vibracell Ultrasonicator, Bioblock, Illkirch, 5 France) (Protocol 4a) and a sonicator of Cup Horn type (protocol 4b). The Vibracell micropoint producing ultrasound is in direct contact with the soil solution. As regards the device of Cup Horn type, the soil solution is stored in tubes which are placed in a water bath through which the ultrasound 10 passes. Preliminary experiments were carried out in order to determine the optimum conditions for the two sonicators (results not presented). The best compromise, in terms of amount of DNA extracted and fragment size, consists of a sonication with the titanium micropoint and the 15 sonicator of Cup Horn type for 7 and 10 minutes respectively, adjusting the power to 15 W and with 50% active cycles. Protocols 5a and 5b: 20 After sonication with a titanium micropoint or a device of Cup Horn type (protocols 4a and 4b respectively), lysozyme and achromopeptidase were added to each of the enzymes at a final concentration of 0.3 mg/ml. The soil suspensions were incubated for 30 minutes at 37 0 C, after which lauryl sulphate at a final concentration of 1% was added, and the 25 suspensions were then incubated for 1 hour at 60 0 C before centrifugation and precipitation as described above. In addition to the protocols described above, the effect of the sonication (Cup Horn, see protocol 4b) and heat shocks (30 seconds in liquid nitrogen followed by three minutes in boiling water, the treatments 72 being repeated three times) on lambda phage DNA digested with Hindill added beforehand to the soil, were examined (see below). Heat shocks were suggested in the prior art as means for in situ cell lysis (Picard et al. (1992)). However, due to the fact that such a 5 treatment has a harmful effect on the free DNA (see the results section) it was not included in the protocols described above. OPTIMIZED PROTOCOL 10 After evaluation of the various lysis treatments, an optimized protocol was defined, which is referred to as protocol 6. Protocol 6 is identical to protocol 5b except that, before sonication, the soil suspensions are subjected to a vortexing treatment and then agitated by rotation on a wheel for two hours before being frozen at -20 0 C. 15 After thawing, the soil suspensions were vortexed for 10 minutes before sonication. Protocol 6 was used in the experiments in which the soils were inoculated with bacterial cells, as well as in the experiments in which the indigenous actinomycetes were quantified (see below). 20 1.6 COUNTING BY MICROSCOPE: The efficacy of grinding of the soil as a method for lysing bacterial cells was examined by microscope. 5 g of dried crude soil were mixed in a Waring Blender device with 50 ml of ultrapure sterilized water for 1.5 minutes; simultaneously, 1 g (dry weight) of ground soil (protocol 2) was suspended in 10 ml by agitation for 25 10 minutes. The soil suspensions were serially diluted and acridine orange was added to a final concentration of 0.001%. After 2 minutes, the suspensions were filtered through a Nucleopore brand membrane of 0.2 pm black type. Each filter was rinsed 73 with lysed sterile water, treated with 1 ml of isopropanol for 1 minute in order to fix the bacterial cells, and then rinsed again. The bacterial cells were counted using a Zeiss Universal epifluorescence microscope with a 100x objective lens. For each of the 5 types of soil, three filters were counted, and at least 200 cells were counted on each of the filters. 1.7 COUNTING OF THE CULTURABLE ACTINOMYCETES AND TOTAL NUMBER OF COLONY-FORMING UNITS (CFU): The actinomycetes 10 which survived the lysis treatments (protocols 1-5) were examined specifically with soil No. 3 (Saint Andr6 coast, see Table 1). After a 10-fold dilution of a solution of yeast extract (6% weight/volume) and of SDS (0.05%) in order to induce germination (Hayakawa et al. (1988)), the soil suspensions were serially diluted in 15 sterile water, incubated at 400C for 20 minutes and inoculated on HV medium (Hayakawa et al., 1987). The HV medium was supplemented with actidione (50 mg/) and nystatin (50 mg/I). The actinomycete colonies were counted after incubation for 20 15 days at 28 0 C. In total, about 400 colonies were examined. The identification was carried out on the basis of the macro- and microscopic morphological characteristics as well as on the analysis of the diaminopimelic acid content of the isolates (Shirling et al., 1966); Staneck et al., 1974; Williams 25 et al.,1993). The total amount of culturable bacteria (total CFU) was also determined for each of the lysis protocols 1 to 5. The soil suspensions were serially diluted and inoculated in triplicate on a Bennett agar medium 74 (Waksman et al., 1961) supplemented with nystatin and actidione (each at 50 mg/I). Each Petri dish was covered with a cellulose nitrate filter (Millipore) and incubated for three days at 280C. After counting the 5 colonies on the membranes, the filters were removed and the Petri dishes were reincubated for 7 days at 280C and then counted again. 1.8 RECOVERY OF THE LAMBDA PHAGE DNA ADDED TO THE SOILS: The lambda phage DNA was digested with HindillI, extracted with a phenol 10 chloroform mixture, precipitated and then resuspended in ultrapure sterile water according to standard protocols (Sambrook et al.,1989). Dilutions corresponding, respectively, to 0, 2.5, 5, 7.5, 10 and 15 pg of DNA/g of dry weight of soil were prepared in 60 pl volumes. These DNA dilutions were added to 5 g batches of dry soil which were 15 subsequently vortexed vigorously for 5 minutes before grinding. The lambda phage DNA was also added to a soil before grinding at concentrations corresponding to 0, 10 and 15 pg of DNA/g of dry weight of soil. After grinding, the extraction buffer was added and the DNA was 20 extracted according to protocol 2 (see above). 1.9 SATURATION OF THE ADSORPTION SITES WITH RNA: In order to determined whether or not the saturation of the nucleic acid adsorption sites of the soil colloids could increase the level of recovery of the DNA, the 25 sandy compost (soil No. 4) and the clayey soil (soil No. 5) were incubated with an RNA solution before any other treatment. Commercial Saccharomyces cerevisiae RNA (Boehringer Mannheim, Meylan, France) was diluted in phosphate buffer (pH 7.1) and 75 added to the dry, screened soil samples (2 ml/g of soil) at final concentrations of 20, 50 and 100 mg of RNA/g of dry weight of soil. The tubes containing the soil suspensions were agitated by rotation for two hours at room temperature. After centrifugation, the soil 5 pellets were dried in an oven (50*C) overnight. The lambda phage DNA was then added to the soils (0, 20 or 50 pg/g of dry weight of soil) in order to simulate the fate of the DNA released after cell lysis. The DNA was extracted according to protocol 2. It was determined thereafter that an identical effect of addition of RNA on the 10 recovery of DNA could be achieved by adding the RNA directly to the extraction buffer. This simplified procedure was used for the clayey soil No. 5 in the experiments in which the microorganisms were inoculated in the soils. The RNA was then added at a concentration corresponding to 15 50 mg of RNA/g of dry weight of soil. 1.10 QUALITATIVE AND QUANTITATIVE DETERMINATION OF THE EFFICACY OF THE EXTRACTION PROTOCOLS: The quality of the DNA (absence of degradation) was estimated on the basis of the size of the 20 DNA fragments or the relative position of the DNA migration bands after electrophoresis of an aliquot fraction of a DNA solution on a 0.8% agarose gel. The fluorescence intensity allowed a semi-quantitative estimation of the extraction yields. 25 Another aliquot fraction was used for quantitative determinations of the DNA content by hybridization (Dot-Blot) and analysis with a phospho-imager. The Dot Blot hybridization protocol has been described by Simonet et al. (1990).

76 The hybridization membranes (GeneScreen plus, Life Science Products, Boston, USA) were prehybridized for at least 2 hours in 20 ml of a solution containing 6 ml of 20 x SSC, 1 ml of Denhardt's solution, 1 ml of 10% SDS and 5 mg of salmon sperm DNA. 5 The hybridization was carried out overnight in the same solution in the presence of a labelled probe prior to two washes of the membranes in an SSC 2 x buffer for 5 minutes at room temperature, followed by a third wash in a SSC 2 x, 0.1% SDS buffer and a fourth wash in an SSC 1 x, 0.1% SDS buffer for 30 minutes at the hybridization temperature. 10 The hybridization signals were quantified with a Biorad radioanalytical imaging system (Molecular Analyst Software, BIORAD, Ivry-sur-Seine, France). In order to quantify the total amount of DNA derived from the indigenous microflora, the various soils were extracted according to 15 protocols 1 to 5. The non-amplified DNA was applied to the Dot-Blot membranes and hybridized using the universal probe FGPS431 (Table 2). This probe, which hybridizes to positions 1392-1406 of the E.coli 16S rDNA gene (Amann et al. (1995)) was labelled at its ends with a 3P ATPa using a polynucleotide T4 kinase (Boehringer Mannheim, Melan, 20 France). A calibration curve was prepared using E.coli DH5a DNA. The conversion of the calculations to the soil bacteria required a simplification, starting from the hypothesis that the average number of copies (rrn) is 7, as for E.coli. 25 The lambda phage DNA digested with HindIll was used to quantify the recovery of the extracellular DNA. Non-amplified extracts from soils, to which lambda phage DNA had been added, were hybridized with lambda phage DNA digested with HindIll and labelled at random using the Klenow fragment (Boehringer Mannheim, Melan, France).

77 The amounts of DNA were calculated by interpolation using a calibration curve prepared with the purified DNA. The total amount of DNA extracted from soils 1, 2, 3, 4 and 6 according to protocol 2 (grinding) was also quantified by colorimetric 5 means according to the technique described by Richard (1974). Briefly, the DNA was mixed with concentrated HC10 4 (the final concentration of HC10 4 was 1.5 N). 2.5 volumes of this solution were mixed with 1.5 volumes of DPA (diphenylamine, Sigma-Aldrich, France) and the mixture was left to incubate at room temperature for 18 hours, prior to 10 determination of the OD at 600 nm. The soil DNA extracts were quantified relative to a standard curve prepared with the DNA extracted from E.coli DH5a according to the standard protocols (Sambrook et al., (1989)). 1.11 DEVELOPMENT OF A DNA QUANTIFICATION TECHNIQUE USING 15 PCR AMPLIFICATION AND HYBRIDIZATION: For the PCR amplifications, DNA Taq polymerase (Appligene Oncor, France) was used according to the manufacturer's instructions. The PCR programme used for all the amplifications is as follows: initial denaturing for 3 minutes at 95 0 C, followed by 35 cycles consisting of 20 1 minute at 95*C, 1 minute at 550C and 1 minute at 720C and then a final extension at 720C for 3 minutes. The DNA isolated and purified from Streptosporangium fragile was used as control at concentrations ranging from 100 fg to 100 ng. In order to amplify specifically the DNA of this bacterial genus, the 25 primers FGPS122 and FGPS350 (Table 2) were selected, which are complementary to a portion of the 16S rDNA, after alignment of the sequences of actinomycetes 16S rDNA. Their specificity was tested on a collection of actinomycetes strains (Streptomyces, Streptosporangium and other highly similar genera).

78 The PCR products were hybridized with the oligonucleotide probe FGPS643 (Table 2). In order to simulate the level of purity routinely obtained with DNA extracted from the soil, controls of pure DNA from S. fragile were mixed with the soil extracts obtained after treatments 5 according to the lysis protocols 4b and 5b and then purified according to protocol D. Before use, the soil extracts were treated with DNase (one unit of DNase/ml, Gibco BRL) for 30 minutes at room temperature. The DNase was then inactivated by heating at 650C for 10 minutes. Verification of the 10 inactivation was carried out by PCR. The humic acid concentrations were measured by spectrophotometry (OD 2 80 nm) against a standard curve of commercial humic acids (Sigma). Soil solutions treated with undiluted, 10-fold diluted and 100-fold diluted DNase were mixed with from 100 fg to 100 ng of S. fragile DNA 15 before the PCR amplification. In another series of experiments, the increasing concentrations of Streptomyces hygroscopicus DNA (from 100 pg to 1 pg) were added to the S. fragile DNA in order to simulate the presence of non-target DNA and its influence on the PCR process. 20 1.12 PURIFICATION OF THE CRUDE DNA EXTRACTS: Four DNA purification methods were compared. The DNA was extracted from 1 g (dry weight of soil) according to protocol 4a and resuspended in 100 pl of buffer TE8 (50 mM Tris, 20 mM EDTA, pH 8.0). 25 Protocol A Elution through two successive Elutip d columns (Schleicher and Schuell, Dassel, Germany) (Picard et al., (1992)).

79 Protocol B: Elution through a Sephacryl S200 column (Pharmacia Biotech, Uppsala, Sweden) followed by an elution through an Elutip d column 5 (Nesme et al. (1995)). Protocol C: Separation using a two-phase aqueous system with 17.9% 10 (weight/weight) of PEG 8000 (Merck, Darmstadt, Germany) and 14.3% (weight/weight) of (NH4)2SO 4 (Zaslavsky, (1995)). After vigorous vortex mixing, the two phases were left at room temperature to separate. 1 ml of each of the phases was transferred into another tube, 15 mixed with 100 l of the sample and left at 40C overnight to allow separation. The lower phase was dialysed for one hour through a Millipore membrane in the presence of an excess of a TE 7.5 buffer (10 mM Tris, 1 mM EDTA at pH 7.5 and 1 M MgC 2 ) in order to remove the excess salts. 20 Protocol D: Elution through a Microspin Sephacryl S400 HR column (Pharmacia Biotech, Uppsala, Sweden), followed by elution through an 25 Elutip d column. Each protocol is completed by a step of precipitation with ethanol and the DNA is resuspended in 10 gi of TE 7.5 buffer. The efficacy of the purification protocols was checked by PCR amplification of undiluted 80 aliquot fractions of the DNA solutions and of 10-fold and 100-fold diluted aliquot fractions, using standard protocols (see below). 1.13 RECOVERY OF THE DNA FROM INOCULATED 5 MICROORGANISMS: The cells, spores and hyphae were washed twice and counted by counting on a plate or by direct microscopic counting. 5 g batches of dry, screened soil (soils 2, 3 and 5) were inoculated with 100 pl of a suspension of S. lividans spores and hyphae at concentrations corresponding to 0, 10 3 10 10 , 107 and 109 spores/g of dry weight of soil, or with B.anthracis vegetative cells at concentrations corresponding to 0, 107 and 109 cells per gram of dry weight of soil. The amounts of S. lividans hyphae were calculated on the basis of the number of spores from which they originate. After addition of the 15 bacterial suspensions, the soil samples were vortexed vigorously for 5 minutes before grinding. The DNA was extracted according to protocol 6 (see below). PCR amplification followed by Dot-Blot hybridization and phosphorescence imaging (phospho-imaging) was used in order to 20 quantify the amounts of DNA recovered from the cells and spores and from the bacterial mycelium inoculated in the soils. The DNA extraction was carried out according to lysis protocol 6. The PCR amplification and the hybridization were carried out as described above. The primers and probes are targetted on chromosome regions 25 located outside the 16S region, and are highly specific for the respective organisms, so as to avoid background signals. For the soils inoculated with B. anthracis, the primers R499 and R500 were used (Patra et al. (1996)) and the amplification products were hybridized with the oligonucleotide probe C501 (Table 2).

81 For the soils inoculated with S. lividans, the PCR reactions were carried out using the primers FGPS516 and FGPS517, and the amplification products were hybridized with the oligonucleotide probe FGPS518 (Table 2). 5 The amplified region is a portion of the cassette constructed specifically to obtain the strain OS48.3 (Clerc-Bardin et al., unpublished). The calibration counts were obtained in all cases using the purified DNA from the target organism. 10 2.RESULTS 2.1 CHOICE OF THE EXTRACTION BUFFER: 20 different soils were used in order to determine the optimum pH of the DNA extraction buffer. For all the soils, the DNA yield increases as 15 the buffer pH increases. The yield for each pH (± sd), calculated as the percentage of the highest value for each of the soil, is as follows: pH 6.0 : 31 ± 13; pH 7.0: 43 ± 16; pH 8.0: 60 ± 14; pH 9.0: 82 ± 12; pH 10.0: 98 ± 3. For 16 out of the 20 soils, the highest yield was obtained at 20 pH 10.0, whereas for the other four soils, the highest yield was obtained at pH 9.0. However, at pH 10.0, larger amounts of humic material were released, compared with pH 9.0 (results not presented). Consequently, pH 9.0 was chosen for all the experiments presented below. 25 2.2 EFFICACY OF THE DNA EXTRACTION PROTOCOLS: The total DNA from the indigenous soil organisms was extracted and quantified so as to evaluate the efficacy of several in situ cell lysis protocols. Soil samples 1-6 (Table 1) were treated according to protocols 1 to 5 described in the Materials and Methods section (Figure 1).

82 After the DNA extraction, the soil suspensions were precipitated with isopropanol, and aliquot fractions of the resuspended pellets were analysed by gel electrophoresis, in a first step, in order to estimate the quality and quantity of the DNA released. 5 However, the colour of the DNA extract turned darker and darker as the number of lysis steps increased, due to the co-extraction of compounds, such as humic acids, with the DNA. Some of these dark-coloured crude extracts do not migrate in the expected manner in the agarose gels. 10 Consequently, the crude DNA solutions were purified (protocol B) before quantification. The gel electrophoreses of the purified solutions obtained after the various lysis treatments are given as examples on soil 3 (Figure 2). A visual comparison by ultraviolet radiation of the intensities of the 15 coloured DNA allowed a semi-quantitative estimation of the efficacy of the treatments. Furthermore, the presence of migration profiles of multiple sizes of DNA fragments (discrete bands) and the disappearance of the long fragments indicates that a degradation of the DNA has taken place. No DNA could be extracted from the clayey soil No. 5. 20 A more precise quantification of the DNA from all the soils, extracted according to protocols 1 to 5, was carried out by Dot-Blot hybridization without a prior PCR amplification step and using an oligonucleotide probe complementary to a highly conserved sequence of the 16S rDNA region (probe FGPS 431, Table 2). 25 The DNA was detected in the extracts of all the soils after each of the various lysis steps, except for the clayey soil No. 5. The results agree with the estimations made after gel electrophoresis.

83 In order to compare with an independent quantification method, the DNA extracted according to protocol 2 (from all the soils except soil No. 5) was also quantified using a colorimetric DNA detection method (Richard, 1974). 5 Good correlation was found (r = 0.88) between the DNA quantified using this colorimetric technique and the results obtained by Dot Blot hybridization/radio-imaging, confirming the hypothesis that the average number of copies of the soil bacteria (rrn) is 7. The hybridization (Dot-Blot) showed that the amounts of 10 extracellular DNA, as determined by extraction without a lysis treatment (protocol 1), ranged from 4 Ig/g for the acidic soil (No. 6) to 36 pg/g for soil No. 3 (Table 3). Grinding of the soil (protocol 2) increased the amounts of DNA extracted from all the soils (e.g. 26 pg/g of soil for soil No. 6 and 59 pg/g of 15 soil for soil No. 3) (Table 3; Figure 2). For the two grinding treatments (see the Materials and Methods section), the discrete DNA migration was detected on the agarose gels, indicating that the DNA molecules were partially degraded (Figure 2). The size of the DNA fragments is between 20 and 0.2 kb. The 20 band intensity of the smallest fragments is very low, indicating that most of the fragments are much bigger than 1 kb. Protocol 3 comprises a step of homogenization in an Ultra-turrax mixing device after addition of the extraction buffer to the soil samples. This step leads to an increase in the amounts of DNA extracted, as 25 determined by Dot-Blot hybridization for two of the soils (the sandy soil No. 3 and the acidic soil No. 6), whereas the two soils rich in organic matter (soils No. 1 and No. 2) led to the production of smaller amounts of

DNA.

84 Protocols 4a and 4b made it possible to evaluate the effect of two types of sonication on the yields of DNA from pre-ground and pre homogenized soils. The sonication had no positive effect on the DNA yield, compared 5 with protocol 3, except for soil No. 6. However, the lysis efficacy for the two types of sonicator differs. For soils 2, 3 and 4, the largest amounts of DNA extracted were obtained using the titanium micropoint (Table 3; Figure 2), whereas for soils Nos. 1 and 6, the DNA yield was higher using the Cup Horn device. 10 Contradictory results were also obtained when a step of enzymatic/chemical lysis was added (protocols 5a and 5b) after the sonication step; in certain cases, the amounts of DNA extracted were larger than those recovered according to protocols 4a and 4b, whereas in other cases the yields were lower (Table 3). 15 2.3 DIRECT COUNTING OF THE MICROORGANISMS: Counting by microscope of the total number of bacterial cells after staining with acridine orange was carried out for all the soils, before and after grinding. 20 Before grinding, the number of bacteria per gram of dry weight of soil ranged from 1.4 x 109 (± 0.4) in the tropical soil No. 5, to 10 x 109 (± 0.7) in the soil obtained from the Saint-Andr6 coast (soil No. 3) (Table 1). After grinding, the number of cells were, respectively, 45, 74, 75, 25 54, 34 and 75% of the initial values for soils Nos. 1 to 6.

85 2.4 COUNTING OF THE CULTURABLE ACTINOMYCETES BELONGING TO DIFFERENT GENERA: A modification in the populations of actinomycetes in soil No. 3 was noted after the various lysis treatments (Figure 3). 5 For example, the colonies of Streptomyces sp. dominated the viable actinomycetes flora when no lysis treatment was applied (protocol 1) and represented 65% of the total number of colonies identified. After grinding, the percentage of Streptomyces colonies fell to 51%, whereas the proportion of colonies belonging to the Micromonospora genus increased 10 by 14% to 41%. The chemical/enzymatic lysis (protocols 5a and 5b) appeared to be particularly effective for the lysis of Streptomycetes. When all the lysis treatments were applied, including a chemical/enzymatic lysis (protocols 5a and 5b), the actinomycetes microflora, which still comprised more than 15 106 cfu/g of soil, was dominated by the species belonging to the Micromonospora genus, while few or no Streptomyces colonies were recovered. The organisms belonging to genera such as Streptosporangium, Actinomadura, Microbispora, Dactilosporangium and Actinoplanes 20 appeared in small number on the plates (2-8% of the total number of colonies identified) after grinding, homogenization with the Ultra-turrax device and sonication, but were generally absent when these treatments were combined with a chemical/enzymatic lysis. The total number of culturable bacteria remaining after each lysis 25 treatment (protocols 2 to 5) was also investigated for soil No. 4. The results indicate that the number of culturable bacteria does not decrease with the intensity of the lysis treatments (about 2 x 106 cfu/g of soil in all cases, and also when a treatment is not applied, such as according to protocol 1).

86 The production of these low cfu values is probably due to the fact that dry soil was used and that only the most resistant bacteria multiplied on the plates. The number of actinomycetes forming colonies was generally greater than that of the total cfu (all the bacteria) due to the fact 5 that a spore-germination step, included in the actinomycetes detection protocol, was missing during the control of the total bacteria. 2.5 RECOVERY OF THE LAMBDA PHAGE DNA ADDED: The aim of these experiments was to estimate the way in which 10 successive lysis treatments might affect the recovery of naked DNA, and whether or not these successive lysis treatments contributed to its degradation. The DNA could be either a fraction of extracellular DNA released from already-dead organisms, which can persist in the soil for months 15 (Ward et al., 1990), or DNA released from organisms readily lysed during the first steps of the treatment. In order to simulate this situation, lambda phage DNA digested with Hindill was added, at various concentrations, to the soils before and after grinding. In addition to grinding, a combination of the other lysis treatments was tested, including sonication (Cup Horn 20 device, see protocol 4b) and heat shocks (see the Materials and Methods section). After extraction, aliquot fractions which theoretically needed to contain from 25 to 150 ng of lambda phage DNA were analysed by gel electrophoresis. No DNA fragment specific for the lambda phage could be 25 observed when the DNA was inoculated into the soil samples prior to grinding, independently of the dose or of the type of soil. When the DNA was added after grinding, and extracted without an additional lysis treatment step, the specific lambda phage DNA profiles were detected in the extracts of four out of the five soils tested.

87 In all these cases, a direct cause-and-effect relationship was obtained between the amount of DNA added and the intensity of the signals on the agarose gels. However, the signal intensities were less than the signal intensities expected when compared with those of the molecular 5 standards. Furthermore, the band at 23 kb was absent in several cases, indicating that the long fragments were preferentially adsorbed onto the soil particles, or were more sensitive to degradation, compared with the short fragments. 10 No band was detected in the samples of tropical soil No. 5 which is characterized by a very high clay content (Table 1). For a more precise quantification, the recovery of DNA was determined on a phosphorescence imaging device (phospho-imager) after Dot-Blot hybridization. According to this technique, the DNA was detected 15 in all the samples, including those which had been inoculated before grinding, except for soil No. 5 in which no DNA could be detected. In all the other soils, the amount of DNA extracted increases as the size of the inoculum increases (Figures 4a-d). However, the recoveries of lambda phage DNA were low. When 20 grinding was the only lysis treatment applied, the recoveries were between 0.6 and 5.9% of the DNA added when this DNA was added before grinding, and from 3.6 to 24% of the DNA added when the latter was added after grinding. The highest levels of recovery were obtained from soil No. 2. 25 Gel electrophoresis of aliquot fractions of samples treated by heat shock and sonication did not allow any DNA bands to be observed in any of the samples, including the tests in which the DNA had been added after grinding. The Dot-Blot hybridization experiments confirmed these results.

88 The hybridization signals obtained from soil suspensions which were treated with heat shocks and sonications were, at best, low. The sample showing the largest amount of DNA (15 pg of DNA/g of dry weight of soil) was the only one for which the signal obtained was 5 substantially different from the background level. No difference (or only small differences) was observed between the samples treated with heat shock and those treated with heat shocks and sonication, indicating that the heat shocks have a harmful effect on the DNA. The best recoveries were observed for soil No. 2, which has the 10 highest organic matter content (Table 1), whereas no DNA was recovered from the clayey soil No. 5. Additional experiments were carried out with non-ground samples of soils No. 4 and No. 5, which were inoculated with 20 and 50 pg of lambda phage DNA per gram of soil. 15 The samples were extracted immediately or after an incubation period of one hour at 280C, and the DNA extracts were then purified and analysed by gel electrophoresis. The incubation of soil No. 4 for one hour after the inoculation did not give profiles that were qualitatively or quantitatively different from those 20 obtained without incubation or from those observed previously when the DNA was added after grinding. These results indicate that the enzymatic degradation by the soil nucleases is not thought to be involved in the low level of DNA recovery. Furthermore, the absence of a grinding step does not allow an increase in 25 the recovery of the DNA from soil No. 5, indicating that the changes to the structure of the soil due to the grinding do not significantly increase the adsorption of the nucleic acids onto the colloids.

89 2.6 SATURATION OF THE ADSORPTION SITES WITH RNA: Most of the profiles obtained on the agarose gels do not differ significantly from the previous profiles in which the RNA treatment was not carried out. 5 For example, no band was detected from the clay-rich soil No. 5, independently of the RNA concentrations and of the lambda phage DNA concentrations used. Furthermore, the specific bands of lambda phage DNA digested with HindIll remained undetectable in the sandy compost treated with RNA 10 (soil No. 4) when the RNA is added before grinding. The intensity of the bands obtained from samples inoculated with DNA after grinding increases as the RNA concentration increases, indicating that the treatment might have a positive effect. However, the results after hybridization and analysis by 15 phosphorescence imaging did not confirm the electrophoresis results. For example, the positive effect of the RNA treatment on the recovery of DNA from the clayey compost, when DNA was added after grinding, did not appear clearly. On the other hand, a positive effect of the RNA was found for the 20 clay-rich soil (No. 5) when the DNA was added after grinding. Although the hybridization signals for the control samples do not differ from the background noise levels, significant amounts of DNA were released from the samples treated with RNA, and the signals increased as the amount of DNA added increased and as the RNA concentration 25 increased. However, even for the highest RNA concentration (100 mg/g of weight of dry soil), the recovery level never exceeded 3%.

90 2.7 PURIFICATION OF THE CRUDE DNA EXTRACTS: Of the four protocols tested, the best amplification of the undiluted DNA extracts (1 pl of extract in 50 pl of PCR mixture) was observed after elution through Microspin S400 columns followed by an elution through an 5 Elutip d column as shown by the gel electrophoresis of the PCR products. The DNA purified by the two-phase aqueous system (protocol C) gave smaller amounts of PCR products after amplification starting with undiluted DNA extract. No amplification product could be obtained from the undiluted 10 extracts after amplification following the use of protocols A or B. Consequently, protocol B (see Materials and Methods section) was used for all the experiments in which the PCR amplifications and/or the Dot-Blot hybridizations were performed. 15 2.8 QUANTIFICATION BY PCR AND HYBRIDIZATION: The first step was to determine whether or not the amounts of PCR product were proportional to the number of target DNA molecules initially present in the reaction tube. DNA from Streptosporangium fragile was used as target (see Materials and Methods section). 20 The primers used were the primers FGPS122 and FGPS350 (Table 2). Gel electrophoresis of the PCR products showed that the band intensity increases as the concentration of the targets increases. The PCR products were hybridized with the oligonucleotide probe FGPS643 (Table 2), and the signals were quantified by phosphorescence imaging 25 (phospho-imaging). A good correlation (r 2 = 0.98) was found between the log[number of targets] and the log[intensity of the hybridization signal]. An investigation was then carried out to see whether or not the efficacy of the PCR amplification was affected by the humic acids and the 91 non-target DNA. When analysed by gel electrophoresis, the increased intensity of the bands for the PCR products, corresponding to the various amounts of target DNA, were conserved when the amplification was carried out with DNA solutions to which extracts of soil treated with DNase 5 had been added, containing humic acids at concentrations ranging up to 8 ng in 50 pl of the PCR mixture. With 20 ng of humic acid in the PCR mixture, the bands corresponding to the small levels of target DNA disappeared, and at humic acid concentrations of 80 ng and at higher concentrations, no band was 10 visible. The varied amounts of target DNA from S.fragile made it possible to supply the expected amounts of PCR product when, before amplification, the S. fragile DNA was mixed with Streptomyces hygroscopicus DNA and added to 50 pl of the PCR mixture in a range from 15 100 pg to 1 pg in order to simulate the non-target DNA released from the soil microflora. 2.9 QUANTIFICATION OF THE INDIGENOUS SOIL ACTINOMYCETES AFTER DIFFERENT LYSIS TREATMENTS: 20 Purification protocol D was applied, followed by a PCR amplification as described above, in order to quantify the actinomycetes belonging to the Streptosporangium genus in soil No. 3 after extraction in accordance with protocols 1, 2, 3, 5a and 5b (Figure 5). After grinding (protocol 2), the amount of target DNA originating 25 from this actinomycete was estimated by hybridization (Dot-Blot) and radio imaging as being 2.5 ± 1.3 ng/g of weight of dry soil. If it is postulated that the DNA content is 10 fg per cell, as for Streptomyces (Gladek et al., 1984), this value corresponds to approximately 2.5 x 105 genomes. Similar values were obtained after the 92 other lysis treatments (2.6 ± 1.1 and 1.8 ± 1.3 ng of DNA/g of dry soil, respectively, using protocols 3 and 4b, respectively). 2.10 EFFICACY OF THE RECOVERY OF DNA FROM SOILS PRE 5 INOCULATED WITH BACTERIA: Three soils (Nos. 2, 3 and 5) were inoculated with different concentrations of Streptomyces lividans spores or hyphae (see Materials and Methods section). The amounts of mycelium added to the soil (Figure 6b) correspond to the number of spores inoculated in the germination 10 medium. Approximately 50% of these spores germinated. The exact number of cells in the hyphae of the germinated spores was not determined. Consequently, the amounts of spores and mycelium inoculated in the soils are not directly comparable. For each soil sample, the extraction protocol No. 6, the 15 purification method D and PCR amplification combined with Dot-Blot hybridization and phosphorescence imaging (phospho-imaging) were used to count the specific target DNAs which had been released. The DNA extracted can be clearly distinguished from the background noise only when the number of spores added exceeds 10 5 for soils No. 3 and No. 5 20 and 107 for soil No. 2 (Figure 6a). When the mycelium is added, the DNA extracted can be detected at and above an amount corresponding to 103 spores/g of soil for soils No. 2 and No. 3, and at and above 107 spores/g of soil No. 5 (Figure b). Above the detection level, the hybridization signal increases as 25 the amounts of inoculated cells increases. For the spore inoculum, a 100-fold increase in the number of cells inoculated leads to a close to 100-fold increase in the DNA yield. This increase is clearly less than when the hyphae are inoculated, particularly into soils No. 2 and No. 3 (Figure 6).

93 In contrast, in the results obtained when lambda phage DNA was used as the inoculum, the DNA was also recovered from the clay-rich soil (No. 5) when the bacterial cells were used as the inoculum. However, for the latter inoculum also, the treatment with RNA increased the recovery of 5 Streptomyces DNA from this soil both for the spores and the mycelium (Figure 6). Inoculating the soils with vegetative Bacillus anthracis cells gave recovery levels similar to those obtained for Streptomyces. Furthermore, the levels of DNA recovery from soil No. 5 increased 10 after treatment with RNA for this inoculum also. Example 2: Construction of a library of low molecular weight DNA (<10 kb) using a soil contaminated with lindane, and cloning and 15 expression of the linA gene This example describes the construction of a DNA library of the E. coli. It demonstrates the cloning and expression of small genes obtained from a non-culturable microflora. 20 Lindane is an organochlorine pesticide, which is recalcitrant to degradation and persistent in the environment. Under aerobic conditions, biodegradation is catalyzed by a dehydrochlorinase, encoded by the linA gene, allowing lindane to be converted into 1,2,4-trichlorobenzene. The 25 linA gene has been identified only from two strains isolated from soil: Sphingomonas paucimobilis, isolated in Japan (Seeno and Wada 1989; Imai et al., 1991; Nagata et al., 1993) and Rhodanobacter lindaniclasticus isolated in France (Thomas et al., 1996, Nalin et al., 1999).

94 However, the degradation potential of lindane, demonstrated by assaying the chloride ions released and PCR amplification of the linA gene from soils which have been in contact with lindane or otherwise, appears to be more widespread in the environment (Biesiekierska-Galguen, 1997). 5 1. Direct extraction of soil DNA The dry soils are ground for 10 minutes in a Restch centrifugal-force grinder equipped with 6 tungsten beads. 10 grams of ground soil are suspended in 50 ml of pH 9 TENP buffer (50 mM Tris, 20 mM EDTA, 10 100 mM NaCl, 1 % w/v polyvinylpolypyrrolidone), and homogenized by vortexing for 10 min. After centrifuging for 5 min, at 4000 x g and 4*C, the supernatant is precipitated with sodium acetate (3M, pH 5.2) and with isopropanol, then taken up in sterile TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0). The DNA 15 extracted is then purified on an S400 molecular sieve column (Pharmacia) and on an Elutip d ion-exchange column (Schleicher and Schuell), according to the manufacturers' instructions, then stored in TE. 2. Construction of the library of DNA extracted from the soil in the vector pBluescript SK 20 The vector pBluescript SK- and the DNA extracted from the soil are each digested with the enzymes HindIll and BamHl (Roche), at a rate of 10 units of enzymes per 1 jug of DNA (incubation for 2 hours at 370C). The DNAs are then ligated by the action of T4 DNA ligase (Roche) overnight at 150C, at a rate of one enzyme unit per 300 ng of DNA (about 25 200 ng of DNA insert and 100 ng of digested vector). Electrocompetent Escherichia coli cells, ElectroMAX DH1OBTM (Gibco BRL) are transformed with the ligation mixture (2 pl) by electroporation (25 jF, 200 and 500 Q, 2.5 kV) (Biorad Gene Pulser).

95 After one hour of incubation in the LB medium, the transformed cells are diluted so as to obtain about 100 colonies per dish, and then plated out on LB medium (10 g/l tryptone, 5 g/l yeast extract, 5 g/I NaCI) supplemented with Ampicillin (100 mg/), y-HCH (500 mg/I), X-gal 5 (5-bromo-4-chloro-3-indolyl-a-D-galactoside, 60 mg/), and IPTG (isopropylthio--D-galactoside, 40 mg/) and incubated overnight at 370C. Since y-hexachlorocyclohexane (Merck-Schuchardt) is insoluble in water, a 50 g/l solution is prepared in DMSO (dimethyl sulphoxide) (Sigma). A library of 10,000 clones was thus obtained. 10 3.Cloninq and expression of the linA gene Screening of the library was carried out by visualization of a lindane degradation halo around the colony (the lindane precipitating in the culture media). Out of 10,000 clones screened, 35 thus exhibited lindane 15 degrading activity. The presence of the linA gene in these clones was confirmed by PCR with the aid of specific primers, desribed by Thomas et al. (1996). Digestions carried out on the inserts and on the amplification products showed identical profiles between all the clones screened and the reference control, R. lindaniclasticus. The clones carrying the linA gene 20 also had an insert of the same size (about 4 kb). It was thus demonstrated that the soil DNA could be cloned and expressed in a heterologous host: E. coli, and that genes derived from a microflora that is difficult to culture could be expressed. Libraries prepared 25 by partial digestion of DNA extracted from soil, with restriction enzymes such as Sau3AI, can thus be envisaged also.

96 EXAMPLE 3: Process for preparing a collection of nucleic acids from a soil sample, comprising a step of indirect DNA extraction. 5 1. MATERIALS AND METHODS 1.1 Extraction of the bacterial fraction of the soil 5g of soil are dispersed in 50 ml of sterile 0.8% NaCl, by grinding 10 in a Waring Blender for 3 x 1 minute, with cooling in ice between each grinding. The bacterial cells are then separated from the soil particles by centrifugation on a density cushion of Nycodenz (Nycomed Pharma AS, Oslo, Norway). In a centrifugation tube, 11.6 ml of a Nycodenz solution with a density of 1.3 g.ml- (8g of Nycodenz suspended in 10 ml of sterile 15 water) are placed below 25 ml of the soil suspension previously obtained. After centrifugation at 10,000 x g in a rotor with swing-out buckets (TST 28.38 rotor, Kontron) for 40 minutes at 40C, the cellular ring, located at the interphase between the aqueous phase and the Nycodenz phase, is taken, washed in 25 ml of sterile water and centrifuged at 10,000 x g for 20 20 minutes. The cell pellet is then taken up in a 10 mM Tris; 100 mMn EDTA pH 8.0 solution. Prior to dispersion of the soil in the Waring Blender, a step of enrichment of the soil in a solution of yeast extract can be included in order in particular to allow the germination of the soil bacterial spores. 5 g of soil 25 are thus incubated in 50 ml of a sterile solution of 0.8% NaCL-6% yeast extract, for 30 minutes at 400C. The yeast extract is removed by centrifugation at 5000 rpm for 10 minutes in order to avoid the formation of a foam during the grinding.

97 1.2 Lysis of the soil bacterial cells - Lysis of the cells in liquid medium and purification on a caesium chloride gradient 5 The cells are lysed in a 10 mM Tris, 100 mM EDTA, pH 8.0 solution containing 5 mg.mf' of lysozyme and 0.5 mg.mf 1 of achromopeptidase for 1 hour at 370C . A solution of lauryl sarcosyl (1% final) and proteinase K (2 mg.mf') is then added and incubated at 370C for 30 minutes. The DNA solution is then purified on a density gradient of 10 caesium chloride by centrifugation at 35,000 rpm for 36 hours on a Kontron 65.13 rotor. The caesium chloride gradient used is a gradient at 1g/mi of CsCl, with a refractive index of 1.3860 (Sambrook et al., 1989). - Lysis of the cells after inclusion in an agarose block 15 The cells are mixed with an equal volume of agarose containing 1.5% (weight/volume) Seaplaque (Agarose Seaplaque FMC Products. TEBU, Le Perray en Yvelines, France) at low melting point and poured into a 100 gI block. The blocks are then incubated in a lysis solution: 250 mM EDTA, 10.3% sucrose, 5 mg.m 1 lysozyme and 0.5 mg.m 1 20 achromopeptidase at 370C for 3 hours. The blocks are then washed in a 10 mM Tris-500 mM EDTA solution and incubated overnight at 370C in 500 mM EDTA containing 1 mg.m[ 1 of proteinase K and 1% lauryl sarcosyl. After washing several times in Tris-EDTA, the blocks are stored in 500 mM EDTA. 25 The quality of the DNAs thus extracted is checked by pulse-field electrophoresis. The amount of DNA extracted was evaluated on electrophoresis gel relative to a calibration range of calf thymus DNA.

98 1.3 Molecular characterization of the DNA extracted from soil The DNAs extracted from the soil are characterized by PCR hybridization, this method consisting in a first stage in amplifying the DNAs 5 using primers located on universally conserved regions of the 16S rRNA gene, and then in hybridizing the amplified DNAs with different oligonucleotide probes of known specificity (Table 4), with the aim of quantifying the intensity of the hybridization signal relative to an external calibration range of genomic DNA. 10 The DNAs extracted from the soil and the genomic DNAs extracted from pure cultures are amplified with the primers FGPS 612-669 (Table 1) under the standard PCR amplification conditions. The amplification products are then denatured with an equal volume of 1 N NaOH, deposited on a Nylon membrane (GeneScreen Plus, Life Science 15 Products) and hybridized with an oligonucleotide probe labelled at its end with g 32 P ATP by the action of T4 polynucleotide kinase. After pre hybridization of the membrane in a solution of 20 ml containing 6 ml of SSC 20X, 1 ml of Denhardt's solution, 1 ml of 10% SDS and 5 mg of heterologous salmon sperm DNA, the hybridizations are carried out 20 overnight at the temperature defined by the probe. The membranes are washed twice in SSC 2X for 5 minutes at room temperature, then once in SSC 2X 0.1% SDS and a second time in SSC 1X, 0.1% SDS for 30 minutes at the hybridization temperature. The hybridization signals are quantified using the Molecular Analyst software (Biorad, Ivry sur Seine, 25 France) and the amounts of DNA are estimated by interpolation of the calibration curves obtained from the genomic DNAs.

99 2. RESULTS AND DISCUSSION 2.1 Extraction and lysis of the bacterial fraction of the soil Separation of the microbial cells from the soil particles, prior to 5 extraction of the DNA, is an alternative which has many advantages over the methods of direct extraction of the DNA in the soil. Specifically, extraction of the microbial fraction limits the contamination of the DNA extract with extracellular DNA freely present in the soil or with DNA of eukaryotic origin. Above all though, the DNA extracted from the microbial 10 fraction of the soil has fragments of longer size and better integrity than the DNA extracted by direct lysis (Jacobson and Rasmussen (1992)). Furthermore, separation of the soil particles makes it possible to avoid contamination of the DNA extract with humic and phenolic compounds, it being possible thereafter for these compounds to seriously impair the 15 cloning efficacies. One of the steps which is a determining factor for the extraction of the cells from the soil is the dispersion of the soil sample in order to dissociate the cells which adhere to the surface or to the inside of aggregates of soil particles. Three successive cycles of grinding for one 20 minute each make it possible to obtain better cell extraction efficacy and a larger amount of DNA recovered, compared with a single cycle of grinding for one minute 30 seconds. Table 5 reports the extraction efficacies obtained after centrifugation on a Nycodenz gradient, on the total viable microflora 25 (counted by microscopy after staining with acridine orange), on the total culturable microflora (counted on solid 10% Trypticase-Soja medium), and on the actinomycetes microflora culturable on HV agar medium (after incubation at 400C in a solution of 6% yeast extract-0.05% SDS in order to bring about germination of the spores). Moreover, the extracted DNA was 100 quantified either after lysing the cells in liquid medium (without purification on a caesium chloride gradient) or after lysing the cells included in an agarose block (after digesting the agarose with a P-agarase). The results show that more than 14% of the total telluric 5 microflora is recovered by this method (i.e. 2 x 108 cells per gram of soil) and that the total culturable microflora represents barely 2% of the total microbial population. Moreover, the amount of DNA extracted from the cells is 330 ng per gram of dry soil. Estimating the DNA content per soil microbial cell to 10 be between 1.6 and 2.4 fg, and given the amount of cells extracted (2 x 108 cells per gram of soil), it can be estimated that virtually all of the cells are lysed and that this lysis does not place any major bias on this approach. The pulsed-field electrophoreses show that the DNA from the soil 15 extracted after Nycodenz and CsCI gradients could be up to 150 kb in size and that the agarose block lysis allowed fragments of more than 600 kb to be extracted. These results confirm the advantage of this approach independent of culture for the construction of environmental DNA libraries, 20 as an alternative to the methods of direct DNA extraction. 2.2 Molecular Characterization of the DNA extracted from the soil The aim of the molecular characterization of the DNA extracted 25 from the soil is to obtain profiles representing the proportions of the various bacterial taxons present in the DNA extract. It also involves the matter of knowing the extraction biases induced by the prior separation of the cellular reaction of the soil, in comparison with a direct extraction method in the absence of a direct visualization of the microbial diversity present in the 101 soils. Specifically, little information has been collected on the extraction of cells on a Nycodenz gradient as a function of their morphological structure (cell diameter, filamentous or sporulated forms). The methods in place hitherto were based on: 5 - quantitative hybridizations using oligonucleotide probes specific for different bacterial groups, applied directly to DNA extracted from the environment. Unfortunately, this approach is not very sensitive and does not allow taxonomic groups or genera present in low abundance to be detected (Amann (1995)). 10 - quantitative PCR such as MPN-PCR (Most Probable Number) (Sykes et al. (1992)) or competitive quantitative PCR (Diviacco et al. (1993)). The respective drawbacks of each of these approaches are (i) the laborious nature due to the multiplication of the dilutions and repetitions, thus making the technique unsuitable for a large number of samples or 15 pairs of primers, and (ii) the need to construct a competitor which is specific for the target DNA and which does not induce any bias in the competition. The method introduced according to the present invention consists in universally amplifying a 700 pb fragment inside the 16S rDNA 20 sequence, in hybridizing this amplificate with an oligonucleotide probe of variable specificity (as regards the kingdom, order, subclass or genus) and in comparing the hybridization intensity of the sample relative to an external calibration range. The amplification prior to the hybridization makes it possible to quantify genera or species of microorganisms that are 25 relatively sparse. Furthermore, the amplification with universal primers makes it possible, during the hybridization, to use a wide series of oligonucleotide probes. It allows a comparison between different modes of lysis (direct or indirect extraction) on well defined taxonomic groups. The results are collated in Table 6.

102 They show similar profiles between the two extraction methods (direct and indirect). Thus, it appears that prior extraction of the telluric microbial fraction does not introduce any genuine bias among the taxons tested. The only significant difference between the two extraction 5 approaches would appear to be the greater abundance of rDNA sequences beloning to y-proteobacteria in the extract by the indirect extraction method. Furthermore, a significant effect of incubating the soil sample in a solution of yeast extract is observed on the sporulated soil populations 10 (Gram', low percentage of GC and actinomycetes). This step brings about germination of the spores and, firstly, definitely allows better recovery of cells of this type, and, secondly, allows greater lysis efficacy on germinating cells. This approach allows a semi-quantitative analysis, targetted on 15 the main taxons defined using microorganisms cultured and usually found in the soils. Only molecular tools make it possible to estimate the magnitude of the various taxons, since culture methods are too restrictive and are dependent on the specificity of the medium used. The results show that a large proportion of the microbial 20 population is not represented in the phylogenetic groups described, thus demonstrating the existence of novel groups made up of microorganisms which have not been cultured hitherto, or which are not culturable. Thus, novel probes can be defined using given sequences starting with DNA extracted from the soil (novel phyla composed of non 25 cultured microorganisms, Ludwig et al. (1997)) in order to obtain a more exact image of the composition of the DNA extract.

103 Example 4: - CONSTRUCTION OF THE COSMID POS7001 Characteristics of POS7001: Replicative in E. coli 5 Integrative in Streptomyces Selectable in E. coliAmpR, HygroR and Streptomyces HygroR The properties of the cosmid make it possible to insert large DNA fragments of between 30 and 40 kb. It comprises 10 1 - The inducible promoter tipA of Streptomyces lividans 2 - The integration system specific for the element pSAM2 3 - The hygromycin-resistance gene 4- The cosmid pWED1, derived from pWED15 15 1) - The inducible promoter of the tip A gene of S. lividans The tipA gene encodes a 19 KD protein whose transcription is induced by the antibiotic thiostrepton or nosiheptide. The tipA is well regulated: induction in exponential phase and in stationary phase (200X) 20 (Murakami T, Holt TG, Thompson CJ., J. Bacteriol 1989 ;171 :1459-66). 2) - The hygromycin-resistance gene - Hygromycin: antibiotic produced by S. hygroscopicus 25 - The resistance gene encodes a phosphotransferase (hph) - The gene used originates from a cassette constructed by Blondelet et al., in which the hyg gene is under the control of its own promoter and of the IPTG-inducible plac promoter (Blondelet-Rouault et al.; Gene 1997 ;190 :315-7) 104 3) - The site-specific integration system The element pSAM2 integrates into the chromosome by means of a site-specific integration mechanism. The recombination takes place 5 between two identical 58 bp sequences present on the plasmid (attP) and on the chromosome (attB). The int gene, located close to the attP site, is involved in the site specific integration of pSAM2, and its product has similarities with the integrases of the temperate bacteriophages of enterobacteria. It has been 10 demonstrated that a pSAM2 fragment containing only the attP attachment site as well as the int gene was capable of integrating in the same manner as the entire element (see French patent No. 88 06638 of 18/05/1988 and Raynal A et al., Mol. Microbiol. 1998 28 :333-42). 15 4) - Construction of the cosmid pOS7001 Step 1/ The promoter TipA was isolated from the plasmid pPM927 (Smokvina et al., Gene 1990; 94:53-9 ) on a 700-base pair Hindlll-BamHI fragment and cloned into the vector pUC18 (Yannish-Perron et al., 1985) 20 digested with HindIll/BamHI. Step 2/ This HindIll-BamHl fragment was subsequently transferred from pUC18 to pUC19 (Yannish-Perron et al., 1985). 25 Step 3/ A 1500-base pair BamHI-BamHI insert carrying the int gene and the attP site of pSAM2 was isolated from the pOSint1, represented in Figure 8 (Raynal A et al. Mol Microbiol 1998 28 :333-42) and cloned into the BamHI site of the preceding vector (pUC19/TipA), in the orientation 105 which allows the int gene to be placed under the control of the promoter TipA. Step 4/ The BamHl site located on the 5' side of the int gene was deleted 5 by partial digestion with BamHI followed by treatment with the Klenow enzyme. A Hindill-BamHl fragment carrying TipA-int-attP was thus isolated from pUC19 and transferred into pBR322 Hindill/BamHl. Step 5/ The hygromycin cassette isolated from pHP45Qhyg 10 (Blondelet-Rouault et al., 1997) on a Hindlil-Hindill fragment was cloned into the HindIll site located upstream of the promoter TipA. Step 6/ The Hindill site located between the QHyg cassette and the promoter TipA was deleted by Klenow treatment after partial HindIll 15 digestion. Step 7/ The plasmid obtained after the preceding step makes it possible to isolate a single Hindlll-BamHl fragment, carrying all the QHyg/TipA/int attP elements, which was cloned after Klenow treatment into the EcoRV site of 20 the cosmid pWED1. The cosmid pWED1, represented in Figure 9, derived from the cosmid pWE15, represented in Figure 10 (Wahl GM, et al., Proc. Natl. Acad. Sci. USA 1987 84:2160-4) by deletion of an Hpal-Hpal fragment carrying the Neomycin gene and the SV40 origin. 25 A map of the vector pOS 7001 is represented in Figure 11. Example 5: Construction of the cosmid which is conjugative and integrative in Streptomyces, the vectors pOSV 303, pOSV306 and pOSV307 106 5.1 Construction of the vector pOSV303 Given that the packaging selects clones larger than 30 kb, only 10 to 15% of the clones contain no insert, and it is thus not really necessary to have a system for selecting recombinants, thus allowing a smaller vector to 5 be constructed. Construction: Step 1 : the vector pOSVO01 Cloning of an 800 base pair Pstl-Pstl fragment carrying the transfer 10 origin OriT of the replicon RK2 (Guiney et al., 1983), into the plasmid pUC19 opened with Pstl. This cloning step makes it possible to obtain a vector which is transferable from E. colito Streptomyces by conjugation. The map of the vector pOSV 001 is represented in Figure 17. 15 Step 2 : the vector pOSVO02 Insertion of the hygromycin marker (Qhyg cassette), which is selectable in Streptomyces, such that the hygromycin-resistance gene is transferred last, thus making it possible to ensure complete transfer of the BAC with the soil DNA insert. 20 Cloning of the hygromycin cassette isolated from pHP4592hyg on a Hindill-Hindlll fragment carrying the hygromycin-resistance gene. This fragment is cloned into the Pstl site (position 201) of the vector pOSVO01. This Pstl site was chosen, given the direction of the transfer, such that the Hygro marker is transferred last during the conjugation. The Pstl and 25 HindIll ends are made compatible after treatment with the Klenow fragment of DNA polymerase, allowing "blunt ends" to be generated. The orientation of the Qhyg fragment is determined at the end of construction. The map of the vector pOSVO02 is represented in Figure 18.

107 Step 3 : the vector pOSVO10 The Xbal-Hindlll fragment isolated from the plasmid pOSVO02 and containing the hygromycin-resistance marker and the transfer origin is cloned into the plasmid pOSint1 digested with Xbal and Hindlll. The 5 orientation of the sites is such that the hygromycin marker will always be transferred last. The plasmid pOSint1, represented in Figure 8, was described in the article by Raynal et al. (Raynal A et al., Mol. Microbiol. 1998 28 :333 42). 10 This construct allows the expression of the integrase in E. coli and Streptomyces. Step 4 : insertion of the "cos" site The principle is to insert a "cos" site into the plasmid pOSVO10, 15 allowing packaging into the plasmid pOSV010, represented in Figure 12. The production of the "cos" fragment is represented in Figure 13. This fragment is obtained by PCR. Starting with a fragment carrying the cohesive ends (cos) of X (bacteriophage lambda or cosmid pHC79), a PCR amplification is carried out using oligonucleotides 20 corresponding to the sequences -50/+130 relative to the cos site. These oligonucleotides also contain the Nsil cloning sites, Pstl compatible, the Xhol site, Sall compatible, and EcoRV, site for obtaining "blunt ends". Addition of the rare Swal and Pacl sites makes it possible to isolate and/or map the insert cloned. 25 The PCR fragment is delimited by a Pstl site at the 5' end and by a Hincil site at the 3' end, allowing cloning into the vector pOSV01 0 (Figure 12) predigested with the enzymes Nsil and EcoRV, bringing about deletion of the laclq repressor.

108 The map of the vector pOSV303 is represented in Figure 14. The vector pOSV303 contains cloning sites such as the Nsil site, Pstl compatible, the Xhol site, Sall compatible or the EcoRV site for obtaining "blunt ends". 5 5.2 Construction of the vector pOSV306 Step 1: Construction of the vector pOSV308 10 The vector pOSV308 was constructed according to the process illustrated in Figure 27. A 643-bp fragment containing the cos region was amplified using a pair of primers of sequences SEQ ID No. 107 and SEQ ID No. 108 from the cosmid vector pHc79 described by Hohm B and Collins (1980). 15 This amplified nucleotide fragment was cloned directly into the pGEMT-easy vector sold by the company Promega, as illustrated in Figure 27, so as to produce the vector pOSV308. Step 2: Construction of the vector pOSV306 20 The vector pOSV01 0 was constructed as described in step 3 of construction of the vector pOSV303, as described in paragraph 5.1 of the present example. The vector pOSV10 was digested with the enzymes EcoRV and 25 Nsil in order to excise a 7874-bp fragment, which was subsequently purified, as illustrated in Figure 28. Next, the vector pOSV308 obtained in step 1) above was digested with the enzymes EcORV and Pstl in order to excise a 617-bp fragment, which was subsequently purified.

109 Next, the 617-bp cos fragment obtained from the vector pOSV308 was integrated by ligation into the vector pOSV10, so as to obtain the vector pOSV306, as illustrated in Figure 28. 5 5.3 Construction of the vector pOSV307 The cosmid pOSV307 still contains the Laclq gene so as to improve the stability of the cosmid in Streptomyces, for example in the S1 7-1 strain of Streptomyces. 10 In order to construct the vector pOSV307, the vector pOSVO10 was subjected to a digestion with the enzyme Pvull, to obtain an 8761-bp fragment which was purified and then dephosphorylated. Next, the vector pOSV308, obtained as described in step 1) of paragraph 5.2 above, was digested with the enzyme EcoRI so as to obtain 15 a 663-bp fragment, which was then purified and treated with the Klenow enzyme. The nucleotide fragment thus treated was integrated into the vector pOSVO10 after ligation so as to obtain the vector pOSV307, as illustrated in Figure 29. 20 Example 6: - Construction of the E. coli-Streptomyces replicative shuttle cosmid pOS70OR The fragments of the plasmid pE116 (Volff et al., 1996) 25 represented in Figure 15 were isolated and Klenow-treated. These fragments contain the sequences required for replication and stability originating from the plasmid SCP2. These two fragments are inserted separately into the EcoRV site of the cosmid pWED1, leading to 2 different clones.

110 The hygromycin cassette isolated from pHP45Qhyg on a Hindill Hindill fragment was cloned into the Hindill site of the pWED1 cosmids containing the ScP2 insert in the form of Pstl-EcoRl or Xbal fragments. It imparts hygromycin resistance which can be selected both in E. coli and in 5 Streptomyces. Transformation of S. lividans and determination of the transformation efficacy. It was found that the cosmid containing the Xbal insert was less 10 stable than that containing the Pstl EcoRli fragment. It is therefore the latter cosmid which was selected under the name pOS700R. The map of the vector pOS 700R is represented in Figure 16. Example 7: Transformation efficacy of the integrative (pOS7001) and 15 replicative vectors Possibilities To render the strain of S. lividans resistant to thiostrepton by integrating the plasmid pTO1 carrying the thiostrepton-resistance marker. 20 Preparation of protoplasts from S. lividans cultured in the presence of thiostrepton. With the pOS7001 vector, the transformation efficacy is about 3000 transformants per jig of DNA. With the vector pOS700R, the transformation efficacy is about 25 30,000 transformants per pg of DNA. Example 8 : Construction of a BAC vector which is integrative in Streptomyces and coniugative Characteristics: 111 Replicative in E. coli Transferable by conjugation of E. coli with Streptomyces Integrative in Streptomyces 5 Selectable in E. coli and Streptomyces Capable of inserting large DNA fragments; it should be pointed out that it is necessary to have available soil DNA which is between 100 and 300 kb in size and which is not contaminated with small fragments. The reason for this is that the small fragments are very preferably integrated. 10 Endowed with a screen for selecting plasmids carrying an insert. This screen makes it possible, by removing the vectors which are closed on themselves and which are not digested, to work with a higher ratio between the vector and the DNA to be inserted, thus making it possible to have better cloning efficacy for making libraries. 15 Construction: Step 1 : the vector pOSVO01 Cloning of an 800 base pair Pstl-Pstl fragment carrying the transfer origin OriT of the replicon RK2 (Guiney et al., 1983), into the plasmid 20 pUC19 opened with Pstl. This cloning step makes it possible to obtain a vector which is transferable from E. coli to Streptomyces by conjugation. The map of the vector pOSV 001 is represented in Figure 17. Step 2 : the vector pOSV002 25 Insertion of the hygromycin marker (92hyg cassette), which is selectable in Streptomyces, such that the hygromycin-resistance gene is transferred last, thus making it possible to ensure complete transfer of the BAC with the soil DNA insert.

112 Cloning of the hygromycin cassette isolated from pHP45Qhyg on a Hindlll-Hindlll fragment carrying the hygromycin-resistance gene. This fragment is cloned into the Pstl site (position 201) of the vector pOSVO01. This Pstl site was chosen, given the direction of the transfer, such that the 5 Hygro marker is transferred last during the conjugation. The Pstl and Hindill ends are made compatible after treatment with the Klenow fragment of DNA polymerase for generating "blunt ends". The orientation of the A hyg fragment is determined at the end of construction. The map of the vector pOSVO02 is represented in Figure 18. 10 Step 3 : the vector pOSVOl 0 The Xbal-Hindlll fragment isolated from the plasmid pOSVO02 and containing the hygromycin-resistance marker and the transfer origin is cloned into the plasmid pOSint1 digested with Xbal and Hindill. The 15 orientation of the sites is such that the hygromycin marker will always be transferred last. The plasmid pOSint1, represented in Figure 8, was described in the article by Raynal et al. (Raynal A et al., Mol. Microbiol. 1998 28 :333 42). 20 This construct allows the expression of the integrase in E. coi and Streptomyces. Step 4 : the vector pOSVO1 4 Addition of a "cassette" making it possible at the end to select in 25 the final construct the plasmids which have foreign DNA inserted. This "cassette" carries the gene encoding the X phage Cl repressor and the tetracycline-resistance gene. This gene carried the target sequence of the repressor in its non-coding 5' region. The insertion of DNA into the HindIll site located in the coding sequence of Cl leads to 113 the non-production of the repressor and thus to the expression of tetracycline resistance. It is carried by the plasmid pUN99 described in the article: Nilsson et al. (Nucleic Acids Res. 1983, 11:8019-30). 5 A Pvull-HindIll fragment isolated from pOSV010 and containing the sequences Int, attP, Hygro and oriT is cloned into the Mscl site of pUN99. The map of the vector pOSV014 is represented in Figure 19. 10 Step 5 : the vector pOSV 403, and integrative and conjugative BAC vector This last step of cloning into pBAC1 1 (represented in Figure 20) gives the final plasmid BAC (Bacterial Artificial Chromosome) characteristics, in particular the ability to accept very large DNA inserts. 15 The Pstl-Pstl fragment of the vector pOSV014 carrying the set of elements and functions described previously is cloned into the plasmid pBAC1 1 (pBeloBAC1 1) digested with Noti. The ends are made compatible by treatment with the Klenow enzyme. The map of the vector pOSV403 is represented in Figure 21. The 20 scheme of Figure 21 indicates the orientation selected. Step 6: The vector pOSV403 contains the Hindill and Nsil sites. The Nsil site is quite rare in Streptomyces and has the advantage of being 25 compatible with Pstl. On the other hand, the Pstl site is common in Streptomyces and can be used to carry out partial digestions. The recombinant clones carrying an insert cloned into the Cl repressor, and thus inactivating this repressor, become tetracycline resistant. Given that the BACs are present only at a rate of one copy per 114 cell, it is necessary to select the recombinant clones with a lower dose of tetracycline than the usual dose of 20 gg/ml, for example with a dose of 5 pg/ml. Under these conditions, there is no background noise. It is also possible to use the system developed and sold by the 5 company InVitrogen, in which the insertion of DNA into the vector inactivates a gyrase inhibitor whose expression is toxic for E. coli. The fragment is preferentially isolated from the vector pZErO-2 (http://www.invitrogen.com/). 10 Example 9 : Construction of an S. alboniger library in the integrative cosmid (pOS7001) and the replicative cosmid (pOS700R) 1) - Construction of the library To evaluate the efficacy of the cloning system, the puromycin 15 biosynthetic pathway of Streptomyces alboniger was cloned into the two shuttle cosmids pOS7001 and pOS700R. The genes of the puromycin biosynthetic pathway are carried by a BamHI DNA fragment of about 15 kb. The genomic DNA of Streptomyces alboniger was isolated. 90% 20 of this DNA has a molecular weight of between 20 and 150 kb, determined by pulsed-field electrophoresis. The two cosmids were digested with the enzyme BamHI (single cloning site). The conditions of partial BamHI digestion of the genomic DNA 25 were determined (50 pg of DNA and 12 units of enzyme, digestion for 5 minutes). After checking the size by agarose gel electrophoresis, the DNA partially digested was introduced into the vectors. In the ligation, 15 pg of genomic DNA + 2 pg of the integrative vector or 5 pg of the replicative vector were used.

115 Each ligation mixture was used for the in vitro encapsidation of the DNA into the heads of bacteriophage lambda. The encapsidation mixtures (0.5 ml) were titrated (integrative vector pOS7001 = 7.5 x 10 cosmids/ml, replicative vector = 5 x 10 4 cosmids/ml). 5 The cosmids were used to transfect E. coli and thus to generate libraries of about 25,000 ampicillin-resistant clones. The DNA from all of these clones was isolated and quantified. To test the libraries, several clones were chosen, the DNA purified and digested with BamHl, in order to check the presence and size of the 10 inserts. The clones tested contain between 20 and 35 Kb of S. alboniger insert. 2) - Identification of the clones containing the puromycin biosynthetic pathway 15 The clones liable to contain the complete puromycin biosynthetic pathway were identified by hybridization with a probe corresponding to the puromycin-resistance gene, the 1.1 kb pac gene (Lacalle et al., Gene 1989; 79, 375-80). 20 Library made in the integrative vector pOS 7001: Among 2000 clones analysed, 9 clones were hybridized with the probe and they contain inserts of about 40 kb. 25 Library made in the replicative vector pOS 700R: Among 2000 clones analysed, 12 clones were hybridized with the probe; they contain inserts of about 40 kb.

116 Using the data published by Tercero et al. (J Biol. Chem. 1996; 271, 1579-90), the clones containing the entire biosynthetic pathway were identified, after hybridization with suitable probes. Certain integrative and replicative cosmids contain a 12,360-base pair fragment after Clal-EcoRV 5 digestion, which leads to the assumption of an insert containing the entire puromycin biosynthetic pathway. 4) - Checking the production of puromycin by the resistant clones (Rh6ne-Poulenc). 10 a) Materials and Methods Strains and culture conditions: Three resistant clones were selected to check the production of puromycin. They correspond to the S. lividans recombinants containing an 15 insert in the integrative vector pOS700 (G 20) or an insert in the replicative vector (G21 and G22). Reference strains were used to ensure that the culture media used allowed this production. They are the S. alboniger wild-type strain 20 ATCC 12461, which produces puromycin, and the S. lividans recombinant strain containing the complete puromycin cluster cloned into the plasmid pRCP1 1 (Lacalle et al, 1992, the EMBO journal, 11, 785-792) (G23). The strains were inoculated in a culture medium whose 25 composition is as follows: Organotechnie bacteriological peptone 5 gl of final medium Springer yeast extract 5 Liebig meat extract 5 Prolabo glucose 15 117 Prolabo CaCO 3 (1) 3 Prolabo NaCl 5 Difco agar (2) 1 5 (1) The 3g of carbonate are mixed with 200 ml of distilled water and then sterilized separately. The addition is carried out after sterilization. (2) The agar is melted beforehand in 100 ml of distilled water, after which it is added to the other ingredients of the medium. 10 pH ajusted to 7.2 before sterilization sterilization for 25 minutes at 121*C 50 pg/I of hygromycin and 5 pg/I of thiostrepton are added to the medium after sterilization so as to maintain a selection pressure for the 15 clones containing an insert by means of the marker gene present on the vector (the thiostrepton-resistance gene being carried by the plasmid pRCP11). 50 ml of liquid culture medium, distributed in 250 ml conical 20 flasks, are inoculated with 2 ml of aqueous suspension of spores and mycelium of each of the strains. The cultures are incubated for 4 days at 28 0 C with stirring at 220 rpm. 50 ml of production medium, distributed in 250 ml conical flasks, are then incoulated with 2 ml of these precultures. The production medium used is an industrial medium optimized for the 25 production of pristinamycin (medium RPR 201). The cultures are incubated at 28 0 C, with stirring at 220 rpm. After different incubation times, a conical flask of each culture is brought to pH 11 and then extracted with twice 1 volume of dichloromethane. The organic phase is concentrated to dryness under reduced pressure and the extract is then taken up in 10 pl of 118 methanol. 100 li of the methanol solution are analysed by HPLC equipped with a diode-bar detector, in a water-acetonitrile 0.05% TFA VN gradient system on a C18 column for the detection of puromycin. 5 b) Results The comparative HPLC analyses from the cultures of the various strains show the production of puromycin in the culture of the wild-type strain at and above 24 h of incubation. A production, although lower, is also clearly detected at and above 48 h in the culture of the clone G20 10 containing the cosmid pOS700l (Figure 23). Puromycin was also detected in trace amounts in the clone G23 containing the complete operon encoding the compound in the plasmid pRCP1 1. However, no production was observed in the cultures of the clones G21 and G22 containing the cosmid pOS700R. The results are given in Figure 23. 15 c) Conclusions The results obtained make it possible to demonstrate the efficacy of the cloning system developed in the cosmid pOS7001 for expressing, in a heterologous host such as S. lividans, a complete biosynthetic pathway 20 under the control of its own regulatory sequences. Moreover, these data also validate the screening of the libraries obtained on the basis of the resistance of the clones to puromycin since it leads to the identification, among a small number of clones, of a recombinant capable of expressing the biosynthetic pathway associated with the resistance gene. The 25 absence of puromycin production in the other clones can probably be explained by the cloning of only a portion of the operon containing the resistance gene but devoid of certain regulatory, transduction or transcription sequences necessary for the synthesis of the compound.

119 EXAMPLE 10 - CLONING OF SOIL DNA INTO VECTORS 1) - Preparation of the soil DNA to be cloned The various DNA fragments need to be purified according to their 5 destination: Cosmids The size of the molecules should be between 30 and 40 kb. Now, 10 the DNA extracted from the soil is heterogeneous in size and comprises molecules of up to 200 or 300 kb. In order to homogenize the sizes, the DNA is broken mechanically by passing the solution through a needle 0.4 mm in diameter. The fragments of a size in the region of 30 kb are not affected by these repeated passages through a needle and it is thus not 15 necessary to carry out a separation on the basis of size especially since the packaging in the particles automatically eliminates the short inserts. BACs Preparation of the DNA 20 The soil DNA is separated by pulsed-field electrophoresis (CHEF type) under conditions such that the fragments between 100 and 300 kb are concentrated in a band of about 5 mm. This is obtained by carrying out the migration in a gel containing 0.7% of normal agarose or 1% of agarose of low melting point with a pulsation time of 100 seconds, for 25 20 hours and at a temperature of 10*C. Recovery of the DNA Two methods are used, their choice depending on the size of the molecules it is desired to isolate, either up to 150 kb or higher.

120 - Up to 150kb The porosity of a 0.7% agarose gel allows the exit of the DNA by electroelution on the condition that there is total absence of ethidium 5 bromide. This DNA is then handled with hydrophobic and enlarged orifice pipetting instruments in order to avoid mechanical fragmentation of the molecules. - Between 100 and 300 kb 10 The band containing the fragments between 100 and 300 kb in size is cut up. For the migration, a gel containing 1% of agarose of low melting point is used. This property makes it possible to melt the gel at a temperature of 650C, which can be tolerated by the DNA, and then to 15 digest it with agarase (Agarase sold by the company Boehringer) at a temperature of 45 0 C according to the supplier's prescriptions. 2) - Use of the integrative cosmid pOS7001 and the replicative cosmid pOS70OR 20 Construction with polyA polyT tails Principle A cosmid vector, opened at any cloning site, is modified at the 3' ends by 25 adding a monotonous polynucleotide. Moreover, the DNA to be cloned is modified at the 3' ends by adding a monotonous polynucleotide which can pair up with the above polynucleotide.

121 The vector-fragment combination to be cloned is made with these polynucleotides and the cos sequence of the vector allows the in vitro packaging of the DNA into lambda phage capsids. 5 Preparation of the vector The vector used is a vector which is self-replicating in E. coli and integrative in Streptomyces. 10 For E. coli, the selection is made on the ampicillin resistance, and for Streptomyces, it is made on the hygromycin resistance. The cosmid is opened at one of the 2 possible sites (BamHl or Hindlll) and the 3' ends are extended with polyA with terminal transferase under the conditions in which the enzyme supplier envisages the addition of 50 to 15 100 nucleotides. Preparation of the DNA to be inserted The 3' ends of the DNA are extended with polyT with terminal transferase 20 under the conditions supplying an extension comparable to that of the vector. Under the experimental conditions described by the manufacturer, the polyA polyT tails are from 30 to 70 bases long. Assembly of the molecules and in vitro encapsidation 25 For the assembly of the molecules, one vector molecule is mixed per molecule of DNA inserted. The concentration of the DNA by mass is 500 ptg.ml .

122 The mixture is encapsidated and the transfection efficacy depends on the strain used as recipient and the DNA inserted: zero with the test DNA and the strain DH5c, the efficacy is comparable for the SURE and DH10B strains; on extraction, the DNA yield is, however, higher with the strain 5 DH10B. Construction by dephosphorylation The soil DNA is rendered with blunt ends by removal of the protruding 3' 10 sequences and filling in of the protruding 5' sequences. This operation is carried out with: Klenow enzyme, T4 polymerase, the 4 nucleotide triphosphates. The cosmid vector is digested with BamHI and then treated with the Klenow enzyme to make the ends blunt, then dephosphorylated to prevent it from closing up on itself. After ligation, the mixture is 15 encapsidated and transfected as described previously. 3) - Use of pBACs Principle 20 The conjugative and integrative plasmid pBAC contains the Hindill and Nsil sites as cloning sites. The insertion of a DNA sequence into these sites inactivates the lambda phage Cl repressor which controls the expression of the tetracycline-resistance gene. Inactivation of the repressor thus makes the cell resistant to this antibiotic (5 pg.m 1 ). The cloning at these sites is 25 facilitated by modifying the vector and preparing the DNA to be cloned.

123 Preparation of the vector. Hindill example In order for the vector not to close up on itself, the Hind IIl site is modified: the first base (A) is reinserted to form a protruding 5' sequence, which 5 cannot pair up with its partners. The operation is carried out with the Klenow enzyme in the presence of dATP. The success of the operation is checked by carrying out a self-ligation of the vector before and after treatment with the Klenow enzyme. For an 10 identical amount of test DNA, 3000 clones are obtained before treatment and 60 clones after treatment. Preparation of the DNA (size between 100 and 300 kb) Giving the DNA blunt ends 15 The DNA is given blunt ends by removing the protruding 3' sequences and filling in the protruding 5' sequences. This operation is carried out with: Klenow enzyme, T4 polymerase, the 4 nucleotide triphosphates. 20 Preparation of the ends. Hindill example The addition of DNA to the vector is carried out by means of oligonucleotides which recognize the Hindill modified sequence of the vector. They contain rare restriction sites to allow the subsequent clonings (Swal; Notl). This technique is derived from that of: Elledge 25 SJ, Mulligan JT, Ramer SW, Spottswood M, Davis RW. Proc. Natl Acad. Sci. USA 1991 Mar 1;88(5):1731-5 Two complementary oligonucleotides are used: Oligo 1: 5'-GCTTATTTAAATATTAATGCGGCCGCCCGGG-3' (SEQ ID No 25) 124 Oligo 2: 5'-CCCGGGCGGCCGCATTAATATTTAAATA-3' (SEQ ID No 26) They are phosphorylated at the 5' end with T4 polynucleotide kinase in 5 the presence of ATP, after hybridization. This phosphorylation step can be eliminated by using the already-phosphorylated oligonucleotides. The ligation of this double-stranded adapter with the DNA to be inserted into a vector is carried out with T4 ligase in the presence of a very large excess of adapter (1000 adapter molecules per molecule of 10 DNA to be inserted) over 15 hours at 140C. The excess adapter is removed by agarose gel electrophoresis and the molecules of interest are recovered from the gel by hydrolysing it with agarase or by electroelution. 15 Vector-DNA ligation The ligation is carried out at 14 0 C over 15 hours with 10 molecules of vector per insert molecule. Transformation 20 The recipient strain is the strain DH10B. The transformation is carried out by electroporation. To express the tetracycline resistance, the transformants are incubated at 370C for 1 hour in antibiotic-free medium. The clones are selected by culturing overnight on gelled LB medium supplemented with 5 gg.mf' of tetracycline. 25 Example 11: CLONE-TO-CLONE CONJUGATION BETWEEN E. COLI AND STREPTOMYCES 125 CONJUGATION BETWEEN E COLI STRAIN S17.1 CONTAINING PPM803 AND STREPTOMYCES LIVIDANs TK 21 Introduction 5 It is possible to carry out conjugations between E. coi and Streptomyces (Mazodier et al, 1989). The adaptation of this method, by developing a so called drop technique in which 10 pl of an E. coi culture containing a recombinant vector are mixed with one drop of recipient S. lividans, consists in carrying out a clone-to-clone transformation while ensuring that, 10 at the end of the operation, all of the library constructed in E. coi is introduced into S. lividans. A bulk transformation would necessarily lead to a multiplication of the Streptomyces transformant clones in order to be sure in practice that the library in E. coi is fully represented in S. lividans. Furthermore, this method is easy to automate. 15 Preliminary tests Conjugation between E. coi strain S17.1 containing the vector pOSV303 and S. lividans TK21. 20 Under these conditions, 6 x 106 E. coi cells are mixed with 2 x 106 pre germinated S. lividans spores in a final volume of 20 pl. Development of the method 25 It is known that the DNA extracted from certain actinomycetes is modified and, as a result, cannot be introduced into certain strains of E. coi without it being restricted. The E. coi strain DH10B which accepts these DNAs is not capable of transferring to Streptomyces a plasmid containing only oriT, and it is thus necessary to construct such a plasmid. A derivative of RP4 126 should be introduced therein by integration into the chromosome, this derivative being capable of trans-supplying all the functions required to ensure the transfer of the recombinant clones containing the transfer origin oriT. 5 Example 12: Construction of a cosmid library in E. coil and Streptomyces lividans: Cloning of the soil DNA The object is to construct a library of large-sized environmental 10 DNA, without a prior step of culturing the microorganisms, with the aim of gaining access to the metabolic genes of bacteria (or of any other organism) which it is not known how to culture under standard laboratory conditions. 15 The procedure described was used to generate a DNA library in Escherichia coli using the E. coli-S. lividans shuttle cosmid pOS7001 and DNA extracted and purified from the bacterial fraction of a soil. This last method makes it possible to obtain DNA of high purity and with an average size of 40 kb. Also, in order to avoid a partial digestion of the extracted 20 DNA in the cloning, an alternative strategy was adopted based on the use of the terminal transferase enzyme for adding polynucleotide tails to the 3' ends of the DNA and of the vector. 5 pg of DNA were extracted from 60 mg of "Saint-Andr6 coast" soil according to the protocol described in Example 3, and were treated 25 with terminal transferase (Pharmacia) to extend the 3' ends with a monotonous polynucleotide (poly T) (Example 10). The integrative cosmid pOS7001 is prepared according to protocol B1, Orsay. After a standard step of purification in the presence of phenol/chloroform, the DNA and the vector are assembled by mixing one 127 molecule of vector and one molecule of inserted DNA. The mixture is then encapsidated in the heads of lambda bacteriophages (Amersham kit) which serve to transfect E. coli DH10B. The cells transfected are then inoculated on LB agar medium in the presence of ampicillin for the 5 selection of the recombinants resistant to this antibiotic. A library of about 5000 ampicillin-resistant E. coli clones was obtained. Each clone was inoculated in LB or TB medium + ampicillin in a microplate well (96 wells) and stored at -800C. 10 The sequence at the sites of insertion of the soil fragments into the vector, pOS7001, generated during the construction of the library was analysed. For this, 17 cosmids of the libraries were purified and sequenced with a primer, seq.5' CCGCGAATTCTCATGTTTGACCG 3', which hybridizes 15 between the BamHl site and the Hindill cloning site present in the vector. The sequences obtained made it possible to estimate that the length of the homopolymeric tails at the junction points is very variable, between 13 and 60 poly-dA/dT. Beyond the tails, the sequences of the soil fragments thus 20 generated have a percentage of G+C of between 53 and 70%. Such high percentages were unexpected, but similar results have already been reported on crude preparations of soil DNA (Chatzinotas A. et al., 1998). A strategy of "pooling" 48 or 96 clones was used to analyse the 25 microbial and metabolic richness. The cosmid DNA extracted from these "pools" of clones was then used to carry out PCR or hybridization experiments.

128 Example 13: Diversity of the 16S ribosomal DNA in the cloned DNA a) Materials and methods The cosmids of the library are extracted from pools of clones by 5 alkaline lysis and are then purified on a caesium chloride gradient, in order to take up the band of cosmid DNA in supercoiled form and for the purpose of eliminating any Escherichia coli chromosomal DNA which might interfere in the study. After linearization of the cosmids by the action of S1 nuclease, 10 (50 units, 30 minutes at 37 0 C), the 16S rDNA sequences contained in the pools of clones are amplified under the standard amplification conditions, using the universal primers 63f (5'-CAGGCCTAACACATGCAAGTC-3') and 1387r (5'-GGGCGGWGTGTACAAGGC-3') defined by Marchesi et al. (1998). The amplification products of about 1.5 kilobases are purified using 15 the Qiaquik gel extraction kit (Qiagen) and then cloned directly into the vector pCR 11 (Invitrogen) in Escherichia coli TOP10, according to the manufacturer's instructions. The insert is then amplified using the primers M13 forward and M13 reverse specific for the cloning site of the vector pCR 11. The amplification products of expected size (about 1.7 kb) are 20 analysed by RFLP (Restriction Fragment Length Polymorphism) using the enzymes Cfol, Mspl and BstUl (0.1 units) in order to select the clones to be sequenced. The restriction profiles obtained are separated on 2.5% Metaphore agarose gel (FMC Products) containing 0.4 mg of ethidium bromide per ml. 25 The 16S rDNA sequences are then determined directly using the PCR products purified with the "Qiaquick gel extraction" kit with the aid of the sequencing primers defined by Normand (1995). The phylogenetic analyses are obtained by comparing the sequences with the prokaryotic 16S rDNA sequences collated in the Ribosomal Database Project (RDP) 129 database, version 7.0 (Maidak et al. (1999)) by means of the SIMILARITY MATCH program, which makes it possible to obtain the similarity values relative to the database sequences. 5 b) Results To determine the phylogenetic diversity represented in the library, 47 sequences of the 16S rRNA gene were isolated from pools of 288 clones and were sequenced almost entirely. The results are given in Table 7. 10 Analysis of the sequences by interrogation of the databases reveals that most of the sequences (>61%) have percentages of similarity of less than or equal to 95% with identified bacterial species (Table 7). Out of the 47 sequences analysed, 28 sequences have non-cultured bacteria 15 as closest neighbours, the sequences of which were obtained directly from DNA extracted from the environment. The majority of these sequences moreover have very low percentages of similarity (88-95%), 17 sequences out of 28 thus differing by more than 5% relative to their closest neighbours. 20 Among the sequences which can be classified in a phyletic group, a majority of sequences belong to the proteobacteria subclass a (18 sequences with a percentage of similarity of between 89 and 99%). A second group of sequences is represented by the proteobacteria subclass g, comprising 9 sequences whose percentages of similarity range 25 between 84 and 99%. The groups of b-proteobacteria and d proteobacteria, which are Firmicutes with a low G+C% and a high G+C%, comprise 1, 4, 3 and 5 sequences, respectively. Only one sequence could not be classified among the major bacterial taxonomic groups defined: the sequence a22.1(19), its closest neighbour Aerothermobacter marianas 130 (with a similarity of 89%) itself being a strain isolated from the marine environment and not classified at the current time. Finally, 6 sequences can be classified in the group of AcidobacteriumlHolophaga. This group has the particular feature of being represented by only two cultured 5 bacteria, Acidobacterium capsulatum and Holophaga foetida, this entire group being composed of bacteria for which only the 16S rRNA gene has been detected by amplification and cloning using DNA extracted from an environmental sample (mainly from soil) (Ludwig et al., (1997)). The low values of similarity between the different sequences composing this group 10 makes it possible to predict great heterogeneity and diversity within this group. The set of results is represented in Table 7. These results show that the sequences contained in the cosmid library are thought to be derived from microorganisms that are not only 15 phylogenetically diversified but above all from microorganisms which have never been isolated to date. The results of the sequencing of the DNAs amplified allowed the establishment of a phylogenetic tree of the organisms present in the soil sample whose characterized sequences are novel. 20 The phylogenetic tree represented in Figure 7 was produced from the alignment of the sequences by the MASE software (Faulner and Jurak, 1988) and corrected by the Kimura 2-parameter method (1980), and with the aid of the Neighbour Joining algorithm (Saitou and Nei, 1987). The 25 phylogenetic analysis allowed comparison of the 16S rDNA sequences cloned in the soil DNA library, with sequences of prokaryotic 16S rDNA collated in the Ribosomal Database Project (RDP) databases (version 7.0, SIMILARITY-MATCH program, Maidak et al., 1999) and in the GenBank base by means of the BLAST 2.0 software (Atschul et al, 1997).

131 Example 14 : Genetic preselection of the library to evaluate the metabolic richness To characterize the library obtained in terms of metabolic diversity and to identify the clones containing inserts carrying genes which may be 5 involved in biosynthetic pathways, genetic screening techniques based on PCR methods were developed according to the invention in order to detect and identify type I PKS genes. 1 Bacterial strains, plasmids and culture conditions 10 S. coelicolor ATCC101478, S. ambofaciens NRRL2420, S. lactamandurans ATCC27382, S. rimosus ATCC109610, B. Subtilis ATCC6633 and B. licheniformis THE1856 (collection RPR) were used as DNA sources for the PCR experiments. S. lividans TK24 is the host strain 15 used for the shuttle cosmid POS1700. For the preparation of genomic DNA, suspensions of spores and protoplasts and for the transformation of S. lividans, the standard protocols described in Hopwood et al.(1 986) were followed. Escherichia coli ToplO (INVITROGEN) was used as host for the cloning of 20 the PCR products and E. coli Sure (STRATAGENE) was used as host for the shuttle cosmid pOS700l. The E. coli culture conditions, the preparation of plasmids, the digestion of the DNA and the agarose gel electrophoresis were carried out according to the standard procedures (Sambrook et al.,1996). 25 2. PCR primers: The primer pairs al-a2 and bl-b2 were defined by the team of N. Bamas-Jacques and their use was optimized for the screening of the 132 DNA from the pure strains and of the soil library for the investigation of genes encoding PKSI. Table 8: 5 PCR primers that are homologous to the PKSI genes used for screening the library. a1 (+) 5' CCSCAGSAGCGCSTSTTSCTSGA 3' a2 (-) 5' GTSCCSGTSCCGTGSGTSTCSA 3' b1 5' CCSCAGSAGCGCSTSCTSCTSGA 3' b2 5' GTSCCSGTSCCGTGSGCCTCSA 3' 10 Amplification conditions: For the investigation of PKS I from the DNA of pure strains, the amplification mixture contained: in a final volume of 50 p, between 50 and 150 ng of genomic DNA, 200 pM of dNTP, 5 mM of MgC 2 final, 7% 15 DMSO, 1x Appligene buffer, 0.4 pM of each primer and 2.5 U of Appligene Taq polymerase. The amplification conditions used are: denaturing at 950C for 2 minutes, hybridization at 650C for 1 minute, elongation at 720C for 1 minute, for the first cycle, followed by 30 cycles in which the temperature is reduced to 580C, as described in K. Seow et al., 1997. The final 20 extension step is carried out at 720C for 10 minutes. For the investigation of PKS I from the DNA of the library, the PCR conditions are the same as above for the al-a2 pair using between 100 and 500 ng of cosmid extracted from pools of 48 clones.

133 For the bl-b2 primer pair, 500 ng of cosmids derived from pools of 96 clones were used. The amplification mixture contained 200 pM of dNTP, 2.5 mM of Mgl 2 final, 7% DMSO, 1x Quiagen buffer, 0.4 pM of each primer and 2.5 U of hot-start Taq polymerase (Qiagen). The amplification 5 conditions used are: denaturing for 15 minutes at 950C followed by 30 cycles: 1 minute of denaturing at 950C + 1 minute of hybridization at 650C for the first cycle and 620C for the other cycles, 1 minute of elongation at 720C, final extension step of 10 minutes at 720C. The identification of the positive clones from the pools of 48 or 96 10 clones is carried out using replicas of the corresponding parent microplates on solid medium or any other standard replication method. 3 Subcloning and sequencing 15 The PCR products of the clones identified were sequenced according to the following protocol: The fragments are purified on agarose gel (gel extraction kit (Qiagen)) and cloned into E.coli TOP 10 (Invitrogen) using the TOPO TA cloning kit (Invitrogen). The plasmid DNA of subclones is extracted by alkaline lysis on 20 a Biorobot (Qiagen) and dialysed for 2 h on a 0.025 pm VS membrane (Millipore). The samples are sequenced with the "universal" and "reverse" M13 primers on the ABI 377 96 sequencer (Perkin Elmer). 4) Results 25 Definition and validation of the PCR primers Two highly conserved regions of actinomycetes type I PKS, comprising the active site of the enzyme, were targeted for the amplification of homologous genes with degenerate primers. These two 134 regions correspond to the sequences PQQR(L)(L)LE and VE(A)HGTGT, respectively. Primers (Table 8) were tested with the DNA of strains producing or 5 not producing macrolides: Streptomyces coelicolor, Streptomyces ambofaciens, producing spiramycin, and Saccharopolyspora erythraea, producing erythromycin. Irrespective of the primers used, bands representing fragments of about 700 pb and corresponding to the length of the expected fragment were obtained with all the strains. 10 These results demonstrate the specificity of the primers a and b for the PKS I genes of productive strains or of silent genes in S. coelicolor. The sequencing of the PCR products obtained with the al-a2 primer pair made it possible to identify, from the S. ambofaciens strain, the sequence of a KS gene already described (European patent application 15 No. EP 0 791 656) as belonging to the pathway for the biosynthesis of plantenolide, a macrolide precursor of spiramycin, and two sequences never described, Stramb 9 and Stramb1 2 (see sequence listing). As regards S. erythraea, the screening method allowed the 20 identification of a sequence of KS (sacery17) which is identical to that of the KS of module 1 already published in Genebank (Access number M63677), encoding synthetase 1 (DEBS1) of 6-deoxyerythronolide B. Another sequence not correlated to the erythromycin biosynthetic pathway was identified and is the sequence SEQ ID No 32. 25 Conclusion A method for analysing the presence of genes encoding type I PKSs by PCR from different microorganisms has been developed. The highly conserved structure of the type I keto-synthetase domain made it possible 135 to produce a PCR method based on the use of GC-biased degenerate primers for the choice of the codons. This approach shows the possibility of identifying genes or clusters involved in the biosynthetic pathway of type I polyketides. The 5 cloning of these genes allows the creation of a collection which may then be used to construct polyketide hybrids. The same principle can be applied to other classes of antibiotics. The results obtained here also show the presence of genes which may belong to silent clusters (SEQ ID No 30 to 32). 10 The presence of silent clusters has already been documented in S. lividans and their expressions are triggered by specific or pleiotropic regulators (Horinouchi et al.; Umeyama et al. 1996). These results suggest that the detection of genes belonging to so-called silent pathways in reality encode active enzymes capable of directing, in combination with the other 15 specific enzymes of the pathway, the enzymatic steps required for the synthesis of the secondary metabolites. Screening of the library 20 The screening was carried out under the conditions described in the Materials and Methods section using the primer pairs validated from productive strains. In the presence of the al-a2 primer pair, the size of the PCR products obtained from cosmid DNA extracted from pools of 48 or 96 25 clones was about 700 bp, which is thus in agreement with the expected results. The intensity of the bands obtained was variable, but only one amplification band was present for each pool of target DNA.

136 Under these conditions, 8 groups of target DNA were detected, corresponding to 9 positive clones after dereplication. The screening carried out with the second primer pair, bl-b2, gave less specific amplification results since many satellite bands were 5 observed alongside the 700 bp band. Nevertheless, 9 groups of target DNA were detected, corresponding to 14 positive clones after dereplication starting with these positive clones, and the DNA was extracted for the steps of sequencing and transformation of S. lividans. 10 Analysis of the cosmids Digestion of the cosmids identified by PCR with the enzyme Dral, which recognizes an AT-rich site, frees a fragment greater than 23 kb (Figure 22). This suggests that the PCR method preferentially targets soil DNA containing a high percentage of G+C. This result is the consequence 15 of the degeneracy of the primers used, which are GC-biased, for the choice of the codons. The inserts, as expected in the case of cosmids, are larger than 23 kb in size, except in one case (clone a9B12), which might reflect a certain level of instability of the cosmids. Moreover, among all the clones selected, only two of them, GS.F1 and GS.G1 1, showed the same 20 restriction profile, indicating a low level of redundancy in the library. The cosmids selected were transferred into Streptomyces lividans by transformation of protoplasts in the presence of PEG 1000. The transformation efficacy ranges between 30 and 1000 transformants per pg of cosmid DNA used. 25 Sequencing and phylogenetic analysis of the soil PKS I genes The PCR method developed on the pure strains was used as described on the cosmids of the library and 24 clones were thus identified.

137 The PCR products of about 700 bp obtained from the DNA of two pools (48 clones) and of 8 unique clones, were cloned, after purification on agarose gel, and sequenced. This allowed the identification of 11 sequences. 5 The alignment of the deduced protein sequences of soil PKSs I with other PKSs I present in different microorganisms (Figure 24) shows the presence of a highly conserved region which corresponds to the consensus region of the active site of 0-ketoacyl synthetase. 10 Analysis of the sequences obtained with the "codon preference" method (Gribskov et al., 1984 ; Bibb et al., 1984) revealed the presence of a strong bias in the use of codons rich in G+C in a single reading frame. The proteins deduced according to this reading frame show strong 15 similarity with known type I KSs (Blast program). In particular, the similarity between the sequences of KSs from the soil and of KSs of the erythromycin cluster is about 53%. After dereplication of a pool and identification of the unique clone, the sequence of the PCR product obtained from this clone is identical to 20 that of the pool, which confirms the reliability of the method used. Analysis of the sequence of the PCR product of a clone allowed the probable identification of 3 different KSI genes. One of these sequences (SEQ ID No 34) has a similarity of 98.7% with the sequence of another pool, suggesting that they encode the same enzyme. The other two 25 sequences are different but strongly homologous. The cloning and identification in a soil DNA library of pathways for the biosynthesis of secondary metabolites containing genes encoding type I KSs is described here for the first time.

138 The high percentage of G+C in the soil sequences suggests that they may derive from genomes having a codon use similar to that of actinomycetes. Although the data available in the literature is limited, it is known that 5 the genes encoding type I PKSs are highly diversified on account of their physical organization in the genome, size and the number of modules contained in each gene. The presence of several domains originating from a single clone is confirmation that they belong to asymmetric polyketide clusters. In a single 10 case, two clones appear to form a contiguum since they share the same sequence for the KS domain. The size of the genetic regions involved in PKSI synthesis ranges between a few kb for penicillin to about 120 kb for rapamycin. The size of the cosmid inserts may thus not be sufficient for the expression of the most 15 complex clusters. Genes encoding PKSs I, capable of working iteratively like the PKSs Il and of controlling the synthesis of aromatic polyketides, have been described (Jae-Hyuk et al., 1995). The study of soil PKS I clusters may provide further novelties in this field. 20 5. Identification of 6 genes encoding polyketide synthases On continuing the screening of the cosmid library according to the protocols described in the present example, the inventors identified a 25 cosmid clone containing a 34071-bp insertion containing several open reading frames encoding polypeptides of the polyketide synthase type. More specifically, the cosmid thus identified by screening the library contains six open reading frames encoding polyketide synthase 139 polypeptides or very closely related polypeptides, non-ribosomal synthase peptides. A detailed map of this cosmid is represented in Figure 36. The complete nucleotide sequence of the cosmid constitutes the sequence SEQ ID No. 113 of the sequence listing. The DNA insertion 5 contained in the sequence SEQ ID No. 113 constitutes the complementary nucleotide sequence (- strand) of the nucleotide sequence encoding the various polyketide synthases. The nucleotide sequence of the DNA insertion contained in the cosmid in Figure 36 which comprises the open reading frames encoding 10 the polyketide synthase polypeptides (+ strand) is represented schematically in Figure 37 and constitutes the sequence SEQ ID No. 114 of the sequence listing. Furthermore, a detailed map of the various open reading frames contained in the DNA insertion of this cosmid is represented in Figure 37. 15 The characteristics of the nucleotide sequences comprising open reading frames contained in the DNA insertion of this cosmid are detailed below. ORF1 Sequence 20 The orf1 sequence comprises a partial open reading frame 4615 nucleotides long. This sequence constitutes the sequence SEQ ID No. 115, which starts at the nucleotide in position 1 and ends at the nucleotide in position 4615 of the sequence SEQ ID No. 114. 25 The sequence SEQ ID No. 115 encodes the 1537-amino acid ORF1 polypeptide, this polypeptide constituting the sequence SEQ ID No. 121. The polypeptide of sequence SEQ ID No. 121 is related to the non ribosomal synthase peptides. This polypeptide has a degree of amino acid 140 identity of 37% with the synthase peptide of Anabaena sp.90 referenced under the access number "emb CACO1604.1" in the Genbank database. ORF2 sequence 5 The orf2 nucleotide sequence is 8301 nucleotides long and constitutes the sequence SEQ ID No. 116, which starts at the nucleotide in position 4633 and ends at the nucleotide in position 12933 of the sequence SEQ ID No. 114. 10 The ORF2 sequence encodes the 2766-amino acid ORF2 peptide, this polypeptide constituting the sequence SEQ ID No. 122. The polypeptide of sequence SEQ ID No. 122 has an amino acid sequence identity of 41% with the MtaD sequence of Stigmatella aurantiaca referenced under the access number "gb AAF 19812.1" from 15 the Genbank database. The ORF2 polypeptide constitutes a polyketide synthase. ORF3 sequence 20 The orf3 nucleotide sequence is 5292 nucleotides long and constitutes the sequence SEQ ID No. 117. The sequence SEQ ID No. 117 corresponds to the sequence which starts at the nucleotide in position 12936 and which ends at the nucleotide in position 18227 of the sequence SEQ ID No. 114. 25 The nucleotide sequence SEQ ID No. 117 encodes the 1763-amino acid ORF3 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 123 according to the invention. The ORF3 polypeptide of sequence SEQ ID No. 123 has an amino acid identity of 42% with the MtaB sequence of Stigmatella aurantiaca 141 referenced under the access number "gb AAF 19810.1" from the Genbank database. ORF4 sequence 5 The orf4 nucleotide sequence is 6462 nucleotides long and constitutes the sequence SEQ ID No. 118 according to the invention. The nucleotide sequence SEQ ID No. 118 corresponds to the sequence starting at the nucleotide in position 18224 and ending at the 10 nucleotide in position 24685 of the nucleotide sequence SEQ ID No. 114. The nucleotide sequence SEQ ID No. 118 encodes the 2153-amino acid ORF4 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 124 according to the invention. The ORF4 polypeptide of sequence SEQ ID No. 124 has an amino 15 acid sequence identity of 46% with the epoD sequence of Sorangium cellulosum referenced under the access number "gb AAF62883.1" of the Genbank database. ORF5 sequence 20 The orf5 nucleotide sequence is 5088 nucleotides long and constitutes the sequence SEQ ID No. 119 according to the invention. The sequence SEQ ID No. 119 corresponds to the sequence starting at the nucleotide in position 24682 and ending at the nucleotide in 25 position 29769 of the nucleotide sequence SEQ ID No. 114. The nucleotide sequence SEQ ID No. 119 encodes the 1695-amino acid ORF5 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 125 according to the invention.

142 The ORF5 polyketide synthase polypeptide of sequence SEQ ID No. 125 has an amino acid identity of 43% with the epod sequence of Sorangium cellulosium referenced under the access number "gb AAF 62883.1" of the Genbank database. 5 ORF6 sequence The orf6 nucleotide sequence is 4306 nucleotides long and constitutes the sequence SEQ ID No. 120 according to the invention. The 10 nucleotide sequence SEQ ID No. 120 corresponds to the sequence starting at the nucleotide in position 29766 and ending at the nucleotide in position 34071 of the sequence SEQ ID No. 114. The sequence SEQ ID No. 120 contains a partial open reading frame encoding the 1434-amino acid ORF6 polypeptide of the polyketide 15 synthase type, this polypeptide constituting the sequence SEQ ID No. 126 according to the invention. The polypeptide of sequence SEQ ID No. 126 has an amino acid identity of 43% with the epoD sequence of Sorangium cellulosum referenced under the access number "gb AAF 62883.1" of the Genbank 20 database. EXAMPLE 15: Construction of shuttle vectors of integrative BAC type in Streptomyces 25 Construction of shuttle vectors of the integrative and conjugative BAC type in Streptomyces 15.1 Construction of the vector pMBD-1 143 The vector BAC pMBD-1 was obtained according to the following steps: Step 1: The vector pOSVO10 was subjected to a digestion with the 5 enzymes PsTI and BstZ171 in order to obtain a 6.3-kb nucleotide fragment. Step 2: The vector pDNR-1 was digested with the enzymes Pstl and Pvull in order to obtain a 4 145-kb nucleotide fragment. 10 Step 3: The 6.3-kb nucleotide fragment derived from the vector pOSVO17 was fused by ligation with the 4.15-kb fragment derived from the vector pDNR-1, so as to produce the vector pMBD-1, as illustrated in Figure 30. 15 15.2 Construction of the vector pMBD-2 The vector pMBD-2 is a vector of the BAC type containing an "$c3l int-4hyg" integrative box. 20 $c31 is a broad host spectrum temperate phage whose site of attachment (attP) is well localized. The $c31 int fragment is the minimum fragment of the actinophage $c31 capable of inducing the integration of a plasmid into the chromosome of Streptomyces Lividans. Qhyg is a derivative of the Q interposon capable of conferring 25 hygromicin resistance in E.coli and S.Lividans. BAC vectors containing the $c31 integration system are described by Sosio et al. (2000) and in PCT patent application No. 99/6734 published on 29 December 1999.

144 The vector BAC pmBD-2 was constructed according to the following steps: Step 1: Construction of a c31 int Qhyg integrative box in an E.coli multicopy plasmid. 5 The $c31int fragment was first amplified from the plasmid pOJ436 using the following pair of primers: - The primer EV~c31l (SEQ ID No.109) (which allows the introduction of an EcoRV site into the 5' end of the $c31 sequence) and the primer Blhc31F (SEQ ID No. 110) (which allows the introduction of a BgLll 10 site into the 3' end of the $c31 sequence). The Qhyg fragment was obtained by digestion using the BamHl enzyme of the plasmid pHP45 9 hyg described by Blondelet-Rouault (1997). Next, the $c31 int-Phyg integrative box was cloned into the vector 15 pMCS5 digested with the enzymes Bglll and EcoRV. Step 2: Construction of the vector pMBD-2 The bacterial artificial chromosome pBAce3.6 described by 20 Frengen et al. (1999) was digested with the enzyme Nhel and then treated with the enzyme Eco polymerase. Next, the vector pMCS5 $c31 int-Qhyg was digested with the enzymes SnaBi and EcoRV so as to recover the integrative box. The detailed map of the vector pMBD2 is represented in Figure 31. 25 15.3 Construction of the vector pMBD-3 The vector pMBD-3 is an integrative ($c31 int) and conjugative (OriT) vector of the BAC type, which comprises the selection marker Qhyg.

145 The map of the vector pMBD-3 and also the method for constructing it are illustrated in Figure 31. The vector pMBD-3 was obtained by amplifying the OriT gene starting with the plasmid pOJ436 using the pair of primers of sequences 5 SEQ ID No. 111 and SEQ ID No. 112 which contain pac restriction sites. The nucleotide fragment amplified using the primers SEQ ID No. 111 and SEQ ID No. 112 was cloned into the vector pMBD2 predigested with the Pacl enzyme. The scheme for constructing the vector pMBD-3 is illustrated in Figure 31. 10 15.4 Construction of the vector pMBD-4 The detailed map of the vector pMBD-4 is represented in Figure 32. The vector pMBD-4 was obtained by cloning the $c31 int-ihyg 15 integrative box into the vector pCYTAC2. 15.5 Construction of the vector pMBD-5 The scheme for constructing the vector pMBD-5 is illustrated in 20 Figure 33. The vector pMBD-5 was constructed by recombination of the nucleotide fragment included between the two loxP sites of the vector pMBD-1 illustrated in Figure 33 with the loxp site contained in the BAC vector designated pBTP3, a detailed map of the plasmid pBTP3 being 25 represented in Figure 34.

146 15.6 Construction of the vector pMBD-6 The vector pMBD-6 was constructed by recombining the nucleotide fragment included between the two loxP sites of the vector pMBD-1 into 5 the loxP site of the BAC pBeloBacl 1 vector, as represented in Figure 35.

00 c o Lm - c 0 M~ NT LO C5 ) LO (D w cc E~ 0 0 0 . ~ S 6? x 0E =V C0 00P =0 fl xO 31fT -L6 Z 0 0)~ o- (U 0 C l o c Y : &- 0 ca~. a) 0) )V 0 )c6 6 CV5 1,: 0i EC) It 9 Y o co Cf ) 0 ) N- Cf) 0 0U - 0 C~ c'o N- C0N 0 0 c C'l C.j 't I C'J(to ~ .C 0 O0 ca C\J C It N-0 cc < cl) Co co LO) Cf) (C'~j L- C (U ca c 0 ca M 0 (I)ci)0 .) ~ C ) U)) _a co CiCl O 0 o 0 0 0. E .

cE (U F- CD Cn C L E CL U) ~a) S 0 0-u

C

0 ca M a) 0 V L. C 0 N O fL 0 ) <~u (U o Is 0D 0 C6-- O LO L 9 a) 70 (D(U a CU) cu & - =~ 0 V 30- Zt T = U) UO 0)~~a 0)c 0( 0) 0) .i~ 0)) 0) 0) 0))0) 0)0) ca 00 _- a) z n- co cn .) U)D U)U) ) 0. i. a) E E E i i E : <a < -L) 0 <0 0 w 0 a)~L. 0 < - o C) < < < U caCl) L)O L 00< co 0~~~~L a)C)0 D< U a <~~~a < n.)0 c 0 0 0 ( 0 o H <O (' OH(D~ c'J~ ~~~ C3 - ' 0 0 H O w 0.0 U* O0 H <<0< 'aQ O< O < H0 C- L a) c) < cOO< oo< U) ( a I< L). <l >v C):j < ) C a) .0 D a) a) 0 0o < 0' q_ . (5L 0 a) _ <_ U) on a) o 0 CL E ) a)) ItU H (D CO L~ >. 0. .. -0 E- .E E. _0 o co c? C Y 0. cu C.0)0. . O O L a)) Q u )O 00 0 0 6 E' OU.)00 OO O . U) w (D -C ~a a) ) CD to4 2)~~ E) ; .C 0 0c 0 0 j c o - (0 IT + rO LO P- c> N ~- 0 to Cf)0 4 Ec CY) co oo( cc cy) a mu -0 +, m a)a0 +0 -0 av cc c ( E + -CY) Nm N- aD 0 1* 0 0Q ca + (o I~. 0.0 0) 't 'C 5 0 = 0 a)j m~ 0 - 0 C U ) co , ca 0O J C co N COI c + + +~ -~ E U c~ LO - -~ 0 cc M- C :3 0 4-ca 0 , 0 2 Lf N ~ CD C+ 5C 0. _ 0 cci 0c (0l + 4) CO .0 LO CD N~ ca o 0)+ o- C N I C - - CM (a 0 0 0 ' + + +- 0) N- 0) (C) + E '" E '- i N o 't a) a)-c o a)o 0 0 00 0 C O a) _) a) 0c C.) c Ua a) C~ co .0 >. 0 cu>% 0 ;- U) E 0a 0o c 9a L- co 0. 0. a) Na) a) M :3 E 0 0 , a_ I.- 0- 1 a) =3 V a) m 0 a I.N 0) C . a) a) a) .So 0 It CY) 0) 0) aI ) 1'- v- 'A ca Lo 0 M~ (D 1'- 0M - T- M c N v- v- 0 LO) v-- It -- -r- N~ 0 ifo * I Y) 0Y) Lo) a I co I I ID I a) N~ U) D P- Lo) CY) 0F) 0 N~ Lo) (Y) 0) 0 o ~- o c0 CY) 0 M~ 0-~ L0) - (- D 0) 0) Lo) (D'- C <0 QOH 0 0 0 H~ < 0 ~< 00 0 0 H<< o ~0 H <00 < y < 00 ~ ~~~- 0 0C H<0 <U < o~ ~ <0~ 0 00 E -0 <HC H 00 H 0 000 0< H 0 0 0 0 (0 Sz (00. 0 0D 0. 0 ~ 0 , (D E0 E. _n . 0D ) a 0 Cl) C L- 0 C. a)c czC 0 0 (0 0D 0) a) +D E 0 0o (D~ ~ ~ 0 1. ca:3 :3 =3C.) a- a co (n ) (n ) U) ) o U ) 09 0D (D ( ( ) 0 ( LL CO U- U- U- U- - U- co U- cD ___ 0 a) cui 0 *10 a~ C/) C' D *1-C) LUU co 0)- & X Z uC) +1 +1 V) z U (f) IT0) c) Cf>cr 0) 0 0 C c V - - a)) (0c 0 0c 1+ +1 +1 0 cu Ca LUC co ~ E O o o :3 00 0 0 0 V .m 0 CC) N- CIO 0 4) ~~ 0 C0LO D r)c cu coC\ Co 1V 1 c CU 0 0 0w 0 ~ 0 0-,2

.-

i 0 L +1 +1 + a c CU ~ ~0-0 -0-U) -0 000 0) m c0 =c 0 (Du (D LO~ N- U) ~I ~ < x x X x J> z E =3 E~ a) a) 0) 0 0 0 r = 0 C Z 0~ LUz 0 0 0) a)) CO0 0 L- C1) l) N Vl -c a ) 0 ca 0 >% 00 0a) ) CU 00 LO c ,r w T~ E 0~ '!0 ) a) a )~) x x x x a) a) 0 0 cu0 0 4-.u~ Cu E 3 4) CO M CD U) . 0 o c C) 0l) 0 a) to

I-.D.

4-0 CD 0 1- 1) U7 0 zC x-L culCO) ca 0a) Zu 4- 0 _ ~ c ~ . 0 cz a) CO~) 0 _ - 0- 0 C.) C))0 .2 )C C% O 0NC .2) CO Cc 1.. OF- I ____ 0co 0C5 0 C. I c +1 +1 +1 +1 0 o OD) 0 (0 (D 0; C6) 0.1 U)U C00 .0 a) +1 +1 +1 4 +1 coU E 00 C) o61 C)~ cc a E C) +1 +1 +1 .2 c6v o ~- -0-0 oz +1 +1 +1 C) ~ CD 0 .2a) 0 0 0-- 0 CD a % a. C6) r 00 - 6 ,o 0 oz x. a) 0a) CU a) 0- 0 Z > a). a-0- -- 80)o) (.) -tl 01c 1 0 CL L)c Z 000 a) c E3 non .2 0 a)C .W C.D 2 .N 0 0 - (6V C.) 0 +u : CO 0_ o -0 cu a) 0 C " =C) to) 0) 0) CO co m) C ) a) cu _0 -a r- -C -D = 0- C.) 0 ) mU.0. =c~ .0 .0 a)00.00 E 0 U U cc 0 - _0 . c ) I) a)CD *0 c 9 U o) 0 0 cc N- 14 C) CO 0 Nl- N- t f _0 = .0 _0 4) - E Un Eocn o( C E U D ) U) (D "m 0 a) a) a) CU E~ E a U)') 0. . .0 _0E E to- =3 E E . mD E E o .0 E _) _ ~ 0 0 (1) *-=~ N N 0 N~ E C >~~~ NE No ~m _E 0 0 . 00> a) Nt -0 '- a 60 Cl)N N~ N 0 c LO N I - - - t - - C ) 0 co00 80 80 (U~r C - - (0 (0 L( ) 0- E ITLOLO O)~C6c6 06 0) 0) 0m 0) 0) 0) 0) CC) 00 aa; -au - - m c a) (D _ ) L 0 0 oo -0 Ln. 0) Ua) :3 a 0 ) . )0 0 co 00 .0 00 0L .0 CL0 = c- 0 0)0) -S * -0 c C jZ 0 0) 0) 0 cm *l < < a . CO (aU cJ _3 c~ L (0 a) -0 T- co . <U) < 0 C cqJC .1o a) Z.. 7 0 0 C: C 0. 0 0 CO 0 0 E~ 0)0 m 0 Q) a

<

0 C: 0Om ) ,0 o =- Q0 .)a : z a) a0 a)J c w co CD ~ M~ Co C o) a) co co 0)) 0) 0o 0 C o C (a 0 .C E_2__EE_ N 0) - = a) a) U) U C) CC a U) U) U D) - r ~.. U)0 ~U( C) a)~ U U) U 0~Cf N 0 0 0 ) 0 C *-. C c c CU) a)0 o >.,p ( C' 00. LO ') T () 0 0 U) U)c C) ca 0 0 ca Cu ) 0o C (a co c I I. U) A~ .0 I( _ 0 1o 11 00 -- 0000 CU (0 cq OD co 0) CC) CO ~0 0. W 0 CL (D . 0 0 ) CL L-I

C

0 ~ m.- CU) 101 o CO* 2 CU) 0 I 0~ co C U ) Ci) 00 a) 0 a) 01 (1 3: U) U 4 ) C 5) 0 0 0 .520 00 0O 0 00 a) 1~ 00 000 CC a)) LU- =a 0 0) vo CO~- C CO I V t 0a) OU) 1" CO. CE P F 0 CO 00 00 C)w_ ) a 0 _0 cl) U) COC l c -j a) E u) )~ M nU) c U -C. C:0 00 01 0z CO CO CO 0 al) c C CO C O C 0 0c ~ 0 M -C CO00 U ) c . .4-~- !E ~ , CO 30 0 c 0 0, =3 S CO 0 0. cU *z CO 00 z 0D p >U -J __ _ .0 LO Nu L- ' ( O Uo (.J0 0)l C3 00 C) ) N 60 a)) m ~ .7-' N ZZ ~ t (D C U o0 ( U Uj'~~ CU) (Uj _ U (D ) LbP C Ca0.0) L 00 m 'It m 0 N~ 0 N 'clN N 0 0 U 't )c CS o LO CY 0 0 LO CY) C() r m)0) 0) 0) 0) 0) 0) 0) 0) C c C C 0 -C ) w w. E= E CL - C -C C C C Q 0. cc D C E E E E E E 5 D CL N - = C.) :30 =$ CCQ C- 0= '- 5 0 C U)U) Uo =~~ Ve -~ -a -a., 0 0 0 ECCO a)00V .o &o v o o . C C 0 0 0 0 U - ) U) :R : 20 -. )E EC C ~C 00) 0 0 0 0 . aa) a) 000 0 W0 000 CU (a 0)S0 CO(0)C O . C a) -o 0 U) U) I-0 z E Cl 2Y )Co Cl) LO LO c 0 C 6 o6 Cl) CVC l: ( SU0 0) a) m m w 0) 0CU mC U oCl) C o U Cl ) C: C CO) C) C E V V . ) caj -a. 0 CL) 0 0 w w CD 0a E A 4-U CD EC 00 4 C .oN. .4- 0 : a) EUC 0 0 00 0~ 00-' 2 0 CU 0- x3U E a) CO 00 Iu c 0 CU -0 0 . L < < a- CcCl__) 1 D 0 E 0 :3 ~ 0 cc CL E o a)) a 0 '- o N. 0 0'. " _r_ CC 0 3 0)0J0 .j C O C C CU aCU CU CU CaUC)\J C) c)C 0 o CO N. 0 vrC 4 CU-LC CU CU CUCU 0 -------------- _ 0- ~ (D0 157 TABLE 9: Sequences Name SEQ ID No Probes and primers FGPS431 1 FGPS122 2 FGPS350 3 FGPS643 (T) 4 FGPS643 (C) 5 R499 6 R500 7 C501 8 FGPS516 9 FGPS517 10 FGPS518 11 FGPS612 12 FGPS669 13 FGPS618 14 FGPS614 15 FGPS615 16 FGPS616 17 FGPS621 18 FGPS617 19 FGPS680 20 FGPS619 21 63f 22 1387r 23 Oligo-1 (Example 10) 24 Oligo-2 (Example 10) 25 Al 26 A2 27 B1 28 B2 29 PKS-I nucleic acids Amb9 30 Amb12 31 Eryl 9 32 A9b12 33 A23G1 1-1 34 A26G1 1-2 35 A26G1-10 36 158 TABLE 9 (continued 1):Sequences Name SEQ ID No A35 E4-16 37 A49F1-32 38 A17d2-3 39 A53F11-13 40 A53F1 1-14 41 A22A 2-11 42 A36E8-1 43 A52E8-2 44 PKS-I amino acid sequences Amb9 45 Amb12 46 Eryl 9 47 A9b12 48 A23G1 1-1 49 A26G1 1-2 50 A26G1-10 51 A35 E4-16 52 A49F1 -32 53 A17d2-3 54 A53F11-13 55 A53F11-14 56 A22A 2-11 57 A36E8-1 58 A52E8-2 59 16S rDNA sequences a24.1(2), 60 a4.a6.a7 (7) 61 a52.a53.a5(15) 62 a49.a50.a51 (11) 63 a4.a6.a7(14) 64 a30.a31.a32(7) 65 a37.a38.a39(6) 66 a46.a47.a48(14) 67 a49.a50.a51 (1) 68 a52.a53.a5(8) 69 a8.a9.al0(13) 70 al.a2.a3(13) 71 a43.a44.a45(10) 72 a27.a28.a29(5) 73 159 TABLE 9 (continued 2):Sequences Name SEQ ID No a23.1 74 a25.1 75 a l 8.1(22) 76 a33.1 77 a14.7 78 a2l.7 79 a8.a9.al 0(7) 80 a8.a9.al 0(1 8) 81 a27.a28.a29(3) 82 a34.a35.a36(5) 83 a22.1 (19) 84 al l.al 2.al 3(5) 85 al 9.a20.a26(9) 86 a40.a4l .a42(6) 87 a27.a28.a29(8) 88 a27.a28.a29(1 2) 89 a37.a38.a39(12) 90 a46.a47.a48(6) 91 al l.al 2.al 3(l 1) 92 al5.al6.a17(12) 93 al 5.al 6.al 7(5) 94 al 9.a20.a26(l 3) 95 a37.a38.a39(14) 96 a8.a9.al 0(9) 97 al 9.a20.a26(5) 98 a43.a44.a45(4) 99 al .a2.a3(4) 100 a4.a6.a7(23) 101 a49.a50.a51 (22) 102 a8.a9.alO(2) 103 a34.a35. 36(3) 104 a34.a35. 36(l 0) 105 a40.a4l .a42(1 3) 1106 160 TABLE 9 (continued 3): Sequences Name SEQ ID No. Primers cos 1 n (Example 5) 107 cos 2 n (Example 5) 108 Evoc 311 (Example 15) 109 Bllpc 31 F (Example 15) 110 Primer 1 (Example 15) 111 Primer 2 (Example 15 112 PKS-l nucleic acids Cosmid a2641 (vector + (-) strand insertion 113 Cosmid a2641 ( insertion - (+) strand 114 orf1 115 orf2 116 orf3 117 orf4 118 orf5 119 orf6 120 PKS- amino acid sequences ORF1 121 ORF2 122 ORF3 123 ORF4 124 ORF5 125 ORF6 126 161 REFERENCES * Amann, R. I., W. Ludwig, and K.-H. Schleifer. 1995. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59:143 169. o Atschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. (1997) " Gapped BLAST and PSI-BLAST : a next generation of protein database search programs " Nucleic Acid Research Vol 25 : 3389-3404 " Atschul SF et al., 1990, J. Mol Biol, 215 : 403-410. " Bakken, L. R. 1985. Separation and purification of bacteria from soil. Apple. Environ. Microbiol. 49:1482-1487. o Bibb MJ, Findlay PR, Johnson MW, The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences., Gene 30: 1-3, 157-66, Oct, 1984. * Biesiekierska-Galguen M. (1997) "Attenuation biologique de contaminants xenobiotiques dans le sol - modele lindane [Biological attenuation of xenobiotic contaminants in soil - lindane model] " National DEP Diploma in Toxicology, Universit6 Claude Bernard Lyons 1. * Blondelet-Rouault MH, Weiser J, Lebrihi A, Branny P, Pernodet JL. Institute of Genetics and Microbiology, URA CNRS 2225, Universite Paris XI, Orsay, France. Gene 1997 May 6;190(2):315-7 " Borchert S et al., 1992, Microbiology Letters, 92 : 175-180 " Blondelet-Rouault, 1997, Gene, 315-317 " Boccard, F., Smokvina T., Pernodet J.L., Friedmann, A. & Guerineau M. (1989). The integrated conjugative plasmid pSAM2 of Streptomyces ambofaciens is related to temperature bacteriophages. Embo J 8,973-80 e Chatzinotas A., Sandaa R-A., Sch6nhuber W., Amanna R., Daae F.L., Torsvik V., Zeyer J., Hahn D. (1998) " Analysis of broad-scale differences in microbial community 162 composition of two pristine forest soils " Systematic and Applied Microbiology Vol 21: 579-587 * Clegg, C. D., K. Ritz, and B. S. Griffiths. 1997. Direct extraction of microbial community DNA from humified upland soils. Lett. Apple. Microbiol. 25:30-33. * Clerc-Bardin, S., J.-L. Pernodet, A. Frostegerd, and P. Simonet. Development of a conditional suicide system for a Streptomyces lividans strain and its use to investigate conjugative transfer in soil. Submitted. 9 Elledge SJ, Mulligan JT, Ramer SW, Spottswood M, Davis RW. Department of Biochemistry, Baylor College of Medicine, Houston, TX 77030. Proc Natl Acad Sci U S A 1991 Mar 1;88(5):1731-5 * Engelen, B., K. Meinken, F. Von Wintzingerode, H. Heuer, H.-P. Malkomes, and H. Backhaus. 1998. Monitoring impact of a pesticide treatment on bacterial soil communities by metabolic and genetic fingerprinting in addition to conventional testing procedures. Apple. Environ. Microbiol. 64:2814-2821. * Farrelly, V., F. A. Rainey, and E. Stackebrandt. 1995. Effect of genome size and rrn gene copy number on PCR amplification of 16S rRNA genes from a mixture of bacterial species. Apple. Environ. Microbiol. 61:2798-2801. * Faulkner D.V., Jurka J. (1988) " Multiple Aligned Sequence Editor (MASE) " Trends in Biochemical Sciences Vol 13 : 321-322 * Frengen et al., 1999, Genomics, 58 : 250-258 *Frostegerd, A., Tunlid, A., and Bifth, E. 1991. Microbial biomass measured as total lipid phosphate in soils of different organic content. J. Microbiol. Meth. 14:151-163. eGiddings, G. 1998. The release of genetically engineered micro-organisms and viruses into the environment. New Phytol. 140:173-184. eGladek, A., and J. Zakrzewska. 1984. Genome size of Streptomyces. FEMS Microbiol. Lett. 24:73-76. *Gribskov M, Devereux J, Burgess RR, The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression., Nucleic Acids Res 12: 1 Pt 2, 539-49, Jan 11, 1984.

163 eGuiney et al., 1983, Proc. Nati. Acad. Sci USA, (12): 3595-3598. e Gourmelen A., Blondelet-Rouault, M.H. & Pernodet, J.L. (1998). Characterization of a glycosyl transferase inactivating macrolide, encoded by gimA from Streptomyces ambofaciens, Antimicrob Agents Chemother 42, 2612-9. 9 Hayakawa, M., and H. Nonomura. 1987. Humic acid-vitamin agar, a new medium for the selective isolation of soil actinomycetes. J. Ferment. Technol. 65:501-509. * Hayakawa, M., Ishizawa K., and H. Nonomura. 1988. Distribution of rare actinomycetes in Japanese soils. J. Ferment. Technol. 66:367-373. * Hickey, R. J., and H. D. Tresner. 1952. A cobalt containing medium for sporulation of Streptomyces species. J. Bacteriol. 64:891-892. e Hintermann, G., R., Crameri, Kieser, T., and R. Hutter. 1981. Restriction analysis of the Streptomyces glaucescens genome by agarose gel electrophoresis. Arch. Microbiol. 130:218-222. 9 Holben, W. E., J. K. Jansson, B. K. Chelm, and J. M. Tiedje. 1988. DNA probe method for the detection of specific microorganisms in the soil bacterial community. Apple. Environ. Microbiol. 54:703-711. e Hong Fu et al., 1995, Molecular diversity, 1 : 121-124 " Hopwood DA, Bibb M J, Chater K F, Kieser T., Bruton C.J., Kieser H.M., Lydiate D.J., Smith C.P., Ward J.M. and Scrempf H. 1985. Genetic Manipulation of Streptomyces. A Laboratory manual. The John Innes Foundation, Norwich, U.K. e Hopwood, D. A., M. J. Bibb, K. F. Chater, T. Kieser, C. J. Bruton, H. M. Kieser, D. J. Lydiate, C. P. Smith, J. M. Ward, and H. Schrempf. 1985. Genetic manipulation of streptomyces - a laboratory manual. The John Innes Foundation, Norwich, United Kingdom. * Hohm B. and Collins J., 1980, Gene, 11 : 291-298 * Horinouchi S., Malpartida F., Hopwood D. et Beppu T., Mol. Gen. Genet. (1989) 215 :355-357. * Imai R., Nagata Y., Fukuda M., Takagi M., Yano K. (1991) " Molecular cloning of a Pseudomonas paucimobilis gene encoding a 17-kilodalton plypeptide that eliminates 164 HCI molecules from ?-Hexachlorocyclohexane " Journal of Bacteriology Vol 17 ", No2l : 6811-6819 * Jacobsen, C. S., and 0. F. Rasmussen. 1992. Development and application of a new method to extract bacterial DNA from soil based on separation of bacteria from soil with cation-exchange resin. Apple. Environ. Microbiol. 58:2458-2462. Jae-Hyuk Y.U. and Leonard T.J.,1995. Sterigmetscytin biosynthesis in Aspergilus nidulans requires a ... type I polyketide synthase. J. Bacteriol, (August) : 4792-4800. * Ka, J. 0., W. E. Holben, and J. M. Tiedje. 1994. Analysis of competition in soil among 2,4-dichlorophenoxyacetic acid-degrading bacteria. Apple. Environ. Microbiol. 60:1121-1128. " Kah-Tong S et al., 1997, J Bacteriol, G179(23) : 7360-7368 " Kimura M. (1980) "A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences " Journal of Molecular Evolution Vol 16: 111-120 o Kuske, C. R., K. L. Banton, D. L. Adorada, P. C. Stark, K. K. Hill, and P. J. Jackson. 1998. Small-scale DNA sample preparation method for field PCR detection of microbial cells and spores in soil. Apple. Environ. Microbiol. 64:2463-2472. * Lacalle RA, Pulido D, Vara J, Zalacain M, Jimenez A. Centro de Biologia Molecular (CSIC-UAM), Universidad Autonoma, Canto Blanco, Madrid, Spain. Gene 1989 Jul 15;79(2):375-80 * Lee, S.-Y., J. Bollinger, D. Bezdicek, and A. Ogram. 1996. Estimation of the abundance of an uncultured soil bacterial strain by a competitive quantitative PCR method. Apple. Environ. Microbiol. 62:3787-3793. * Leff, L. G., J. R. Dana, J. V. McArthur, and L. J. Shimkets. 1995. Comparison of methods of DNA extraction from stream sediments. Apple. Environ. Microbiol. 61:1141 1143. * Liesack, W., and E. Stackebrandt. 1992. Occurrence of novel groups of the domain Bacteria as revealed by analysis of genetic material isolated from an Australian terrestrial environment. J. Bacteriol. 174:5072-5078.

165 Liesack, W., P. H. Janssen, F. A. Rainey, N. L. Ward-Rainey, and E. Stackebrandt. 1997. Microbial diversity in soil: the need for a combined approach using molecular and cultivation techniques. In J. D. Van Elsas, J. T. Trevors, and E. M. H. Wellington (ed.), Modern soil microbiology, Marcel Dekker, Inc., New York. (p 375 439) e Lorentz, M. G., and W. Wackernagel. 1994. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol. Reviews 58:563-602. 9 Maidak B.L., Cole J.R., Parker C.T., Garrity G.M., Larsen N., Li B., Lilburn T.G., McCaughey M.J., Olsen G.J., Overbeek R., Pramanik S., Schmidt T.M., Tiedje J.M., Woese C.R. (1999) " A new project of the RDP (Ribosomal Database Project) " Nucleic Acids Research Vol 27: 171-173 " Mazodier P. et al., 1989, J. Bacteriol., 171(6) : 3583-3585. * More, M. I., J. B. Herrick, M. C. Silva, W. C. Ghiorse, and E. L. Madsen. 1994. Quantitative cell lysis of indigenous microorganisms and rapid extraction of microbial DNA from sediment. Apple. Environ. Microbiol. 60:1572-1580. * Murakami T, Holt TG, Thompson CJ, Microbiological Engineering Unit, Institut Pasteur, Paris, France. J. Bacteriol 1989 Mar;171(3):1459-66 eNagata Y., Hatta T., imai R., Kimbara K., Fukuda M., Yano K., Takagi M. (1993) " Purification and characterization of ?-Hexachlorocyclohexane

(?

HCH)dehydrochlorinase (LinA) from Pseudomonas paucimobilis" Bioscience, Biotechnology and Biochemistry Vol 57 No 9: 1582-1583 o Nalin R., Simonet P., Vogel T.M., Normand P. (1999) " Rhodanobacter lindaniclasticus gen.nov., sp., nov., a lindane-degrading bacterium " International Journal of Systematic Bacteriology Vol 49: 19-23 o Nesme, X., C. Picard, and P. Simonet. 1995. Specific DNA sequences for detection of soil bacteria. In J. T. Trevors, and J. D. van Elsas (ed.), Nucleic acids in the environment, methods and application. Springer Lab Manual. (p 111-139) o Nilsson B, Uhlen M, Josephson S, Gatenbeck S, Philipson L. Nucleic Acids Res 1983 Nov 25;11(22):8019-30 166 * Normand P. et al., 1995, Oceanis, 21(1) : 31-56 * Ogram, A. V., M. L. Mathot, J. B. Harsh., J. Boyle, and C. A. Pettigrew, JR. 1994. Effects of DNA polymer length on its adsorption to soils. Apple. Environ. Microbiol. 60:393-396. o Ogram, A., G. S. Sayler, and T. Barkay. 1987. The extraction and purification of microbial DNA from sediments. J. Microbiol. Methods 7:57-66. o Olsen, R. A., and Bakken, L. R. 1987. Viability of soil bacteria: optimization of the plate-counting technique. Microb. Ecol. 13:59-74. e Paget, E., L. Jocteur Monrozier, and P. Simonet. 1992. Adsorption of DNA on clay minerals: protection against DNasel and influence on gene transfer. FEMS Microbiol. Lett. 97:31-40. * Patra, G., P. Sylvestre, V. Ramisse, J. Therasse, and J.-L. Guesdon. 1996. Isolation of a specific chromosomic DNA sequence of Bacillus anthrasis and its possible use in diagnosis. FEMS Immunol. Medical Microbiology 15:223-231. * Pernodet J.L. Fish, S. Blondelet-Rouault, M.H. & Cundliffe, E. (1996). The macrolide-lincosamide-streptogramin B resistance phenotypes characterized by using a specifically deleted, antibiotic-sensitive strain of Streptomyces lividans. Antimicrob Agents Chemother 40, 581, 5. * Pernodet J.L. , Gourmelen, A., Blondelet-Rouault, M.H. & Cundliffe, E. (1999). Dispensable ribosomal resistance to spiramycin conferred by srmA in the spiramycin producer Streptomyces ambofaciens. 145, 2355-64. o Picard, C., C. Ponsonnet, X. Nesme, and P. Simonet. 1992. Detection and enumeration of bacteria in soil by direct DNA extraction and polymerase chain reaction. Apple. Environ. Microbiol. 58:2717-2722. * Preud'homme, J., Belloc, A., Charpentie, Y., and Tarridec, P. 1965. Un antibiotique form de deux groupes de composants a synergie d'action : la pristinamycine [An antibiotic formed from two groups of components with synergistic action: pristinamycin] C. R. Acad. Sci. 260 :1309-1312.

167 * Prieme, A., J. I. B. Sitaula, A. K. Klemedtsson, and L. R. Bakken. 1996. Extraction of methane-oxidizing bacteria from soil particles. FEMS Microbiol. Ecol. 21: 59-68. e Prosser, J. 1994. Molecular marker systems for detection of genetically engineered micro-organisms in the environment. Microbiol. 140:5-17. e Raynal A, Tuphile K, Gerbaud C, Luther T, Guerineau M, Pernodet JL; Laboratory of Biology and Molecular Genetics, Institute of Genetics and Microbiology, URA CNRS 2225, Universit6 Paris-Sud, Orsay, France. Mol Microbiol 1998 Apr;28(2):333-42 e Raynald A. Tuphile, K. Gerbaud, C., Luther, T. Guerineau, M. & Pernodet, J.L. (1998). Structure of the chromosomal insertion site for pSAM2: functional analysis in Escherichia coli. Mol. Microbiol 28, 333-42. 9 Richard, G. M. 1974. Modifications of the diphenylamine reaction giving increased sensitivity and simplicity in the estimation of DNA. Analytical Biochem. 57:369-376. * Romanowski, G., M. G. Lorentz, and W. Wackernagel. 1993. Use of polymerase chain reaction and electroporation of Escherichia coli to monitor the persistence of extracellular plasmid DNA introduced into natural soils. Apple. Environ. Microbiol. 59:3438-3446. 9 Saitou N., Nei M. (1987) "The Neighbour-Joining method : a new method for reconstructing phylogentic trees " Molecular and Biological Evolution Vol 2 : 112-118 * Sambrook J., Fritsch E. F. et Maniatis T. 1996. Molecular cloning: a laboratory manual, 2 nd ed. Cold spring Harbor Laboratory Press, Cold Sring Harbor, N.Y. *Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N. Y. * Senoo K., Wada H. (1989) " Isolation and identification of an aerobic ?-HCH decomposing bacterium from soil " Soil Science, Plant Nutrition Vol 35, No 1 : 79-87. * Sezonov, G., Blanc, V., Bamas-Jacques, N., Friedmann, A. Pernodet, J.L. & Guerineau, M.(1997). Complete conversion of antibiotic precursor to pristinamycin IIA by overexpression of Streptomyces pristinae biosynthetic genes. Nat Biotechnol 15,349-53.

168 e Shirling, E. B., and D. Gottlieb. 1966. Methods for characterization of Streptomyces species. Int. J. Syst. Bacteriol. 16:313-340. * Shizuga et al., 1992, Proc. Nati. Acad. Sci USA, 89 : 8794-8797. * Siefert, J. L., and G. E. Fox. 1998. Phylogenetic mapping of bacterial morphology. Microbiology 144:2803-2808. e Simonet, P., P. Normand, A. Moiroud, and R. Bardin. 1990. Identification of Frankia strains in nodules by hybridization of polymerase chain reaction products with strain-specific oligonucleotide probes. Arch. Microbiol. 153:235-240. 9 Smalla, K., N. Cresswell, L. Mendonca-Hagler, A. Wolters, and D. J. van Elsas. 1993. Rapid DNA extraction protocol from soil for polymerase chain reaction-mediated amplification. J. Apple. Bacteriol. 74:78-85. " Sosio M. et al., 2000, Nature Biotechnology, vol 18 : 343-345 " Smit, E., P. Leeflang, and K. Wernars. 1997. Detection of shifts in microbial community structure and diversity in soil caused by copper contamination using amplified ribosomal DNA restriction analysis. FEMS Microbiol. Ecol. 23:249-261. * Smokvina T, Mazodier P, Boccard F, Thompson CJ, Guerineau M. Laboratory of Biology and Molecular Genetics, Universite Paris-Sud, Orsay, France. Gene 1990 Sep 28;94(1):53-9 9 Smolvina, T., Mazodier, P. Boccard, F. Thompson, C.J. & Guerineau, M. (1990). Construction of a series of pSAM2-based integrative vectors for use in actinomycetes. Gene 94, 53-9. e Stackebrandt, E. 1988. Phylogenetic relationships vs. phenotypic diversity: how to achieve a phylogenetic classification system of the eubacteria. Can. J. Microbiol. 34:552-556. * Staneck, J. L., and G. D. Roberts. 1974. Simplified approach to identification of aerobic Actinomycetes by thin-layer chromatography. Apple. Microbiol. 28:226-231. e Stapleton, R. D., S. Ripp, L. Jimenez, S. Cheol-Koh, J. T. Fleming, I. R. Gregory, and G. S. Sayler. 1998. Nucleic acid analytical approaches in bioremediation: site assessment and characterization. J. Microbiol. Methods 32:165-178.

169 * Steffan, R. J., J. Goksoyr, A. K. Bej, and R. Atlas. 1988. Recovery of DNA from soils and sediments. Appl. Environ. Microbiol. 54:2908-2915. * Tebbe, C. C., and W. Vahjen. 1993. Interference of humic acids and DNA extracted directly from soil in detection and transformation of recombinant DNA from bacteria and a yeast. Apple. Environ. Microbiol. 59:2657-2665. & Tercero JA, Espinosa JC, Lacalle RA, Jimenez A. Centro de Biologia Molecular Severo Ochoa, Consejo Superior de Investigaciones Cientificas, Madrid, Spain. J Biol Chem 1996 Jan 19;271(3):1579-90 * Thomas J-C., Berger F., Jacquier M., Bernillon D., Baud-Grasset F., Truffaut N., Normand P., Vogel T .M., Simonet P. (1996) " Isolation and Characterisation of a novel ?-Hexachlorocyclohexane-degrading bacterium " Journal of Bacteriology Vol 178, No20: 6049-6055 e Torsvik, V. L. 1980. Isolation of bacterial DNA from soil. Soil Biol. Biochem. 12:15 21. * Torsvik, V., R. Sorheim, and J. Goksoyr. 1996. Total bacterial diversity in soil and sediment communities - a review. J. Ind. Microbiol. 17:170-178. * Tsai, Y.-L., and B. Olson. 1991. Rapid method for direct extraction of DNA from soil and sediments. Apple. Environ. Microbiol. 57:1070-1074. " Umeyama T., Tanabe Y., Aigle B.D. et Horinuochi S., FEMS (1996) 144 :177-184. " Van Elsas, J. D., G. F. Duarte, A. S. Rosado, and K. Smalla. 1998. Microbiological and molecular biological methods for monitoring microbial inoculants and their effects in the soil environment. J. Microbiol. Methods 32:133-154. e Van Elsas, J. D., V. Mintynen, and A. C. Wolters. 1997. Soil DNA extraction and assessment of the fate of Mycobacterium chlorophenolicum strain PCP-1 in different soils by 16S ribosomal RNA gene sequence based most-probable-number PCR and immunofluorescence. Biol. Fert. Soils 24:188-195. " Volff JN et al., 1996, Mol. Microbiol., 21(5): 1037-1047. e Volossiouk, T., E. J. Robb, and R. N. Nazar. 1995. Direct DNA extraction for PCR mediated assays. Apple. Environ. Microbiol. 61:3972-3976.

170 * Wahl GM, Lewis KA, Ruiz JC, Rothenberg B, Zhao J, Evans GA. Proc Natl Acad Sci U S A 1987 Apr;84(8):2160-4 * Waksman, S. A. 1961. Williams and Wilkins (ed.) The actinomycetes. Classification, identification and description of genera and species.Vol 2. Baltimore. * Ward, D. M., R. Weller, and M. M. Bateson. 1990. 16S rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature 344:63-65. e Widmer, F., R. J. Seidler, and L. S. Watrud. 1996. Sensitive detection of transgenic plant marker gene persistence in soil microcosms. Mol. Ecol. 5:603-613. * Williams, S.T., R. Locci, A. Beswick, D. I. Kurtb6ke, V. D. Kuznetsov, F. J. Le Monnier, P. F. Long, K. A. Maycroft, R. A. Palma, B. Petrolini, S. Quaroni, J. 1. Todd, and M. West. 1993. Detection and identification of novel actinomycetes. Res. Microbiol. 144:653-656. * Wilson, I. G. 1997. Inhibition and facilitation of nucleic acid amplification. Appl. Environ. Microbiol. 63:3741-3751. " Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51:221-271. " Yannish-Perron et al., 1985 , Gene, 33(1): 103-119. " Zaslavsky, B. Y. 1995. Separation of biomolecules, p. 503-667. In Aqueous two phase partitioning. Boris Y. Zaslavsky (ed.) Physical Chemistry and Bioanalytical Applications, Marcel Dekker, Inc., New York. 9 Zhou, J., M. A. Bruns, and J. M. Tiedje.1996. DNA recovery from soils of diverse composition. Apple. Environ. Microbiol. 62:316-322.

Claims

1. Process for preparing a collection of nucleic acids from a soil sample containing organisms, the said process comprising the following sequence of steps: - I (a) obtaining microparticles by grinding a pre-dried or pre desiccated soil sample, followed by suspension of the microparticles in a liquid buffer medium; and (b) extracting the nucleic acids present in the microparticles; and (c) passage of the solution containing the nucleic acids over a moleculare sieve, followed by recovery of the elution fractions enriched in nucleic acids and passage of the elution fractions enriched in nucleic acids over an anion-exchange chromatography support, followed by recovery of the elution fractions containing the purified nucleic acids.

2. Process for preparing a collection of nucleic acids from an environmental sample containing organisms, the said process comprising the following sequence of steps: - II (i) production of a suspension by dispersing the environmental sample in liquid medium and then homogenizing the suspension by gentle stirring; and (ii) separating the organisms and the other inorganic and/or organic constituents of the homogeneous suspension obtained in step (i) by centrifugation on a density gradient; and (iii) lysis of the organisms separated out in step (ii) and extraction of the nucleic acids; and 172 (iv) purification of the nucleic acids on a caesium chloride gradient.

3. Process according to claim 1, characterized in that step 1-(a) is followed by an additional step of: - treating the microparticles suspended in a liquid buffer by sonication.

4. Process according to claim 1, characterized in that step 1-(a) is followed by the following additional steps: - treatment of the microparticles suspended in a liquid buffer by sonication; - incubation of the suspension at 370C after sonication in the presence of lysozyme and achromopeptidase; - addition SDS - recovery of the nucleic acids.

5. Process according to claim 1, characterized in that step 1-(a) is followed by the following additional steps: - homogenization of the microparticles using a step of vigorous mixing (vortex) followed by a step of simple stirring; - freezing the homogeneous suspension followed by thawing; - treatment of the suspension by sonication after thawing; - incubation of the suspension at 370C after sonication in the presence of lysozyme and achromopeptidase; - addition of SDS.

6. Process according to one of claims 1 to 5, characterized in that the nucleic acids are DNA molecules. 173

7. Process for preparing a collection of recombinant vectors, characterized in that the nucleic acids obtained by the process according to one of claims 1 to 6 are inserted into a cloning and/or expression vector.

8. Process according to claim 7, characterized in that the nucleic acids are separated as a function of their size prior to inserting them into the cloning and/or expression vector.

9. Process according to claim 7, characterized in that the average size of the nucleic acids is made substantially uniform by physical rupture, prior to inserting them into the cloning and/or expression vector.

10. Process according to claim 7, characterized in that the cloning and/or expression vector is of the plasmid type.

11. Process according to claim 7, characterized in that the cloning and/or expression vector is of the cosmid type.

12. Process according to claim 11, characterized in that it is a cosmid which is replicative in E. coli and integrative in Streptomyces.

13. Process according to claim 12, characterized in that it is the cosmid pOS7001.

14. Process according to claim 1, characterized in that it is a cosmid which is conjugative and integrative in Streptomyces.

15. Process according to claim 14, characterized in that the cosmid is chosen from cosmids pOSV303, pOSV306 and pOSV307. 174

16. Process according to claim 11, characterized in that it is a cosmid which is replicative both in E. coli and in Streptomyces.

17. Process according to claim 16, characterized in that it is the cosmid pOS 700R.

18. Process according to claim 11, characterized in that it is a cosmid which is replicative in E. coli and Streptomyces and conjugative in Streptomyces.

19. Process according to claim 7, characterized in that the cloning and/or expression vector is the BAC type.

20. Process according to claim 19, characterized in that it is a BAC vector which is integrative and conjugative in Streptomyces.

21. Process according to claim 20, characterized in that the vector is chosen from BAC vectors pOSV403, pMBD-1, pMBD-2, pMBD-3, pMBD 4, pMBD-5 and pMBD-6.

22. Process for preparing a recombinant cloning and/or expression vector, characterized in that the step of inserting a nucleic acid into the cloning and/or expression vector comprises the following steps: - opening the cloning and/or expression vector at a chosen cloning site, using a suitable restriction endonuclease; - adding a first homopolymeric nucleic acid to the free 3' end of the open vector; - adding a second homopolymeric nucleic acid, whose sequence is complementary to the first homopolymeric nucleic acid, at the free 3' 175 end of the nucleic acid from the collection to be inserted into the vector; - assembling the nucleic acid of the vector and the nucleic acid of the collection by hybridizing the first and second homopolymeric nucleic acids of mutually complementary sequence; - closing the vector by ligation.

23. Process according to claim 22, characterized in that: - the first homopolymeric nucleic acid is of poly(A) or poly (T) sequence;and - the second homopolymeric nucleic acid is of poly(T) or poly(A) sequence.

24. Process for preparing a recombinant vector according to either of claims 22 and 23, characterized in that the size of the nucleic acid to be inserted is at least 100 kilobases, preferably at least 200 kilobases.

25. Process for preparing a recombinant vector according to one of claims 22 to 24, characterized in that the nucleic acid to be inserted is contained in the collection of nucleic acids obtained by the process according to one of claims 1 to 6.

26. Process for preparing a recombinant cloning and/or expression vector, characterized in that the step of inserting a nucleic acid into the cloning and/or expression vector comprises the following steps: - creation of blunt ends on the ends of the nucleic acid of the collection by removing the protruding 3' sequences and filling in the protruding 5' sequences; 176 - opening the cloning and/or expression vector at a chosen cloning site using a suitable restriction endonuclease; - creation of blunt ends at the ends of the nucleic acid of the vector by removing the protruding 3' sequences and filling in the protruding 5' sequences, then dephosphorylating the 5' ends; - adding complementary oligonucleotide adapters; - inserting the nucleic acid of the collection into the vector by ligation.

27. Process for preparing a recombinant vector according to claim 26, characterized in that the size of the nucleic acid to be inserted is at least 100 kilobases, preferably at least 200 kilobases.

28. Process for preparing a recombinant vector according to either of claims 26 and 27, characterized in that the nucleic acid to be inserted is contained in the collection of nucleic acids obtained by the process according to one of claims 1 to 6.

29. Process according to one of claims 22 to 28, characterized in that the nucleic acids are inserted as obtained, without treatment with one or more restriction endonucleases prior to inserting them into the cloning and/or expression vector.

30. Collection of nucleic acids consisting of the nucleic acids obtained by the process of one of claims 1 to 6.

31. Nucleic acid, characterized in that it is contained in the collection of nucleic acids according to claim 30. 177

32. Nucleic acid according to claim 31, characterized in that it comprises a nucleotide sequence encoding at least one operon, or part of an operon.

33. Nucleic acid according to claim 32, characterized in that the operon encodes all or part of a metabolic pathway.

34. Nucleic acid according to claim 33, characterized in that the metabolic pathway is the polyketide synthesis pathway.

35. Nucleic acid according to claim 34, characterized in that it is chosen from polynucleotides comprising the sequences SEQ ID No 30 to 44 and SEQ ID No. 115 to 120.

36. Nucleic acid according to claim 31, characterized in that it comprises all of a nucleotide sequence encoding a polypeptide.

37. Nucleic acid according to one of claims 31 to 36, characterized in that it is of prokaryotic origin.

38. Nucleic acid according to claim 37, characterized in that it originates from a bacterium or from a virus.

39. Nucleic acid according to one of claims 31 to 33 and 36, characterized in that it is of eukaryotic origin.

40. Nucleic acid according to claim 39, characterized in that it originates from a fungus, a yeast, a plant or an animal. 178

41. Recombinant vector, characterized in that it is chosen from the following recombinant vectors: a) a vector comprising a nucleic acid according to one of claims 35 to 40; b) a vector obtained according to the process of one of claims 22 to 25 and 29; c) a vector obtained according to the process of one of claims 26 to 29.

42. Vector, characterized in that it is the cosmid pOS7001.

43. Vector, characterized in that it is the cosmid pOSV303.

44. Vector, characterized in that it is the cosmid pOSV306.

45. Vector, characterized in that it is the cosmid pOSV307.

46. Vector, characterized in that it is the cosmid pOS700R.

47. Vector, characterized in that it is the BAC vector pOSV403.

48. Vector, characterized in that it is the vector pMBD-1.

49. Vector, characterized in that it is the vector pMBD-2.

50. Vector, characterized in that it is the vector pMBD-3.

51. Vector, characterized in that it is the vector pMBD-4. 179

52. Vector, characterized in that it is the vector pMBD-5.

53. Vector, characterized in that it is the vector pMBD-6.

54. Collection of recombinant vectors as obtained according to the process of one of claims 7 to 21, 25 and 28.

55. Recombinant cloning and/or expression vector, characterized in that it is contained in the collection of recombinant vectors according to claim 54.

56. Recombinant host cell comprising a nucleic acid according to one of claims 31 to 40 or a recombinant vector according to claim 55.

57. Recombinant host cell according to claim 56, characterized in that it is a prokaryotic or eukaryotic cell.

58. Recombinant host cell according to claim 57, characterized in that it is a bacterium.

59. Recombinant host cell according to claim 58, characterized in that it is a bacterium chosen from E. coliand Streptomyces.

60. Recombinant host cell according to claim 58, characterized in that it is a yeast or a filamentous fungus.

61. Collection of recombinant host cells, each of the constituent host cells of the collection comprising a nucleic acid from the collection of nucleic acids according to claim 30. 180

62. Collection of recombinant host cells, each of the constituent host cells of the collection comprising a recombinant vector according to either of claims 41 and 55.

63. Process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to either of claims 61 and 62, characterized in that it comprises the following steps: - placing the collection of recombinant host cells in contact with a pair of primers which hybridize with the given nucleotide sequence or which hybridize with the nucleotide sequence that is structurally similar to a given nucleotide sequence; - carrying out at least three amplification cycles; - detecting any nucleic acid amplified.

64. Process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to either of claims 61 and 62, characterized in that it comprises the following steps: - placing the collection of recombinant host cells in contact with a probe which hybridizes with the given nucleotide sequence or which hybridizes with a nucleotide sequence that is structurally similar to the given nucleotide sequence; - detecting the hybrid possibly formed between the probe and the nucleic acids included in the vectors of the collection. 181

65. Process for identifying the production of a compound of interest by one or more recombinant host cells in a collection of recombinant host cells according to either of claims 61 and 62, characterized in that it comprises the following steps: - culturing the recombinant host cells of the collection in a suitable culture medium; - detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured.

66. Process for selecting a recombinant host cell which produces a compound of interest in a collection of recombinant host cells according to either of claims 61 and 62, characterized in that it comprises the following steps: - culturing recombinant host cells of the collection in a suitable culture medium; - detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured; - selecting recombinant host cells which produce the compound of interest.

67. Process for producing a compound of interest, characterized in that it comprises the following steps: - culturing a recombinant host cell selected according to the process of claim 66; - recovering and, where appropriate, purifying the compound produced by the said recombinant host cell.

68. Compound of interest, characterized in that it is obtained according to the process of claim 67. 182

69. Compound according to claim 68, characterized in that it is a polyketide.

70. Polyketide, characterized in that it is produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from sequences SEQ ID No 30 to 44 and SEQ ID No. 115 to 120.

71. Composition comprising a polyketide according to claim 69 or 70.

72. Pharmaceutical composition comprising a pharmacologically active amount of a polyketide according to claim 69 or 70, in combination with a pharmaceutically compatible vehicle.

73. Process for determining the diversity of the nucleic acids contained in a collection of nucleic acids and most particularly of a collection of nucleic acids originating from an environmental sample, preferentially from a soil sample, the said process comprising the following steps: - placing the nucleic acids of the collection of nucleic acids to be tested in contact with a pair of oligonucleotide primers which hybridize with any sequence of bacterial 16S ribosomal DNA; - carrying out at least three amplification cycles; - detecting the amplified nucleic acids using an oligonucleotide probe or a plurality of oligonucleotide probes, each probe hybridizing specifically with a sequence of 16S ribosomal DNA common to a bacterial kingdom, order, subclass or genus; - where appropriate, comparing the results of the preceding detection step with the detection results, using the probe or the plurality of 183 probes, for nucleic acids of known sequence constituting a calibration range.

74. Process according to claim 73, characterized in that the pair of primers which hybridize with any sequence of bacterial 16S ribosomal DNA consists of the primer FGPS 612 (SEQ ID No 12) and the primer FGPS 669 (SEQ ID No 13).

75. Process according to claim 73, characterized in that the pair of primers which hybridize with any sequence of bacterial 16S ribosomal DNA consists of the primer 63 f (SEQ ID No 22) and the primer 1387 r (SEQ ID No 23).

76. Nucleic acid comprising a 16S rDNA nucleotide sequence chosen from the sequences having at least 99% nucleotide identity with the sequences SEQ ID No 60 to SEQ ID No 106.

77. Process for producing a type I polyketide synthase, the said production process comprising the following steps: - production of a recombinant host cell comprising a nucleic acid encoding a type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 and SEQ ID No. 115 to SEQ ID No. 120; - culturing of the recombinant host cells in a suitable culture medium; - recovery and, where appropriate, purification of the type I polyketide synthase from the culture supernatant or from the cell lysate. 184

78. Polyketide synthase comprising an amino acid sequence chosen from the sequences SEQ ID No 45 to 59 and SEQ ID No. 121 to SEQ ID No. 126.

79. Antibody directed against a polyketide synthase according to claim 78.

80. Process for detecting a type I polyketide synthase or a peptide fragment of this enzyme, in a sample, the said process comprising the steps of: a) placing an antibody according to claim 79 in contact with the sample to be tested; b) detecting any antigen/antibody complex possibly formed.

81. Kit for detecting a type I polyketide synthase in a sample, comprising: a) an antibody according to claim 79; b) where appropriate, reagents required for detecting any antigen/antibody complex possibly formed.