METHOD OF GENERATION AND MANAGEMENT OF UNIQUE SEQUENCES IN DNA PRODUCTION
Field of the Invention The present invention concerns improvements in or relating to synthetic
DNA production and particularly in relation to security or tracking devices.
Background of the Invention
In the current climate of mass production and global commercialisation, there is a need for methods to uniquely mark or tag products so that they can be traced or linked back to their origin. The facility of traceability is of importance and can be used by companies in product tracking. Product tracking can be particularly useful in addressing the problem of grey-market imports. Although product tagging has a number of potential applications, one of the most important is arguably in crime reduction.
Crimes such as theft and burglary are common social problems. The loss of personal property as a result of crime can be a distressing, not to mention expensive, experience. Generally speaking, it is the responsibility of the Police to investigate such crimes and hopefully recover any stolen property.
The recovery of stolen property is only half the problem, the identification of the rightful owners of such property can be an equally difficult task.
There are a variety of ways of marking property so as to uniquely identify them and thus, in the event they are stolen and subsequently recovered, they stand a better chance of being returned to their rightful owners.
Synthetic DNA has been used as a tracking and security device, as discussed in International PCT applications PCT/GB91/00719 and PCT/GB93/01822. However the coding used to describe the base sequence does not lend itself to easy computer handling. The sequences are
represented usually by a series of letters and digits separated by commas. Although these do provide all the necessary information to fully describe the sequence, this type of representation does not lend itself to automated production of sequences or easy management of those sequences produced. Synthetic DNA has to date been exclusively used as a tracking/security product due to the unique nature of the sequences and the small amount of material required to perform an identification.
Summary of the Invention
The present invention discloses a way of modeling and producing synthetic nucleic acid chains (DNA) so as to contain a unique identifying marker that can be assigned to a unique origin. In one embodiment, the present invention provides a method of generating and managing unique sequences of synthetic nucleic acid, comprising: applying a secure interpretation system to known unique decimal number; and synthesising a nucleic acid chain based on the sequence provided by the interpretation of said decimal number.
In another embodiment, the present invention provides a method of tracing and/or identifying goods comprising: modeling and synthesising at least one nucleic acid chain with a base sequence contained therein; applying a secure interpretation system to obtain a unique identifying marker from the base sequence; establishing a database in which the unique identifying marker is assigned to a unique source; and determining to which items the synthesised nucleic acid chain has be applied and identifying the base sequence therein and obtaining the unique identifying marker from said sequence so as to determine the unique source from the database.
Preferably, an indicator is also applied to any items to which the nucleic acid chain is applied, thus facilitating identification of the tagged items. The invention further provides for a security composition for tracing or identifying goods, comprising an indicator material and at least one nucleic acid chain, which has been synthesised to store a unique identifying marker.
Preferably the above composition also comprises a solvent system for the indicator material, said solvent system containing a solvent which is volatile under conditions of application.
Preferably, the present invention involves the use of a multilevel security product. At least one additional level of security is provided by the composition further comprising a plurality of separately identifiable trace materials that can be varied in such a manner as to produce unique formulations, the combination of trace materials being varied by modeling each composition on a binary string to produce a unique code. Preferably, the unique chemical code may provide the information required to determine the primers necessary to breed the nucleic acid to a level suitable for analysis of the unique identifying marker stored therein. The primer specification can be obtained via the mathematical processing of the unique code. Such security product provides a concealed extra level that would not be apparent to any would-be counterfeiter and furthermore without knowledge of the mathematical process involved, the chemical from the first layer product cannot be converted into the information required to identify the primers required to access the unique identifying code held within the nucleic acid sequence.
Alternatively, the unique chemical code may indicate the start location and/or size of a sequence of bases within the nucleic acid chain, such sequence providing the unique identifying marker.
Preferably the indicator, which shows where the composition has been applied, is covert. A suitable covert indicator could be visible under ultraviolet light only, but alternative types will be appreciated by the skilled man.
It may be of further advantage if the composition is adapted for aerosol spraying.
The multilevel security product in accordance with the present invention can suitably be utilised in connection with the compositions disclosed in our UK Patent Nos. 2286044 and 2319337.
Detailed Description of the Invention
The synthesis of modeled sequences of nucleic acid can be achieved by methods and procedures currently known in the technology field of nucleic acid synthesis, such as Polymerase Chain Reaction (PCR). The present invention utilises this technology to provide nucleic acid chains with specific base sequences which have been modeled according to the secure interpretation systems discussed below, such base sequences being modeled to provide unique identifying markers following their interpretation using a secure interpretation system. The present invention provides a number of these secure interpretation systems that utilise the unique codes held within the modeled base sequences to store an identifying marker. Such markers are in turn assigned to a unique source i.e. the owner or maker of the item to which the tag has been applied. As discussed above, the present invention provides mechanisms for representing the base sequence within a nucleic acid chain (i.e. DNA) with a simple numerical code. Each sequence can therefore be represented by a numerical code that is unique to that sequence.
Under a first preferred system each of the four main bases: Adenine (A), Cytosine (C), Guanine (G) and Thymine (T), are assigned values of 1 ,2,3 and 4 respectively. This value may be represented via a 3 digit binary string. By replacing the bases in a particular sequence with their binary equivalents the resulting strings can then be combined to form a single composite string. Such a composite string may be used as a unique identifying marker in itself. However, it is more preferable that the decimal equivalent of the string is used to express the unique numerical value of that sequence.
Example 1
The following theoretical examples show how this would work with different sequences:
T A A A A A T G A C 100 001 001 001 001 001 100011 001 010 composite string 100001001001001100011001010 decimal code = 556046538
T T T T T T T T T T 100100100100100100100100100100 decimal code = 613566756
This coding can be used as a model to produce unique nucleic acid strands in an automated and computer controlled manner. It provides a mathematical block on the duplication of nucleic acid strands and is more easily managed than the accepted alphabetic labeling of the base sequences of the oligonucleotide. Such a system can be applied in a single level security product in which the unique base sequence in its entirety encodes the unique identifying marker.
Alternatively, a higher level of security may be created by using a two level marker system, wherein the first level of information is provided by a unique chemical formulation of separately identifiable trace materials, being represented by its own unique code and serving as the first level of information within the product. The second level being contained in the nucleic acid.
The nucleic acid strands can then be manufactured based on a mathematical relationship between it and the first level device. The mathematical relationship between the two, for security purposes, can be varied and be part of the information stored with the first level unique code.
The information stored in the first level unique code may be used to indicate the appropriate primer necessary to synthesize an effective amount of the nucleic acid chain, thus enabling the analysis of the unique identifying marker stored therein. It is appreciated that the first level unique code can be used to store other information relevant to the interpretation of the unique Identifying marker stored within the nucleic acid chain.
It is also appreciated that the first preferred system of the present invention is more suitable for relatively short oligonucleotides, e.g. less than 20 bases.
An alternative approach for a larger sequence would be to use just one base to carry the code. In a second preferred system, the positions occupied by a particular base within the coding section could be used to provide the code, using a binary approach. Therefore, within this system, the presence or absence of the chosen base can be represented by a 1 or 0 respectively. Any other bases can be used to make up the sequence and these would simply add a 0 to the string.
Furthermore, it will be appreciated that the information stored in the first level unique code may be used to indicate the chosen base for the interpretation of the code.
Either of the above preferred systems could be applied to a nucleic acid chain wherein all of the bases therein contribute to the unique identifying marker.
Alternatively, the above systems could be applied to a specific region of bases on a nucleic acid strand. In such cases, the information stored in the first level unique code may be used to indicate the start/end location and/or size of a sequence of bases within the nucleic acid chain, such sequence providing the unique identifying marker.
The start and end location points of the base sequence may alternatively be marked by a specific base sequence (usually four bases long).
Such an alternative may be appropriate in compositions of the present
invention that have only one level security, i.e. composition that only contain the nucleic acid strand.
Example 2
The start of the coding sequence could be given by the four part sequence AGCT, which sequence will only appear again at the end of the coding section. This also indicates that the coding will be obtained from the position of base A within the sequence.
AAAACCAAACAGCTAAACCCGGTGCAGCTGCTTTTTAAAA start{ }end
The coding sequence reads AAACCCGGTGC which produces the binary code 11100000000. To conform with normal binary code usage this should be reversed and read right to left i.e. 00000000111 or decimal value 7. Alternatively the coding sequence within the code area could be assembled to run right to left to match this form of usage.
Alternatively the sequence:
AAACCTTTGGAAGCTTTTTGGAAATGTTGGAAAAAAAAAAAGCTTTGGGGGAAAA
Code 0 0 0 0 0 0111 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 read as binary code
= 1111111111000000111000000 which in decimal equals 33522112. A thirty base coding sequence used in this manner would provide a basis for the generation and management of over 1 billion unique sequences.
It is appreciated that the present invention provides a mechanism for representing a nucleic acid (DNA) sequence as a simple decimal number, this
could be proved mathematically (using a secure interpretation system)to have a specific sequence.
For example, where x is a decimal number, x = ACGTACGT
Simple software would then produce the number x+1 :-
hence x+1 = ACGTACTG.
The software would then move on to x+2 and calculate that :-
x+2 = ACGTATCG etc...
Obviously, processor speeds would allow the production of thousands of such representation in seconds, which could then be used as a basis for synthesising the specific nucleic chain sequences by PCR.
The above disclosed secure interpretation systems will permit automation of the synthesis of nucleic acid chain that encode a unique identifying marker.
A computer programmed with the appropriate secure interpretation system could apply a simple process to produce a large number base sequences corresponding to a known sequence of unique decimal numbers.
Such a facility, would make the use of (DNA) nucleic acid chains in the storage of unique identifying markers a much more economical option.