NL9101181A

NL9101181A - Method and device for detecting one or more known character strings in a collection of characters

Info

Publication number: NL9101181A
Application number: NL9101181A
Authority: NL
Original assignee: Nederland Ptt
Priority date: 1991-07-05
Filing date: 1991-07-05
Publication date: 1993-02-01

Abstract

Method for detecting the presence of one or more search strings in a collection of characters which is to be searched, wherein a specific combination of characters from each search string is stored in a memory. From the collection of characters to be searched, combinations of a similar type are formed in windows and are then compared with the first-mentioned combinations. In the event of a correspondence between a first-mentioned combination and a last-mentioned combination, a code signal is emitted to carry out a more precise check for correspondence between the search string and the relevant part from the collection of characters to be searched. The method is highly suitable for the fast detection of computer viruses of which an identification string is known. A database can also be efficiently searched in this way for the occurrence of various search terms. <IMAGE>

Description

KONINKLIJKE PTT NEDERLAND N.V.ROYAL PTT NETHERLANDS N.V.

GRONINGENGRONINGEN

Titel : Werkwijze en inrichting voor het detecteren van een of meer bekende karakterstrings in een verzameling karakters.Title: Method and device for detecting one or more known character strings in a collection of characters.

De uitvinding heeft betrekking op een werkwijze voor het detecteren van de aanwezigheid van een of meer, verder als zoekstring aangeduide karakterstrings in een te onderzoeken verzameling karakters. De uitvinding is in het bijzonder geschikt voor het detecteren van computervirussen, echter ook voor het simultaan detecteren van verschillende zoektermen.The invention relates to a method for detecting the presence of one or more character strings, further referred to as search string, in a collection of characters to be examined. The invention is particularly suitable for detecting computer viruses, but also for simultaneously detecting different search terms.

Een computervirus of kortweg een virus is een programma, dat zich hecht aan andere programma’s of informatiedragers en dat de juiste werking daarvan verstoort. Een virus omvat evenals andere programma's een unieke verzameling karakters of een unieke code, die uit een aantal karakters is opgebouwd. Doorgaans wordt een karakter (byte) gecodeerd door 8 bits en kan derhalve 256 waarden aannemen (0 t/m 255). Van alle tot nu toe bekend geworden virussen zijn karakterstrings vastgesteld, die voor een bepaald virus karakteristiek zijn. Men spreekt in dit verband wel van een virusidentificatiestring. Een dergelijke virusidenti-ficatiestring omvat meestal tien tot twintig karakters. Indien zich in een computerprogramma een met een virusidentificatiestring overeenkomende combinatie van karakters voordoet, is de kans groot, dat sprake is van een viru's. Op dit moment zijn reeds honderden verschillende virussen bekend. Om programma's op de aanwezigheid van virussen te kunnen controleren, dienen derhalve honderden verschillende strings herkend te kunnen worden. Verwacht wordt dat het aantal verschillende virussen in de toekomst sterk zal toenemen.A computer virus or simply a virus is a program that attaches itself to other programs or information carriers and disrupts its proper functioning. A virus, like other programs, contains a unique collection of characters or a unique code, which consists of a number of characters. Typically, a character (byte) is encoded by 8 bits and can therefore take 256 values (0 to 255). Character strings have been identified for all viruses known so far, which are characteristic of a particular virus. One speaks in this context of a virus identification string. Such a virus identification string usually contains ten to twenty characters. If a combination of characters corresponding to a virus identification string occurs in a computer program, there is a high probability that a virus is involved. Hundreds of different viruses are already known. In order to be able to check programs for the presence of viruses, it is therefore necessary to recognize hundreds of different strings. The number of different viruses is expected to increase sharply in the future.

Volgens de thans gangbare methodes worden vi- russen opgespoord met behulp van een aftastinrichting of scanner, die in een computer aanwezige (bijvoorbeeld op de hard disk) karakters string voor string vergelijkt met alle bekende virusstrings. Daarbij worden de te onderzoeken karakters onderzocht door middel van 'vensters' die even lang zijn als de langste virusstring. Zodra een aldus onderzochte karakter-string een der bekende virusstrings blijkt te omvatten wordt een waarschuwingscodesignaal gegeven.According to the current methods, viruses are detected by means of a scanner or scanner, which compares characters (for example on the hard disk) in a computer string by string with all known virus strings. The characters to be examined are examined by means of 'windows' that are the same length as the longest virus string. As soon as a character string thus examined appears to comprise one of the known virus strings, a warning code signal is given.

Per virus (zoekstring) moet de hele verzameling karakters (op de hard disk) worden doorlopen en onderzocht. Voor bijvoorbeeld honderden virusstrings kost dat vrij veel tijd. Dat geldt ook voor het opzoeken van andere strings in een karakterverzameling, zoals bijvoorbeeld het (simultaan) opzoeken van een aantal trefwoorden in een data base, hetgeen thans niet anders dan sequentieel kan geschieden. Behoefte bestaat derhalve aan een methode om een verzameling karakters snel te onderzoeken op de aanwezigheid van een of (in het bijzonder) meer zoekstrings.The entire collection of characters (on the hard disk) must be gone through and examined per virus (search string). For hundreds of virus strings, for example, that takes a lot of time. This also applies to the search for other strings in a character collection, such as, for example, the (simultaneous) search for a number of keywords in a database, which can currently only be done sequentially. There is therefore a need for a method to quickly search a collection of characters for the presence of one or (in particular) more search strings.

De uitvinding beoogt in die behoefte te voorzien en een snelle en betrouwbare methode ter beschikking te stellen om de aanwezigheid van een of meer zoekstrings, bijvoorbeeld de identificatiestrings van een aantal computervirussen, in een verzameling karakters te kunnen detecteren. Hiertoe wordt volgens de uitvinding een werkwijze voor het detecteren van de aanwezigheid van een of meer, als zoekstring aangeduide, karakterstrings in een te onderzoeken verzameling karakters, opgeslagen in een computergeheugen, daardoor gekenmerkt dat een of meer op voorafbepaalde wijze vastgestelde combinaties van karakters uit elke zoekstring worden opgeslagen in een geheugen, dat op dezelfde voorafbepaalde wijze een of meer combinaties van karakters uit de te onderzoeken verzameling karakters worden gevormd, dat die laatstgenoemde combinaties worden vergeleken met de eerstgenoemde combina ties en dat bij overeenstemming tussen een eerstgenoemde combinatie en een laatstgenoemde combinatie een eerste codesignaal wordt afgegeven. Om een grote efficiëntie en daarmee snelheid in het met elkaar vergelijken van de karaktercombinaties te bewerkstelligen, voorziet de uitvinding bijvoorkeur in de toepassing van tabelmarkeringen, waardoor de uitvinding bij voorkeur erdoor wordt gekenmerkt dat tenminste één op voorafbepaalde wijze vastgestelde combinatie van tenminste twee karakters van elke zoekstring als markering wordt opgeslagen in tenminste één, in een geheugen gedefinieerde, tenminste twee-dimensionale tabel met tenminste twee tabelrichtingen, waarbij de eerste van de tenminste twee karakters in de eerste tabel-richting wordt afgebeeld, de tweede van die in de tweede tabelrichting etc., dat op dezelfde voorafbepaalde wijze een of meer combinaties van karakters uit de te onderzoeken verzameling karakters worden gevormd waarvan op dezelfde wijze de bijbehorende plaats in de genoemde tabel wordt bepaald, dat gecontroleerd wordt of de aldus bepaalde bijbehorende tabelplaats een markering bevat en dat, indien de genoemde bijbehorende tabelplaats een markering bevat een eerste codesignaal wordt afgegeven.The object of the invention is to meet this need and to provide a fast and reliable method for detecting the presence of one or more search strings, for example the identification strings of a number of computer viruses, in a set of characters. To this end, according to the invention, a method for detecting the presence of one or more character strings, designated as search string, in a collection of characters to be examined, stored in a computer memory, characterized in that one or more predetermined combinations of characters from each search string are stored in a memory which, in the same predetermined manner, forms one or more combinations of characters from the set of characters to be examined, comparing said latter combinations with the former combinations and, when matching between a former combination and a latter combination a first code signal is output. In order to achieve a high efficiency and thus speed in comparing the character combinations with each other, the invention preferably provides for the use of table markers, whereby the invention is preferably characterized in that at least one predetermined combination of at least two characters of each search string as marker is stored in at least one memory-defined at least two-dimensional table with at least two table directions, the first of the at least two characters being mapped in the first table direction, the second of those in the second table direction etc that in the same predetermined manner one or more combinations of characters are formed from the set of characters to be examined, the corresponding position of which is determined in the said table in the same manner, that it is checked whether the corresponding table location thus determined contains a marking and that, if the g said associated table location a marker containing a first code signal is output.

De uitvinding omvat tevens een detectie-inrichting voor het detecteren van een of meer als zoek-strings aangeduide karakterstrings in een verzameling te onderzoeken karakters, met behulp van de voorgaande werkwijze, die gekenmerkt wordt door een eerste geheu-geninrichting voor het opslaan van de verzameling te onderzoeken karakters, een tweede geheugeninrichting voor het opslaan van de zoekstrings, een derde geheugeninrichting waarin tenminste één n-dimensionale tabel is gedefinieerd, een omzetter die de tenminste één vooraf bepaalde combinatie van n karakters van elke zoekstring in de tweede geheugeninrichting omzet in een markering in de tenminste ene tabel van de derde geheugeninrichting op een n-dimensionale plaats, overeenkomend met de resp. waarden van die n karakters, verder een werkgeheugen waaraan in bedrijf opeenvolgende karakters uit de eerste geheugeninrichting worden toegevoerd, een inrichting die uit de opeenvolgende karakterstring in het werkgeheugen telkens de tenminste één voorafbepaalde combinatie van n karakters vormt en daarvan de overeenkomstige n-dimensionale plaats in de genoemde tabel vaststelt, een eerste vergelijkinrichting die telkens de inhoud van de tabel op die vastgestelde tabelplaats detecteert en die, indien die tabelplaats een markering bevat, aan een uitgang een eerste codesignaal afgeeft.The invention also includes a detection device for detecting one or more character strings designated as search strings in a collection of characters to be examined, using the foregoing method, which is characterized by a first memory device for storing the collection characters to be examined, a second memory device for storing the search strings, a third memory device in which at least one n-dimensional table is defined, a converter which converts the at least one predetermined combination of n characters of each search string in the second memory device into a marker in the at least one table of the third memory device in an n-dimensional position, corresponding to the resp. values of those n characters, furthermore a working memory to which consecutive characters from the first memory device are applied in operation, a device which from the successive character string in the working memory always forms the at least one predetermined combination of n characters and the corresponding n-dimensional position thereof. determines said table, a first comparator which in each case detects the content of the table at said determined table location and which, if said table location contains a mark, outputs a first code signal at an output.

In het volgende zal de uitvinding nader worden beschreven met verwijzing naar de bijgevoegde figuren.In the following, the invention will be further described with reference to the accompanying figures.

Figuur 1 toont bij wijze van voorbeeld een bij een werkwijze en inrichting volgens de uitvinding toepasbare 2-dimensionele tabel; figuur 2 toont schematisch een voorbeeld van een detectie-inrichting volgens de uitvinding.Figure 1 shows by way of example a 2-dimensional table which can be used in a method and device according to the invention; Figure 2 schematically shows an example of a detection device according to the invention.

In de volgende beschrijving zal bij wijze van voorbeeld worden uitgegaan van de detectie van één of meer computervirussen. Zoals uit het voorgaande reeds bleek is de uitvinding evenzeer van toepassing op het detecteren van andere karakterstrings in een verzameling karakters, bijvoorbeeld een aantal zoektermen die via het toetsenbord van de computer kunnen worden ingevoerd.The following description will assume, by way of example, the detection of one or more computer viruses. As already apparent from the foregoing, the invention also applies to detecting other character strings in a collection of characters, for example a number of search terms that can be entered via the computer keyboard.

De in figuur 1 getoonde tabel 1 bevat informatie over bekende virusidentificatiestrings (zoek-strings). Deze informatie bestaat echter volgens de uitvinding niet uit de complete virusidentificatiestrings maar uit een combinatie van een vooraf bepaald aantal karakters van alle bekende virusidentificatiestrings of althans van alle virusidentificatiestrings waarvan men de aanwezigheid wenst te kunnen detecteren. Men zou kunnen zeggen, dat de tabel een uittrek sel van elke te detecteren virusidentificatiestring bevat.Table 1 shown in Figure 1 contains information about known virus identification strings (search strings). However, according to the invention, this information does not consist of the complete virus identification strings but of a combination of a predetermined number of characters of all known virus identification strings or at least of all virus identification strings whose presence is desired to be detectable. It could be said that the table contains an extract from each virus identification string to be detected.

Volgens de uitvinding wordt de tabel zodanig samengesteld dat deze grotendeels leeg is en zeer snel door een computer of een speciale scanner is te doorzoeken. Hiertoe zou bijvoorbeeld elk uittreksel van een virusidentificatiestring kunnen bestaan uit de eerste twee karakters. In de tabel wordt dan bijvoorbeeld de bijbehorende string aangegeven door een markering op het snijpunt van de rij met het nummer (bijvoorbeeld ASCII-nummer) van het eerste karakter en de kolom met het nummer van het tweede karakter.According to the invention, the table is composed in such a way that it is largely empty and can be searched very quickly by a computer or a special scanner. For this purpose, for example, each extract from a virus identification string could consist of the first two characters. For example, in the table, the corresponding string is indicated by a marker at the intersection of the row with the number (for example, ASCII number) of the first character and the column with the number of the second character.

Bij het controleren van karakters op de mogelijke aanwezigheid van een virus wordt nu in eerste aanleg slechts nagegaan of in de verzameling karakters een combinatie van twee opeenvolgende karakters voorkomt die in de bovengenoemde tabel zou leiden tot een markering op dezelfde plaats als één van de virusidentif icatiestrings. Een dergelijke controle kan zeer snel uitgevoerd worden.When checking characters for the possible presence of a virus, it is now only checked in the first instance whether the collection of characters contains a combination of two consecutive characters that in the above table would lead to a marking in the same place as one of the virus identification strings . Such a check can be carried out very quickly.

Als nu op een bepaalde plaats in de verzameling de genoemde combinatie van karakters een tabelplaats aanwijst die reeds door een virusidentificatiestring is gemarkeerd, bestaat de kans, dat de verzameling daar een virus bevat. Slechts in dat geval wordt volgens de uitvinding een nader, nauwkeuriger onderzoek ingesteld om na te gaan of inderdaad sprake is van de aanwezigheid van een virus.If at a certain location in the collection the said combination of characters designates a table location already marked by a virus identification string, there is a chance that the collection contains a virus there. Only in that case, according to the invention, is a closer, more detailed investigation to establish whether there is indeed a virus present.

Volgens de uitvinding wordt derhalve de te controleren verzameling karakters opeenvolgend ven-stergewijs beschouwd, waarbij op steeds per beschou-wingsvenster (dat bij een combinatie van twee opeenvolgende karakters slechts twee karakters breed is) de op voorafbepaalde wijze vastgestelde combinaties van karakters vergeleken met op dezelfde voorafbepaalde wijze gevormde en als markering in een tabel opgeslagen combinatie van karakters van bekende virusiden- tificatiestrings. Uitsluitend als bij deze vergelijk-stap overeenstemmende combinaties worden gedetecteerd wordt een nader onderzoek verricht naar de mogelijke aanwezigheid van een virus.According to the invention, therefore, the set of characters to be checked is considered successively in a window-wise manner, whereby the combinations of characters determined in a predetermined manner are always shown in each description window (which is only two characters wide in the case of a combination of two successive characters). predetermined fashioned and tagged combination of characters from known virus identification strings. Only if corresponding combinations are detected in this comparison step, a further investigation is conducted into the possible presence of a virus.

Ter nadere toelichting wordt bij wijze van voorbeeld uitgegaan van een drietal virusidenditificatiestrings a, b en c bestaande uit de opeenvolgende karakters: a) 1 - 33 - 27 234 b) 254 - 126 99 127 c) 0 227 - 158 - 216For further explanation, an example is assumed of three virusid identification strings a, b and c consisting of the following characters: a) 1 - 33 - 27 234 b) 254 - 126 99 127 c) 0 227 - 158 - 216

Deze numerieke waarden van de karakters kunnen ASCII-of dergelijke waarden zijn en bijvoorbeeld letters, cijfers en dergelijke representeren. In de bovengenoemde tabel kunnen van deze drie strings bijvoorbeeld de combinaties van de eerste twee karakters worden opgeslagen, waarbij het eerste karakter bijvoorbeeld met het rijnummer (eerste tabelrichting) en het tweede karakter met het kolomnummer (tweede tabelrichting) van de tabel correspondeert. String a) leidt aldus tot een markering op het snijpunt van rij 1 en kolom 33; string b) tot een markering op het snijpunt van rij 254 en kolom 126; en string c) tot een markering op rij 0 en kolom 227.These numerical values of the characters can be ASCII or similar values and represent, for example, letters, numbers and the like. In the above table, of these three strings, for example, the combinations of the first two characters can be stored, the first character corresponding for instance to the row number (first table direction) and the second character to the column number (second table direction) of the table. String a) thus leads to a marking at the intersection of row 1 and column 33; string (b) to a marker at the intersection of row 254 and column 126; and string c) to a marker on row 0 and column 227.

In het in figuur 1 getoonde voorbeeld is de tabel gecomprimeerd door van het tweede karakter niet de werkelijke waarde maar de "modulo 16" waarde te gebruiken. De "modulo X" waarde van een getal is de restwaarde, die overblijft nadat het getal door X is gedeeld. De modulo 10 waarde van 17 is dus 7. Evenzo is de modulo 16 waarde van 25 gelijk aan 9. Door de modulo 16 waarde van het tweede karakter te gebruiken kan volstaan worden met 16 kolommen in plaats van de normaliter benodigde 256 kolommen.In the example shown in Figure 1, the table is compressed by using the "modulo 16" value of the second character not. The "modulo X" value of a number is the residual value, which remains after the number is divided by X. The modulo 10 value of 17 is therefore 7. Likewise, the modulo 16 value of 25 is equal to 9. By using the modulo 16 value of the second character, 16 columns can suffice instead of the normally required 256 columns.

In de tabel van figuur 1 leiden de strings a, b en c derhalve tot de met kruisjes aangegeven markeringen op de plaatsen (1,1), (254,14) en (0,3), immers 33 modulo 16 is 2, 126 modulo 16 is 14 en 227 modulo 16 is 3. De tabel heeft 16 x 256 plaatsen. Tweehonderd virusidentificatiestrings beslaan tweehonderd plaatsen, zodat de tabel grotendeels leeg is en zeer snel doorzocht kan worden.In the table of figure 1, the strings a, b and c therefore lead to the markings with crosses at the places (1,1), (254,14) and (0,3), since 33 modulo 16 is 2, 126 modulo 16 is 14 and 227 modulo 16 is 3. The table has 16 x 256 spaces. Two hundred virus identification strings occupy two hundred places, so that the table is largely empty and can be searched very quickly.

Als het aantal bekende virusidentificatiestrings toeneemt kan de tabel desgewenst vergroot worden door de tweede karakters als modulo Y waarde te noteren, waarbij Y > 16 of door de "normale” waarde van het tweede karakter te noteren. In het laatste geval heeft de tabel 256 rijen en 256 kolommen.If the number of known virus identification strings increases, the table can be expanded if desired by noting the second characters as modulo Y value, where Y> 16 or by noting the "normal" value of the second character. In the latter case, the table has 256 rows and 256 columns.

Om een verzameling karakters te controleren op de mogelijke aanwezigheid van een virus wordt telkens van elke twee opeenvolgende karakters van die verzameling bepaald welke positie in de tabel bij die twee karakters behoort. Correspondeert op zeker moment die positie met een markering (kruisje) behorend bij één der bekende virussen, dan is de kans aanwezig dat de verzameling op die plaats een virus bevat. Correspondeert de gevonden positie niet met een virusmarkering dan wordt het volgende venster van twee karakters van de verzameling karakters gecontroleerd, etc.In order to check a collection of characters for the possible presence of a virus, it is determined for every two consecutive characters of that collection which position in the table belongs to those two characters. If at some point that position corresponds to a mark (cross) belonging to one of the known viruses, there is a chance that the collection contains a virus at that location. If the found position does not correspond with a virus marker, the next two-character window of the character set is checked, etc.

Een dergelijke controle kan zeer snel geschieden. Bovendien is de snelheid waarmee de controle kan worden uitgevoerd nauwelijks afhankelijk van het aantal virusmarkeringen. Bij toepassing van de in figuur 1 getoonde tabel, waarin van het tweede karakter steeds de modulo 16 waarde is gebruikt werd slechts een verschil in controletijd van ± 0,1% geconstateerd tussen de situatie waarin 150 virusmarkeringen in de tabel zijn aangebracht en de situatie waarin slechts één virusmarkering in de tabel is aangebracht.Such a check can be done very quickly. Moreover, the speed with which the check can be carried out hardly depends on the number of virus marks. When using the table shown in figure 1, in which the modulo 16 value was always used for the second character, only a difference in control time of ± 0.1% was found between the situation in which 150 virus markings were applied in the table and the situation in which only one virus marker is included in the table.

Bij wijze van voorbeeld zal in het volgende de controle van een karakterbestand met behulp van de tabel van figuur 1 met de aangegeven virusmarkeringen op de posities (0,3), (1,1) en (254,14) worden beschreven .By way of example, the following will describe the checking of a character file using the table of Figure 1 with the indicated virus markers at positions (0.3), (1.1) and (254.14).

Stel dat in een verzameling karakters de volgende karakters voorkomen: 17 - 28 - 254 - 110 -1-33-27-67-35.Suppose that a collection of characters contains the following characters: 17 - 28 - 254 - 110 -1-33-27-67-35.

Om deze karakters te controleren op de mogelijke aanwezigheid van een virus worden de eerste twee karakters genomen: (17,28). De corresponderende tabelposi-tie is (17,12). In de tabel staat op die positie geen kruisje, en deze combinatie komt dus niet overeen met een virus. Het volgende venster omvat de twee karakters (28,254); in de tabel komt (28,14) niet voor. De volgende te beschouwen karakters zijn (254,110); in de tabel komt (254,14) voor. Nu is misschien een virus gevonden want de module 16 waarde van 110 is 14. Derhalve is nader onderzoek nodig. Van alle virusstrings, die als eerste karakter 254 hebben, worden nu de karakters één voor één vergeleken met de karakters in de karakterverzameling. Bij string b) blijkt het eerste karakter wel, maar het tweede karakter niet te voldoen. Virusstring b) komt dus niet voor op deze plaats. Tevens was dit de laatste virusstring waarvan het eerste karakter 254 is, zodat geen enkele virusstring meer voorkomt waarvan het eerste karakter 254 is.To check these characters for the possible presence of a virus, the first two characters are taken: (17,28). The corresponding table position is (17,12). In the table there is no cross at that position, so this combination does not correspond to a virus. The next window includes the two characters (28,254); (28.14) does not appear in the table. The following characters to consider are (254,110); (254.14) appears in the table. Now maybe a virus has been found because the module 16 value of 110 is 14. Therefore further investigation is needed. Of all virus strings, which have 254 as the first character, the characters are now compared one by one with the characters in the character set. In string b) the first character appears to be satisfactory, but the second character is not satisfactory. Thus, virus string b) does not occur in this place. This was also the last virus string of which the first character is 254, so that no virus string occurs anymore of which the first character is 254.

Verderop in de te controleren karakters komt de combinatie 1,33 van twee opeenvolgende karakters voor. De modulo 16 waarde van 33 is 1, zodat deze combinatie met positie (1,1) in de tabel overeenkomt. Op deze positie is inderdaad een virusmarkering aanwezig, zodat gecontroleerd dient te worden of sprake is van een virus.Later in the characters to be checked the combination of 1.33 of two consecutive characters occurs. The modulo 16 value of 33 is 1, so that this combination corresponds to position (1,1) in the table. A virus marking is indeed present at this position, so it must be checked whether there is a virus.

Hiertoe worden alle virusstrings, die een eerste karakter met de waarde 1 hebben vergeleken met de karakters uit het karakterbestand, beginnend met het karakter met de waarde 1. Virusidentitifcatiestring a) begint met een karakter met de waarde 1 en het volgende karakter heeft evenals het volgende karakter van het karakterbestand de waarde 33. Ook de eerstvolgende karakters zijn gelijk. Het vierde karakter van de virusidentificatiestring verschilt echter van het vierde karakter na het karakter met de waarde 1 van het karakterbestand, zodat geconcludeerd moet worden, dat virus a) niet aanwezig is. Ook komen in de verzameling van bekende virusidentificatiestrings geen andere strings voor die met een karakter met de waarde 1 beginnen, zodat het karakterbestand geen (bekend) virus bevat.To this end, all virus strings that have a first character with the value 1 are compared with the characters from the character file, starting with the character with the value 1. Virus identification string a) starts with a character with the value 1 and has the following character as well as the following character of the character file the value 33. The next characters are also the same. However, the fourth character of the virus identification string differs from the fourth character after the character with the value 1 of the character file, so it must be concluded that virus a) is not present. Also, in the collection of known virus identification strings there are no other strings that start with a character with the value 1, so that the character file does not contain a (known) virus.

Bij toepassing van de werkwijze volgens de uitvinding behoeft in slechts een gering aantal gevallen daadwerkelijk een groter aantal opeenvolgende karakters van een karakterbestand vergeleken te worden met een aantal of alle karakters van een virusidenti-ficatiestring. Daar deze vergelijking relatief tijdrovend is, leidt de werkwijze volgens de uitvinding tot een aanzienlijke tijdsbesparing. In de praktijk behoeft slechts in ongeveer 1 op 10.000 gevallen die relatief tijdrovende controle te worden uitgevoerd, terwijl in de overige gevallen volstaan kan worden met een controle met behulp van de tabel.When using the method according to the invention, in a small number of cases, a larger number of consecutive characters of a character file need actually be compared with some or all of the characters of a virus identification string. Since this comparison is relatively time consuming, the method according to the invention leads to a considerable time saving. In practice, this relatively time-consuming check only has to be carried out in about 1 in 10,000 cases, while in the other cases a check using the table is sufficient.

De viruscontrole kan hierdoor bij toepassing van de uitvinding bijvoorbeeld tegelijk met het invoeren of kopiëren van gegevens plaatsvinden.The virus control can hereby take place, for example when the invention is applied, simultaneously with entering or copying data.

Figuur 2 toont schematisch een voorbeeld van een inrichting volgens de uitvinding. De getoonde inrichting omvat een geheugentabel ST, waarin de bekende virusidentificatiestrings in volledige vorm zijn opgeslagen. Voorts is een geheugen LT aanwezig, waarin een tabel is gedefinieerd en waarin de uittreksels van de virusidentificatiestrings worden opgeslagen. Deze tabel komt dus overeen met de tabel van figuur 1, waarin in het beschreven voorbeeld de eerste twee karakters van de virusidentificatiestrings zijn opgenomen .Figure 2 schematically shows an example of a device according to the invention. The device shown comprises a memory table ST, in which the known virus identification strings are stored in full form. Furthermore, a memory LT is provided, in which a table is defined and in which the extracts of the virus identification strings are stored. Thus, this table corresponds to the table of Figure 1, in which in the described example the first two characters of the virus identification strings are included.

Een omzetter TT vormt uit de karakters van het geheugen ST de in de tabel LT te plaatsen markeringen.A converter TT forms from the characters of the memory ST the markings to be placed in the table LT.

In een geheugen MS zijn de te controleren verzameling karakters opgeslagen en uit dat geheugen MS wordt steeds een deel van de karakters overgebracht naar een werkgeheugen, dat deel uitmaakt van een ver-gelijkinrichting C. De vergelijkinrichting C omvat een eerste comparator Cl, die de deelverzameling karakters in het werkgeheugen W kan vergelijken met de tabel LT. Indien van een gecomprimeerde tabel gebruik wordt gemaakt, waarin telkens het tweede karakter als een modulo X of modulo Y getal is weergegeven, dient tussen het geheugen MS en het werkgeheugen W, of tussen het werkgeheugen W en de comparator Cl, dan wel in het werkgeheugen W of in de comparator Cl nog een vertaling van het tweede karakter van een paar opeenvolgende karakters naar de bijbehorende modulo X- of modulo Y-waarde plaats te vinden. In het getoonde voorbeeld is een modulo-omzetter M getoond tussen het werkgeheugen W en de eerste comparator Cl.The set of characters to be checked is stored in a memory MS and a part of the characters is always transferred from that memory MS to a working memory which forms part of a comparator C. The comparator C comprises a first comparator C1, which contains the subset characters in the working memory W can compare with the table LT. If use is made of a compressed table, in which the second character is always represented as a modulo X or modulo Y number, between the memory MS and the working memory W, or between the working memory W and the comparator Cl, or in the working memory W or in the comparator C1 a translation of the second character of a few consecutive characters to the associated modulo X or modulo Y value takes place. In the example shown, a modulo converter M is shown between the working memory W and the first comparator C1.

Als bij de door de eerste comparator Cl uitgevoerde vergelijkstap blijkt, dat de met een tweetal opeenvolgende karakters van de te controleren verzameling karakters corresponderende positie in de tabel voorzien is van een bij een virusidentificatiestring behorende markering, verschaft de eerste comparator Cl een eerste codesignaal aan een tweede comparator C2.If, in the comparison step performed by the first comparator C1, it appears that the position corresponding to two consecutive characters of the set of characters to be checked in the table is provided with a marking associated with a virus identification string, the first comparator C1 supplies a first code signal to a second comparator C2.

De tweede comparator C2 heeft twee ingangen. De ene ingang is direkt verbonden met de geheugentabel ST, die de complete virusidentificatiestrings bevat.The second comparator C2 has two inputs. One entry is directly connected to the memory table ST, which contains the complete virus identification strings.

De andere ingang is verbonden met het werkgeheugen W zodat de tweede comparator, na ontvangst van een eerste codesignaal van de eerste comparator, de in het werkgeheugen opgeslagen karakters kan vergelijken met de volledige virusidentificatiestrings, die in het geheugen ST zijn opgeslagen.The other input is connected to the working memory W so that the second comparator, after receiving a first code signal from the first comparator, can compare the characters stored in the working memory with the complete virus identification strings stored in the memory ST.

Als blijkt, dat inderdaad een met een virusidentificatiestring overeenkomende karakterstring in de gecontroleerde karakters voorkomt, wordt aan een ge- heugen RS een tweede (alarm)codesignaal toegevoerd.If it appears that a character string corresponding to a virus identification string indeed occurs in the checked characters, a second (alarm) code signal is applied to a memory RS.

Ook kan een alarminrichting, zoals bijvoorbeeld een LED, worden bekrachtigd.An alarm device, such as, for example, an LED, can also be activated.

Als het geheugen RS na vergelijking van alle te controleren karakters met de tabel LT en eventueel met de tabel ST een alarmindicatie (tweede codesignaal) blijkt te bevatten, dienen nadere maatregelen te worden genomen, met name voor het onschadelijk maken van de gedetecteerde virus(sen).If, after comparing all the characters to be checked with the table LT and possibly with the table ST, the memory RS appears to contain an alarm indication (second code signal), further measures must be taken, in particular to render the detected virus (es) harmless ).

Als het geheugen RS daarentegen na de uitgevoerde controle geen alarmindicatie bevat, kunnen de gecontroleerde karakters daadwerkelijk verder worden gebruikt, bijvoorbeeld voor het uitvoeren van een applicatieprogramma.If, on the other hand, the memory RS does not contain an alarm indication after the check has been carried out, the checked characters can actually be used further, for example for running an application program.

Opgemerkt wordt, dat na het voorgaande diverse modificaties voor de deskundige voor de hand liggen.It is noted that after the foregoing various modifications are obvious to the skilled person.

Zo kan bijvoorbeeld van meerdere tabellen gebruik worden gemaakt. Een eerste tabel kan dan zoals hierboven beschreven, de eerste en tweede karakters van een virusidentificatiestring representeren; een tweede tabel de tweede en derde karakters; een derde tabel de derde en vierde karakters; een vierde tabel de vierde en tweede karakters; etc.For example, multiple tables can be used. A first table may then, as described above, represent the first and second characters of a virus identification string; a second table the second and third characters; a third table the third and fourth characters; a fourth table the fourth and second characters; etc.

Bij het gebruik van meer dan één tabel wordt eerst met behulp van een eerste tabel gecontroleerd. Als zich in die tabel een markering bevindt op een met corresponderende karakters van een string karakters overeenkomende plaats, wordt een volgende controle uitgevoerd met een volgende tabel. Pas als in alle tabellen markeringen zijn aangetroffen wordt een vergelijking met één of meer volledige virusidentifica-tiestrings uitgevoerd.When using more than one table, a first table checks first. If there is a mark in that table in a place corresponding to corresponding characters of a string of characters, a subsequent check is performed with a subsequent table. Only when markings are found in all tables is a comparison with one or more complete virus identification strings performed.

Ook kan, zoals reeds vermeld, naar wens gebruik worden gemaakt van één of meer gecomprimeerde tabellen dan wel van niet gecomprimeerde tabellen. Ook kan van drie- of meerdimensionale tabellen (matrices, arrays) gebruik worden gemaakt.As already mentioned, one or more compressed tables or uncompressed tables can also be used as desired. Three- or multi-dimensional tables (matrices, arrays) can also be used.

Op dezelfde wijze als in het voorgaande beschreven kunnen andere zoekstrings gedetecteerd worden, bijvoorbeeld bij het doorzoeken van een database op het voorkomen van een aantal zoektermen.In the same manner as described above, other search strings can be detected, for example when searching a database for the occurrence of a number of search terms.

Claims

Method for detecting the presence of one or more character strings further referred to as search string in a collection of characters to be examined, stored in a computer memory, characterized in that one or more predetermined combinations of characters from each search string are stored in a memory which, in the same predetermined manner, forms one or more combinations of characters from the set of characters to be examined, compares said latter combinations with the former combinations and, in accordance with a correspondence between a former combination and a latter combination, a first code signal is issued.

Method according to claim 1, characterized in that at least one predetermined combination of at least two characters of each search string is stored as a marker in at least one memory-defined at least two-dimensional table with at least two table directions, the first of the at least two characters, in the first table direction, are depicted, the second of those in the second table direction, etc., that in the same predetermined manner one or more combinations of characters are formed from the set of characters to be examined, the corresponding place of which in said The table determines that it is checked whether the associated associated table location thus contains a marker and, if said associated table location contains a marker, a first code signal is output.

Method according to claim 2, wherein each character is encoded by x bits, characterized in that the modulo X value of the character concerned is used to determine the table position of at least one of the characters associated with a combination of characters, wherein X <2X.

Method according to claim 2 or 3, characterized in that two or more tables are used, the locations in each table corresponding to a predetermined specific combination of characters of a search string, respectively of the set of characters to be examined.

Method according to claim 1 or 2, characterized in that after the output of said first code signal, the set of characters to be examined at the relevant location is compared with the relevant search string or at least a part thereof, wherein, upon agreement, a second code signal is found is issued.

Detection device for detecting one or more character strings, further referred to as search strings, in a collection of characters to be examined, using the method according to any one of the preceding claims, characterized by a first memory device (MS) for storing the set of characters to be examined, a second memory device (ST) for storing the search strings, a third memory device (LT) in which at least one n-dimensional table is defined, a converter (TT) containing the at least one converts a certain combination of n characters from each search string in the second memory device (ST) into a marker in at least one table of the third memory device (LT) at an n-dimensional location, corresponding to the respective values of those n characters, furthermore a working memory (W) to which successive characters from the first memory device are applied in operation, a device (M) which is supplied from the successive e character string, in the working memory (W) each time forms the at least one predetermined combination of n characters and determines the corresponding n-dimensional position thereof in the said table, a first comparator (Cl) which each time the content of that table on that detects a determined table location and which, if that table location contains a marker, gives a first code signal to an output.

Detection device according to claim 6, characterized by a second comparison device (C2), comprising a control input connected to the output of the first comparison device, an output, a first input connected to the working memory (W) and one to the second memory device (ST) connected input for comparing the content of at least a part of the working memory with at least a part of the content of the second memory device after receipt of the said first code signal and for supplying a second code signal upon proven agreement.

Detection device according to claim 7, characterized in that the output of the second comparison device (C2) is connected directly or indirectly to an alarm device (RS).

Detection device according to any one of claims 6 to 8, characterized in that the maximum value of a dimension of the at least one n-dimensional table is equal to the maximum value of the character corresponding to that dimension from said combination of n characters.

Detection device according to any one of claims 6 to 9, wherein each character is encoded by x bits, characterized in that of the at least one n-dimensional table the maximum value of a dimension is equal to the modulo X value of the maximum value of the character corresponding to that dimension from the combination of n characters, where X <2X.

Computer system provided with a detection device according to any one of claims 6 to 10.