WO2007020466A2

WO2007020466A2 - Data classification apparatus and method

Info

Publication number: WO2007020466A2
Application number: PCT/GB2006/003111
Authority: WO
Inventors: Christopher Kirkham; Roberta Cambio; Helge Nareid
Original assignee: Axeon Limited
Priority date: 2005-08-19
Filing date: 2006-08-18
Publication date: 2007-02-22
Also published as: WO2007020466A3; WO2007020456A3; WO2007020456A2

Abstract

The present invention relates to data classification apparatus (10). The data classification apparatus (10) comprises a data processor (13) operative to present a classification input, which corresponds to data to be classified, to each of a plurality of neural networks (15). Each neural network of the plurality of neural networks (15) has a different response characteristic corresponding to a different, predetermined classification, with each neural network being operable to produce a network output in dependence upon the neural network's response characteristic and the received classification input. The data classification apparatus also has a classification processor (17) operable to receive a network output from each neural network of the plurality of neural networks (15) and, in dependence upon at least one received network output, to determine if the classification input belongs to at least one of the plurality of different, predetermined classifications.

Description

Data Classification, Apparatus and Method

Field of the invention

The present invention relates to data classification apparatus and a method of data classification, which make use of neural networks .

Background to the invention

Artificial Neural Network (ANN) technology is widely used in a variety of applications to which conventional programming and processing techniques are unsuited. These applications include systems having complex data relationships, such as problems involving a large number of variables, classification applications, pattern recognition and control applications, and function estimation.

Different neural network architectures have been developed for different applications, such as SeIf- Organising Map (SOM) networks described in Kohonen, T., ISBN 3-540-67921-9, published by Springer, or Learning Vector Quantisation (LVQ) networks and a number of derived models .

An alternative neural network architecture is the subject of International Patent Publication number WO 00/45333 in the name of Axeon Limited, and is marketed under the Vindax® technology brand. The technology concerns a modular approach to neural network architecture, based on an adaptation of the Kohonen SOM algorithm. The technology is generally referred to as the modular map processor or architecture. The modular map technology has particular application as a classifier, in which a discrete value is output from a set of possible outputs.

US Patent Number 6,805,668 describes an ANN structure that is used for classification applications and which contains a number of networks in parallel whose responses are numerically integrated to obtain a cumulative result from the system. In US 6,805,668 "backpropagation" (MLP) networks are utilised. Target output values for the training data are therefore required in order for the algorithm to function.

The present inventors have appreciated that known artificial neural networks for classification applications have shortcomings.

It is therefore an object of the present invention to provide data classification apparatus and a method of classifying data.

It is a further object of the present invention to provide data classification apparatus comprising a neural network structure and a method of classifying data comprising operating a neural network structure.

Statement of invention

According to a first aspect of the present invention, there is provided data classification apparatus comprising: a data processor operative to present a classification input, which corresponds to data to be classified, to each of a plurality of neural networks, each neural network of the plurality of neural networks having a different response characteristic corresponding to a different, predetermined classification, each neural network being operable to produce a network output in dependence upon the neural network's response characteristic and the received classification input; and a classification processor operable to receive a network output from each neural network of the plurality of neural networks and, in dependence upon at least one received network output, to determine if the classification input belongs to at least one of the plurality of different, predetermined classifications. 3 4 More specifically, the classification processor may be 5 operative to determine that the classification input 6 belongs to none of the plurality of different 7 predetermined classifications. Q O 9 Alternatively or in addition, the classification 0 processor may be operative to receive a network output 1 from each neural network and to make its determination in 2 dependence on the received network outputs. In use, the data classification apparatus may, for example, be operative to classify data from an automobile engine as being indicative of normal combustion events in the engine, of misfire events in the engine or of neither normal combination nor misfire events. More specifically, a classification input, which corresponds to the acquired data, may be presented to each of two neural networks. In this example, a first of the two neural networks has been trained to have a response characteristic corresponding to normal combustion. The second of the two neural networks has been trained to have a response characteristic corresponding to misfire events. The classification processor may receive network outputs from each of the two neural networks and in dependence upon the two network outputs may determine that the acquired data represents either a misfire event or normal combination. Alternatively, the classification processor may be operative to determine that the acquired data represents neither a misfire event nor normal combustion.

The ANN structure of US Patent Number 6,805,668 has three neural networks, with each network receiving sleep data and generating a sleep stage score by selecting one of six scores or classifications. In contrast, the data classification apparatus of the present invention has a plurality of networks, with each network having a response characteristic that corresponds to a single, different classification rather than a plurality of classifications common to all networks. Thus, the data classification apparatus of the present invention comprises at least the same number of neural networks as different predetermined classifications. For example, data classification apparatus having two classifications, such as misfire events and normal combustion, has at least two neural networks.

The data classification apparatus of the present invention differs from known neural network apparatus comprising a single neural network. Such single neural network arrangements include those described in EP 546835 and EP 689154.

As mentioned above, WO 00/45333 relates to a modular approach to neural network architecture. The modular approach described in WO 00/45333 involves providing a plurality of neural network modules such that they have either a lateral configuration or a hierarchical configuration. In the lateral configuration, each of plurality of neural network modules is configured to respond to a different part of a map for a particular classification. An input vector is provided to each of the plurality of neural network modules and the network responds with a single active neuron, which belongs to one of the neural network modules. In other words, operation of the neural network modules WO 00/45333 is synchronised such that the modules operate together as a single neural network. In effect, the lateral configuration provides a large network that is spread across the plurality of modules. In the hierarchical configuration, each of a plurality of neural network modules forming an input layer is provided with a different part of an input vector and the outputs from the modules provided as inputs to an output layer comprising fewer network modules than the input layer. In effect, the hierarchical configuration caters for input vectors that are larger than can be accepted by a single neural network module. Irrespective of the configuration, WO 00/45333 describes architectures in which a plurality of neural network modules functions as a single network that caters by means of the modular approach for either a larger map or larger input vector than can be accommodated by a single neural network module. In contrast, the present invention relates to a configuration in which each of the plurality of networks can operate autonomously to provide an output corresponding to a different classification (e.g. misfire or normal combustion) , with the outputs being subject to arbitration by the classification processor.

Alternatively, the classification processor may be operative to determine that the classification input belongs to at least two of the plurality of different predetermined classifications.

Alternatively or in addition, the plurality of neural networks may be comprised in a single neural network array. Thus, although the plurality of networks form part of one neural network array, each of the plurality of neural networks may be configured and operative as a separate neural network.

Alternatively or in addition, the plurality of networks may be comprised in an unsupervised neural network arrangement .

More specifically, the plurality of neural networks may be comprised in a Self -Organising Map ( SOM) neural network arrangement . Alternatively or in addition, at least one neural network may be operative such that the produced network output comprises a distance metric associated with a responding (i.e. winning) neuron of the neural network.

More specifically, at least one neural network may be operative such that the produced network output comprises a response metric associated with a responding neuron of the neural network, the response metric representing a smallest distance value for the responding neuron. The smallest distance will have been determined for the responding neuron, and all other neurons, upon presentation of classification inputs corresponding to a set of training data to the neural network.

Alternatively or in addition, at least one neural network may be operative such that the produced network output comprises an activation frequency metric associated with a responding neuron of the neural network. The activation frequency metric may represent a number of times that the responding neuron has been determined as being a winning neuron upon presentation of a set of training data to the neural network.

Alternatively or in addition, the produced network output may comprise a reinforced metric, the reinforced metric being based on a combination of a distance metric and a response metric. More specifically, the reinforced metric may be the sum of the distance metric and the response metric. Alternatively or in addition, where at least one neural network is comprised in an SOM neural network arrangement the at least one neural network may be operative such that the produced network output comprises at least one of a relative array reference and a weight metric of a responding neuron of the neural network.

At least one of the distance metric, the response metric and the activation frequency metric may have been predetermined as a consequence of a training procedure.

Alternatively or in addition, the classification processor may be operative to determine the different, predetermined classification of the plurality of different, predetermined classifications to which the classification input belongs in dependence on at least one metric associated with a responding neuron of each neural network.

More specifically, the data classification apparatus may be operative to determine the different, predetermined classification in dependence upon at least one of: a correlation; a comparison with a threshold value; and an ANN classifier.

Where the determination is in dependence upon a correlation, the correlation may comprise at least one of correlating the at least one metric with at least one metric associated with a responding neuron of at least one further neural network. More specifically, the correlation may comprise at least one of: division of respective metrics to determine a ratio; and full cross correlation of respective metrics.

Alternatively or in addition, the different, predetermined classification may be determined in dependence upon a committee decision based upon cross- correlation.

In one form of the invention, the data classification apparatus may be operative to provide a confidence level output indicative of a level of confidence with which the classification input belongs to at least one of the plurality of different, predetermined classifications.

More specifically, the confidence level output may be indicative of a level of confidence with which the classification input belongs to the different, predetermined classification that the classification processor has determined the classification input to belong.

A confidence level output can be advantageous when the data classification apparatus is operating on data to be classified in high noise environments, e.g. data acquired from an automobile engine. For example, where the data classification apparatus comprises two neural networks, one for a misfire classification and the other for a normal combustion classification, the apparatus can provide a confidence level for each classification, such as an 80% confidence level that the data to be classified is a misfire and a 20% confidence level that the data to be classified is normal combustion. Thus, upon making its determination the classification processor can determine that the data to be classified belongs the misfire classification with an 80% confidence level .

Alternatively or in addition, the confidence level output may be determined in dependence upon a statistical measure of a metric associated with a responding neuron of each neural network, the statistical measure indicating a likelihood of a given value for the metric being obtained.

The metric may comprise at least one of: a distance metric, a response metric, an activation frequency metric and a reinforced metric.

Alternatively or in addition, the data classification apparatus may be operative such that a metric associated with a responding neuron of each neural network is compared with the statistical measure.

The data classification apparatus may be configured such that the statistical measure of the at least one metric, e.g. a reinforced metric, is stored in the data classification apparatus as a mean and a variance. Thus, a statistical measure having a form of a normal distribution is characterised by the mean and the variance.

In a second form of the present invention, at least one neural network of the plurality of neural networks may comprise a first neural network and a second neural network. More specifically, the first neural network may have a response characteristic corresponding to a first part of a classification and the second neural network may have a response characteristic corresponding to a second, different part of the classification, each of the first and the second neural networks being operative to produce a network output in dependence upon its response characteristic.

Alternatively or in addition, the first neural network may have a response characteristic configured to respond to a classification input that belongs to first set of data and the second neural network may have a response characteristic configured to respond to a classification input that belongs to a second, different set of data, one of the first and the second neural networks being operative to produce a network output in dependence upon a classification input belonging to either the first or the second set of data.

Alternatively or in addition, the at least one neural network may further comprise a secondary classification processor operative to receive a network output from at least one of the first and second neural networks.

Where the first neural network has a response characteristic corresponding to a first part of a classification and the second neural network has a response characteristic corresponding to a second, different part of the classification, the secondary classification processor may be operative to determine to which of the network outputs produced by the first and second neural networks the classification input belongs. Where the first neural network has a response characteristic configured to respond to a classification input that belongs to first set of data and the second neural network has a response characteristic configured to respond to a classification input that belongs to a second, different set of data, the secondary classification processor may be operative to receive a network output from one or other of the first and second neural networks and to convey the received network output to the classification processor.

In a third form of the present invention, where the classification processor is further operative to determine that the classification input belongs to none of the plurality of different predetermined classifications, said classification input constituting a rejected classification input, the data classification apparatus may be re-configured to comprise a further neural network having a response characteristic corresponding to a further classification to which the rejected classification input belongs, the further neural network being operative to produce a network output in dependence on its response characteristic and a further classification input.

Thus, a rejected classification input may be used to create a fresh classification in the form of a further neural network having a response characteristic corresponding to the fresh classification. The rejection of the classification input may thus represent a change, for example, in a condition of apparatus, such as an engine, from which data to be classified is acquired. More specifically, the data classification apparatus may be operative to re-configure itself as comprising the further neural network in dependence upon a determination by the classification processor that the classification input belongs to none of the plurality of different predetermined classifications. Thus, the data classification apparatus may be operative to re-configure itself during use, e.g. by a user in the field.

Alternatively or in addition, the data classification apparatus may be configured to comprise the further neural network as part of a re-training process, such as may be carried out at a service location.

The data processor may be configured to receive the data to be classified and to be operative to produce a classification input corresponding to the data to be classified.

The data classification apparatus may be configured to convey the classification input to the classification processor .

Alternatively or in addition, the data processor may be further operative to receive the data to be classified and to perform at least one task of: validation of the data to be classified; conditioning of the data to be classified; parameter reduction; and classification input scaling.

More specifically, the data classification apparatus may be configured to convey a result of performing the said at least one task from the data processor to the classification processor.

Alternatively or in addition, the data to be classified may comprise at least one of digital data and analogue data.

Alternatively or in addition, at least one of the neural networks may have a plurality of neurons and a function processor, the function processor being operable to receive the network output from a neuron of the neural network and to provide a processor output in dependence upon the received output and a trained response characteristic of the . function processor. Thus, the processor output constitutes a network output that the classification processor is operable to receive.

More specifically, the at least one neural network may comprise a plurality of function processors.

More specifically, the at least one neural network may comprise fewer function processors than neurons in the neural network. Therefore, the at least one neural network may comprise a plurality of function processors, each of the function processors being operable to receive outputs from a plurality of (e.g. four) neurons. Thus, sets of weights of reference vectors of the plurality of neurons may be stored in the associated function processor and the at least one neural network may be operative to select, for use, a corresponding one of the sets of weights. The selection may be in dependence upon operation of one of the plurality of neurons. For example, the selection may be by means of a pointer to one of the sets of weights.

Alternatively, the at least one neural network may comprise at least a same number of function processors as neurons in the neural network, with a function processor being operative to receive an output from a respective neuron of the neural network. Thus, a set of weights of a reference vector of a neuron may be stored in the associated function processor.

In another form, the at least one neural network apparatus may comprise one function processor operable to receive an output from each of the plurality of neurons in the neural network. Thus, sets of weights for the function processor may be stored in the data classification apparatus, and the function processor may be operative, in use, to receive a set of weights corresponding to an operative one of the plurality of neurons.

Alternatively or in addition, the at least one neural network may be configured such that the network output from the one neuron is received in a neighbouring function processor, the neighbouring function processor being operative to provide a neighbourhood processor output. In use, the processor output and neighbourhood processor output may be used to provide for an improvement in approximation accuracy towards a transition between the subspaces of the neighbouring function processors. An overall response characteristic of the at least one neural network may correspond to a function that defines a particular classification.

More specifically, the neural network may be operative to provide a first approximation to the function. Thus, the at least one function processor may be operative to provide an improved approximation to the function in relation to the first approximation and in a subspace of the function associated with the neuron that provides an output to the function processor.

Alternatively or in addition, the trained response characteristic of the function processor may comprise a numerical function. For example, the numerical function may be a linear polynomial. Thus, the trained response characteristic of the function processor, which defines a part of the function defined by an overall response characteristic of the neural network and the function processor, can be simple in comparison to the function defined by the overall response characteristic. Hence, complicated functions can be accommodated by means of the neural network and function processor structure whilst reducing processing demands.

Alternatively or in addition, the at least one function processor may comprise at least one perceptron of a further neural network.

The present invention may be applied in the fields of machine learning, artificial intelligence, neural networks, and Self Organising Maps/Networks . The present invention may be at least partially embodied in hardware, such as in embedded Digital Signal Processors (DSPs) and silicon solutions.

According to a second aspect of the present invention there is provided a method of classifying data, the method comprising: receiving a classification input, which corresponds to data to be classified, in each of a plurality of neural networks, each neural network of the plurality of neural networks having a different response characteristic corresponding to a different, predetermined classification; operating each neural network to produce a network output in dependence upon the neural network's response characteristic and the received classification input; receiving a network output from at least one neural network of the plurality of neural networks in a classification processor; and operating the classification processor in dependence upon the received network output to determine if the classification input belongs to at least one of the plurality of different, predetermined classifications.

Embodiments of the second aspect of the present invention may comprise one or more features of the first aspect of the present invention.

According to a third aspect of the present invention, there is provided a computer program comprising program instructions for causing a computer to perform the method of the second aspect of the present invention. More specifically, the computer program may be at least one of: embodied on at least one of a record medium, stored in a computer memory, embodied in a read-only memory; and carried on an electrical signal.

Further embodiments of the third aspect of the present invention may comprise one or more features of the first aspect of the present invention.

According to a fourth aspect of the present invention there is provided condition monitoring apparatus comprising data classification apparatus according to the first aspect of the present invention.

More specifically, the condition monitoring apparatus may further comprise at least one sensor operative to provide data to be classified to the data classification apparatus.

Alternatively or in addition, the condition monitoring apparatus may further comprise an actuator operable to control further apparatus in dependence on operation of the data classification apparatus.

Alternatively or in addition, the condition monitoring apparatus may further comprise a user output operative to provide a signal discernable by a user in dependence on operation of the data classification apparatus.

Further embodiments of the fourth aspect of the present invention may comprise one or more features of the first aspect of the present invention. According to a fifth aspect of the present invention there is provided an automobile comprising data classification apparatus according to the first or fourth aspect of the present invention.

More specifically, the data classification apparatus may be operative to perform at least one of: misfire detection; emissions monitoring; and catalyser performance monitoring.

According to a sixth aspect of the present invention, there is provided a method of training a data classification apparatus, the method comprising: presenting training data to each of a plurality of neural networks of the data classification apparatus, training data presented to each neural network consisting substantially of data belonging to a different, predetermined classification, whereby each neural network has a different response characteristic corresponding to the different, predetermined classification; each neural network being operable to produce a network output in dependence upon a received classification input, which corresponds to data to be classified, and the neural network's response characteristic, the apparatus comprising a classification processor operable to receive- a network output from at least one neural network and to determine, in dependence upon the received network output, if the classification input belongs to at least one of the plurality of different, predetermined classifications.

More specifically, presenting training data to each of a plurality of neural networks of the data classification apparatus may change a reference vector of neurons in each neural network.

Alternatively or in addition, the method may further comprise after the step of presenting training data to each of a plurality of neural networks the step of presenting further training data to each of the plurality of neural networks.

More specifically, a reference vector of neurons in each neural network is not changed in dependence upon the received further training data.

Alternatively or in addition, the method may further comprise the step of determining a metric in dependence upon the received further training data.

More specifically, the metric may comprise at least one of distance metrics for each neuron, a minimum distance metric for each neuron and an activation frequency metric for each neuron.

Alternatively or in addition, the method may comprise the step of determining a probability distribution for a metric determined in dependence upon the received further training data.

Alternatively or in addition, at least one neural network may have a plurality of neurons and at least one function processor, the function processor being operable to receive an output from at least one of the plurality of neurons and to provide a processor output in dependence upon the received output, and the method may further comprise: receiving a first set of training data in the neural network, the neural network being operative to adopt a trained response characteristic in dependence upon the received first set of training data, and receiving a second set of training data in the function processor, the function processor being operative to adopt a trained response characteristic in dependence upon the received second set of training data, in which the function processor is operative to adopt its trained response characteristic after the neural network is operative to adopt its trained response characteristic.

Thus, the at least one neural network may have what can be considered to be a two layer structure, with the first layer comprising the neural network itself and the second layer comprising the function processor. An increase in accuracy may be obtained by the function processor providing a further approximation within a subspace (of the total state space) associated with at least one neuron of the neural network.

The method of training a neural network according to the two immediately preceding paragraphs may take advantage of the neural network - function processor architecture by training the neural network on a first set of training data and thereafter training the function processor on a second set of training data. Thus the two training stages can have independent dynamics. This means that more rapid convergence can be obtained during training compared with an approach in which the neural network and function processor are trained at the same time. More specifically, the second set of training data may be received in the function processor after the first set of training data is received in the neural network.

Alternatively or in addition, the first set of training data may be different from the second set of training data. Thus, data contained in the first and second sets may be determined to provide for at least one of: an improved rate of convergence during training; and an improvement in a degree of accuracy of a function approximated by the neural network.

More specifically, the second set of training data may be a subset of the first set of training data. For example, the second set of training data may comprise data of the first set of training data, which is associated with a subspace of the neuron from which the function processor is operative to receive an output. Thus, the second set of training data may be determined in dependence upon the first set of training data.

Alternatively or in addition, the method may further comprise a step of receiving a third set of training data in the function processor, the function processor being operative to modify its trained response characteristic in dependence upon the received third set of training data.

More specifically, the third set of training data may comprise at least one data element not comprised in the second set of training data. More specifically, the at least one data element not comprised in the second set of training data may be determined based on an analysis of the trained response characteristic adopted in dependence upon the received second set of training data. For example, where the analysis determines that the response characteristic is based upon insufficient data elements to properly characterise a function, further appropriate data elements may be determined and be comprised in the third data set.

Alternatively or in addition, the at least one data element not comprised in the second set of training data may be determined based on a response characteristic of at least one further function processor associated with at least one neuron neighbouring the neuron from which the output is received by the function processor. Thus, the content of the third data set can be determined to reduce a discontinuity that may be present in a transition between the subspace associated with the neuron from which the output is received by the function processor and at least one neighbouring subspace.

Alternatively or in addition, the neural network may comprise a plurality of function processors, each of the plurality of function processors being operable to receive an output from a respective neuron of the neural network.

In a first form, the neural network may comprise at least a same number of function processors as neurons in the neural network, with each of the function processors being operative to receive an output from a respective neuron of the neural network. Thus, a set of weights of a reference vector of a neuron may be stored in its associated function processor.

In a second form, the neural network may comprise a plurality of function processors, each of the function processors being operable to receive outputs from a plurality but not all of the neurons of the neural network (e.g. four) neurons. Thus, sets of weights of reference vectors of the plurality of neurons may be stored in the associated function processor and the neural network may be operative to select, for use, a corresponding one of the sets of weights. The selection may be in dependence upon selection of one of the plurality of neurons, i.e. operation of the neural network that determines the so-called "winning" neuron. For example, the selection may be by means of a so-called "pointer" , which is a form of software or firmware function, to one of the sets of weights.

In a third form, the neural network may comprise one function processor operable to receive an output from each of the plurality of neurons in the neural network. Thus, sets of weights for the function processor may be stored in the neural network, and the function processor may be operative, in use, to receive a set of weights corresponding to an operative one of the plurality of neurons.

Alternatively or in addition, the neural network may be operative such that a location of an input to the neural network within a subspace associated with a neuron is passed to the function processor. Alternatively or in addition, the trained response characteristic of the function processor may comprise a numerical function. For example, the numerical function may be a linear polynomial. Thus, the trained response characteristic of the function processor, which defines a part of the function defined by an overall response characteristic of the neural network and the function processor, can be simple in comparison to the function defined by the overall response characteristic. Hence, complicated functions can be accommodated by the neural network - function processor architecture whilst reducing processing demands.

Alternatively or in addition, the at least one function processor may comprise at least one perceptron of a further neural network .

Further embodiments of the sixth aspect of the present invention may comprise one or more features of the first to fifth aspects of the present invention.

According to a further aspect of the present invention there is provided data classification apparatus comprising: a data processor operative to present a classification input, which corresponds to data to be classified, to an unsupervised neural network having a response characteristic corresponding to a predetermined classification, the neural network being operative to produce a network output in dependence upon the neural network's response characteristic and the received classification input; and a classification processor operative to receive a network output from the neural network and, in dependence upon the received network output, to determine if the classification input belongs to the predetermined classification, the data classification apparatus being operative to provide a confidence level output indicative of a level of confidence with which the classification input belongs or does not belong to the predetermined classification.

More specifically, the unsupervised neural network may be an SOM neural network.

Alternatively or in addition, the data classification apparatus may comprise a plurality of neural networks and the data processor may be operative to present the classification input to each of the plurality of neural networks, each neural network of the plurality of neural networks having a different response characteristic corresponding to a different, predetermined classification, each neural network being operative to produce a network output in dependence upon the neural network's response characteristic and the received classification input; and the classification processor being operative to receive a network output from each neural network and, in dependence upon the received network outputs, to determine if the classification input belongs to at least one of the plurality of different, predetermined classifications.

Further embodiments of the further aspect of the present invention may comprise one or more features of the first to sixth aspects of the present invention. According to a further aspect of the present invention, there is provided apparatus for classifying data into one of a plurality of output categories, the apparatus comprising a neural network system adapted to receive an input and associate that input with an output category, wherein the neural network system is a self-organising neural network, and the apparatus is further adapted to provide an output indicative of a level of confidence to which the input is associated with the output category.

According to a yet further aspect of the present invention, there is provided a method of classifying data into one of a plurality of output categories, the method comprising the steps of: presenting input data to a neural network system; calculating a first metric for the input data; associating the input data with an output category; and comparing the first metric for input data with stored metrics to generate a confidence measure indicative of a level of confidence to which the input is associated with the output category.

According to a yet further aspect of the present invention, there is provided a method of using a self- organising neural network system, the method comprising the steps of: providing a neural network system trained on a set of training data; deriving a probability distribution of a first metric using the training data; presenting a feature vector to the neural network system; calculating the first metric for the feature vector; and comparing the first metric for the feature vector with the probability distribution to provide an indication of the similarity. According to a yet further aspect of the present invention, there is provided a method of configuring a neural network system, the method comprising the steps of: providing a neural network system trained on a set of training data; presenting the training data to the trained neural network in a non-adaptive mode; calculating at least one metric from the training data; and storing the at least one metric for later use in a classification or change detection method implemented by the neural network system.

There will now be described, by way of example only, various embodiments of the invention with reference to the following drawings, of which:

Figure 1 is a block diagram showing schematically a system in accordance with the first embodiment of the invention;

Figure 2 is a block diagram representing a method of configuring the system according to an embodiment of the invention;

Figure 3 is a block diagram representing a method of operation of the system of Figure 1;

Figure 4 is a graphical representation of the probability distributions of reinforced metrics for each category in a two-category example;

Figure 5 is a block diagram showing schematically a system according to an alternative embodiment of the invention, generally depicted at 60 ; Figure 6 is a graphical representation of the probability distributions of reinforced metrics for each category in the example application to misfire detection;

Figure 7 is a schematic representation of components of an embodiment of the invention;

Figure 8 is a block diagram showing steps forming part of a method according to an embodiment of the invention; and

Figure 9 is a representation of a two-dimensional input space with subspaces associated with processing elements.

Figure 1 is a block diagram showing schematically a system in accordance with the first embodiment of the invention, shown generally at 10. The system 10 provides a one-of-N classification of an input vector (which constitutes a classification input) and an associated confidence metric, which is a measure of the significance to which the classified input is associated to the output category.

The system 10 comprises an input module 11 for receiving input data into the system. The input module 11 is selected according to the application of the system, and functions to provide the input to the pre-processing module 13 (which constitutes a data processor) . The input could be from electronic sensor measurements (either digital or analogue) or other parameters in electronic form such as open/close state indicators or manually recorded observations which have been transcribed or converted to machine readable format . The input module 11 passes the data to pre-processing module 13 which generates a feature vector by conditioning the input data and extract relevant features for presentation to the artificial neural network (ANN) layer 14.

The pre-processing module 13 comprises the means for carrying out a number of processing tasks necessary before presentation to the particular neural network implementation adopted. These tasks include data validation and conditioning, parameter reduction, feature calculation and feature vector scaling. The primary function of the pre-processing module 13 is to distribute, to each ANN 15 in the neural network layer 14, a copy of the feature vector to be classified. The pre-processing module 13 also functions to provide a copy of the feature vector direct to the post-processing module 17, along with other features and parameters required by the post-processing module 13.

The ANN layer 14 comprises N neural networks 15 (ANNi to ANN_N in Figure 1) . The value N will typically be at least equal to the number of output categories for the particular classification problem to be solved by the system. Each network 15 is trained on a subset of available training data according to a particular category.

Each ANNi to ANN_N is a self-organising neural network (which could alternatively be described as a clustering, competitive or unsupervised neural network type architecture) , and in a preferred embodiment is the modular map neural network architecture described in WOOO/45333 Al.

The system also comprises a post-processing module 17 (or arbitration module) adapted to receive an output from the neural network layer 14 along with the feature vector and other data direct from the pre-processing module 13. The post-processing module is a sophisticated decision-making mechanism that embodies a number of techniques, paradigms and processes. The post-processing module 17 (which constitutes a classification processor) functions to perform the necessary processing to provide the classification output 51 and an output of confidence measure 52.

All input data and features, plus the pattern of all metrics from all networks are provided to the post- processing module in order to carry out the decision processing. This is discussed in further detail below.

Figure 2 is a block diagram representing a method 20 of configuring the system according to an embodiment of the invention. The method' 20 comprises two separate parts, described here "training mode" 21 and a "classification mode" 26. In training mode 21, training data is input 22 to each AWN 15. Each ANN 15 is trained on data examples of an individual category (or a subset of each category) . The reference vectors of neurons in each ANN are updated 23 to provide a trained ANN.

On completion of the training mode 21, the system is switched 24 to a "classification mode", in which the reference vectors of the ANNs are not adapted or updated. In this mode, the training data for each ANN are again presented to the respective ANNs, and for each input vector a distance metric will be calculated. The smallest distance value (or the "minimum distance metric") obtained from any given neuron on presentation of the entirety of the input vectors from the training data during the classification mode is referred to here as a "response metric". The response metric of each neuron in each network is a single value associated with that neuron. The response metric can be described as an offset value for each neuron with respect to its relative similarity to the entire training data.

In addition, the configuration method includes the step 28 of determining an "activation frequency" of the responses of each neuron to the training data. The activation frequency is the number of times each neuron is determined as being the winning neuron on presentation of all training data while in the classification mode. This information can be used to augment the response of the respective network when an output classification decision is made, and is particularly useful when a neuron in a network responds to disproportionately few, or none of, the training data. This is useful in real life applications where there is an imbalance between different categories of input data.

The response metric of the winning neuron and the output distance metric are combined to provide a "reinforced metric". Typically the distance metric and the response metric will be summed. All training data are presented to the respective network and the reinforced metrics are obtained and stored to provide a probability distribution of the reinforced metrics . The probability distributions for a two category example are shown diagrammaticalIy as normal distribution curves in Figure 4.

The mean and standard deviation of the reinforced metrics distributions are calculated such that each classification category, i.e. each network will have a respective pair of values.

At . this point, the first arbitration of a given test example can be achieved. The test input vector is applied to all AWNs and reinforced metric for the winning neuron/processing element in each ANN is calculated. These are then converted to z-valu.es (a variance measure in units of standard deviations) for each respective ANN by subtracting its stored distribution mean and dividing this result by the stored distribution standard deviation. The smallest of these z-values can be used to supplement the output classification decision.

Using "normal distribution" statistics, further metrics can be calculated that provide a number of confidence measures, which can be utilised in isolation or as a pattern to present the confidence output measure. For example, the data can be used for significance or hypothesis testing.

The above can be applied to the reinforced metric directly, or to the correlated metrics as described in more detail below. The system also carries out processing to establish a threshold value for use in later classification of input data. The threshold value is derived from the correlations of training data, and is selected to enable effective separation and to minimise errors in the classification of the training data. The threshold value can be a variable or a fixed value. This is described in more detail below in relation to the application example.

Figure 3 is a block diagram representing a method of operation of the system of Figure 1.

The method, generally depicted at 30 comprises the initial step of inputting 31 input data to the pre- processing module 13. The input signals could be electronic sensor measurements in digital or analogue form or other parameters that have been converted to machine readable form.

Within the pre-processing module 13, the step of conditioning 32 the input data is carried out. The pre- processing module 13 carries out any conditioning required for the particular application and neural network implementation, extracting relevant features from the input vector, carrying out data validation, parameter reduction, feature calculation and feature vector scaling. Copies of the feature vector are then presented 33 to all of the neural networks 14 in the neural network layer 15. The pre-processing module 13 also presents 34 the feature vector direct to the post-processing module 17, along with other data extracted from the input vector.

Each network receives a copy of the feature vector and generates an output distance metric and a response metric. For each feature vector presented for classification, the responding ("winning") neuron provides its distance metric, activation frequency value and response metric as outputs. Any additional parameters available from the particular ANN architecture employed are also output to the post-processing module (steps 35, 36, 37, 38 respectively) . In the case of an SOM network, these additional parameters could comprise the relative array co-ordinates of the winning neuron and its reference vector.

The response metric and the output distance metric are combined 39 to provide a reinforced metric, and the reinforced metric, along with other metrics where available are presented 40 to the post-processing module 17. Typically the distance metric and the response metric will be summed. The post-processing module receives all available output parameters and metrics from all ANNs 15 and all available parameters from the pre- processing module.

The post-processing module 13 (or arbitration module 13) compares the reinforced metrics output from each ANN and determines the "winning" ANN. The category on which the winning ANN was trained is output as 51. Thus the output 51 is a decision as to which one of N categories the input vector belongs, and the output 52 is a statistical measure of the significance of the decision.

The nature of the arbitration processing 41 is dependent upon the structure of the components in the ANN layer 14 of Figures 1. It could involve a committee decision based upon the calculated cross-correlations of reinforced metrics, a simple threshold comparison, an ANN classifier, or a mixture of all these, utilising the metrics and a selection of the available parameters that have been passed to the post-processing module. The nature of the additional parameters is specific to the application and problem.

In the event there is no clear winning ANN, and if further arbitration by the use of other parameters is not possible, a default output is applied which is appropriate for the particular application.

The simplest method of arbitration is for the system to classify the input vector according to the smallest reinforced metric value output by the networks 15. The smallest reinforced metric value indicates that the input vector is more similar to one category than another, and thus this closest category is output by the system at 51. However, it is a feature of the invention to use the probability distributions generated from the input of the training data to provide additional information.

An alternative arbitration technique is based on the correlation and analysis of the reinforced metrics. The distribution of the reinforced metrics for a particular category or network is known from the presentation of the training data during the classification mode. The reinforced metric of the current input vector is compared to the distribution and expressed in terms of z-values for that category. Thus, the reinforced metric of the current input feature vector is compared to the probability distribution for that ANN'S category.

The classification decision can therefore be made by a comparison of the z-values for the different categories, with the category having the smallest absolute z-value being selected as the output category.

The z-value, being a measure of similarity of the reinforced metric to the distribution is determined and output 52 to give a measure of the significance associated with that classification decision.

It will be appreciated that the probability distributions produced for the various categories can be used for statistical analysis and hypothesis testing in a number of ways according to the particular application.

In a further arbitration technique, the reinforced metrics of all categories are correlated and analysed. Analysis of these may be as simple as their division to obtain a ratio, or (but not limited to) , a full cross correlation between all categories.

Two categories are typically correlated as follows. The reinforced metric for an input feature vector from a first network is calculated, and divided by the reinforced metric produced by a second network. The calculated ratio gives a value for the pair of categories, termed the ratio metric.

The ratio metric can be used to directly determine the output classification decision, since whether the ratio is greater than or less than 1 indicates which of the reinforced metrics has the smallest value.

However, in an alternative arbitration technique, the values can be compared to a ratio metric distribution. The ratio metric distribution is calculated by presenting an input vector from the training data to a pair of categories. The resulting pair of reinforced metrics is divided to create a ratio metric value. This process is repeated for each input data in the training set to create two distributions of ratio metrics for each given category. The standard deviations and means are calculated, and this information is used to represent the ratio metric of the current input feature vector in terms of z-values of the overall ratio metric distributions for each category. The lower of the two absolute values is used for the output classification decision, and the z- value also provides an indication of confidence in the classification decision.

Where more than two categories exist, multiple categories are correlated by the pair-wise correlation described above, with ^λλ losing" categories eliminated from the arbitration decision.

For cases where the categories represent an ordered parameter, for example, weights, distances, temperatures, speed bands or ranges, the raw "reinforced metrics" produced by the multiple networks provide a pattern of metric values which will tend to increase radiating away from - the "winning network", . By comparing the "reinforced metrics" for adjacent categories, this pattern provides additional decision making information. This information can be used to make the classification decision, particularly where the number of categories is much greater than one, or in other cases can be used to support the other metrics and arbitration.

The activation value, which may be represented as a percentage of the number of total activations (i.e. as a relative term) may also be used in the arbitration decision. Other arbitration techniques are not excluded from the invention.

Note that although the above-described examples produce confidence measures for systems with a plurality of networks, a comparison of a reinforced metric with a probability distribution of that network can be made even where only a single network is present in the structure.

One application of the single network system is as a novelty filter or for change detection. In this example, the training data will be presented to a single network in its classification mode to create a probability distribution of the reinforced metric. The standard deviation and mean are calculated, and the reinforced metrics of each input feature vector are compared to the distribution. This system is used to provide an output which indicates how similar the feature vector is to the data on which the network has been trained. Where the reinforced metric of the feature vector is outside of a selected threshold, e.g. has a z-value greater than 3, the system indicates that the input vector is dissimilar enough to the training data (i.e. what has been seen by the system before) to be considered data representing a different or new type of event.

Figure 5 is a block diagram showing schematically a system according to an alternative embodiment of the invention, generally depicted at 60. This embodiment is similar to the embodiment of Figure 1, although differs in the neural network and post-processing structure.

As before, the input vector is provided to the pre- processing module, which generates a feature vector for presentation to all of the ANN'S 65 in the neural network layer 64. In this example, the neural network structure includes nested JiMS!' s, which increases the number of ANN'S above that of the number of categories for the classification problem.

In the example of Figure 5, the nested structure of the neural network layer includes ANNi 66 comprising a pair of ANN'S, 67a and 67b. ANNi 66 also comprises a local post- processing module 68 providing output to the main post- processing module 69.

By providing a nested structure, the system 60 is capable of resolving the problem domain for one classification category, or alternatively allows the category to be resolved into sub-categories.

The following describes the application of an embodiment of the invention to a two-category problem, namely distinguishing misfire events from normal combustion events in a multi-cylinder automobile engine.

In this embodiment, the system is similar to that described with reference to Figure 1. The system consists of two ANNs 15 in the ANN layer 14. ANNi is trained to recognise normal combustion events, and ANN₂ is trained to recognise misfire events.

In this specific application, one class of event (namely, normal combustion) has significantly more examples available than the other (misfire events) . Typically the ratio of normal combustion events to misfire events will be much greater than 4:1 for induced misfire data collection. In normal operation, the ratio is in the order of approximately 100:1 or greater.

The detection of misfire events is presented as a specific example whereby the classification errors for one category are more sensitive than for the other, and hence need to be specifically minimised. That is, the classification errors for normal combustion events need to be minimised, as opposed to minimising the overall quantity of errors for the categories combined.

The system is configured by presenting both sets of training data (i.e. the combined training data) to both ANNs, such that the distance metric and response metric parameters are obtained.

The reinforced metrics are then calculated for each vector in the combined training data, and their "reinforced metric ratios" are calculated by dividing one reinforced metric by the other. In this case, the reinforced metric for the normal combustion event is divided by the reinforced metric for the misfire event. The log to base 10 of the resultant ratios ("log ratio") is then calculated. For each event type, the respective "log ratio" distribution mean and standard deviation are calculated.

For the "log ratio" distributions arbitration, the z- value of the "log ratio" with respect to each class distribution is first calculated:

z-value_normai = ("log ratio" value - mean_normai) / standard deviationnormai

z-value_mi_Sfi_re = ( " log ratio " value - mean_mi_Sfi_re) / standard deviation_mi_sfire

A misfire decision is output when:

abs ( z-value_mi_Sfire) < abs ( z- value_normai)

where abs = absolute value. Hence, in Figure 6, the line (element 71 in Figure 6) is the mid-point between the two distributions in z-value units (is equidistant from the distribution means in these units) .

The z-value is a measure in units of standard deviations from the mean of the distributions of the training data, hence the z-value alone provides a measure of similarity to the training data distribution, and is output for the respective classification as its confidence measure. Additional statistical analysis and hypothesis testing can be applied and parameters inferred from the values calculated.

The "baseline classification" is the number of errors when the decision boundary is at the mid-overlap point between the classes, as illustrated by element 71 in Figure 6, this is the optimal total errors decision point at which the minimum errors for each class exist, any movement in this decision point will reduce the errors in one class to the detriment of the other.

In practice, a particular application may have one "optimised" category for which errors are required to be minimised. That is, there will be a requirement to correctly classify as many of the events in that category as possible, even if this increases the error in the other category. The error rate considered to be acceptable for the optimised category will be known, and using normal distribution statistics, the corresponding z-value will be determined. The z-value provides the threshold for the classification decision.

For the "reinforced metric ratio" arbitration, the decision point is where this value equals one, and thus when classifying an input to the system, a misfire decision is output when its calculated "reinforced metric ratio" is greater than a threshold of one.

As explained above, the application to engine misfire detection is sensitive to normal combustion classification errors (or "alpha errors") and hence it is necessary to minimise these "alpha errors". Undertaking this, however, will be detrimental to the misfire event errors (or "beta errors"), but in a practical misfire detection application the solution can have a higher tolerance to errors of this latter type.

For the "reinforced metric ratio" arbitration, it is necessary to define a threshold value (decision point) greater than one. This is because, in this example, the "reinforced metric ratio" is a division of the normal combustion "reinforced metric" by the misfire event "reinforced metric", otherwise the opposite would apply. To achieve the best classification rates, the threshold would be non-linear (or variable) in nature (and can be implemented as a gradient, a curve, an Nth order polynomial, a neural network, etc.). The threshold could be dependent on an external variable, such as engine speed. However, it can simply be increased as a fixed, linear value until an appropriate rate is achieved.

For the "log ratio" distributions arbitration, a positive z-value_nOrmai value (i.e. a positive standard deviation value) is chosen (which is greater than that shown as element 1 in Figure 6) , and is converted back to a "log ratio" value by the inverse of the z-value calculation:

"log ratio" value = (z-value_normai * standard deviation_normai) + mean_πθrinai

The point shown element 71 in Figure 6 is dependent upon the amount of overlap between the two distributions. Unless perfect classification can be achieved, this will ^■ need to be a value in the order of +3 (= +3 standard deviations, a point at which, for a completely symmetric normal distribution, would exclude 0.135% of the distribution, and thus give an error rate of 0.135%, which is an acceptable error rate for this application) . The setting of this value could be automated.

In the present example, the conversion of a z-value_normai of, say, +3 to a "log ratio threshold" value would now be the fixed value (decision point) for classification arbitration. Thus, if the "log ratio" value of the input example is greater than this "log ratio threshold", the output is a misfire decision. (This is in the case in which the normal ratio values have been divided by misfire values, otherwise the reverse would apply).

As previously, this "log ratio threshold" could be more complex, for example by utilising ANNS to further improve the decision by a non-fixed, non-linear means. However, experimentation has shown equivalent rates of classification per category are obtained for a fixed "log ratio threshold", when compared to a linear gradient "reinforced metric ratio" . In addition, as discussed previously, each input example is classified, and also accompanied by a z-value statistic to provide a confidence measure.

The above-described application example does not incorporate all features of the proposed invention, however it does provide validation of the technique for a two-class problem, and of the distribution statistics analysis and confidence measures techniques. In addition, the description provides a comparative study of classification arbitration based upon the ratio between the "reinforced metrics" from each ANN, with the correlation of the "reinforced metric" to each event's distributions (also illustrating the confidence value obtained by the latter technique) .

Figures 7 to 9 provide illustration of an embodiment of the present invention in which at least one of the neural networks 14, 65, 66 has the architecture shown in Figure 7.

Figure 7 shows a schematic representation of components of the neural network architecture. The system, generally depicted at 110, is a two-layered neural network, where data are passed sequentially from the first layer to the second layer. The first layer is referred to as the selector layer 112, and the second layer as the estimator layer 116.

The selector layer 112 comprises a neural network 113 consisting of a plurality of processing elements or neurons 114. The neural network 113 is, in this example, a neural network modular map using a modified Kohonen SOM, of the type described in WO 00/45333.

The primary function of the selector layer is to determine which region of the input space an input vector belongs to. It can also be used for extracting additional inforination, described in more detail below.

The estimator layer 116 comprises a plurality of numerical estimators 118, which are in this example perceptron processing elements of a second neural network. The numerical estimator provides a single numerical output 140 for a multi-dimensional input vector, such as a polynomial of first, second or higher order or a sum of sigmoid functions. The numerical estimator 118 will normally be characterised by a set of coefficients, often called weights in neural network terminology. Each numerical estimator 118 is associated with a processing element 114 of the selector layer 112.

The neural network 113 is trained according to the normal method on training data representing the state space of the function to be estimated, and each processing element in a trained network will have an associated reference vector. The reference vector will be of the same dimension as input vectors 122 presented to the system.

The estimator layer 116 is trained using a data set identical or similar to the data set used to train the selector layer, and is provided with associated actual numerical values for each input vector of the training data. The numerical estimator is, for example, trained using an optimising technique, where the numerical estimator coefficients are optimised so that they minimise the errors between the actual numerical values and the values calculated by the numerical estimator from the input vector. The errors can be evaluated using a merit function, such as a Root Mean Square (RMS) error estimate. Further details of the training of the estimator layer 116 are given below.

Figure 8 is a block diagram representing steps of the method carried out in the selector layer 112 and the estimator layer 116. Initially, the trained selector layer 112 is presented with an input vector 122. The input vector 122 is compared to the reference vectors of all the processing elements in this layer, according to the algorithm implemented in the neural network modular map 113. The reference vector which is most similar to the input vector 122 is selected, and the processing element with which this reference vector is associated is identified (step 124) as the winning processing element 115.

Each processing element 114 will be the winning processing element for a subset of input vectors from the set of possible input vectors. Each processing element 114 may thus be associated with a localised subspace within the multidimensional hyperspace spanned by the set of possible input vectors. This subspace will contain the reference vector of the processing element 114. This is an inherent property of modular map networks and related neural network architectures such as the SOM and LVQ architectures.

Figure 9 is a graphical representation of a two- dimensional input space, generally depicted at 130. Reference vectors for the individual processing elements are shown as points 131, while the area (which in the general, higher dimensional case is a subspace) associated with each processing element is shown as an irregular polygon 132.

For any input vector, there will be a responding processing element with an associated subspace of the entire input space. In this technique, the selector layer 112 of the system is used to determine which subspace an input vector 122 is associated with.

When the associated subspace has been identified, the location of the input vector within that subspace is determined (step 126) . The location of the input vector 122 within the localised subspace can either be represented relative to the reference vector of the processing element 115 associated with the subspace, or relative to another fixed point within the total input space. Although either technique is valid, it is likely that using a local reference point will be advantageous from a numerical computation perspective, since the numerical values will be smaller.

The location of the input vector within the localised subspace of the input space is input (step 128) to the numerical estimator 119 that is associated with the winning processing element 115. Other information generated by the selector layer, such as a distance value (the distance of the input vector from the local reference vector) , may be used as an additional input (step 128a) for the estimator layer.

The additional input could include an indication of whether the input vector is located within the state space represented by the training data. This indication can be derived using the distance metric inherent in SOM- type networks. The indication can also be used to indicate whether the system is interpolating or extrapolating.

The system may use a reinforced metric, being the result or product of the distance metric of the selector layer and a numerical label applied to each of the selector layer processing elements. This numeric label provides further information relative to defining the input space. Thus, the distance metric alone, or a metric including or derived from the distance metric can be used.

The numerical estimator 119 is in this example implemented as a perceptron, which is trained on the subset of the data training set which activates the processing element in the selector layer with which it is associated. That is, it is trained on data which would cause the processing element to be identified as the winning processing element. The training data for the numerical estimator thus is representative of a subspace of the input space.

The numerical estimator 119 calculates a numerical value (step 129) and provides a numerical output (step 140), corresponding to the original input vector.

The system operates on the assumption that the complexity of the function within each subspace of the input space is less than complexity of the function over the entire input space. This allows acceptable numerical accuracy to be achieved with a simpler estimator function than would be required for adequate estimation over the entire input space. The estimator will calculate an estimated numerical function value for the input vector it has received. Since the estimator function will be a relatively simple function, it will be well suited for hardware implementations, but could equally be implemented in software. The estimator layer is trained after the selector layer.

In an alternative training method, the training data may also include those data which activate a neighbourhood of processing elements around the associated processing element 115, during all or part of the training. The definition of a neighbourhood in this context may be similar to the definition of a neighbourhood, in a modular map given in WO 00/45333 (the neighbourhood comprises those processing elements 114 with reference vectors falling within a predefined distance metric) , or may correspond to a logical cluster of processing elements. This enables the system to map the probability density distribution of the input data with better definition at the extremes or transitions between of the local subspace(s) .

The accuracy of the estimator can be assessed during the training process. Where the accuracy of a particular estimator is insufficient, it is possible to bias the training data for the selector layer in such a way that the particular subspace represents a greater proportion of the training data. This can be used to "subdivide" the problematic subspace and potentially achieve better accuracy in the problem areas. This will result in another training cycle for the network; this process can be repeated until an optimum selector network configuration and size has been found.

The network configuration of this embodiment may be implemented fully or partially in hardware. For the selector layer, a hardware implementation is preferred, and the hardware will be substantially similar to the hardware described in WO 00/45333. The estimator layer may be implemented in software, e.g. as software operating on a general purpose computer platform or in hardware. Possible implementations include the following: i. The estimator layer has a dedicated estimator for each processing element in the selector layer. In this case, the weights for the reference vector of the associated processing element are permanently stored in the estimator. ii. The estimator layer comprises a single generic estimator which is able to receive both its weights and its inputs from the selector layer (which stores the associated weights for each of its processing element) . iii. The estimator may comprise a number of estimators, each of which serves a cluster of selector processing elements (e.g. 4). In this case, the weights are stored in the estimator, and the selector layer provides an input with a pointer to the correct set of weights to be used.

In an application, a requirement may be to model a function representing a class of data, e.g. misfire or normal combustion in an internal combustion engine. Such a function will be of the general form:

The function value Y is assumed to be a numerical value. The set of input values X₁-X_n is termed the input vector, and the number of components n in the input vector is the dimensionality of the vector. The full set of values which can be potentially held by the input vector is the input space of the input vector, which can be visualised as an n-dimensional hyperspace. The state space of the function Y is the subspace of the input space which contains the actual range of function inputs, and will normally be significantly smaller than the potential input space.

An optimised input vector will consist of the minimum number of linearly independent components required to map the complete state space of the function. Full linear independence of the vector components is not a requirement, and indeed in most practical applications, some interdependence among input vector components is to be expected. The only necessary requirement for the input vector is that it completely fills the state space of the function, and that will in many cases result in a higher number of vector components , and thus dimensionality than strictly necessary.

The function f (...) is assumed to be at least partially continuous, that is, continuous over discrete areas of the input space. The function will also be deterministic, that is that the function has a single output value for any given input vector xχ...x_n. If the latter requirement is not fully satisfied, the situation may frequently be remedied by increasing the dimensionality of the input vector.

Beyond the above-mentioned limitations, which are necessary in order to establish a meaningful functional relationship, there may not be anything further known about the function. In particular, the function need not typically be known in an analytical form, nor need an algorithm be known (or found) to calculate the function value. A function estimation technique will typically be required to operate on the basis of the above information and assumptions alone.

The embodiment shown in Figures 6 to 9 allows results to be achieved using a significantly smaller network. This facilitates implementation in an embedded control system where resources may be limited.

In an alternative application, the apparatus and method are used as a novelty filter or change detector. In this application, the selector layer is used to determine whether a specific input vector is within the state space on which the network has been trained.

The input vector is presented to the selector layer 112, which will determine which processing element 114 responds to the input vector, that is, which is the winning processing element 115. The input vector is determined to be located within the subspace of the total input hyperspace that is associated with the processing element 119. The location of the input vector within that subspace is subsequently passed to the estimator layer 116, where the numerical estimator function associated with this particular subspace is used to provide a numerical output.

By comparing the input vector with training data distance metrics, its location with respect to the state space can be determined. In the event that the input vector is outside the state space on which the network has been trained, a distance metric is obtained from the selector layer 112 and used to provide an out-of-range indicator. Alternatively the estimator 119 can provide an extrapolated output value for the input vector. The two methods can also be combined, so that the extrapolated numerical output value for the input vector can be associated with a confidence level derived from the out- of-range indicator.

Although the above-described embodiment refers to a selector layer having a modular map network implementation, in an alternatively embodiment another network implementation of a similar type, such- as a SOM or LVQ network could be used.

The modular map implementation is preferred as it has a number of advantages for function estimation. An advantage of this class of neural network architectures is that it maps the n-dimensional state space to a two- dimensional surface. The mapping retains the statistical distribution of the training data used, so that the area occupied on the modular map by a region in the state space is roughly proportional to the cumulative probability of the region within the training data. This property ensures that the entire state space of the training data will be properly mapped.

Another important property is that relationships between data points are retained, in the sense that points which are close to each other in the original input space remain close to each other in the trained modular map. This is one reason why neural networks of the self- organising map family are frequently used for visualisation of complex multi-dimensional state spaces. The embodiment described above has the estimator layer implemented as a perceptron. In an alternative embodiment of the invention, th.e numerical estimator comprises a numerical function which outputs a plurality of numbers, for instance by performing a numerical transform, such as a Fourier or wavelet transform on the input vector, or a data set associated with the input vector, with coefficients for the transform provided by the selector network.

In the embodiment described above, the location of the input vector is passed 128 to only one of the numerical estimators 115 in the estimator layer 116, being the numerical estimator associated with the winning processing element in the selector layer 112. In a variation, the invention may also pass the data to estimators 118 neighbouring the estimator 119.

Claims

CLAIMS :

1. Data classification apparatus comprising: a data processor operative to present a classification input, which corresponds to data to be classified, to each of a plurality of neural networks, each neural network of the plurality of neural networks having a different response characteristic corresponding to a different , predetermined classification, each neural network being operable to produce a network output in dependence upon the neural network's response characteristic and the received classification input; and a classification processor operable to receive a network output from each neural network of the plurality of neural networks and, in dependence upon at least one received network output, to determine if the classification input belongs to at least one of the plurality of different, predetermined classifications.

2. Apparatus according to claim 1, in which the classification processor is operative to determine that the classification input belongs to none of the plurality of different predetermined classifications.

3. Apparatus according to claim 1 or 2 , in which the classification processor is operative to receive a network output from each neural network and to make its determination in dependence on the received network outputs.

4 . Apparatus according to any preceding claim, in which the classification processor is operative to determine that the classification input belongs to at least two of the plurality of different predetermined classifications.

5. Apparatus according to any preceding claim, in which the plurality of neural networks are comprised in a single neural network array.

6. Apparatus according to any preceding claim, in which the plurality of networks are comprised in an unsupervised neural network arrangement.

7. Apparatus according to claim 6 , in which the plurality of neural networks are comprised in a SeIf- Organising Map (SOM) neural network arrangement.

8. Apparatus according to any preceding claim, in which at least one neural network is operative such that the produced network output comprises a distance metric associated with a responding neuron of the neural network .

9. Apparatus according to claim 8, in which at least one neural network is operative such that the produced network output comprises a response metric associated with a responding neuron of the neural network, the response metric representing a smallest distance value for the responding neuron.

10. Apparatus according to any preceding claim, in which at least one neural network is operative such that the produced network output comprises an activation frequency metric associated with a responding neuron of the neural network.

11. Apparatus according to any preceding claim, in which the produced network output comprises a reinforced metric, the reinforced metric being based on a combination of a distance metric and a response metric.

12. Apparatus according to claim 11, in which the reinforced metric is the sum of the distance metric and the response metric.

13. Apparatus according to any preceding claim in which at least one neural network is comprised in an SOM neural network arrangement, the at least one neural network being operative such that the produced network output comprises at least one of a relative array reference and a weight metric of a responding neuron of the neural network.

14. Apparatus according to any preceding claim, in which the classification processor is operative to determine a different, predetermined classification of the plurality of different, predetermined classifications to which the classification input belongs in dependence on at least one metric associated with a responding neuron of each neural network.

15. Apparatus according to claim 15, in which the data classification apparatus is operative to determine the different, predetermined classification in dependence upon at least one of: a correlation; a comparison with a threshold value; and an ANN classifier.

16. Apparatus according to claim 15 where the determination is in dependence upon a correlation, the correlation comprising at least one of correlating the at least one metric with at least one metric associated with a responding neuron of at least one further neural network.

17. Apparatus according to claim 16, in which the correlation comprises at least one of: division of respective metrics to determine a ratio; and full cross correlation of respective metrics.

18. Apparatus according to any preceding claim, in which a different, predetermined classification is determined in dependence upon a committee decision based upon cross- correlation.

19 . Apparatus according to any preceding claim, in which the data classif ication apparatus is operative to provide a confidence level output indicative of a level of confidence with which the classification input belongs to at least one of the plurality of different , predetermined classifications.

20. Apparatus according to claim 19, in which the confidence level output is indicative of a level of confidence with which the classification input belongs to the different, predetermined classification that the classification processor has determined the classification input to belong.

21. Apparatus according to claim 19 or 20, in which the confidence level output is determined in dependence upon a statistical measure of a metric associated with a responding neuron of each neural network, the statistical measure indicating a likelihood of a given value for the metric being obtained.

22. Apparatus according to claim 21, in which the metric comprises at least one of: a distance metric, a response metric, an activation frequency metric and a reinforced metric.

23. Apparatus according to claim 21 or 22, in which the data classification apparatus is operative such that a metric associated with a responding neuron of each neural network is compared with the statistical measure.

24. Apparatus according to any of claims 21 to 23 configured such that the statistical measure of the at least one metric is stored in the data classification apparatus as a mean and a variance.

25. Apparatus according to any preceding claim, in which at least one neural network of the plurality of neural networks comprises a first neural network and a second neural network.

26. Apparatus according to claim 25, in which the first neural network has a response characteristic corresponding to a first part of a classification and the second neural network may have a response characteristic corresponding to a second, different part of the classification, each of the first and the second neural networks being operative to produce a network output in dependence upon its response characteristic.

27. Apparatus according to claim 25 or 26, in which the first neural network has a response characteristic configured to respond to a classification input that belongs to first set of data and the second neural network has a response characteristic configured to respond to a classification input that belongs to a second, different set of data, one of the first and the second neural networks being operative to produce a network output in dependence upon a classification input belonging to either the first or the second set of data.

28. Apparatus according to any of claims 25 to 27, in which the at least one neural network further comprises a secondary classification processor operative to receive a network output from at least one of the first and second neural networks.

29. Apparatus according to claim 28, in which the first neural network has a response characteristic corresponding to a first part of a classification and the second neural network has a response characteristic corresponding to a second, different part of the classification, the secondary classification processor being operative to determine to which of the network outputs produced by the first and second neural networks the classification input belongs.

30. Apparatus according to claim 28, in which the first neural network has a response characteristic configured to respond to a classification input that belongs to a first set of data and the second neural network has a response characteristic configured to respond to a classification input that belongs to a second, different set of data, the secondary classification processor being operable to receive a network output from one or other of the first and second neural networks and to convey the received network output to the classification processor.

31. Apparatus according to any preceding claim, in which the classification processor is further operative to determine that the classification input belongs to none of the plurality of different predetermined classifications, said classification input constituting a rejected classification input, the data classification apparatus is re-configurable to comprise a further neural network having a response characteristic corresponding to a further classification to which the rejected classification input belongs, the further neural network being operative to produce a network output in dependence on its response characteristic and a further classification input.

32. Apparatus according to claim 31, in which the data classification apparatus is operable to re-configure itself as comprising the further neural network in dependence upon a determination by the classification processor that the classification input belongs to none of the plurality of different predetermined classifications.

33. Apparatus^' according to claim 31 or 32, in which the data classification apparatus is configurable to comprise the further neural network as part of a re-training process.

34. Apparatus according to any preceding claim, in which the data processor is configured to receive the data to be classified and to be operative to produce a classification input corresponding to the data to be classified.

35. Apparatus according to any preceding claim, in which the data classification apparatus is configured to convey the classification input to the classification processor.

36. Apparatus according to any preceding claim, in which the data processor is further operative to receive the data to be classified and to perform at least one task of: validation of the data to be classified; conditioning of the data to be classified; parameter reduction; and classification input scaling.

37. Apparatus according to claim 36, in which the data classification apparatus is configured to convey a result of performing the said at least one task from the data processor to the classification processor.

38. Apparatus according to any preceding claim, in which the data to be classified comprises at least one of digital data and analogue data.

39. Apparatus according to any preceding claim, in which at least one of the neural networks has a plurality of neurons and a function processor, the function processor being operable to receive the network output from a neuron of the neural network and to provide a processor output in dependence upon the received output and a trained response characteristic of the function processor.

40. Apparatus according to claim 40, in which the at least one neural network comprises a plurality of function processors.

41. Apparatus according to claim 41, in which the at least one neural network comprises fewer function processors than neurons in the neural network.

42. Apparatus according to claim 41, in which the at least one neural network comprises at least a same number of function processors as neurons in the neural network, with a function processor being operative to receive an output from a respective neuron of the neural network.

43. Apparatus according to claim 41, in which the at least one neural network apparatus comprises one function processor operable to receive an output from each of the plurality of neurons in the neural network.

44. Apparatus according to any of claims 39 to 43, in which the at least one neural network is configured such that the network output from the one neuron is received in a neighbouring function processor, the neighbouring function processor being operative to provide a neighbourhood processor output.

45. Apparatus according to any of claims 39 to 44, in which the trained response characteristic of the function processor comprises a numerical function.

46. Apparatus according to any of claims 39 to 45, in which the at least one function processor comprises at least one perceptron of a further neural network.

47. A method of classifying data, the method comprising: receiving a classification input, which corresponds to data to be classified, in each of a plurality of neural networks, each neural network of the plurality of neural networks having a different response characteristic corresponding to a different, predetermined classification; operating each neural network to produce a network output in dependence upon the neural network's response characteristic and the received classification input; receiving a network output from at least one neural network of the plurality of neural networks in a classification processor; and operating the classification processor in dependence upon the received network outputs to determine if the classification input belongs to at least one of the plurality of different, predetermined classifications.

48. A computer program comprising program instructions for causing a computer to perform the method of claim 47.

49. A computer program according to claim 48 which is at least one of: embodied on at least one of a record medium, stored in a computer memory, embodied in a read- only memory; and carried on an electrical signal. 61

50 . Condition monitoring apparatus comprising data classif ication apparatus according to any one of claims 1 to 46 .

51. Condition monitoring apparatus according to "claim 50 further comprising at least one sensor operative to provide data to be classified to the data classification apparatus .

52. Condition monitoring apparatus according to claim 50 or 51 further comprising an actuator operable to control further apparatus in dependence on operation of the data classification apparatus.

53. Condition monitoring apparatus according to any one of claims 50 to 52 further comprising a user output operative to provide a signal discernable by a user in dependence on operation of the data classification apparatus .

54. An automobile comprising data classification apparatus according to any one of claims 1 to 46 and 50 to 53.

55. Data classification apparatus according to claim 54 operative to perform at least one of: misfire detection; emissions monitoring; and catalyser performance monitoring.

56. A method of training a data classification apparatus, the method comprising: presenting training data to each of a plurality of neural networks of the data classification apparatus, training data presented to each neural network consisting substantially of data belonging to a different, predetermined classification, whereby each neural network has a different response characteristic corresponding to the different, predetermined classification; each, neural network being operable to produce a network output in dependence upon a received classification input, which corresponds to data to be classified, and the neural network's response characteristic, the apparatus comprising a classification processor operable to receive a network output from each neural network and to determine, in dependence upon the received network outputs, if the classification input belongs to at least one of the plurality of different, predetermined classifications.

57. A method according to claim 56, in which presenting training data to each of a plurality of neural networks of the data classification apparatus changes a reference vector of neurons in each neural network.

58. A method according to claim 56 or 57 further comprising, after the step of presenting training data to each of a plurality of neural networks, the step of presenting further training data to each of the plurality of neural networks.

59. A method according to claim 58, in which a reference vector of neurons in each neural network is not changed in dependence upon the received further training data.

60. A method according to any one of claims 56 to 59, in which the method further comprises the step of determining a metric in dependence upon the received further training data .

61. A method according to claim 60, in which the metric comprises at least one of distance metrics for each neuron, a minimum distance metric for each neuron and an activation frequency metric for each neuron.

62. A method according to any one of claims 58 to 61, in which the method comprises the step of determining a probability distribution for a metric determined in dependence upon the received further training data.

63. A method according to any one of claims 56 to 62, in which at least one neural network has a plurality of neurons and at least one function processor, the function processor being operable to receive an output from at least one of the plurality of neurons and to provide a processor output in dependence upon the received output, and the method further comprises: receiving a first set of training data in the neural network, the neural network being operative to adopt a trained response characteristic in dependence upon the received first set of training data, and receiving a second set of training data in the function processor, the function processor being operative to adopt a trained response characteristic in dependence upon the received second set of training data, in which the function processor is operative to adopt its trained response characteristic after the neural network is operative to adopt its trained response characteristic.

64. A method according to claim 63, in which the second set of training data is received in the function processor after the first set of training data is received in the neural network.

65. A method according to claim 63 or 64, in which the first set of training data is different from the second set of training data.

66. A method according to claim 65, in which the second set of training data is a subset of the first set of training data.

67. A method according to any one of claims 63 to 65, in which the method further comprises a step of receiving a third set of training data in the function processor, the function processor being operative to modify its trained response characteristic in dependence upon the received third set of training data.

68. A method according to claim 67, in which the third set of training data comprises at least one data element not comprised in the second set of training data.

69. A method according to claim 68, in which the at least one data element not comprised in the second set of training data is determined based on an analysis of the trained response characteristic adopted in dependence upon the received second set of training data.

70. A method according to claim 68 or 69, in which the at least one data element not comprised in the second set of training data is determined based on a response characteristic of at least one further function processor associated with at least one neuron neighbouring the neuron from which the output is received by the function processor.