Keywords

1 Introduction

Biometric authentication is a kind of personal identification, which is performed using the characteristics of the human body by computer [1]. Face recognition is an example of using biometric to authenticate. Compared with the other biological features such as iris and fingerprint, the acquisition of face image is more convenient and the equipment is more hidden. As a method of using effective information for identification, face recognition has been widely used in many aspects in the past few decades [2].

In the past few decades, face recognition technology has become more and more concerned by researchers in the world. Especially since recent years, the research and application of face recognition technology has made great progress and a large number of academic papers have been published every year [3]. Some websites and APP use face login and face registration. In the last year, the iPhone X produced by Apple Inc uses the face recognition function. At the same time, there are many commercial face recognition system into the market, such as law enforcement advanced video surveillance, surveillance portal control and so on.

As a complex pattern recognition problem [4], face recognition involves many disciplines, including image processing, mathematics, physiology, computer vision, etc. Because of the influence of many factors, face recognition is a technique with high complexity. In order to deal with these complex problems, some good methods are needed in feature extraction and recognition.

How can we extract features of the face accurately? Feature extraction is a key step in face recognition, which determines the results of recognition directly. It is affected by many aspects, including posture, expression, age, etc. [5]. The extracted features should reflect the identity as much as possible. It is inaccurate if we just use a single method to extract feature, then the recognition results are unsatisfactory. We can obtain more complete features by combining a variety of methods to extract features and lay the foundation for the recognition of the back. There are many methods to extract features. Reference [6] has proposed a method based on Canny operator to detect edges. The wavelet transform has a good time-frequency localization properties, so it is suitable for image processing. Reference [7] used stationary wavelet transform (SWT) to extract features from MR brain images.

In addition to feature extraction, the design of classifier also has great influence on the performance of face recognition algorithm. Different classification can make different results. In general, feature recognition usually adopts single classifier such as SVM [8], neural network and so on. However, it is unable to ensure the accuracy and stability of the results only relying on a single classifier for recognition. Thus, multiple classifiers are combined by the integration technology [9] to improve the generalization ability and reliability of the classification system. When designing an integrated system, the selection of a single classifier is critical, which is the first factor affecting the performance. The selected single classifier need to be stable and diverse. Secondly, the strategy of ensemble method is the second influencing factor. Reference [10] has used weighted majority voting classifier combination for relation extraction from biomedical sentences.

We proposed a method of ensemble learning for face recognition in this paper. Canny operator, wavelet transform were used to extract features of the images itself and transformation domain in this method [11]. Then we utilized three simple and common classifiers the KNN, SVM and WNN to identify. A classifier combined a feature extraction method and the classification of the three views were constructed subsequently. The voting strategy was adopted to integrate decision finally.

2 Classifier

The classifier can affect the final result, and we will introduce several classifiers used in this chapter.

2.1 KNN (k Nearest Neighbor Classifier)

The K nearest neighbor classifier is an effective classifier in pattern recognition [12].

It uses the known categories of the nearest neighbor samples to judge the unknown sample, which is suitable for dealing with overlapping or crossover samples. Specific steps are as follow: Calculate the distance of the sample (also as known similarity) to be sorted and the known samples in the feature space. This is the key to the method. Then find the k samples that are closest to the unknown sample. Count the category of k samples, and find the category which has the largest number. Finally classify the unknown sample into this category.

2.2 SVM (Support Vector Machine)

Support Vector Machine [8] has great advantages in solving nonlinear classification. The basic principle is to transform the input space into the high-dimensional space by non-linear mapping. The samples can be divided linearly, in which case the optimal interface can be obtained.

Suppose the known training sets are \( C = \{ (x_{i} ,\,y_{i} )\} \), where \( x_{i} \in R^{n} \), \( y_{i} \in \{ - 1,\,1\} \), (i = 1,2,…,l). For linear transformation of x, the equation of linear separation is \( wx_{i} + b = 0 \). The surface that satisfies \( y_{i} (w^{T} x_{i} + b) - 1 \ge 0 \). The surface that satisfies \( y_{i} (w^{T} x_{i} + b) - 1 \ge 0 \) and \( \left\| w \right\|^{2} \) is the optimal classification surface.

Under this condition, it can be transformed into an optimization problem:

$$ \mathop {\hbox{min} }\limits_{\alpha } \frac{1}{2}\sum\limits_{i = 1}^{j} {\sum\limits_{j = 1}^{l} {y_{i} y_{j} \alpha_{i} \alpha_{j} K(x_{i} ,x_{j} ) - \sum\limits_{j = 1}^{l} {\alpha_{j} } } } $$
(1)

Then the discriminant function can be determined according to the optimal solution \( \alpha \) and the threshold \( b \) determined from the training samples:

$$ f(x) = \text{sgn} (\sum\limits_{{x_{i} \in S_{i} }}^{n} {\alpha_{i} y_{i} K(x_{i} ,\,x) + b)} $$
(2)

Where \( \alpha \) is Lagrange multiplier, \( K(x_{i} ,\,x) \) is the kernel function.

We can construct multiple classifiers to solve the multiple class problems. On the one hand, the SVM multi-class classifier can be realized by combining multiple two-class classifiers. On the other hand, the objective function can be modified to merge the problem of multiple classification surfaces into an optimization problem.

2.3 WNN (Wavelet Neural Network)

Wavelet neural network is the combination of wavelet transform and artificial neural network. It not only includes the local time-frequency characteristics and multi-scale decomposition characteristics of wavelet transform, but also contains the self-learning, adaptive and fault-tolerant ability of neural network [13]. Simply speaking, the wavelet function is used to replace the function in hidden layer on the basic of the BP neural network. The signal of wavelet neural network is transmitted forward, and error is transmitted backward at the same time.

The output of WNN is given by:

$$ y(k) = \sum\limits_{j = 1}^{l} {\omega_{jk} \, * } \,h_{j} ((\sum\limits_{i = 1}^{k} {\omega_{ij} x_{i} - b_{j} } )/a_{j} ) \, $$
(3)

Where \( h_{j} \) is the mother wavelet function; \( a_{j} \) is the scaling factor and \( b_{j} \) is the translation factor.

The error function is used as the fitness function to verify the degree of parameters correction:

$$ Error = \sum\limits_{k = 1}^{m} {(y(k) - D(k))^{2} /2} $$
(4)

Where \( D(k) \) is the expected output of the network.

We need to adjust the parameters according to the error. There are many methods for parameter revision and the gradient descent method is the most common in the wavelet neural network. However, it converges slowly and is easy to fall into the minimum. In this paper, we use the method of adding momentum item to modify the local parameters:

$$ \omega_{ij} (i + 1) = \omega_{ij} (i) + \Delta \omega_{ij} (i + 1) + k(\omega_{ij} (i) - \omega_{ij} (i - 1)) $$
(5)
$$ a_{j} (i + 1) = a_{j} (i) + \Delta a_{j} (i + 1) + l * (a_{j} (i) - a_{j} (i - 1)) $$
(6)
$$ b_{j} (i + 1) = b_{j} (i) + \Delta b_{j} (i + 1) + l * (b_{j} (i) - b_{j} (i - 1)) $$
(7)

3 Multi-view Ensemble Learning

3.1 The Multi-view Ensemble Learning Model

The classification technique of ensemble learning is a combination of multiple classifiers to enhance the reliability and generalization of system. In order to identify face images better, different feature extraction methods and identification classifiers are adopted in this study. The recognition model is shown in Fig. 1.

Fig. 1.
figure 1

The multi-view ensemble learning model.

View 1 (LDA + KNN)

In this view, LDA is used to obtain the features with fewer dimensions and then we use KNN to identify the features.

LDA [11] also called Fisher Linear Discriminant, is a supervised algorithm that reduces the dimension. The principle is: The data with label can be projected to a lower dimension by mapping, the projecting points in the same class are as close as possible, and the distance between different classes are as large as possible. Thus the data after projection can be distinguish by category.

View 2 (Edge detection + SVM)

As an edge detection method, Canny operator has good anti-noise performance and detection accuracy [14]. In this view, we use Canny operator to obtain the edge information of the image. The gradient amplitude and direction of images can be calculated after the Gaussian smoothing. We can use non-maximal suppression and double threshold processing to get the final edge.

After obtaining the edge features, we utilize SVM to classify to get the results. Face recognition is a typical multi-class identification problem. The support vector machine has strong generalization ability and good recognition rate for face recognition on pattern classification.

View 3 (Wavelet Transform + WNN)

Firstly, we used the wavelet transform to deal the image. It is well known that the wavelet transform has the ability of multi-scale expression. We use the two-dimensional discrete wavelet transform in this model and it can be realized by one-dimensional wavelet transform. The transformed image is divided into four parts: The LL part is an image with approximate coefficient that contains the major feature of the image. LH, HL and HH are images with detail coefficient that contain the details of the image. Among them, HH has high frequency both in horizontal direction and vertical direction, LH has low frequency in horizontal direction and high frequency in vertical direction, HL has high frequency in horizontal direction and low frequency in vertical direction.

In WNN, we adopt the three-layer feed-forward neural network shown in Fig. 2. This kind of wavelet neural network has one hidden layer.

Fig. 2.
figure 2

The structure of WNN for MIMO system.

3.2 Ensemble Learning Method

When designing an integrated system, multiple classifiers need to be integrated to achieve good integration [15]. And the selection of ensemble method affects the final results. There are many methods to integrate. Among them, the bagging as the most intuitive method has a surprisingly good performance. Table 1 shows the voting method.

Table 1. Algorithm: voting method.

The voting results can be divided into three categories:

The Unanimous Voting:

the result of ensemble learning is the class on which all classifiers are consistent. In other words, if KNN, SVM and WNN are identified as the same output, the final result will be this output.

The Plurality Voting:

the ensemble result is the class on which more than one half of the classifiers are consistent. For example, if KNN and WNN are identified as the same output A, the SVM is identified as another output B, then the final result is output A.

The Weighted Voting:

If the outputs of the three classifiers are different, the output of the classifier which has the highest recognition rate will be the final result. In this experiment, the recognition rate of WNN is higher than KNN and SVM, so the final result is derived from the WNN.

4 Experiments

In order to verify the feasibility of this algorithm, the experiment is carried out in ORL face database. In this paper, we select 320 images of human face in ORL face database consisting of 32 people. The size of each image is 92 × 112 pixels with a grayscale of 256. Some of the face images are shown in Fig. 3. We select 5 images of each person that are 160 images as the training samples, and the rest of the images are used as test samples.

Fig. 3.
figure 3

Some sample images in ORL database.

In view 1, we obtain 160 dimensional features after the process of LDA. Then we classify the features according to the K nearest neighbor classifier. There are many methods to calculate the distance between the sorted samples and the known samples, such as the Euclidean distance, the Minkowski distance, the Manhattan distance, and so on. Here, we use the Euclidean distance. And we choose 5 neighbors through the experiment finally.

In view 2, the two-dimensional Gaussian function is served as the noise filter in Canny operator. Then we use the LIBSVM-FarutoUltimate toolbox to construct SVM classification after obtain the edge features. This toolbox provides a series of auxiliary functions for parameter searching, processing and result visualization, which are more convenient to use. Different inner product kernel functions in SVM will form different algorithms. In this model, we use sigmoid kernel function.

In view 3, in order to improve the speed, we adopt the wavefast function in wavelet toolbox. The Fig. 4 is the original image and its wavelet transform. The left image is the original image, and the right image is a 1-scale wavelet transform. As can be seen from the figure, the low frequency part retains the approximate information, and the high frequency part retains some edge information and noise. In the wavelet neural network, the morlet wavelet shown in Fig. 5 is exploited as the activation function in hidden layer.

Fig. 4.
figure 4

The original image and its wavelet transform.

Fig. 5.
figure 5

Morlet wavelet function.

Table 2 shows the recognition rate of the ensemble learning method proposed in this paper. The single classifier is also used to identify the same data set. It can be seen that the recognition rate is lowest when using KNN. Compared with the single classifier, the recognition rate of ensemble learning method has been improved obviously.

Table 2. Average accuracy rates on ORL.

In order to increase the contrast, we select some images from the FERET database randomly. Each person has 7 different images. In the experiment, four images of each person are used for training set randomly and the remaining 3 images of each person are used for testing set. Some of the face images are shown in Fig. 6. Table 3 shows the recognition rate on this small data set. We can see that the ensemble learning method has the highest recognition rate.

Fig. 6.
figure 6

Some sample images in FERET database.

Table 3. Average accuracy rates on FERET.

5 Conclusions

In this paper, combining multiple feature extraction and classification techniques, we propose a method of multi-view ensemble learning in face recognition. A variety of methods are used to extract features, which avoids the incompleteness of information and represents the feature more fully. The classification uses SVM, KNN, WNN as the base classifier to identity respectively. Multi-view results are integrated with voting strategy to ensure the accuracy of identification results. The experimental results show that our method has impressive recognition accuracy on face database.

Future work includes implementing the parallelism of the algorithm to compensate the complexity. In addition, there is a need for further reduction in running time. I believe that face recognition technology will be more prefect, stable and powerful in the near future.