CN110348350A

CN110348350A - A kind of driver status detection method based on facial expression

Info

Publication number: CN110348350A
Application number: CN201910584900.8A
Authority: CN
Inventors: 胡江平; 甘路涛; 张馨滢; 李咏章
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2019-10-18
Anticipated expiration: 2039-07-01
Also published as: CN110348350B

Abstract

The driver status detection method based on facial expression that the invention discloses a kind of, by gray processing, Gamma correction and PCA dimension-reduction treatment, so that face-image size reduces, feature enhancing.On this basis, the present invention constructs a sequentially connected four-layer structure, average pond layer, the human facial expression recognition that dropout layers and softmax classifier are constituted is rolled up according to neural network, its number of parameters is small, simultaneously, inception structure design therein uses a kind of new form, traditional regular convolution 3*3 convolution is splitted into 1*3 convolution sum 3*1 convolution, quantity of parameters has on the one hand been saved in this way, accelerate operation and alleviates over-fitting, and increase one layer of nonlinear extensions model tormulation ability, it can handle more, richer space characteristics, increase characteristic polymorphic.This inception structure designs so that the accuracy rate that human facial expression recognition volume becomes more lightweight, while there is better detection effect to improve driver status detection according to neural network.

Description

A Driver State Detection Method Based on Facial Expression

技术领域technical field

本发明属于驾驶员状态技术领域，更为具体地讲，涉及一种基于面部表情的驾驶员状态检测方法，即对驾驶员面部表情进行实时检测并由此对驾驶员当前的驾驶状态进行判定的方法。The invention belongs to the technical field of driver status, and more specifically relates to a method for detecting driver status based on facial expressions, that is, real-time detection of driver facial expressions and thus judging the driver's current driving status method.

背景技术Background technique

驾驶员的驾驶状态对安全驾驶起着至关重要的作用，通过实时检测出驾驶员的驾驶状态，可以很好地确保驾驶员的安全驾驶。The driver's driving state plays a vital role in safe driving. By detecting the driver's driving state in real time, the driver's safe driving can be well ensured.

目前对驾驶员的驾驶状态进行分析判断主要分为接触式和非接触式两大类。其中，接触式方法主要为通过穿戴式设备等检测驾驶员脑电信号、肌电信号等生理信号来判断驾驶员的驾驶状态，该方法主要的缺点是检测过程中会对驾驶员安全驾驶造成影响且成本较高；非接触式的方法分为三小类，第一类是通过检测车辆的行驶轨迹来判断驾驶员的驾驶状态，但是该方法受环境道路影响较大且准确率低，第二种方法是通过实时检测方向盘转动角度、刹车离合受力程度等情况判断驾驶员的驾驶状态，但是该方法受到驾驶员个人的驾驶习惯影响较大；第三种方法是利用计算机视觉方法，利用摄像头拍摄到的驾驶员面部图像判断出驾驶员当前的表情，进而实时检测出驾驶员的驾驶状态，该方法具有实时性好、准确率高的优点，因此，计算机视觉方法检测驾驶员的驾驶状态是当前的主流方向。At present, the analysis and judgment of the driver's driving state are mainly divided into two categories: contact type and non-contact type. Among them, the contact method is mainly to detect the driver's driving state by detecting the driver's EEG signal, EMG signal and other physiological signals through wearable devices. The main disadvantage of this method is that the detection process will affect the driver's safe driving. And the cost is high; the non-contact method is divided into three sub-categories. The first category is to judge the driver's driving state by detecting the vehicle's driving trajectory, but this method is greatly affected by the environmental road and has low accuracy. The first method is to judge the driving state of the driver by detecting the steering wheel rotation angle in real time, the force level of the brake clutch, etc., but this method is greatly affected by the driver's personal driving habits; the third method is to use computer vision methods. The captured facial image of the driver can determine the driver's current expression, and then detect the driver's driving state in real time. This method has the advantages of good real-time performance and high accuracy. Therefore, the computer vision method to detect the driver's driving state current mainstream direction.

面部表情在人与人之间交流上有着重要作用，面部表情相对于文字语音等媒介，在表达人的情感方面具有更加直观，准确的优势。人的这种情感交互模式现在已经用于如虚拟现实、数字娱乐领域、通信与视频会议、人机交互等场景。因此基于面部表情的驾驶员状态检测相对于单纯的疲劳检测会更加具有优势和人性化。面部表情识别方法大致包括以下三个方面：人脸图像预处理、面部表情特征学习、面部表情分类，最后依据面部表情分类检测出驾驶员的驾驶状态。Facial expressions play an important role in communication between people. Compared with media such as text and voice, facial expressions have more intuitive and accurate advantages in expressing people's emotions. This human emotional interaction mode has now been used in scenarios such as virtual reality, digital entertainment, communication and video conferencing, and human-computer interaction. Therefore, the driver state detection based on facial expression will be more advantageous and humanized than simple fatigue detection. The facial expression recognition method generally includes the following three aspects: face image preprocessing, facial expression feature learning, facial expression classification, and finally, the driver's driving state is detected according to the facial expression classification.

然而，现有的基于面部标签的驾驶员状态检测，参数数量大，运算速度较低，从而影响了驾驶员状态检测的实时性。同时，准确率也有待提高。However, the existing driver state detection based on facial tags has a large number of parameters and a low calculation speed, which affects the real-time performance of driver state detection. At the same time, the accuracy rate needs to be improved.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于面部表情的驾驶员状态检测方法，以增强驾驶员状态检测的实时性，同时提高准确率。The purpose of the present invention is to overcome the deficiencies of the prior art and provide a driver state detection method based on facial expressions, so as to enhance the real-time performance of the driver state detection and improve the accuracy rate at the same time.

为实现上述发明目的，本发明基于面部表情的驾驶员状态检测方法，其特征在于，包括以下步骤：In order to realize the foregoing invention object, the driver's state detection method based on facial expression of the present invention is characterized in that, comprises the following steps:

(1)、获取驾驶员的面部图像(1) Obtain the driver's face image

利用在驾驶员前方安装的摄像头获取驾驶员的视频流，利用haar特征+adaboost人脸检测算法，检测视频流图像中用户的人脸区域即面部图像；Use the camera installed in front of the driver to obtain the driver's video stream, and use the haar feature + adaboost face detection algorithm to detect the user's face area in the video stream image, that is, the facial image;

(2)、对获取的面部图像进行预处理(2), preprocessing the acquired facial image

首先对面部图像进行图像灰度化处理First, image grayscale processing is performed on the face image

Gray＝0.3R+0.59G+0.11BGray＝0.3R+0.59G+0.11B

其中，Gray为像素灰度值，R为红色像素值、G为绿色像素值、B为蓝色像素值；Among them, Gray is the pixel gray value, R is the red pixel value, G is the green pixel value, and B is the blue pixel value;

然后进行Gamma校正：Then perform Gamma correction:

I＝Gray^γ I = Gray ^|

其中，I为校正后像素灰度值，γ为0.5；Among them, I is the pixel gray value after correction, and γ is 0.5;

最后，再对灰度化处理以及Gamma校正后的图像使用PCA(主成分分析)方法进行处理：Finally, use the PCA (Principal Component Analysis) method to process the grayscaled and Gamma-corrected images:

确定面部图像为n行m列的矩阵X，先将矩阵X的每一行进行零均值化，然后求出矩阵X的协方差矩阵，再求出协方差矩阵的特征值及对应的特征向量；将特征向量按照对应特征值大小从小到大按行排成矩阵O，取矩阵O的前K行组成矩阵P，矩阵P为K行n列的矩阵，得到面部图像Y，Y＝PX即为降维到K维后的面部图像；Determine that the facial image is a matrix X with n rows and m columns, first carry out zero-meanization for each row of the matrix X, then find the covariance matrix of the matrix X, and then find the eigenvalues and corresponding eigenvectors of the covariance matrix; The eigenvectors are arranged into a matrix O according to the size of the corresponding eigenvalues from small to large, and the first K rows of the matrix O are taken to form a matrix P. The matrix P is a matrix of K rows and n columns, and the facial image Y is obtained. Y=PX is dimensionality reduction The facial image after reaching the K dimension;

(3)、驾驶员面部表情识别(3) Driver facial expression recognition

3.1)、构建面部表情识别卷据神经网络3.1), Construct facial expression recognition volume neural network

所述面部表情识别卷据神经网络包括依次连接的四层结构、平均池化层、dropout层以及softmax分类器；Described facial expression recognition volume neural network comprises four layers of structure connected successively, average pooling layer, dropout layer and softmax classifier;

每一层结构包括卷积核大小为3×3、步长为2的第一卷积层、卷积核大小为3×3、步长为1的第二、三卷积层、池化层以及inception结构；其中，前三层结构中的池化层的卷积核大小为3×3、步长为2，第四层结构的池化层的卷积核大小为3×3、步长为1；Each layer structure includes the first convolution layer with a convolution kernel size of 3×3 and a step size of 2, the second and third convolution layers with a convolution kernel size of 3×3 and a step size of 1, and a pooling layer And the inception structure; among them, the convolution kernel size of the pooling layer in the first three layers of structure is 3×3, the step size is 2, and the convolution kernel size of the pooling layer of the fourth layer structure is 3×3, the step size is 1;

所述inception结构分为包括并行处理的四个特征图处理通道以及一个filterconcatenation层，第一个特征图处理通道采用大小为3×3卷积核对输入特征图池化操作，然后采用大小为1×1卷积核进行卷积操作，最后送入filter concatenation层中；第二个特征图处理通道采用大小为1×1卷积核对输入特征图进行卷积操作，然后送入filterconcatenation层中；第三个特征图处理通道采用大小为1×1卷积核对输入特征图进行卷积操作，得到的特征图分别再采用大小为3×1卷积核、1×3卷积核进行卷积操作，得到的特征图都送入filter concatenation层中；第四个特征图处理通道采用大小为1×1卷积核对输入特征图进行卷积操作，卷积操作得到的特征图再采用大小为3×3卷积核进行卷积操作，然后分别再采用大小为3×1卷积核、1×3卷积核对3×3卷积核卷积操作后的特征图进行卷积操作，得到的特征图都送入filter concatenation层中；filter concatenation层中将四个特征图处理通道得到的特征图进行连接，得到连接后的特征图；The inception structure is divided into four feature map processing channels including parallel processing and a filterconcatenation layer. The first feature map processing channel uses a convolution kernel with a size of 3×3 to pool the input feature map, and then uses a size of 1× 1 convolution kernel performs convolution operation, and finally sends it to the filter concatenation layer; the second feature map processing channel uses a convolution kernel with a size of 1×1 to perform convolution operation on the input feature map, and then sends it to the filter concatenation layer; the third Each feature map processing channel uses a convolution kernel with a size of 1×1 to perform convolution operations on the input feature map, and the obtained feature maps are then convolutionally operated with a convolution kernel with a size of 3×1 and a convolution kernel with a size of 1×3 to obtain The feature maps of all are sent to the filter concatenation layer; the fourth feature map processing channel uses a convolution kernel with a size of 1×1 to perform convolution operations on the input feature maps, and the feature maps obtained by the convolution operation are then used with a size of 3×3. The convolution operation is performed on the product kernel, and then the convolution operation is performed on the feature map after the convolution operation of the 3×3 convolution kernel by using a 3×1 convolution kernel and a 1×3 convolution kernel respectively, and the obtained feature maps are sent to into the filter concatenation layer; in the filter concatenation layer, the feature maps obtained by the four feature map processing channels are connected to obtain the connected feature map;

第一层结构的第一卷积层输入降维后的K维面部图像，依次经过第一、二、三卷积层的卷积操作后送入池化层中进行池化操作，池化后的特征图像送入inception结构进行处理，得到连接后的特征图；第一层结构处理完得到的连接后的特征图送入第二层结构进行第一层结构的相同处理、第二层结构处理完的特征图送入第三层结构进行第一层结构的相同处理，第三层结构处理完的特征图送入第四层结构进行第一层结构的相同处理，第四层结构处理完得到的连接后的特征图送入平均池化层进行平均池化操作，平均池化后的特征图在dropout层进行一定比例的丢弃，然后送入softmax分类器器中分类得到面部表情；The first convolutional layer of the first layer structure inputs the K-dimensional facial image after dimension reduction, and after the convolution operation of the first, second, and third convolutional layers, it is sent to the pooling layer for pooling operation. After pooling The feature image of the inception structure is sent to the inception structure for processing, and the connected feature map is obtained; the connected feature map obtained after the first layer structure is processed is sent to the second layer structure for the same processing of the first layer structure, and the second layer structure processing The finished feature map is sent to the third layer structure for the same processing as the first layer structure, the feature map processed by the third layer structure is sent to the fourth layer structure for the same processing as the first layer structure, and the fourth layer structure is processed to get The connected feature maps are sent to the average pooling layer for average pooling operation, and the average pooled feature maps are discarded in a certain proportion in the dropout layer, and then sent to the softmax classifier to classify facial expressions;

3.2)、训练面部表情识别卷据神经网络3.2), training facial expression recognition volume neural network

将标记面部表情的K维面部图像送入步骤3.1)构建的面部表情识别卷据神经网络对其进行训练，得到训练好的面部表情识别卷据神经网络；The K-dimension facial image of mark facial expression is sent into the facial expression recognition volume neural network that step 3.1) builds and it is trained, obtains the well-trained facial expression recognition volume neural network;

其中，面部表情识别卷据神经网络训练过程中选取的激活函数为Relu函数，优化算法为SGD((Stochastic Gradient Descent，随机梯度下降))，初始化方法为Xavier，学习速率为：Among them, the activation function selected in the facial expression recognition volume neural network training process is the Relu function, the optimization algorithm is SGD ((Stochastic Gradient Descent, stochastic gradient descent)), the initialization method is Xavier, and the learning rate is:

base_lr(1-iter/max_iter)×0.5base_lr(1-iter/max_iter)×0.5

其中，base_lr＝0.01是最初学习速率，iter是当前迭代的次数，max_iter是最大迭代次数；Among them, base_lr=0.01 is the initial learning rate, iter is the number of current iterations, and max_iter is the maximum number of iterations;

3.3)、获取的驾驶员面部图像经过步骤(1)、(2)处理后，送入训练好的面部表情识别卷据神经网络，得到驾驶员的面部表情；3.3), after step (1), (2) processing, the driver's facial image that obtains is sent into the trained facial expression recognition volume neural network to obtain the driver's facial expression;

(4)、输出结果(4), output result

识别出驾驶员面部表情之后，得到驾驶员的驾驶状态，实时地显示在屏幕中，或者能够及时地对驾驶员进行提示，当驾驶员出现愤怒等不适合驾驶的面部表情时，给出不适合驾驶的状态提醒，及时对驾驶员进行有效的提示或者采用一系列的方法来缓解驾驶员当前的不适驾驶状态。After recognizing the facial expression of the driver, the driving status of the driver can be obtained and displayed on the screen in real time, or the driver can be prompted in time. Driving status reminder, timely and effective reminder to the driver or a series of methods to alleviate the driver's current uncomfortable driving state.

本发明的发明目的是这样实现的：The purpose of the invention of the present invention is achieved like this:

本发明基于面部表情的驾驶员状态检测方法，通过灰度化、Gamma校正以及PCA降维处理，使得面部图像大小减小、特征增强。在此基础上，本发明构建了一个依次连接的四层结构、平均池化层、dropout层以及softmax分类器构成的面部表情识别卷据神经网络，其参数数量小，同时，其中的inception结构设计采用了一种新的形式，将传统的规则卷积3*3卷积拆成1*3卷积和3*1卷积，这样一方面节约了大量参数，加速运算并减轻了过拟合，并增加了一层非线性扩展模型表达能力，可以处理更多、更丰富的空间特征，增加特征多样性。这种inception结构设计使得面部表情识别卷据神经网络变得更加轻量化，同时具有更好的检测效果即提高了驾驶员状态检测的准确率。The facial expression-based driver state detection method of the present invention reduces the size of the facial image and enhances the features through grayscale, Gamma correction and PCA dimension reduction processing. On this basis, the present invention builds a facial expression recognition roll neural network consisting of a sequentially connected four-layer structure, an average pooling layer, a dropout layer and a softmax classifier, and its parameter quantity is small, and at the same time, the inception structure design wherein A new form is adopted to split the traditional regular convolution 3*3 convolution into 1*3 convolution and 3*1 convolution, which saves a lot of parameters on the one hand, speeds up the operation and reduces overfitting. And a layer of non-linear expansion model expression ability is added, which can handle more and richer spatial features and increase feature diversity. This inception structure design makes the neural network for facial expression recognition more lightweight, and at the same time has a better detection effect, which improves the accuracy of driver state detection.

附图说明Description of drawings

图1是本发明基于面部表情的驾驶员状态检测方法的流程图；Fig. 1 is the flow chart of the driver's state detection method based on facial expression of the present invention;

图2是面部表情识别卷据神经网络一种具体实施方式框架图；Fig. 2 is a frame diagram of a specific embodiment of the neural network for facial expression recognition;

图3是图2所示面部表情识别卷据神经网络中inception结构的框架图。Fig. 3 is a framework diagram of the inception structure in the neural network for facial expression recognition shown in Fig. 2 .

具体实施方式Detailed ways

下面结合附图对本发明的具体实施方式进行描述，以便本领域的技术人员更好地理解本发明。需要特别提醒注意的是，在以下的描述中，当已知功能和设计的详细描述也许会淡化本发明的主要内容时，这些描述在这里将被忽略。Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention. It should be noted that in the following description, when detailed descriptions of known functions and designs may dilute the main content of the present invention, these descriptions will be omitted here.

实施例Example

本发明中，首先采用haar特征+adaboost人脸检测算法检测出驾驶员的人脸区域即面部图像，然后把检测到的面部图像进行预处理后输入的构建的面部表情识别卷积神经网络中，实时的检测当前驾驶员的面部表情，得到驾驶员的驾驶状态。In the present invention, at first adopt haar characteristic+adaboost human face detection algorithm to detect the driver's human face region, i.e. the facial image, then in the facial expression recognition convolutional neural network that is input after the preprocessing of the detected facial image, Real-time detection of the current driver's facial expression to obtain the driver's driving status.

图1是本发明基于面部表情的驾驶员状态检测方法的流程图。Fig. 1 is a flow chart of the driver's state detection method based on facial expression in the present invention.

在本实施例中，如图1所示，本发明基于面部表情的驾驶员状态检测方法，包括以下步骤：In the present embodiment, as shown in Figure 1, the driver's state detection method based on facial expression of the present invention comprises the following steps:

步骤S1：获取驾驶员的面部图像Step S1: Get the driver's face image

利用在驾驶员前方安装的摄像头获取驾驶员的视频流，利用haar特征+adaboost人脸检测算法，检测视频流图像中用户的人脸区域即面部图像：提取图像中的haar-like特征，然后将haar-like特征输入Adaboost分类器，检测驾驶员人脸所处区域位置，把人脸位置框选到作为面部图像进行后续处理。Use the camera installed in front of the driver to obtain the driver's video stream, use the haar feature + adaboost face detection algorithm to detect the user's face area in the video stream image, that is, the facial image: extract the haar-like feature in the image, and then The haar-like features are input into the Adaboost classifier to detect the location of the driver's face, and frame the location of the face as a facial image for subsequent processing.

步骤S2：对获取的面部图像进行预处理Step S2: Preprocessing the acquired facial image

Gray＝0.3R+0.59G+0.11BGray＝0.3R+0.59G+0.11B

其中，Gray为像素灰度值，R为红色像素值、G为绿色像素值、B为蓝色像素值。Among them, Gray is the pixel gray value, R is the red pixel value, G is the green pixel value, and B is the blue pixel value.

然后进行Gamma校正：Then perform Gamma correction:

I＝Gray^γ I = Gray ^|

其中，I为校正后像素灰度值，γ为0.5。Among them, I is the pixel gray value after correction, and γ is 0.5.

最后，再对灰度化处理以及Gamma校正后的图像使用PCA方法进行处理：确定面部图像为n行m列的矩阵X，先将矩阵X的每一行进行零均值化，然后求出矩阵X的协方差矩阵，再求出协方差矩阵的特征值及对应的特征向量；将特征向量按照对应特征值大小从小到大按行排成矩阵O，取矩阵O的前K行组成矩阵P，矩阵P为K行n列的矩阵，得到面部图像Y，Y＝PX即为降维到K维后的面部图像。这样就完成了驾驶员人脸图像的预处理，方便后续神经网络的表情识别。通过灰度化PCA降维处理，使得面部图像大小减小，增强了处理的实时性，通过Gamma校正，图像特征得到了增强，提高了识别的准确率。Finally, use the PCA method to process the gray-scaled and Gamma-corrected images: determine the facial image as a matrix X with n rows and m columns, first zero-mean each row of the matrix X, and then calculate the matrix X Covariance matrix, and then find the eigenvalues and corresponding eigenvectors of the covariance matrix; arrange the eigenvectors into matrix O according to the size of the corresponding eigenvalues from small to large, and take the first K rows of matrix O to form matrix P, matrix P is a matrix of K rows and N columns, and the facial image Y is obtained, and Y=PX is the facial image after dimensionality reduction to K dimensions. In this way, the preprocessing of the driver's face image is completed, which is convenient for the expression recognition of the subsequent neural network. Through gray-scale PCA dimension reduction processing, the size of the facial image is reduced, and the real-time performance of the processing is enhanced. Through Gamma correction, the image features are enhanced, and the recognition accuracy is improved.

步骤S3：驾驶员面部表情识别Step S3: Driver facial expression recognition

步骤S3.1：构建面部表情识别卷据神经网络Step S3.1: Construct facial expression recognition volume neural network

在本实施例总，如图2所示，所述面部表情识别卷据神经网络包括依次连接的四层结构、平均池化层、dropout层以及softmax分类器。In this embodiment, as shown in FIG. 2 , the neural network for facial expression recognition includes a sequentially connected four-layer structure, an average pooling layer, a dropout layer, and a softmax classifier.

所述四层结构的每一层结构包括卷积核大小为3×3、步长为2的第一卷积层、卷积核大小为3×3、步长为1的第二、三卷积层、池化层以及inception结构；其中，前三层结构中的池化层的卷积核大小为3×3、步长为2，第四层结构的、池化层的卷积核大小为3×3、步长为1。Each layer of the four-layer structure includes the first convolution layer with a convolution kernel size of 3×3 and a step size of 2, the second and third convolution layers with a convolution kernel size of 3×3 and a step size of 1 Product layer, pooling layer, and inception structure; among them, the convolution kernel size of the pooling layer in the first three layers of structure is 3×3, and the step size is 2, and the convolution kernel size of the fourth layer structure and pooling layer is 3×3 with a step size of 1.

在本实施例中，如图3所示，所述inception结构分为包括并行处理的四个特征图处理通道以及一个filter concatenation层，第一个特征图处理通道采用大小为3×3卷积核对输入特征图池化操作，然后采用大小为1×1卷积核进行卷积操作，最后送入filterconcatenation层中；第二个特征图处理通道采用大小为1×1卷积核对输入特征图进行卷积操作，然后送入filter concatenation层中；第三个特征图处理通道采用大小为1×1卷积核对输入特征图进行卷积操作，得到的特征图分别再采用大小为3×1卷积核、1×3卷积核进行卷积操作，得到的特征图都送入filter concatenation层中；第四个特征图处理通道采用大小为1×1卷积核对输入特征图进行卷积操作，卷积操作得到的特征图再采用大小为3×3卷积核进行卷积操作，然后分别再采用大小为3×1卷积核、1×3卷积核对3×3卷积核卷积操作后的特征图进行卷积操作，得到的特征图都送入filter concatenation层中；filter concatenation层中将四个特征图处理通道得到的特征图进行连接，得到连接后的特征图。In this embodiment, as shown in Figure 3, the inception structure is divided into four feature map processing channels including parallel processing and a filter concatenation layer, and the first feature map processing channel uses a convolution check with a size of 3×3 Input the feature map pooling operation, then use the size of 1×1 convolution kernel for convolution operation, and finally send it to the filterconcatenation layer; the second feature map processing channel uses the size of 1×1 convolution kernel to convolve the input feature map The product operation is then sent to the filter concatenation layer; the third feature map processing channel uses a 1×1 convolution kernel to perform convolution operations on the input feature map, and the obtained feature maps use a 3×1 convolution kernel , 1×3 convolution kernel for convolution operation, and the obtained feature maps are sent to the filter concatenation layer; the fourth feature map processing channel uses a 1×1 convolution kernel to perform convolution operations on the input feature map, and the convolution The feature map obtained by the operation is then convolved with a 3×3 convolution kernel, and then the convolution kernel with a size of 3×1, 1×3 convolution kernel and 3×3 convolution kernel are used respectively. The feature map is convolved, and the obtained feature maps are sent to the filter concatenation layer; in the filter concatenation layer, the feature maps obtained by the four feature map processing channels are connected to obtain the connected feature map.

在本实施例中，inception结构的设计采用了一种新的形式，将传统的规则卷积如3*3卷积拆成1*3卷积和3*1卷积，一方面节约了大量参数，加速运算并减轻了过拟合，同时增加了一层非线性扩展模型表达能力，可以处理更多、更丰富的空间特征，增加特征多样性。这种特殊的结构设计使得面部表情识别卷据神经网络变得更加轻量化，同时具有更好的检测效果。In this embodiment, the design of the inception structure adopts a new form, and the traditional regular convolution such as 3*3 convolution is split into 1*3 convolution and 3*1 convolution, which saves a lot of parameters on the one hand , to speed up the operation and reduce overfitting, and at the same time add a layer of nonlinear expansion model expression ability, which can handle more and richer spatial features and increase feature diversity. This special structural design makes the neural network for facial expression recognition more lightweight and has better detection results.

第一层结构的第一卷积层输入降维后的K维面部图像，依次经过第一、二、三卷积层的卷积操作后送入池化层中进行池化操作，池化后的特征图像送入inception结构进行处理，得到连接后的特征图；第一层结构处理完得到的连接后的特征图送入第二层结构进行第一层结构的相同处理、第二层结构处理完的特征图送入第三层结构进行第一层结构的相同处理，第三层结构处理完的特征图送入第四层结构进行第一层结构的相同处理，第四层结构处理完得到的连接后的特征图送入平均池化层进行平均池化操作，平均池化后的特征图在dropout层进行一定比例的丢弃，然后送入softmax分类器器中分类得到面部表情。The first convolutional layer of the first layer structure inputs the K-dimensional facial image after dimension reduction, and after the convolution operation of the first, second, and third convolutional layers, it is sent to the pooling layer for pooling operation. After pooling The feature image of the inception structure is sent to the inception structure for processing, and the connected feature map is obtained; the connected feature map obtained after the first layer structure is processed is sent to the second layer structure for the same processing of the first layer structure, and the second layer structure processing The finished feature map is sent to the third layer structure for the same processing as the first layer structure, the feature map processed by the third layer structure is sent to the fourth layer structure for the same processing as the first layer structure, and the fourth layer structure is processed to get The connected feature maps are sent to the average pooling layer for average pooling operation, and the average pooled feature maps are discarded in a certain proportion in the dropout layer, and then sent to the softmax classifier to classify facial expressions.

步骤S3.2：训练面部表情识别卷据神经网络Step S3.2: Training facial expression recognition volume neural network

将标记面部表情的K维面部图像送入步骤S3.1构建的面部表情识别卷据神经网络对其进行训练，得到训练好的面部表情识别卷据神经网络.Send the K-dimensional facial image marked with facial expression to the facial expression recognition volume neural network constructed in step S3.1 to train it, and obtain the trained facial expression recognition volume neural network.

其中，面部表情识别卷据神经网络训练过程中选取的激活函数为Relu函数，优化算法为SGD，初始化方法为Xavier，学习速率为：Among them, the activation function selected in the facial expression recognition volume neural network training process is the Relu function, the optimization algorithm is SGD, the initialization method is Xavier, and the learning rate is:

base_lr(1-iter/max_iter)×0.5base_lr(1-iter/max_iter)×0.5

其中，base_lr＝0.01是最初学习速率，iter是当前迭代的次数，max_iter是最大迭代次数。Among them, base_lr=0.01 is the initial learning rate, iter is the number of current iterations, and max_iter is the maximum number of iterations.

步骤S3.3：获取的驾驶员面部图像经过步骤(1)、(2)处理后，送入训练好的面部表情识别卷据神经网络，得到驾驶员的面部表情。Step S3.3: After the acquired driver's facial image is processed in steps (1) and (2), it is sent to the trained facial expression recognition roll neural network to obtain the driver's facial expression.

在本实施例中，输出7种基本的驾驶员面部表情。本实施例中，训练所需的数据集为人脸表情数据库中的部分数据集，对数据集进行预处理后，送入面部表情识别卷据神经网络进行训练。用人脸表情数据库中的其他部分数据集作为测试。In this embodiment, seven basic driver facial expressions are output. In this embodiment, the data sets required for training are part of the data sets in the facial expression database. After the data sets are preprocessed, they are sent to the neural network for facial expression recognition for training. Use other part of the dataset in the facial expression database as a test.

步骤S4：得到驾驶员的驾驶状态Step S4: Obtain the driving state of the driver

根据识别出驾驶员面部表情，得到驾驶员的驾驶状态，实时地显示在屏幕中，或者能够及时地对驾驶员进行提示，当驾驶员出现愤怒等不适合驾驶的面部表情时，给出不适合驾驶的状态提醒，及时对驾驶员进行有效的提示或者采用一系列的方法来缓解驾驶员当前的不适驾驶状态。According to the recognition of the driver's facial expression, the driver's driving status can be obtained and displayed on the screen in real time, or the driver can be prompted in time. When the driver has facial expressions that are not suitable for driving, such as anger, an inappropriate Driving status reminder, timely and effective reminder to the driver or a series of methods to alleviate the driver's current uncomfortable driving state.

在本实施例中，通过在数据集中的训练，通过对驾驶员面部进行表情识别，验证了本发明提出的改进的inception结构的正确性和有效性。In this embodiment, the correctness and effectiveness of the improved inception structure proposed by the present invention are verified by training in the data set and performing expression recognition on the driver's face.

采用VGG网络模型、inception V2网络模型、ResNet网络模型和本发明基于面部表情的驾驶员状态检测方法进行面部表情识别，识别结果如表1所示。表1算法识别率结果对比：Using VGG network model, inception V2 network model, ResNet network model and the driver state detection method based on facial expression of the present invention to carry out facial expression recognition, the recognition results are shown in Table 1. Table 1 Algorithm recognition rate results comparison:

表1Table 1

从表1可以看出：本发明可以通过动态的增减改进的inception结构以提高准确率，可以适用于不同的条件，在改进的inception结构数量增加到一定的程度下，准确率可以超过最好的VGG网络模型，并且参数数量少于该模型。因此本发明在驾驶员面部表情识别准确率与实时性方面有较大优势。It can be seen from Table 1 that the present invention can improve the accuracy rate by dynamically increasing or decreasing the improved inception structure, and can be applied to different conditions. When the number of improved inception structures increases to a certain extent, the accuracy rate can exceed the best The VGG network model of , and the number of parameters is less than this model. Therefore, the present invention has greater advantages in the recognition accuracy and real-time performance of the driver's facial expression.

尽管上面对本发明说明性的具体实施方式进行了描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。Although the illustrative specific embodiments of the present invention have been described above, so that those skilled in the art can understand the present invention, it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, As long as various changes are within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations using the concept of the present invention are included in the protection list.

Claims

1. a kind of driver status detection method based on facial expression, which comprises the following steps:

(1), the face-image of driver is obtained

The video flowing that driver is obtained using the camera installed in front of driver, utilizes haar feature+adaboost face Detection algorithm detects human face region, that is, face-image of user in video streaming image；

(2), the face-image of acquisition is pre-processed

Image gray processing processing is carried out to face-image first

Gray=0.3R+0.59G+0.11B

Wherein, Gray is grey scale pixel value, and R is red pixel value, G is green pixel values, B is blue pixel value；

Then Gamm correction is carried out:

I=Gray^γ

Wherein, I is grey scale pixel value after correction；

Finally, being handled using PCA method the image after gray processing processing and Gamma correction again:

It determines that face-image is the matrix X of n row m column, each row of matrix X is first subjected to zero averaging, then finds out matrix X's Covariance matrix, then find out the characteristic value and corresponding feature vector of covariance matrix；By feature vector according to corresponding eigenvalue Size lines up matrix O by row from small to large, and the preceding K row of matrix O is taken to form matrix P, and matrix P is the matrix of K row n column, obtains face Portion image Y, Y=PX are the face-image after dimensionality reduction to K dimension；

(3), driver's human facial expression recognition

3.1), building human facial expression recognition volume is according to neural network

Human facial expression recognition volume according to neural network include sequentially connected four-layer structure, average pond layer, dropout layers with And softmax classifier；

Each layer of structure include convolution kernel size be 3 × 3, the first convolutional layer that step-length is 2, convolution kernel size be 3 × 3, step-length For 1 second and third convolutional layer, pond layer and inception structure；Wherein, the convolution kernel of the pond layer in three first layers structure Size is 3 × 3, step-length 2, and four-layer structure, pond layer convolution kernel size is 3 × 3, step-length 1；

The inception structure is divided into four characteristic pattern treatment channels and a filter including parallel processing Concatenation layers, first characteristic pattern treatment channel uses size to operate for 3 × 3 convolution kernels to input feature vector figure pondization, Then it uses size to carry out convolution operation for 1 × 1 convolution kernel, is finally sent into filter concatenation layers；Second Characteristic pattern treatment channel uses size to carry out convolution operation to input feature vector figure for 1 × 1 convolution kernel, is then fed into filter In concatenation layers；Third characteristic pattern treatment channel uses size to roll up for 1 × 1 convolution kernel to input feature vector figure Product operation, obtained characteristic pattern use size to carry out convolution operation, obtained spy for 3 × 1 convolution kernels, 1 × 3 convolution kernel again respectively Sign figure is all sent into filter concatenation layers；4th characteristic pattern treatment channel uses size for 1 × 1 convolution kernel pair Input feature vector figure carries out convolution operation, and the characteristic pattern that convolution operation obtains uses size to carry out convolution behaviour for 3 × 3 convolution kernels again Make, then respectively use again size for 3 × 1 convolution kernels, 1 × 3 convolution kernel to the characteristic pattern after 3 × 3 convolution kernel convolution operations into Row convolution operation, obtained characteristic pattern are all sent into concatenation layers of filter；Concatenation layers of filter The middle characteristic pattern for obtaining four characteristic pattern treatment channels is attached, the characteristic pattern after being connected；

K after the first convolutional layer input dimensionality reduction of first layer structure ties up face-image, successively by the first, second and third convolutional layer It is sent into after convolution operation in the layer of pond and carries out pondization operation, the characteristic image of Chi Huahou is sent into inception structure and is handled, Characteristic pattern after being connected；Characteristic pattern after the complete obtained connection of first layer pattern handling is sent into second layer structure and carries out first The complete characteristic pattern of the same treatment of layer structure, second layer pattern handling is sent into third layer structure and carries out mutually existing together for first layer structure Reason, the complete characteristic pattern of third layer pattern handling are sent into the same treatment that four-layer structure carries out first layer structure, four-layer structure Characteristic pattern after the connection handled is sent into average pond layer and carries out average pondization operation, and the characteristic pattern of average Chi Huahou exists Dropout layers carry out a certain proportion of discarding, are then fed into classification in softmax classifier device and obtain facial expression；

3.2), training human facial expression recognition volume is according to neural network

The K for the marking facial expression human facial expression recognition for tieing up face-image feeding step 3.1) building is rolled up according to neural network to it It is trained, obtains trained human facial expression recognition volume according to neural network；

Wherein, human facial expression recognition volume is Relu function, optimization algorithm according to the activation primitive chosen in neural network training process For SGD (Stochastic Gradient Descent, stochastic gradient descent), initial method Xavier, learning rate Are as follows:

base_lr(1-iter/max_iter)×0.5

Wherein, base_lr=0.01 is initial learning rate, and iter is the number of current iteration, and max_iter is greatest iteration Number；

3.3), the driver's face-image obtained is sent into trained human facial expression recognition volume after step (1), (2) processing According to neural network, the facial expression of driver is obtained；

(4), result is exported

After identifying driver's facial expression, the driving condition of driver is obtained, is displayed in real time in screen, or can Driver is prompted in time, when the facial expression of the unsuitable driving such as indignation occurs in driver, provides and is not suitable for driving The state sailed is reminded, and is effectively prompted driver in time or alleviates driver using a series of method currently Uncomfortable driving condition.