CN112287855A

CN112287855A - Driving behavior detection method and device based on multitask neural network

Info

Publication number: CN112287855A
Application number: CN202011207383.1A
Authority: CN
Inventors: 周婷; 刘威; 袁淮; 吕晋; 周伟杰
Original assignee: Neusoft Reach Automotive Technology Shenyang Co Ltd
Current assignee: Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2021-01-29
Anticipated expiration: 2040-11-02
Also published as: CN112287855B

Abstract

The invention provides a driving behavior detection method and device based on a multitask neural network, and relates to the technical field of driving state detection, wherein the method comprises the following steps: determining a first neural network based on the convolutional layer of the underlying neural network, the first neural network comprising: a first output network and a second output network; determining a second neural network based on the basic neural network and the Hourglass network, and determining a third neural network based on the first neural network and the second neural network; and then training a third neural network by using the sample information, and finally predicting the sample information based on the trained third neural network, wherein the first output network is used for predicting the key point coordinates of the sample, and the second output network is used for predicting the head posture angle of the sample. The method can solve the problems of long operation time consumption and low prediction accuracy in the existing driving behavior detection technology, thereby realizing the effect of improving the calculation performance.

Description

Driving behavior detection method and device based on multitask neural network

Technical Field

The invention relates to the technical field of driving state detection, in particular to a driving behavior detection method and device based on a multitask neural network.

Background

With the increase of the number of private cars, motor vehicles become a very common trip mode, but in recent years, the accident rate of motor vehicles is high, and particularly, driving behaviors such as fatigue driving or inattention of a driver are gradually an important reason for frequent traffic accidents, so that the detection and supervision of the driving behaviors of the driver are of great significance for effectively reducing the occurrence of the traffic accidents. The driver fatigue monitoring technology generally performs head posture estimation and face key point detection on a driver, so that the attention detection, face verification and the like of the driver are realized.

However, when the existing driving behavior detection technology carries out head posture estimation and face key point detection at the same time, a separate neural network is usually adopted for realization, and the problems of long operation time consumption and low prediction accuracy exist.

Disclosure of Invention

The invention aims to provide a driving behavior detection method and device based on a multitask neural network, so as to solve the technical problems of long operation time consumption and low prediction accuracy in the prior art.

In a first aspect, an embodiment of the present invention provides a driving behavior detection method based on a multitask neural network, where the method includes: determining a first neural network based on a convolutional layer of a base neural network, the first neural network comprising: a first output network and a second output network;

determining a second neural network based on the base neural network and the Hourglass network;

determining a third neural network based on the first neural network and the second neural network;

training the third neural network using sample information; the sample information comprises a head image of a sample, the head image comprising face information of the sample;

predicting the sample information based on the trained third neural network, wherein the first output network is used for predicting the key point coordinates of the sample, and the second output network is used for predicting the head pose angle of the sample.

In some embodiments, the step of determining the first neural network based on the convolutional layer of the underlying neural network comprises:

connecting a specified number of convolutional layers in the basic neural network with the convolutional layers of the first output network;

and connecting the last convolution layer of the basic neural network with the convolution layer of the second output network.

In some embodiments, the first output network comprises a keypoint detection network; the second output network comprises a head pose estimation network.

In some embodiments, the step of determining a second neural network based on the base neural network and the HourGlass network comprises:

taking a plurality of convolution layers in the basic neural network as down-sampling convolution layers of a first Hourglass network to construct a first cascade network;

wherein the first HourGlass network further comprises the same number of upsampled convolutional layers as the downsampled convolutional layers.

In some embodiments, the step of determining a second neural network based on the base neural network and the HourGlass network further comprises:

connecting a plurality of Hourglass networks to construct a second cascade network;

and connecting the second cascade network after the first cascade network to determine a second neural network.

In some embodiments, the step of training the third neural network using sample information comprises:

training a first loss function of the first output network and a second loss function of the second output network using sample information;

training a fusion loss function of the second neural network based on training results of the first loss function and the second loss function.

In some embodiments, the step of training the second neural network using sample information comprises:

training a first loss function of the first output network, a second loss function of the second output network, and a fusion loss function of the second neural network simultaneously using sample information.

In a second aspect, an embodiment of the present invention provides a driving behavior detection apparatus based on a multitasking neural network, including:

a first determination module to determine a first neural network based on a convolutional layer of a base neural network, the first neural network comprising: a first output network and a second output network;

a second determining module for determining a second neural network based on the basis of the basic neural network and the Hourglass network;

a third determination module to determine a third neural network based on the first neural network and the second neural network;

a training module for training the third neural network using sample information; the sample information comprises a head image of a sample, the head image comprising face information of the sample;

and the prediction module is used for predicting the sample information based on the trained third neural network, wherein the first output network is used for predicting the key point coordinates of the sample, and the second output network is used for predicting the head posture angle of the sample.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the steps of the method according to any one of the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing machine executable instructions that, when invoked and executed by a processor, cause the processor to perform the method of any of the first aspects.

The invention provides a driving behavior detection method and a device based on a multitask neural network, wherein the method comprises the following steps: determining a first neural network based on the convolutional layer of the underlying neural network, the first neural network comprising: a first output network and a second output network; determining a second neural network based on the basic neural network and the Hourglass network, and determining a third neural network based on the first neural network and the second neural network; and then training a third neural network by using the sample information, and finally predicting the sample information based on the trained third neural network, wherein the first output network is used for predicting the key point coordinates of the sample, and the second output network is used for predicting the head posture angle of the sample. The method can solve the problems of long operation time consumption and low prediction accuracy in the existing driving behavior detection technology, thereby realizing the effect of improving the calculation performance.

The calculation time is saved, and the detection precision is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flowchart of a driving behavior detection method based on a multitask neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a HourGlass network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a multitasking neural network according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a driving behavior detection apparatus based on a multitask neural network according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

The detection and supervision of the driving behavior of the driver are of great significance to effectively reduce the occurrence of traffic accidents. The driver fatigue monitoring technology generally performs head posture estimation and face key point detection on a driver, so that the attention detection, face verification and the like of the driver are realized. However, when processing head pose estimation and face key point detection, the existing driving behavior detection technology generally adopts two separate networks to respectively estimate the head pose and detect the face key points of the face ROI after a face detection framework, and this method is time-consuming on an embedded system. That is, in the prior art, when head pose estimation and face key point detection are performed simultaneously, a separate neural network is usually required to be used for realization, and the problems of long operation time consumption and low prediction accuracy exist.

Based on the above, the embodiment of the invention provides a driving behavior detection method and device based on a multitask neural network, and the technical problems of long operation time consumption and low prediction accuracy in the prior art can be solved through the method. To facilitate understanding of the present embodiment, first, a driving behavior detection method based on a multitask neural network disclosed in the embodiment of the present invention is described in detail, referring to a schematic flow chart of the driving behavior detection method based on the multitask neural network shown in fig. 1, where the method may be executed by an electronic device, and mainly includes the following steps S110 to S150:

s110, determining a first neural network based on the convolution layer of the basic neural network, wherein the first neural network comprises: a first output network and a second output network;

the basic neural network may be a neural network including convolutional layers, and as a specific example, the basic neural network may be a lightweight network MobilenetV2 or a residual network ResNet.

In some embodiments, the first output network may be a keypoint detection network, which may be used to predict keypoint coordinates of the input image; the second output network may be a head pose estimation network that may be used to predict an angle of a head pose of the input image.

The construction method of the first neural network comprises the following steps: and selecting a specified number of convolutional layers from the basic neural network to be connected with the convolutional layer of the first output network, and connecting the last convolutional layer of the basic neural network with the convolutional layer of the second output network. The first neural network constructed in the way can simultaneously realize two functions of key point detection and head posture estimation, can reduce the operation time and improve the prediction efficiency.

S120, determining a second neural network based on the basic neural network and the Hourglass network;

the HourGlass network generally includes the structure shown in part (a) of fig. 2, and each square in the diagram is a residual block shown in part (B) of fig. 2.

The second neural network is formed by fusing a basic neural network and a Hourglass network. In some embodiments, several convolutional layers may be decimated in the underlying neural network as the downsampled convolutional layers of the first HourGlass network, such as the left downsampled convolutional portion (also referred to as the downsampled convolutional portion) in part (a) of fig. 2. The first HourGlass network also includes the same number of upsampled convolutional layers as the downsampled convolutional layers, e.g., the upsampled convolutional portion on the right in portion (a) of fig. 2. Therefore, the basic model and the Hourglass network are fused, and the first-level networking network is successfully constructed and can be used as a second neural network.

However, in order to obtain a better training effect, a second cascade network can be constructed by connecting a plurality of Hourglass networks, and then the second cascade network is connected to the first cascade network to form a second neural network consisting of a plurality of Hourglass networks. As a specific example, the number of HourGlass networks that build the second level networking may be 3. The prediction result of the second neural network formed by the plurality of Hourglass networks is obviously superior to the prediction results of the basic neural network and the single Hourglass network, and the prediction precision is improved.

S130, determining a third neural network based on the first neural network and the second neural network;

referring to fig. 3, the third neural network includes: the first output network and the second output network are determined by a second neural network formed by fusing the basic neural network and the Hourglass network and a convolution layer based on the basic neural network. The third neural network may thus comprise at least three loss functions, respectively: a fusion loss function of the second neural network, a first loss function of the first output network, and a second loss function of the second output network. The loss function can be added into an optimizer during the training of the neural network, and the training of the corresponding neural network is realized through the training of the loss function and the parameters of the loss function.

S140, training a third neural network by using the sample information;

wherein the sample information comprises a head image of the sample, and the head image comprises face information of the sample.

Because the third neural network includes the second neural network, the first output network, and the second output network, training the third neural network using the sample information includes: the second neural network, the first output network, and the second output network are trained using the sample information, respectively.

As a specific example, the training process may include:

step (A): training a first loss function of the first output network and a second loss function of the second output network using the sample information;

step (B): and training the fusion loss function of the second neural network based on the training results of the first loss function and the second loss function.

Alternatively, the training process may include:

step (C): a first loss function of the first output network, a second loss function of the second output network, and a fusion loss function of the second neural network are trained simultaneously using the sample information.

In the training process, in order to improve the precision of the training parameters, a second neural network formed by the first cascade network and the second-level networking network can be trained, and the training precision of the fusion loss function of the second neural network can be improved.

S150, predicting the sample information based on the trained third neural network;

the first output network is used for predicting the key point coordinates of the sample, and the second output network is used for predicting the head posture angle of the sample.

In order to improve the calculation speed, a second neural network comprising only the first-level networking network and a third neural network consisting of the first output network and the second output network can be selected for prediction during prediction, so that the prediction efficiency can be improved, and the calculation performance can be saved.

The driving behavior detection method based on the multitask neural network comprises the following steps: determining a first neural network based on the convolutional layer of the underlying neural network, the first neural network comprising: a first output network and a second output network; determining a second neural network based on the basic neural network and the Hourglass network, and determining a third neural network based on the first neural network and the second neural network; and then training a third neural network by using the sample information, and finally predicting the sample information based on the trained third neural network, wherein the first output network is used for predicting the key point coordinates of the sample, and the second output network is used for predicting the head posture angle of the sample. The method can solve the problems of long operation time consumption and low prediction accuracy in the existing driving behavior detection technology, thereby realizing the effect of improving the calculation performance.

The embodiment of the present application further provides a driving behavior detection device based on a multitask neural network, and with reference to fig. 4, the device includes:

a first determining module 310 for determining a first neural network based on a convolutional layer of a base neural network, the first neural network comprising: a first output network and a second output network;

a second determining module 320 for determining a second neural network based on the base neural network and the Hourglass network;

a third determining module 330 for determining a third neural network based on the first neural network and the second neural network;

a training module 340 for training the third neural network using the sample information; the sample information comprises a head image of the sample, and the head image comprises face information of the sample;

and the predicting module 350 is configured to predict the sample information based on the trained third neural network, where a first output network is used to predict the key point coordinates of the sample, and a second output network is used to predict the head pose angle of the sample.

In some embodiments, the first determining module is further configured to:

Wherein the first output network comprises a key point detection network; the second output network comprises a head pose estimation network.

In some embodiments, the second determining module is further configured to: taking a plurality of convolution layers in the basic neural network as down-sampling convolution layers of a first Hourglass network to construct a first cascade network; wherein the first HourGlass network further comprises the same number of upsampled convolutional layers as the downsampled convolutional layers.

In some embodiments, the second determining module is further configured to: connecting a plurality of Hourglass networks to construct a second cascade network; and connecting a second-level networking network behind the first cascade network to determine a second neural network.

In some embodiments, the training module is further to: training a first loss function of the first output network and a second loss function of the second output network using the sample information; and training the fusion loss function of the second neural network based on the training results of the first loss function and the second loss function.

In some embodiments, the training module is further to: a first loss function of the first output network, a second loss function of the second output network, and a fusion loss function of the second neural network are trained simultaneously using the sample information.

The driving behavior detection device based on the multitask neural network provided by the embodiment of the application can be specific hardware on equipment or software or firmware installed on the equipment. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. The driving behavior detection device based on the multitask neural network provided by the embodiment of the application has the same technical characteristics as the driving behavior detection method based on the multitask neural network provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.

The embodiment of the application further provides an electronic device, and specifically, the electronic device comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the above described embodiments.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device 400 includes: a processor 40, a memory 41, a bus 42 and a communication interface 43, wherein the processor 40, the communication interface 43 and the memory 41 are connected through the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.

The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

The bus 42 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.

The memory 41 is used for storing a program, the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40, or implemented by the processor 40.

The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 40. The Processor 40 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and the processor 40 reads the information in the memory 41 and completes the steps of the method in combination with the hardware thereof.

Corresponding to the method, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores machine executable instructions, and when the computer executable instructions are called and executed by a processor, the computer executable instructions cause the processor to execute the steps of the method.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be noted that: like reference numbers and letters indicate like items in the figures, and thus once an item is defined in a figure, it need not be further defined or explained in subsequent figures, and moreover, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. A driving behavior detection method based on a multitask neural network is characterized by comprising the following steps:

determining a first neural network based on a convolutional layer of a base neural network, the first neural network comprising: a first output network and a second output network;

2. The multitask neural network-based driving behavior detection method according to claim 1, wherein the step of determining the first neural network based on the convolutional layer of the basic neural network comprises:

3. The multitask neural network-based driving behavior detection method of claim 2, wherein the first output network comprises a keypoint detection network; the second output network comprises a head pose estimation network.

4. The multitask neural network-based driving behavior detection method according to claim 1, wherein the step of determining a second neural network based on the base neural network and the Hourglass network includes:

5. The multitask neural network-based driving behavior detection method of claim 4, wherein the step of determining a second neural network based on the base neural network and the Hourglass network further comprises:

6. The multitask neural network-based driving behavior detection method according to claim 5, wherein the step of training the third neural network using the sample information includes:

7. The multitask neural network-based driving behavior detection method according to claim 5, wherein the step of training the second neural network using the sample information includes:

8. A multitask neural network-based driving behavior detection apparatus, comprising:

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to execute the method of any of claims 1 to 7.