WO2021077840A1

WO2021077840A1 - Gesture control method and apparatus

Info

Publication number: WO2021077840A1
Application number: PCT/CN2020/105593
Authority: WO
Inventors: 曾彬; 肖琴
Original assignee: 上海商汤智能科技有限公司
Priority date: 2019-10-22
Filing date: 2020-07-29
Publication date: 2021-04-29
Also published as: CN110716648A; CN110716648B; KR20210141688A; JP2022520030A; JP7479388B2

Abstract

The embodiments of the present disclosure provide a gesture control method and apparatus. Said method comprises: performing gesture recognition processing on N chronologically consecutive images in a video stream acquired by a camera, to obtain a gesture recognition result sequence, the gesture recognition result sequence comprising recognition results of a plurality of gestures included in the N images; in response to the number of identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, determining the identical gesture recognition results to be target gesture recognition results, both N and M being integers greater than 1, and N being greater than or equal to M; and sending to a target device a control instruction corresponding to the target gesture recognition results, or controlling a target device to execute an operation corresponding to the target gesture recognition results.

Description

Gesture control method and device

Cross-reference to related applications

This application is filed based on a Chinese patent application with an application number of 201911008049.0 and an application date of October 22, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by reference.

Technical field

The present disclosure relates to computer vision technology, and in particular to a gesture control method and device.

Background technique

With the continuous development of product intelligence, electronics, and interconnection, many more and more intelligent human-computer interaction methods have emerged to meet people's needs for individualization and fashion. For example, the touch screen of a smart phone is a human-computer interaction system realized by touch. There are also products that are controlled through voice interaction. For example, users input related instructions through voice, and products perform related operations according to the voice input instructions.

Summary of the invention

The embodiments of the present disclosure provide at least one gesture control method and device.

According to a first aspect of the present disclosure, there is provided a gesture control method, the method includes: performing gesture recognition processing on N frames of images consecutively in a sequence in a video stream collected by a camera to obtain a sequence of gesture recognition results, and the gesture recognition The result sequence includes the recognition results of multiple gestures included in the N frames of images; in response to the number of the same gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, it is determined that the same gesture recognition result is a target Gesture recognition result, where N and M are integers greater than 1, and N is greater than or equal to M; send a control instruction corresponding to the target gesture recognition result to the target device, or control the target device to perform the target gesture recognition The operation corresponding to the result.

According to a second aspect of the present disclosure, there is provided a gesture control device, the device comprising: a recognition processing module, configured to perform gesture recognition processing on N frames of images in a video stream collected by a camera in sequence, respectively, to obtain a gesture recognition result Sequence, the gesture recognition result sequence includes the recognition results of multiple gestures included in the N frames of images; the gesture determination module is configured to respond to the number of the same gesture recognition results included in the gesture recognition result sequence being greater than or Equal to M, it is determined that the same gesture recognition result is the target gesture recognition result, where N and M are both integers greater than 1, and N is greater than or equal to M; the operation control module is used to send the target gesture recognition result to the target device The control instruction corresponding to the result, or the target device is controlled to perform an operation corresponding to the target gesture recognition result.

According to a third aspect of the present disclosure, there is provided an electronic device that includes a processor and a memory on which a computer program is stored. The computer program can be executed by the processor to implement the method according to the first aspect of the present disclosure. Gesture control method.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the gesture control method according to the first aspect of the present disclosure.

According to the gesture control method and device provided by the embodiments of the present disclosure, when it is determined that a preset number of identical gesture recognition results are obtained, the gesture is confirmed to be valid, and the valid gesture is determined as the target gesture recognition result, which can be determined to a certain extent. To avoid false triggering of gestures and improve the accuracy of gesture recognition. For example, if the user accidentally makes a certain gesture, as long as the gesture corresponding to the same gesture recognition result does not reach the preset number, the gesture will not be recognized as a valid target gesture recognition result, so that the target device will not Respond to the gesture to reduce false triggers.

Description of the drawings

Fig. 1 shows a flow chart of a gesture control method according to at least one embodiment of the present disclosure;

Fig. 1a shows a schematic diagram of a static gesture according to at least one embodiment of the present disclosure;

Fig. 1b shows a schematic diagram of a dynamic gesture according to at least one embodiment of the present disclosure;

Fig. 2 shows a flowchart of another gesture control method according to at least one embodiment of the present disclosure;

Fig. 3 shows a flowchart of another gesture control method according to at least one embodiment of the present disclosure;

Fig. 4 shows a schematic diagram of a functional interface of a music player according to at least one embodiment of the present disclosure;

Fig. 5 shows a block diagram of a gesture control device according to at least one embodiment of the present disclosure;

Fig. 6 shows a block diagram of another gesture control device according to at least one embodiment of the present disclosure;

Fig. 7 shows a block diagram of an electronic device according to at least one embodiment of the present disclosure.

Detailed ways

The embodiments of the present disclosure provide a gesture control method to control a device through gesture interaction.

FIG. 1 shows a flow chart of a gesture control method according to at least one embodiment of the present disclosure. The method may be executed by a gesture control device, and the method may include step 100 to step 104.

In step 100, the sequence of consecutive N frames of images in the video stream collected by the camera are respectively subjected to gesture recognition processing to obtain a sequence of gesture recognition results.

When the user wants to control a device, for example to enable a certain function in the device, a certain gesture can be made. The device may be referred to as a target device, and controlling the target device may be controlling a functional component in the target device, and the functional component may be a hardware or software module. In an example, the target device includes but is not limited to a vehicle, and controlling the target device may include, but is not limited to, controlling one or more functional components such as a media player, an air conditioner controller, and a window controller provided in the vehicle. control. It is understandable that the target device may also include other application devices such as mobile phones, TVs, air conditioners, stereos, and smart homes.

In this step, the camera can be used to collect the video stream of the user's gestures. For example, the camera on the target device can be used to collect it. The video stream includes N frames of sequential sequential images collected by the camera, and the gestures in the images are gestures made when the user wants to control the operation of the functional components in the target device. N is an integer greater than 1.

By performing gesture recognition processing on the N frames of images in the aforementioned video stream, respectively, a gesture recognition result sequence can be obtained, and the gesture recognition result sequence includes recognition results of multiple gestures.

The gestures made by the user can be static gestures or dynamic gestures. Figures 1a and 1b illustrate some gestures, but it is understandable that the actual implementation is not limited to these gestures. Illustratively, Fig. 1a shows a series of static gestures: OK gesture, V gesture, like gesture, palm gesture, index finger gesture, and fist gesture. Illustratively, Fig. 1b shows a series of dynamic gestures: fist-palm change (fist-to-palm, palm-to-fist), palm translation (up, down, left, and right), index finger rotation (clockwise, counterclockwise).

For example, the recognition result of the multiple gestures included in the gesture recognition result sequence may be a static gesture: for example, the gesture recognized in the image is a V gesture, or the gesture recognized in the image is an OK gesture.

For another example, by performing gesture recognition processing on N frames of images, the obtained gesture recognition result sequence may also include multiple dynamic gestures, for example, multiple “palm translation” gestures are recognized.

For another example, the gesture recognition result sequence may also be a combination of static gestures and dynamic gestures. For example, the gesture recognition result sequence includes an OK gesture and a palm translation gesture.

The gesture recognition in this step, for example, can be performed by a pre-trained gesture recognition neural network, and the image collected by the camera is input into the neural network, and the gesture recognition result corresponding to the image can be obtained.

In step 102, in response to the number of identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, it is determined that the same gesture recognition result is a target gesture recognition result.

In this step, it can be set to confirm that the gesture is valid only when it is determined that a preset number of identical gesture recognition results are obtained, and the effective gesture is called the target gesture recognition result. The preset number can be set to M, M is also an integer greater than 1, and N is greater than or equal to M.

For example, if five consecutive V gestures are recognized in consecutive N frames of images, the recognized "V gesture" is confirmed as the target gesture recognition result. For another example, if five consecutive "palm translation gestures" are recognized in consecutive N frames of images, then this "palm translation gesture" is the target gesture recognition result, where each palm translation gesture can be represented by multiple frames of images Okay.

If the number of consecutive gestures recognized does not reach the preset number, these images are discarded and re-recognized. For example, if there are three V gestures recognized in consecutive N frames of images, and the preset number "five" is not reached, then the three V gestures are discarded and gesture recognition is performed on the consecutive N frames of images. .

When the target gesture recognition result is determined, step 104 is continued. Otherwise, if the target gesture recognition result is not determined, then continue to perform step 100 to step 102.

In step 104, a control instruction corresponding to the target gesture recognition result is sent to the target device, or the target device is controlled to perform an operation corresponding to the target gesture recognition result.

In this step, the corresponding target device can be controlled according to the target gesture recognition result determined above. Specifically, it can control a functional component in the target device. For example, if the functional component is a media player, such as a volume control module for playing music in a vehicle, the volume can be increased or decreased according to the target gesture recognition result. In actual implementation, a control instruction corresponding to the target gesture recognition result can be sent to the target device, and the target device can operate according to the instruction; or, the gesture control apparatus of this embodiment can also control the target device to execute and execute the target device according to the instruction. The operation corresponding to the target gesture recognition result.

According to the gesture control method of this embodiment, when it is determined that a preset number of identical gesture recognition results are obtained, the gesture is confirmed to be valid, and the valid gesture is determined as the target gesture recognition result, which can avoid gesture errors to a certain extent. Trigger to improve the accuracy of gesture recognition. For example, if the user accidentally makes a certain gesture, as long as the gesture corresponding to the same gesture recognition result does not reach the preset number, the gesture will not be recognized as a valid target gesture recognition result, so that the target device will not Respond to the gesture to reduce false triggers.

FIG. 2 shows another gesture control method according to at least one embodiment of the present disclosure. The method may include step 200 to step 208, wherein the same steps as those in FIG. 1 will not be described in detail.

In step 200, a multi-frame image collected by a camera is received, and the gesture in the image is a gesture made when the user wants to control the function component in the target device to run.

The multi-frame images may be N frames of images continuously in sequence included in the video stream collected by the camera.

In step 202, gesture recognition processing is performed on the multiple frames of images to obtain a gesture recognition result sequence.

For example, the image collected by the camera has multiple frames, and multiple gestures can be recognized according to the multiple frames of images, and these multiple gestures can form a sequence of gesture recognition results. For example, the gesture recognition result sequence may include "V, V, V, V, V, V, fist, V, V".

In the above sequence of gesture recognition results, multiple "Vs" can be referred to as multiple identical gesture recognition results, and "fists" can be referred to as differential gesture recognition results, which are different gesture recognition results from the same gesture recognition result . In other examples, the number of different gesture recognition results may also be multiple.

In step 204, in response to the plurality of identical gesture recognition results included in the gesture recognition result sequence, a difference gesture recognition result is included among the plurality of same gesture recognition results, and the difference gesture recognition result accounts for the number of the gesture recognition result sequence If the value is lower than the preset value, the difference gesture recognition result is smoothed; wherein, the difference gesture recognition result is different from the same gesture recognition result. In other words, in response to the gesture recognition result sequence including at least one frame of image, a difference gesture recognition result that is different from the same gesture recognition result, and in the gesture recognition result sequence, the gesture before the difference gesture recognition result The recognition result and the gesture recognition result after the difference gesture recognition result are both the same gesture recognition result, and the proportion of the difference gesture recognition result in the gesture recognition result sequence is lower than the preset value. The difference gesture recognition result is smoothed. Wherein, determining that the number of consecutive identical gesture recognition results included in the gesture recognition result sequence is greater than or equal to M includes: determining the number of consecutive identical gesture recognition results included in the gesture recognition result sequence after the smoothing process The quantity is greater than or equal to M.

For example, in the gesture recognition result sequence of the above example "V, V, V, V, V, V, fist, V, V", "fist" is the difference gesture recognition result, and six V gestures are recognized before the fist gesture. Two V gestures are recognized after the fist gesture, that is, the gesture recognition result before the difference gesture recognition result and the gesture recognition result after the difference gesture recognition result are both the same gesture recognition result, that is, the V gesture. And the proportion of the number of differential gesture recognition results in the sequence of gesture recognition results is lower than a preset value, for example, the ratio of the "number of differential gestures" to the total number of gesture recognition result sequences is lower than the preset value (for example, 15 %), the difference gesture recognition result is smoothed. The actual implementation is not limited to this judgment method, and it is only an example here.

After confirming the smoothing process for the difference gesture recognition result, the smoothing process includes but not limited to any of the following:

For example, the difference gesture recognition result may be corrected to the same gesture recognition result, for example, a fist gesture may be corrected to a V gesture. The aforementioned gesture recognition result sequence "V, V, V, V, V, V, fist, V, V" is modified to "V, V, V, V, V, V, V, V, V".

For another example, the difference gesture recognition result can also be removed from the gesture recognition result sequence. For example, the above sequence "V, V, V, V, V, V, fist, V, V" can be modified to "V, V, V" , V, V, V, V, V".

For another example, the gesture recognition result whose time sequence is before the difference gesture recognition result and the gesture recognition result whose time sequence is after the difference gesture recognition result can also be used as consecutive multiple gesture recognition results. That is, the gesture recognition result sequence "V, V, V, V, V, V, fist, V, V" is considered to recognize eight consecutive V gestures, and fist gestures are ignored.

In step 206, for the smoothed gesture recognition result sequence, if it is recognized that the gesture recognition result sequence includes a continuous preset number of identical gesture recognition results, it is confirmed that the target gesture recognition result is recognized.

For example, in this embodiment, it may be set that if consecutive M identical gesture recognition results are recognized, it is confirmed that the same gesture recognition result is the target gesture recognition result, and the target gesture recognition result is valid. For example, if 8 consecutive V gestures are recognized, it is confirmed that the V gesture is the target gesture recognition result.

In step 208, a control instruction corresponding to the target gesture recognition result is sent to the target device, or the target device is controlled to perform an operation corresponding to the target gesture recognition result.

According to the gesture control method of this embodiment, when a preset number of identical gestures are recognized, the gesture is confirmed to be effective, which improves the accuracy of gesture recognition; and by smoothing the difference gesture recognition results, it is also possible to increase gestures. The sensitivity of recognition improves the response speed of gesture recognition.

For example, suppose that the actual gestures made by the user have reached a preset number of V gestures, such as ten V gestures, but due to misrecognition, nine V gestures and two fist gestures are recognized. Smoothing processing requires abandoning the re-recognition of the nine V gestures, and therefore cannot respond to the user's gestures in time; according to the method of this embodiment, the recognition results of the above two fist gestures can be corrected to the correct V gesture recognition results, thereby quickly identifying To effective V gestures, quickly respond to user gestures.

FIG. 3 shows a flowchart of another gesture control method according to at least one embodiment of the present disclosure. The method includes step 300 to step 306.

In step 300, a single frame of the camera acquisition image in the video stream is acquired, the camera acquisition image is an image corresponding to the camera shooting field of view space, and the camera shooting field of view space includes an effective space area for gesture control.

In this embodiment, the camera is fixed at a certain position of the vehicle, the camera has a corresponding camera shooting field of view space when collecting images, and the image collected by the camera is also an image in this space. Among them, the field of view space includes the effective space area of gesture control. For example, only when the driver makes a gesture in a certain space area in front of the vehicle's central control panel, the control based on the gesture will be triggered. If the driver is outside the effective space area Gestures in the area will not trigger gesture control. The image collected by the camera includes the image corresponding to the effective space area of the aforementioned gesture control.

In step 302, from the image collected by the camera, a partial image area corresponding to the effective space area of the gesture control is selected.

In this step, the image collected by the camera can be cropped, and the partial image area in the image collected by the camera can be cropped, and the shooting field of view space corresponding to the partial image area is an effective space area for gesture control. For example, the camera may capture a large area of space and capture the entire interior scene in the vehicle. The partial image area selected in this step is the part of the area in front of the vehicle's central control panel included in the image collected by the camera. This part of the area is the effective space area for gesture control. Only when the driver makes gestures in the effective space area Trigger the response of gesture control.

In step 304, gesture recognition processing is performed on the partial image area to obtain a gesture recognition result.

In some embodiments, when performing gesture recognition on N frames of images in a video stream, a partial image area may be selected from each frame of image, and gesture recognition processing may be performed on the partial image area. The above-mentioned image is the image collected by the camera.

In step 306, the target device is controlled according to the gesture recognition result.

For example, for N frames of images collected by a camera, a gesture recognition result sequence is obtained by recognition. If there are a preset number of M identical gesture recognition results in the gesture recognition result sequence, or there are consecutive M identical gesture recognition results, it is confirmed that the same gesture recognition result is a target gesture recognition result. The target device is controlled according to the control instruction corresponding to the target gesture recognition result.

According to the gesture control method of this embodiment, when it is determined that a preset number of identical gesture recognition results are obtained, device control is performed according to the gesture, which can prevent false triggering. In addition, by recognizing the gestures in the partial image area in the image, it is possible to avoid the interference of the images of other areas outside the partial image area to a certain extent, making the recognition of gestures more accurate, and only perform gesture recognition processing on the partial image area Compared with the recognition processing of all the images, the processing speed will be faster.

In another embodiment, some parameters in the gesture control function can be adjusted in a visual manner. For example, the gesture recognition parameters used for gesture recognition can be visually displayed on the visual interface, and the user adjusts the gesture recognition parameters in the parameter adjustment visual interface in the manner of a progress bar. For example, the gesture recognition parameter may include: M in the "M same gesture recognition results recognized" mentioned above. For example, you can adjust the recognition of 10 identical V gestures to confirm that the V gesture is recognized; you can also set to recognize 8 identical V gestures to confirm that the V gesture is recognized. After the user adjusts the gesture recognition parameter, the system can perform gesture recognition processing according to the gesture recognition parameter. It is more convenient to adjust the gesture recognition parameters through a visual interface.

In addition, different gesture recognition parameters can be set for different gestures. Taking the above M as an example, the M corresponding to different gestures can be different. For example, if 10 identical V gestures are recognized, it is confirmed that the V gesture is recognized; if 6 OK gestures are recognized, it is confirmed that the OK gesture is recognized. That is, the M corresponding to the V gesture is 10, and the M corresponding to the OK gesture is 6.

The gesture recognition parameters may also include, for example, the number of different gestures in the sequence, the number of the same gestures before the different gestures, and so on. These parameters can also be adjusted and set in the form of a progress bar through the above-mentioned visual interface. In addition, each parameter can be adjusted independently. For example, taking M corresponding to different gestures as an example, in the above example, the M corresponding to the V gesture is 10, and the M corresponding to the OK gesture is 6. The M corresponding to different gestures such as OK gesture can be adjusted separately.

The following uses the application of gesture control in a vehicle as an example to describe the gesture control method of the present disclosure, but it is understandable that the gesture control method is not limited to being applied to vehicles, and can also be applied to other devices, such as mobile phones. , Smart home system, etc.

In the vehicle, the driver can adjust the vehicle accessories such as windows, light brightness, air-conditioning temperature, etc. through gestures; it can also control the vehicle entertainment components in the vehicle, such as controlling music playback, such as switching songs ,volume adjustment. You can also use gestures to control the game, and so on. For example, FIG. 4 shows a schematic diagram of a functional interface of a music player according to at least one embodiment of the present disclosure. The user can click to turn on the music player. In an illustrative example, when the user clicks on the interface of the player When the gesture control area 41 (ie, the area at the bottom of the player) is displayed, it indicates that the gesture control for music playback-related functions is enabled; if the user taps the gesture control area 41 again, the gesture control for music playback-related functions is cancelled.

The interface shown in Figure 4 is the functional interface of the music player. The user can use the camera to collect images by making a variety of gestures, and the gesture control device controls the music playing function of the music player according to the received images. In addition, in the interface shown in FIG. 4, the music player can be controlled in response to the gesture recognition result of the image. For example, the volume of music playback can be increased in response to the gesture recognition result of the image; for another example, the window controller can also be controlled to move the window glass in response to the gesture recognition result of the image. For another example, not only can the volume of music playback be increased in response to the result of the gesture recognition of the image, but also the change state of the related control functions of the music player caused by the change of the image can be synchronously displayed.

Please continue to refer to Figure 4, the gesture control area 41 lights up the icon of multiple gestures, indicating that multiple gestures are supported in the music playback scene. For example, related gestures and corresponding music playback functions can be seen in Table 1. As shown, the gestures include:

Table 1 Gestures and corresponding control functions

手势gesture	控制功能control function
OKOK	播放Play
竖大拇指Thumbs up	点赞like
食指顺时针旋转Index finger rotates clockwise	增大音量Increase volume
食指逆时针旋转Index finger rotates counterclockwise	降低音量lower the volume
手掌向右平移Pan to the right	下一首next song
手掌向左平移Pan to the left	上一首Previous song
拳头fist	暂停time out

For the recognition of each gesture in Table 1 above, the target gesture recognition result can be determined according to the following predetermined rules: if there are a preset number of identical gesture recognition results in the gesture recognition result sequence, confirm that the same gesture recognition result is the target gesture Recognition results.

Exemplarily, after turning on the gesture control for the music playing related functions, the user can make an OK gesture, and the music player starts to play music. In addition, the running start of the music playing function can be displayed synchronously in the function status interface of Figure 4; similarly, when the user makes a fist gesture, the music playback is paused, and the running of the music playing function can also be displayed synchronously in the function status interface. Stop.

For example, the user makes a gesture of rotating the index finger, and at this time, after the index finger rotating gesture is recognized, the gesture control device may first determine whether the "OK" gesture has been recognized. If "OK" has not been recognized before, no response will be made; if "OK" has been recognized before, the volume of the music player can be adjusted according to the component control information corresponding to the index finger rotation gesture. For example, if the gesture is "index finger rotate clockwise", you can control the music player to increase the volume of music playback. At the same time, in the function status interface of FIG. 4, the volume adjustment display module 42 can also be used to synchronously display the volume increase signal as the index finger rotates clockwise.

For another example, the user makes a gesture of shifting the palm to the right, and at this time, after recognizing the index finger rotation gesture, the gesture control device may first determine whether the "OK" gesture has been recognized. If "OK" has not been recognized before, no response will be made; if "OK" has been recognized before, you can adjust the music player to switch to the next song according to the gesture of panning the palm to the right. At the same time, in the function status interface of FIG. 4, the song display module 43 can also be used to synchronously display the song cutting effect as the palm moves to the right.

In addition, users can also use gestures to control like songs. For example, if the user can give a thumbs up, in response to the gesture, the gesture control device can control the music player to display the like mark of a certain song in the function status interface shown in FIG. 4. For example, the like mark 44 in FIG. 4 is illuminated. It can also be judged in advance whether the "OK" gesture has been recognized before the like.

Gesture control for other functions will not be described in detail.

FIG. 5 shows a block diagram of a gesture control device according to at least one embodiment of the present disclosure. As shown in FIG. 5, the device may include: a recognition processing module 500, a gesture determination module 502, and an operation control module 504.

The recognition processing module 500 may perform gesture recognition processing on the sequential sequential N frames of images in the video stream collected by the camera, respectively, to obtain a sequence of gesture recognition results. The gesture recognition result sequence includes recognition results of multiple gestures included in the N frames of images.

The gesture determination module 502 may determine that the same gesture recognition result is a target gesture recognition result in response to the number of identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, where N and M are both integers greater than 1. , And N is greater than or equal to M.

The operation control module 504 may send a control instruction corresponding to the target gesture recognition result to the target device, or control the target device to perform an operation corresponding to the target gesture recognition result.

According to the gesture control device of this embodiment, the recognition processing module and the gesture determination module confirm that the gesture is valid only when the preset number of identical gesture recognition results are obtained, and the valid gesture is determined as the target gesture recognition result. To a certain extent, avoid false triggering of gestures and improve the accuracy of gesture recognition. For example, if the user accidentally makes a certain gesture, as long as the gesture corresponding to the same gesture recognition result does not reach the preset number, the gesture will not be recognized as a valid target gesture recognition result, so that the target device will not Respond to the gesture to reduce false triggers.

In one embodiment, the gesture determination module 502 may determine that the same gesture recognition result is a target gesture recognition result in response to the number of consecutive identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M.

In one embodiment, the gesture determination module 502 may also respond to that there is a difference gesture recognition result among multiple identical gesture recognition results included in the gesture recognition result sequence, and the difference gesture recognition result is in the The proportion of the number in the gesture recognition result sequence is lower than the preset value, and the difference gesture recognition result is smoothed; wherein the difference gesture recognition result is different from the same gesture recognition result. In other words, in response to the gesture recognition result sequence including at least one frame of image, a difference gesture recognition result that is different from the same gesture recognition result, and in the gesture recognition result sequence, the gesture before the difference gesture recognition result The recognition result and the gesture recognition result after the difference gesture recognition result are both the same gesture recognition result, and the proportion of the difference gesture recognition result in the gesture recognition result sequence is lower than the preset value. The difference gesture recognition result is smoothed. Wherein, determining that the number of consecutive identical gesture recognition results included in the gesture recognition result sequence is greater than or equal to M includes: determining the number of consecutive identical gesture recognition results included in the gesture recognition result sequence after the smoothing process The quantity is greater than or equal to M.

In one embodiment, when the gesture determination module 502 smoothes the difference gesture recognition result, it may correct the difference gesture recognition result to the same gesture recognition result, or convert the difference gesture recognition result Removed from the sequence of gesture recognition results.

In one embodiment, when the gesture determination module 502 smoothes the difference gesture recognition result, it may combine the gesture recognition result whose time sequence is before the difference gesture recognition result with the time sequence that is located in the difference gesture recognition result. The subsequent gesture recognition result is used as a continuous multiple gesture recognition result.

In an embodiment, the recognition processing module 500 may perform the following operations when performing gesture recognition processing on N frames of consecutive sequential images in the video stream collected by the camera, and may perform the following operations: acquiring a single frame of the camera-collected image in the video stream, The image captured by the camera is an image corresponding to the field of view space captured by the camera, and the field of view space captured by the camera includes an effective space area for gesture control; from the image captured by the camera, select the image corresponding to the effective space area for gesture control Partial image area; performing the gesture recognition processing on the partial image area.

In an embodiment, as shown in FIG. 6, the device may further include a parameter receiving module 600.

The parameter receiving module 600 may receive the gesture recognition parameters configured by the user through the visual interface for parameter adjustment, so that the recognition processing module 500 executes the gesture recognition processing according to the gesture recognition parameters.

In one embodiment, the target device includes a vehicle, and the operation control module 504 may send a control instruction corresponding to the target gesture recognition result to the functional component in the vehicle, or control the functional component in the vehicle to execute and The operation corresponding to the target gesture recognition result.

In one embodiment, the functional component includes a media player and/or a window controller, and the operation control module 504 is used to control the functional component in the vehicle to execute the corresponding target gesture recognition result. During operation, the media player may be controlled to change the media playing state in response to the target gesture recognition result, or the vehicle window controller may be controlled to move the window glass in response to the target gesture recognition result.

In one embodiment, the operation control module 504 may display at least one of the following items on the function status interface corresponding to the function component to be controlled by the image in response to the target gesture recognition result: Running start or running stop state; volume change; like mark of target object.

FIG. 7 shows a block diagram of an electronic device according to at least one embodiment of the present disclosure. The electronic device includes a memory 71 and a processor 72. The memory 71 stores a computer program, and when the computer program is executed by the processor 72, the gesture control method according to any embodiment of the present disclosure is implemented.

At least one embodiment of the present disclosure further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the gesture control method according to any embodiment of the present disclosure is implemented.

Those skilled in the art should understand that one or more embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of the present disclosure may adopt computer programs implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.

The embodiments of the present disclosure also provide a computer-readable storage medium, and the storage medium may store a computer program. When the program is executed by a processor, the training of the neural network for gesture recognition described in any of the embodiments of the present disclosure is realized. The steps of the method, and/or, the steps of the gesture recognition method described in any embodiment of the present disclosure are implemented. Wherein, the "and/or" means having at least one of the two, for example, "A and/or B" includes three schemes: A, B, and "A and B".

The various embodiments in the present disclosure are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

The specific embodiments of the present disclosure have been described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The embodiments of the subject and functional operations described in the present disclosure can be implemented in the following: digital electronic circuits, tangible computer software or firmware, computer hardware including the structures disclosed in the present disclosure and structural equivalents thereof, or among them A combination of one or more. Embodiments of the subject matter described in the present disclosure may be implemented as one or more computer programs, that is, one or one of the computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Multiple modules. Alternatively or in addition, the program instructions may be encoded on artificially generated propagated signals, such as machine-generated electrical, optical or electromagnetic signals, which are generated to encode information and transmit it to a suitable receiver device for data transmission. The processing device executes. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processing and logic flow described in the present disclosure can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output. The processing and logic flow can also be executed by a dedicated logic circuit, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.

Computers suitable for executing computer programs include, for example, general-purpose and/or special-purpose microprocessors, or any other type of central processing unit. Generally, the central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, the computer will also include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer will be operatively coupled to this mass storage device to receive data from or send data to it. It transmits data, or both. However, the computer does not have to have such equipment. In addition, the computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or, for example, a universal serial bus (USB ) Flash drives are portable storage devices, just to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or Removable disks), magneto-optical disks, CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by or incorporated into a dedicated logic circuit.

Although the present disclosure contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or the scope of protection, but are mainly used to describe the features of specific embodiments of the specific disclosure. Certain features described in multiple embodiments within the present disclosure can also be implemented in combination in a single embodiment. On the other hand, various features described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. In addition, although features can function in certain combinations as described above and even initially claimed as such, one or more features from the claimed combination can in some cases be removed from the combination, and the claimed The combination of protection can be directed to a sub-combination or a variant of the sub-combination.

Similarly, although operations are depicted in a specific order in the drawings, this should not be construed as requiring these operations to be performed in the specific order shown or sequentially, or requiring all illustrated operations to be performed to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. In addition, the separation of various system modules and components in the above embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can usually be integrated together in a single software product. In, or packaged into multiple software products.

The above are only some embodiments of the present disclosure, and are not used to limit the present disclosure. Any modification, equivalent replacement, transformation, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

A gesture control method, the method includes:

Gesture recognition processing is performed on the sequential sequential N frames of images in the video stream collected by the camera to obtain a gesture recognition result sequence, the gesture recognition result sequence includes recognition results of multiple gestures included in the N frames of images;

In response to the number of identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, it is determined that the same gesture recognition result is a target gesture recognition result, where both N and M are integers greater than 1, and N is greater than or Equal to M;

Send a control instruction corresponding to the target gesture recognition result to the target device, or control the target device to perform an operation corresponding to the target gesture recognition result.
The method according to claim 1, wherein in response to the number of identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, determining that the same gesture recognition result is a target gesture recognition result comprises:

In response to the number of consecutive identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, it is determined that the same gesture recognition result is a target gesture recognition result.
The method according to claim 1 or 2, wherein in response to the number of identical gesture recognition results included in the sequence of gesture recognition results being greater than or equal to M, before determining that the same gesture recognition result is a target gesture recognition result, The method also includes:

In response to the plurality of the same gesture recognition results included in the gesture recognition result sequence including a different gesture recognition result, and the proportion of the difference gesture recognition result in the gesture recognition result sequence is lower than expected Set a value to smooth the difference gesture recognition result; wherein the difference gesture recognition result is different from the same gesture recognition result.
The method according to claim 3, wherein the smoothing of the difference gesture recognition result comprises:

Correct the difference gesture recognition result to the same gesture recognition result; or,

The difference gesture recognition result is removed from the gesture recognition result sequence.
The method according to claim 3, wherein the smoothing of the difference gesture recognition result comprises:

The gesture recognition result whose time sequence is before the difference gesture recognition result and the gesture recognition result whose time sequence is after the difference gesture recognition result are used as consecutive multiple gesture recognition results.
The method according to any one of claims 1 to 5, wherein said performing gesture recognition processing on N frames of images consecutive in time series in the video stream collected by the camera respectively comprises:

Acquiring a single frame of a camera-captured image in the video stream, where the camera-captured image is an image corresponding to a camera-captured field of view space, and the camera-captured field of view space includes an effective space area for gesture control;

Selecting a partial image area corresponding to the effective space area controlled by the gesture from the image collected by the camera;

Perform the gesture recognition processing on the partial image area.
The method according to any one of claims 1 to 6, the method further comprising:

Receive the gesture recognition parameters configured by the user through the visual interface for parameter adjustment,

Wherein, the gesture recognition processing is executed according to the gesture recognition parameters.
The method according to claim 7, wherein:

The gesture recognition parameter includes M.
The method according to any one of claims 1 to 8, wherein the target device comprises a vehicle, and the control instruction corresponding to the target gesture recognition result is sent to the target device, or the target device is controlled to execute the The operations corresponding to the target gesture recognition result include:

Send a control instruction corresponding to the target gesture recognition result to the functional component in the vehicle; or control the function component in the vehicle to perform an operation corresponding to the target gesture recognition result.
The method according to claim 9, wherein:

The functional component includes a media player;

The controlling the functional components in the vehicle to perform an operation corresponding to the target gesture recognition result includes: in response to the target gesture recognition result, controlling the media player to change the media playing state.
The method according to claim 9, wherein:

The functional component includes a car window controller;

The controlling the functional components in the vehicle to perform an operation corresponding to the target gesture recognition result includes: in response to the target gesture recognition result, controlling the window controller to move the window glass.
The method according to claim 9, wherein:

The controlling the functional components in the vehicle to perform an operation corresponding to the target gesture recognition result includes:

In response to the target gesture recognition result, at least one of the following items is displayed on the function status interface corresponding to the function component to be controlled by the image: the running start or running stop state of the function component, the volume change, Or the like mark of the target object.
A gesture control device, the device includes:

The recognition processing module is used to perform gesture recognition processing on the sequential consecutive N frames of images in the video stream collected by the camera to obtain a gesture recognition result sequence. The gesture recognition result sequence includes a plurality of images included in the N frames of images. Recognition results of gestures;

The gesture determination module is configured to determine that the same gesture recognition result is a target gesture recognition result in response to the number of identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, where N and M are both greater than 1. An integer, and N is greater than or equal to M;

The operation control module is configured to send a control instruction corresponding to the target gesture recognition result to the target device, or to control the target device to perform an operation corresponding to the target gesture recognition result.
The device according to claim 13, wherein:

The gesture determination module is configured to: in response to the number of consecutive identical gesture recognition results included in the gesture recognition result sequence being greater than or equal to M, determine that the same gesture recognition result is a target gesture recognition result.
The device according to claim 13 or 14, wherein:

The gesture determination module is configured to respond to a difference gesture recognition result among the plurality of the same gesture recognition results included in the gesture recognition result sequence, and the difference gesture recognition result is in the gesture recognition result sequence If the proportion of the number in is lower than the preset value, the difference gesture recognition result is smoothed; wherein, the difference gesture recognition result is different from the target gesture recognition result.
The device according to claim 15, wherein the gesture determination module is configured to smooth the difference gesture recognition result through the following operations:

Correct the difference gesture recognition result to the same gesture recognition result; or,

The difference gesture recognition result is removed from the gesture recognition result sequence.
The device according to claim 15, wherein the gesture determination module is configured to smooth the difference gesture recognition result through the following operations:

The gesture recognition result whose time sequence is before the difference gesture recognition result and the gesture recognition result whose time sequence is after the difference gesture recognition result are used as consecutive multiple gesture recognition results.
The device according to any one of claims 13-17, wherein the identification processing module is configured to:

When performing gesture recognition processing on the consecutive N frames of images in the video stream collected by the camera, acquiring a single frame of the camera collection image in the video stream, and the camera collection image is an image corresponding to the camera's shooting field of view space, The effective space area for gesture control is included in the field of view space captured by the camera;

Selecting a partial image area corresponding to the effective space area controlled by the gesture from the image collected by the camera;

Perform the gesture recognition processing on the partial image area.
The device according to any one of claims 13-18, wherein the device further comprises:

The parameter receiving module is configured to receive gesture recognition parameters configured by the user through the visual interface for parameter adjustment, so that the recognition processing module executes the gesture recognition processing according to the gesture recognition parameters.
The device according to any one of claims 13-19, wherein:

The target device includes a vehicle, and the operation control module is configured to: send a control instruction corresponding to the target gesture recognition result to a functional component in the vehicle; or, control the functional component in the vehicle to execute the The operation corresponding to the target gesture recognition result.
The device of claim 20, wherein:

The functional component includes a media player;

The operation control module is configured to control the media player to change the media playing state in response to the target gesture recognition result when controlling the functional components in the vehicle to perform the operation corresponding to the target gesture recognition result.
The device of claim 20, wherein:

The functional component includes a car window controller;

The operation control module is used to control the vehicle window controller to move the window glass in response to the target gesture recognition result when controlling the functional components in the vehicle to perform the operation corresponding to the target gesture recognition result .
The device of claim 20, wherein:

The operation control module is further configured to: when controlling the functional components in the vehicle to perform an operation corresponding to the target gesture recognition result, in response to the target gesture recognition result, the function component to be controlled through the image At least one of the following items is displayed on the corresponding function status interface: the running start or running stop status of the function component, the change of the volume, or the like mark of the target object.
An electronic device, the device includes:

Processor; and

The memory has a computer program stored thereon, and the computer program can be executed by the processor to implement the gesture control method according to any one of claims 1-12.
A computer-readable storage medium having a computer program stored thereon, and the computer program can be executed by a processor to implement the gesture control method according to any one of claims 1-12.