Nothing Special   »   [go: up one dir, main page]

CN111986700B - Method, device, equipment and storage medium for triggering contactless operation - Google Patents

Method, device, equipment and storage medium for triggering contactless operation Download PDF

Info

Publication number
CN111986700B
CN111986700B CN202010886923.7A CN202010886923A CN111986700B CN 111986700 B CN111986700 B CN 111986700B CN 202010886923 A CN202010886923 A CN 202010886923A CN 111986700 B CN111986700 B CN 111986700B
Authority
CN
China
Prior art keywords
target
image data
sound data
motion
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010886923.7A
Other languages
Chinese (zh)
Other versions
CN111986700A (en
Inventor
陈文琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Fanxing Huyu IT Co Ltd
Original Assignee
Guangzhou Fanxing Huyu IT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Fanxing Huyu IT Co Ltd filed Critical Guangzhou Fanxing Huyu IT Co Ltd
Priority to CN202010886923.7A priority Critical patent/CN111986700B/en
Publication of CN111986700A publication Critical patent/CN111986700A/en
Application granted granted Critical
Publication of CN111986700B publication Critical patent/CN111986700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Social Psychology (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for triggering non-contact operation, and belongs to the technical field of computers. The method comprises the following steps: acquiring sound data acquired by an audio acquisition device, and determining target reference sound data matched with the sound data in stored reference sound data; in response to determining target reference sound data matched with the sound data, determining target actions and target operations corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the actions and the operations; detecting whether the target action exists in the acquired image data based on an action detection model, and executing the target operation if the target action exists. The application can improve the poor operation convenience.

Description

Method, device, equipment and storage medium for triggering contactless operation
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for triggering a contactless operation.
Background
With the development of computer technology, the entertainment modes of people have changed greatly, and watching live broadcast gradually becomes an important entertainment mode in people's life.
In order to enhance interaction between the anchor and the audience, a technician sets a plurality of functions for the anchor, and when the anchor uses the functions, the anchor can control the on and off of the functions through a mouse or a keyboard. For example, when using the lottery function, the anchor may control the start and end of the lottery by a mouse or a keyboard.
In carrying out the application, the inventors have found that the prior art has at least the following problems:
The starting and ending of the functions are controlled by the host through a mouse or a keyboard, but the method of triggering the contactless operation is not suitable for all scenes, for example, lottery drawing is very inconvenient when the host dances, so that the host must stop the dance, and then the starting and ending of the lottery drawing are controlled through the mouse or the keyboard, which results in poor operation convenience.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for triggering non-contact operation, which can solve the problem of poor practicability. The technical scheme is as follows:
in one aspect, a method of contactless operation triggering is provided, the method comprising:
Acquiring sound data acquired by an audio acquisition device, and determining target reference sound data matched with the sound data in stored reference sound data;
in response to determining target reference sound data matched with the sound data, determining target actions and target operations corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the actions and the operations;
And executing the target operation when the target action exists in the acquired image data based on the action detection model.
Optionally, the performing the target operation when the collected image data is detected to have the target action based on the action detection model includes:
Acquiring image data acquired in a preset time period, and determining the position information of key points of a human body in the image data based on a key point extraction model;
Determining a first action corresponding to the image data based on an action detection model and the position information of the human body key points in the image data;
and when the first action is the same as the target action, executing the target operation.
Optionally, the human body key point position information includes head key point position information, the motion detection model includes a head motion detection model, the first motion includes a first head motion, and the target motion includes a target head motion;
Or alternatively
The human body key point position information comprises hand key point position information, the motion detection model comprises a hand motion detection model, the first motion comprises a first hand motion, and the target motion comprises a target hand motion.
Optionally, the human body key point position information includes hand key point position information and head key point position information, the motion detection model includes a hand motion detection model and a head motion detection model, the first motion includes a first hand motion and a first head motion, and the target motion includes a target hand motion and a target head motion;
The determining, based on the motion detection model and the human body key point position information in the image data, a first motion corresponding to the image data includes:
and determining a first hand motion corresponding to the image data based on the hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on the head motion detection model and the head key point position information.
Optionally, before determining the target action and the target operation corresponding to the target reference sound data based on the correspondence between the reference sound data, the action and the operation, the method further includes:
performing face detection on the acquired image data to obtain face image data in the image data;
sending a facial image data request carrying an account identifier of a current login account to a server;
receiving reference face image data corresponding to the account identifier sent by the server;
determining that the face image data matches the reference face image data.
Optionally, the target operation is to start the target function or to close the target function.
In another aspect, there is provided a device for contactless operation triggering, the device comprising:
The acquisition module is used for acquiring sound data acquired by the audio acquisition equipment and determining target reference sound data matched with the sound data in the stored reference sound data;
the determining module is used for determining a target action and a target operation corresponding to the target reference sound data based on the corresponding relation among the reference sound data, the action and the operation in response to determining the target reference sound data matched with the sound data;
and the execution module is used for executing the target operation when the collected image data is detected to have the target action based on the action detection model.
Optionally, the execution module is configured to:
Acquiring image data acquired in a preset time period, and determining the position information of key points of a human body in the image data based on a key point extraction model;
Determining a first action corresponding to the image data based on an action detection model and the position information of the human body key points in the image data;
and when the first action is the same as the target action, executing the target operation.
Optionally, the human body key point position information includes head key point position information, the motion detection model includes a head motion detection model, the first motion includes a first head motion, and the target motion includes a target head motion;
Or alternatively
The human body key point position information comprises hand key point position information, the motion detection model comprises a hand motion detection model, the first motion comprises a first hand motion, and the target motion comprises a target hand motion.
Optionally, the human body key point position information includes hand key point position information and head key point position information, the motion detection model includes a hand motion detection model and a head motion detection model, the first motion includes a first hand motion and a first head motion, and the target motion includes a target hand motion and a target head motion;
the determining module is used for:
and determining a first hand motion corresponding to the image data based on the hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on the head motion detection model and the head key point position information.
Optionally, the method further includes a matching module, where the matching module is configured to:
performing face detection on the acquired image data to obtain face image data in the image data;
sending a facial image data request carrying an account identifier of a current login account to a server;
receiving reference face image data corresponding to the account identifier sent by the server;
determining that the face image data matches the reference face image data.
Optionally, the target operation is to start the target function or to close the target function.
In yet another aspect, a computer device is provided that includes a processor and a memory having instructions stored therein that, when executed by the processor, cause the computer device to implement the method of contactless operation triggering.
In yet another aspect, a computer-readable storage medium is provided, the computer-readable storage medium storing instructions that, when executed by a computer device, cause the computer device to implement the method of contactless operation triggering.
The technical scheme provided by the embodiment of the application has the beneficial effects that:
According to the application, whether the sound data are identical with the target reference sound data is firstly verified, whether the collected image data have target actions is detected after verification is passed, and if the target actions are present, the target operations corresponding to the target reference sound data are executed, so that a user can control and execute the corresponding operations only by sending corresponding sounds to swing out the corresponding actions, the control is not required by operating a mouse and a keyboard, and the operation convenience is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;
FIG. 2 is a flow chart of a method for contactless operation triggering according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a method for triggering a contactless operation according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a method for triggering a contactless operation according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a device for contactless operation triggering according to an embodiment of the present application;
fig. 6 is a schematic diagram of a terminal structure according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
The embodiment of the application provides a method for triggering a non-contact operation, which is that a user can trigger corresponding operation without operating hardware such as a keyboard and a mouse, and the method can be realized by a terminal. The terminal can be a mobile phone, a desktop computer, a tablet computer, a notebook computer, intelligent wearable equipment and the like, and the terminal can be provided with controls such as a screen, a microphone, a camera and the like. The terminal may have a function of displaying an image, a function of audio acquisition, a function of image acquisition, and the terminal may be installed with an application program, for example, a live application program, a short video application program, or the like. It should be noted that, the live application is taken as an example for the present scheme, and will not be described in detail herein.
As shown in fig. 1, when a live broadcast application is used by a host, the host may click on a start live broadcast control, and then the terminal acquires rights of the audio acquisition device and the image acquisition device, synthesizes sound data and image data acquired by the audio acquisition device and the image acquisition device into video data, and then sends the video data to a server, which acquires terminal identifications of audiences in a live broadcast room of the host, determines terminals of the audiences according to the terminal identifications of the audiences, and then sends the video data to terminals of the audiences.
Fig. 2 is a flowchart of a method for triggering a contactless operation according to an embodiment of the present application. Referring to fig. 2, the process includes:
step 201, acquiring sound data acquired by an audio acquisition device, and determining target reference sound data matched with the sound data in stored reference sound data.
In implementation, when the anchor is broadcasting, the terminal may collect the sound data based on the audio collection device, the terminal may detect the obtained sound data to obtain the frequency of the sound data, and further the terminal may obtain the frequency corresponding to the internally stored reference sound data, then compare the frequency of the obtained sound data with the frequency corresponding to the internally stored reference sound data, and if there is a situation that the frequency of the stored reference sound data matches the frequency of the sound data, determine the reference sound data as the target reference sound data.
For example, when the anchor plays a clapping palm during live broadcasting, clapping palm sound is generated, the terminal can detect clapping palm sound collected by the microphone to obtain clapping palm sound frequency, the terminal can obtain internally stored reference sound data, the frequency of clapping palm sound is compared with the frequency of the reference sound data, and if the frequency of the reference sound data matched with the clapping palm sound frequency is detected, the reference sound data is determined to be target reference sound data.
Here, the terminal may detect not only the frequency of the sound data but also the duration of the sound data, the pitch of the sound data, the number of beats of the sound data, and the semantics of the sound data.
Here, the above-described detection of the collected sound data and determination of the target reference sound data matching the sound data is a low-performance process, and at this time, the function of detecting the image data described below is in an off state. Only after the target reference sound data matching the sound data is determined, the terminal can turn on the function of detecting the image data. This also has the effect of reducing the waste of computing resources.
Optionally, the anchor may set a function switch before the anchor plays the live.
In implementation, the anchor may start a live application and trigger an account control, so that the terminal may display an account setting page, in which an avatar of the anchor and a nickname of the anchor may be displayed, and the avatar of the anchor and the nickname of the anchor may be displayed in an account information editing control. A function switch control may be displayed in the account page, and the anchor may click on the function switch control, and further, the terminal may display a function switch page, and in the function switch page, an existing function switch list may be displayed, where the list may display reference audio, head action, hand action, function name, and status, as shown in fig. 3. An adding control is arranged below the function switch list, a user can trigger the adding control, the terminal can display a function switch adding interface, in the function switch adding interface, the user can sequentially add/select reference audio, head action and hand action and select function names and states, the user can set interval duration of audio detection and action detection according to own habits, and trigger a determining control, as shown in fig. 4, and furthermore, a host can complete the setting of the function switch.
It should be noted here that the above setting of the function switch is to generate the correspondence relationship between the user-defined reference sound data, the action, and the operation, so that the terminal can perform the processing of step 202 described below.
The head motion may include facial motion, such as blinking, opening mouth, and the like, and is not limited thereto.
Here, the hand motion may include a static hand motion and a dynamic hand motion, for example, an "OK" gesture, a "ringing finger motion" and the like, which are not limited herein.
Step 202, in response to determining target reference sound data matched with the sound data, determining a target action and a target operation corresponding to the target reference sound data based on the corresponding relation among the reference sound data, the action and the operation.
The target action comprises a target hand action and/or a target head action, the corresponding relation among the reference sound data, the action and the operation is stored in the terminal, and the target action is an identification for identifying one action.
In an implementation, after the terminal determines target reference sound data matched with sound data acquired by the audio acquisition device, the terminal may determine a target action and a target operation corresponding to the target reference sound data based on a correspondence of the stored reference sound data, actions and functions.
Here, the target operation is to start the target function or to close the target function.
For example, if the correspondence relation between the reference audio, the head motion, the hand motion, the function, and the state is stored in the terminal, after the target reference audio data is determined, the terminal may determine the target reference audio corresponding to the target reference audio data, and then determine the target head motion (corresponding to the head motion), the second hand motion (corresponding to the hand motion), the lottery (corresponding to the function), and start (corresponding to the state) corresponding to the target reference audio.
It should be noted that the first action and the target action are the marks corresponding to the actions with simple bases, so that the calculation complexity can be reduced, and meanwhile, in order to ensure the accuracy of detection and reduce the false judgment rate, the scheme adopts a combination mode of detecting two simple actions to ensure the accuracy of detection and reduce the false judgment rate.
And 203, executing the target operation when the collected image data is detected to have the target action based on the action detection model.
In implementation, after the terminal determines the target action and the target operation corresponding to the target reference sound data, the terminal may first acquire the image data acquired by the image acquisition device within the preset duration.
The preset time length can be set manually or can be a default value of the system.
For example, after the clapping palm, the terminal can acquire the head action nodding corresponding to clapping palm sound, the gesture action OK gesture and the target operation starting lottery drawing, so that the user can nod within the preset time period and finish the OK gesture by hands, and further, the terminal can acquire the image data of the user nodding within the preset time period and finish the OK gesture by hands through the image acquisition equipment.
In this case, the hand motion and the head motion may be dynamic motion or static motion, and are not limited herein.
Secondly, after the image data is collected, the terminal can determine a first action corresponding to the image data through a key point extraction model and an action detection model, and the specific processing can be as follows:
first, human body key point position information in image data is determined based on a key point extraction model.
The key point extraction model comprises a hand key point extraction model and a head key point extraction model, and the human body key point position information comprises hand key point position information and head key point position information.
In an implementation, the terminal may input the image data into a hand keypoint extraction model and a head keypoint extraction model, and may output hand keypoint position information and head keypoint position information in the image data by calculating the hand keypoint extraction model and the head keypoint extraction model.
Next, a first motion corresponding to the image data is determined based on the motion detection model and the human body key point position information in the image data.
Wherein the motion detection model comprises a hand motion detection model and a head motion detection model, and the first motion comprises a first hand motion and a first head motion.
In an implementation, after the hand keypoint position information and the head keypoint position information are acquired, the hand keypoint position information may be input to the hand motion detection model, and the head keypoint position information may be input to the head motion detection model, and further, the hand motion detection model may determine the first hand motion corresponding to the image data, and the head motion detection model may determine the first head motion corresponding to the image data.
After the first action corresponding to the image data is determined, when the first action is the same as the target action, the target operation is executed.
In practice, the first head motion, the first hand motion, the target head motion, and the target hand motion may be obtained via the processing terminal. Further, the terminal may confirm whether the first head motion is identical to the target head motion, the first hand motion is identical to the target hand motion, and if the first head motion is identical to the target head motion, the first hand motion is identical to the target hand motion, the target operation is performed.
For example, if the first head action is a click, the target head action is also a click, the first hand action is "OK", and the target hand action is "OK", it is verified that the first head action is the same as the target head action, and the first hand action and the target hand action are the same, and the process of starting the lottery is executed.
Optionally, in some special scenarios, safety triggered by contactless operation needs to be considered, and based on the above consideration, the developer sets a safety detection scheme before performing the step 202, the specific processing may be as follows:
first, face detection is performed on image data, and face image data in the image data is obtained.
In implementations, after the terminal acquires the image data, the terminal may input the image data into a face detection model, which may in turn output facial image data in the image data.
Secondly, a facial image data request carrying an account identifier of the current login account is sent to a server.
In an implementation, after the terminal obtains the facial image data in the image data, the terminal may obtain the account identifier of the current login account and generate the facial image data request based on the account identifier of the current login account and the facial image data in the image data. Then, the face image data request is transmitted to the server.
And secondly, receiving the reference face image data corresponding to the account identification sent by the server.
In an implementation, after the server receives the facial image data request, the server may obtain an account identifier in the facial image data request, and obtain reference facial image data corresponding to the account identifier according to a correspondence between the account identifier and the reference facial image data. Then, the acquired reference face image data is transmitted to the terminal.
Then, it is determined that the face image data matches the reference face image data.
In implementation, the terminal may receive the reference face image data sent by the server, and further the terminal may input the face image data and the reference face image data into a face matching model, and a matching result of the face image data and the reference face image data may be obtained through the face matching model calculation.
Optionally, after the terminal performs the target operation, the following processing may be performed;
First, when the anchor wants to stop a target function currently being executed, the terminal may acquire sound data acquired by the audio acquisition device, and determine target reference sound data matching the sound data among the stored reference sound data.
For example, the anchor turns on the lottery function in the live broadcast, and after a certain period of time, the anchor needs to stop the lottery function. The anchor can say that the finger is sounded, and then the finger sound is generated, the terminal can detect the finger sound collected by the microphone to obtain the frequency of the finger sound, and then the terminal can obtain the reference sound data stored in the terminal, and then compare the frequency of the finger sound with the frequency of the reference sound data, and if the frequency of the reference sound data matched with the frequency of the finger sound is detected, the reference sound data is determined to be target reference sound data.
And secondly, acquiring image data acquired by the image acquisition equipment.
For example, after the anchor rings the finger, the anchor may shake his head for a predetermined period of time and finish the "fist-making" gesture with his hand, and further, the terminal may acquire image data of the user nodding his head and the "fist-making" gesture with his hand.
Next, a first motion corresponding to the image data is determined based on the motion detection model.
In an implementation, the terminal determines human body key point position information in the image data based on the key point extraction model, and then determines a first action corresponding to the image data based on the action detection model and the human body key point position information in the image data.
For example, the terminal may input image data including "waving" and "making a fist" motions into the keypoint extraction model to obtain hand keypoint position information and head keypoint position information in the image data, and then the terminal may input the hand keypoint position information and the head keypoint position information into the hand motion detection model and the head motion detection model, respectively, and further may obtain the first hand motion and the first head motion.
Next, a target action and a target function corresponding to the target reference sound data are determined based on the correspondence of the reference sound data, the action, and the function.
For example, the terminal stores correspondence between reference audio, head motion, hand motion, function name, and on/off, and after determining the target reference sound data, the terminal may determine the target reference audio corresponding to the target reference sound data, and then determine the corresponding target head motion (corresponding to the head motion), target hand motion (corresponding to the hand motion), lottery (corresponding to the function), and off (corresponding to the state).
Then, if the first action is the same as the target action, the target function is performed.
For example, if the first head motion is a shaking head, the target head motion is also a shaking head, the first hand motion is a "fist-making" and the target hand motion is a "fist-making" as well, it is proved that the first head motion is the same as the target head motion, and the first hand motion is the same as the target hand motion, and the process of stopping the lottery is performed.
According to the application, whether the sound data are identical with the target reference sound data is firstly verified, whether the collected image data have target actions is detected after verification is passed, and if the target actions are present, the target operations corresponding to the target reference sound data are executed, so that a user can control and execute the corresponding operations only by sending corresponding sounds to swing out the corresponding actions, the control is not required by operating a mouse and a keyboard, and the operation convenience is improved.
Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.
Fig. 5 is a schematic diagram of an apparatus for a method for triggering a contactless operation according to an embodiment of the present application. Referring to fig. 5, the apparatus may be the above terminal, and the apparatus includes:
an obtaining module 510, configured to obtain sound data collected by an audio collecting device, and determine target reference sound data matched with the sound data from stored reference sound data;
a determining module 520, configured to determine, in response to determining target reference sound data that matches the sound data, a target action and a target operation corresponding to the target reference sound data based on a correspondence of the reference sound data, the action, and the operation;
an execution module 530 for executing the target operation when the presence of the target action is detected based on the action detection model.
Optionally, the executing module 530 is configured to:
Acquiring image data acquired in a preset time period, and determining the position information of key points of a human body in the image data based on a key point extraction model;
Determining a first action corresponding to the image data based on an action detection model and the position information of the human body key points in the image data;
and when the first action is the same as the target action, executing the target operation.
Optionally, the human body key point position information includes head key point position information, the motion detection model includes a head motion detection model, the first motion includes a first head motion, and the target motion includes a target head motion;
Or alternatively
The human body key point position information comprises hand key point position information, the motion detection model comprises a hand motion detection model, the first motion comprises a first hand motion, and the target motion comprises a target hand motion.
Optionally, the human body key point position information includes hand key point position information and head key point position information, the motion detection model includes a hand motion detection model and a head motion detection model, the first motion includes a first hand motion and a first head motion, and the target motion includes a target hand motion and a target head motion;
the determining module 520 is configured to:
and determining a first hand motion corresponding to the image data based on the hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on the head motion detection model and the head key point position information.
Optionally, the method further includes a matching module, where the matching module is configured to:
performing face detection on the acquired image data to obtain face image data in the image data;
sending a facial image data request carrying an account identifier of a current login account to a server;
receiving reference face image data corresponding to the account identifier sent by the server;
determining that the face image data matches the reference face image data.
Optionally, the target operation is to start the target function or to close the target function.
According to the application, whether the sound data are identical with the target reference sound data is firstly verified, whether the collected image data have target actions is detected after verification is passed, and if the target actions are present, the target operations corresponding to the target reference sound data are executed, so that a user can control and execute the corresponding operations only by sending corresponding sounds to swing out the corresponding actions, the control is not required by operating a mouse and a keyboard, and the operation convenience is improved.
It should be noted that: the device for triggering the contactless operation provided in the above embodiment is only exemplified by the division of the above functional modules when controlling the switch, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiments of the method for triggering the contactless operation provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not described herein again.
Fig. 6 shows a block diagram of a terminal 600 according to an exemplary embodiment of the present application. The terminal may be the terminal in the above embodiment, and the terminal 600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 600 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.
In general, the terminal 600 includes: a processor 601 and a memory 602.
Processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 601 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). Processor 601 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 601 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 601 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the method of contactless operation triggering provided by the method embodiments of the present application.
In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603, and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 603 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 604, a touch display 605, a camera 606, audio circuitry 607, a positioning component 608, and a power supply 609.
Peripheral interface 603 may be used to connect at least one Input/Output (I/O) related peripheral to processor 601 and memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 601, memory 602, and peripheral interface 603 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
The Radio Frequency circuit 604 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 604 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 604 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 604 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.
The display screen 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 605 is a touch display, the display 605 also has the ability to collect touch signals at or above the surface of the display 605. The touch signal may be input as a control signal to the processor 601 for processing. At this point, the display 605 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 605 may be one, providing a front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display, disposed on a curved surface or a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 605 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode), or other materials.
The camera assembly 606 is used to capture images or video. Optionally, the camera assembly 606 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuit 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing, or inputting the electric signals to the radio frequency circuit 604 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 607 may also include a headphone jack.
The location component 608 is utilized to locate the current geographic location of the terminal 600 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 608 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.
A power supply 609 is used to power the various components in the terminal 600. The power source 609 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the terminal 600 further includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyroscope sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.
The acceleration sensor 611 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 601 may control the touch display screen 605 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 611. The acceleration sensor 611 may also be used for the acquisition of motion data of a game or a user.
The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 may collect a 3D motion of the user on the terminal 600 in cooperation with the acceleration sensor 611. The processor 601 may implement the following functions based on the data collected by the gyro sensor 612: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor 613 may be disposed at a side frame of the terminal 600 and/or at a lower layer of the touch screen 605. When the pressure sensor 613 is disposed at a side frame of the terminal 600, a grip signal of the terminal 600 by a user may be detected, and a left-right hand recognition or a shortcut operation may be performed by the processor 601 according to the grip signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The fingerprint sensor 614 is used to collect a fingerprint of a user, and the processor 601 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 614 may be provided on the front, back, or side of the terminal 600. When a physical key or vendor Logo is provided on the terminal 600, the fingerprint sensor 614 may be integrated with the physical key or vendor Logo.
The optical sensor 615 is used to collect ambient light intensity. In one embodiment, processor 601 may control the display brightness of touch display 605 based on the intensity of ambient light collected by optical sensor 615. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 605 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 based on the ambient light intensity collected by the optical sensor 615.
A proximity sensor 616, also referred to as a distance sensor, is typically provided on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front of the terminal 600. In one embodiment, when the proximity sensor 616 detects a gradual decrease in the distance between the user and the front face of the terminal 600, the processor 601 controls the touch display 605 to switch from the bright screen state to the off screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually increases, the processor 601 controls the touch display screen 605 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the structure shown in fig. 6 is not limiting of the terminal 600 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.
In an exemplary embodiment, a computer readable storage medium, such as a memory comprising instructions executable by a processor in a terminal to perform the method of contactless operation triggering in the above embodiments is also provided. For example, the computer readable storage medium may be Read-only Memory (ROM), random-access Memory (Random Access Memory, RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (7)

1. A method of contactless operation triggering, for application to a live application, the method comprising:
responding to a triggering instruction of the function switch control, and displaying a current existing function switch list and an adding control;
Responding to a trigger instruction of the adding control, displaying a function switch adding interface, wherein the function switch adding interface is used for adding or selecting reference sound data, actions, operations and interval duration of audio detection and action detection by a host;
responding to a trigger instruction of a determined control, establishing a corresponding relation among the reference sound data, the action and the operation, and storing the interval duration;
Acquiring sound data acquired by an audio acquisition device, and determining target reference sound data matched with the frequency, the duration, the pitch or the beat number of the sound data in stored reference sound data;
Responding to the determined target reference sound data matched with the sound data, starting an image data detection function, and carrying out face detection on the acquired image data to obtain face image data in the image data; sending a facial image data request carrying an account identifier of a current login account to a server; receiving reference face image data corresponding to the account identifier sent by the server; if the face image data is determined to be matched with the reference face image data, determining that the person is the anchor, and determining a target action and a target operation corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the action and the operation;
Acquiring image data acquired in a preset time period when the distance of the target reference sound data is determined, and determining the position information of key points of a human body in the image data based on a key point extraction model;
Determining a first action corresponding to the image data acquired in a preset time period based on the action detection model and the position information of the human body key points in the image data, wherein the first action comprises a combination of a plurality of basic actions;
And when the first action is the same as the target action, performing the target operation including a lottery operation.
2. The method of claim 1, wherein the human keypoint location information comprises head keypoint location information, the motion detection model comprises a head motion detection model, the first motion comprises a first head motion, and the target motion comprises a target head motion;
Or alternatively
The human body key point position information comprises hand key point position information, the motion detection model comprises a hand motion detection model, the first motion comprises a first hand motion, and the target motion comprises a target hand motion.
3. The method of claim 1, wherein the human keypoint location information comprises hand keypoint location information and head keypoint location information, the motion detection model comprises a hand motion detection model and a head motion detection model, the first motion comprises a first hand motion and a first head motion, and the target motion comprises a target hand motion and a target head motion;
The determining, based on the motion detection model and the human body key point position information in the image data, a first motion corresponding to the image data includes:
and determining a first hand motion corresponding to the image data based on the hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on the head motion detection model and the head key point position information.
4. The method of claim 1, wherein the target operation is to start a target function or to shut down a target function.
5. A device for contactless operation triggering, for application to a live application, the device being configured to:
responding to a triggering instruction of the function switch control, and displaying a current existing function switch list and an adding control;
Responding to a trigger instruction of the adding control, displaying a function switch adding interface, wherein the function switch adding interface is used for adding or selecting reference sound data, actions, operations and interval duration of audio detection and action detection by a host;
responding to a trigger instruction of a determined control, establishing a corresponding relation among the reference sound data, the action and the operation, and storing the interval duration;
The device comprises:
the acquisition module is used for acquiring sound data acquired by the audio acquisition equipment and determining target reference sound data matched with the frequency, the duration, the pitch or the beat number of the sound data in the stored reference sound data;
the matching module is used for responding to the determined target reference sound data matched with the sound data, starting an image data detection function, and carrying out face detection on the acquired image data to obtain face image data in the image data; sending a facial image data request carrying an account identifier of a current login account to a server; receiving reference face image data corresponding to the account identifier sent by the server; determining that the face image data matches the reference face image data, indicating a principal is the principal when the face image data matches the reference face image data;
the determining module is used for determining a target action and a target operation corresponding to the target reference sound data based on the corresponding relation among the reference sound data, the action and the operation;
The execution module is used for acquiring image data acquired in a preset time period when the distance of the target reference sound data is determined, and determining the position information of the key points of the human body in the image data based on a key point extraction model;
Determining a first action corresponding to the image data acquired in a preset time period based on the action detection model and the position information of the human body key points in the image data, wherein the first action comprises a combination of a plurality of basic actions;
And when the first action is the same as the target action, performing the target operation including a lottery operation.
6. A computer device comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to implement an operation performed by the contactless operation-triggered method of any one of claims 1 to 4.
7. A computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement an operation performed by the contactless operation triggered method of any one of claims 1 to 4.
CN202010886923.7A 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering contactless operation Active CN111986700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010886923.7A CN111986700B (en) 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering contactless operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010886923.7A CN111986700B (en) 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering contactless operation

Publications (2)

Publication Number Publication Date
CN111986700A CN111986700A (en) 2020-11-24
CN111986700B true CN111986700B (en) 2024-09-06

Family

ID=73440905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010886923.7A Active CN111986700B (en) 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering contactless operation

Country Status (1)

Country Link
CN (1) CN111986700B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697686B (en) * 2020-12-25 2023-11-21 北京达佳互联信息技术有限公司 Online interaction method and device, server and storage medium
CN114741561B (en) * 2022-02-28 2024-10-29 商汤国际私人有限公司 Action generation method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108076392A (en) * 2017-03-31 2018-05-25 北京市商汤科技开发有限公司 Living broadcast interactive method, apparatus and electronic equipment
CN110446115A (en) * 2019-07-22 2019-11-12 腾讯科技(深圳)有限公司 Living broadcast interactive method, apparatus, electronic equipment and storage medium
CN110881134A (en) * 2019-11-01 2020-03-13 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9743138B2 (en) * 2015-07-31 2017-08-22 Mutr Llc Method for sound recognition task trigger
CN106791921B (en) * 2016-12-09 2020-03-03 北京小米移动软件有限公司 Processing method and device for live video and storage medium
CN107124664A (en) * 2017-05-25 2017-09-01 百度在线网络技术(北京)有限公司 Exchange method and device applied to net cast
CN111353805A (en) * 2018-12-24 2020-06-30 阿里巴巴集团控股有限公司 Lottery drawing processing method and device in live broadcast and electronic equipment
CN111382624B (en) * 2018-12-28 2023-08-11 杭州海康威视数字技术股份有限公司 Action recognition method, device, equipment and readable storage medium
CN111274910B (en) * 2020-01-16 2024-01-30 腾讯科技(深圳)有限公司 Scene interaction method and device and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108076392A (en) * 2017-03-31 2018-05-25 北京市商汤科技开发有限公司 Living broadcast interactive method, apparatus and electronic equipment
CN110446115A (en) * 2019-07-22 2019-11-12 腾讯科技(深圳)有限公司 Living broadcast interactive method, apparatus, electronic equipment and storage medium
CN110881134A (en) * 2019-11-01 2020-03-13 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111986700A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN110971930B (en) Live virtual image broadcasting method, device, terminal and storage medium
CN108833818B (en) Video recording method, device, terminal and storage medium
CN109874312B (en) Method and device for playing audio data
CN108965922B (en) Video cover generation method and device and storage medium
CN109327608B (en) Song sharing method, terminal, server and system
CN109922356B (en) Video recommendation method and device and computer-readable storage medium
CN110956580B (en) Method, device, computer equipment and storage medium for changing face of image
CN111681655A (en) Voice control method and device, electronic equipment and storage medium
CN111986700B (en) Method, device, equipment and storage medium for triggering contactless operation
CN110677713B (en) Video image processing method and device and storage medium
CN110152309B (en) Voice communication method, device, electronic equipment and storage medium
CN113485596B (en) Virtual model processing method and device, electronic equipment and storage medium
CN108509127B (en) Method and device for starting screen recording task and computer equipment
CN111158575B (en) Method, device and equipment for terminal to execute processing and storage medium
CN111061369B (en) Interaction method, device, equipment and storage medium
CN112118482A (en) Audio file playing method and device, terminal and storage medium
CN110941458B (en) Method, device, equipment and storage medium for starting application program
CN113592874B (en) Image display method, device and computer equipment
CN111314205B (en) Instant messaging matching method, device, system, equipment and storage medium
CN111241334B (en) Method, device, system, equipment and storage medium for displaying song information page
CN110942426B (en) Image processing method, device, computer equipment and storage medium
CN114595019A (en) Theme setting method, device and equipment of application program and storage medium
CN112132472A (en) Resource management method and device, electronic equipment and computer readable storage medium
CN112084041A (en) Resource processing method and device, electronic equipment and storage medium
CN111813486A (en) Page display method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant