Nothing Special   »   [go: up one dir, main page]

CN113657163B - Behavior recognition method, electronic device and storage medium - Google Patents

Behavior recognition method, electronic device and storage medium Download PDF

Info

Publication number
CN113657163B
CN113657163B CN202110801176.7A CN202110801176A CN113657163B CN 113657163 B CN113657163 B CN 113657163B CN 202110801176 A CN202110801176 A CN 202110801176A CN 113657163 B CN113657163 B CN 113657163B
Authority
CN
China
Prior art keywords
thermodynamic diagram
current frame
key point
behavior
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110801176.7A
Other languages
Chinese (zh)
Other versions
CN113657163A (en
Inventor
张澍
魏乃科
潘华东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202110801176.7A priority Critical patent/CN113657163B/en
Publication of CN113657163A publication Critical patent/CN113657163A/en
Application granted granted Critical
Publication of CN113657163B publication Critical patent/CN113657163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a behavior recognition method, an electronic device and a storage medium, wherein the behavior recognition method comprises the following steps: determining a key point sequence of the target object based on a monitoring video containing the to-be-processed behavior of the target object, wherein the key point sequence comprises key point information of the target object in a plurality of video frames of the monitoring video; determining a gesture type sequence corresponding to the behavior to be processed according to the key point sequence; extracting features of the key point sequences to obtain first behavior features; extracting features of the gesture type sequence to obtain second behavior features; based on the first behavior feature and the second behavior feature, the behavior to be processed is identified, and the problem that the behavior of the target object cannot be accurately described is solved, so that the error rate of identifying the behavior to be processed is reduced.

Description

Behavior recognition method, electronic device and storage medium
Technical Field
The present application relates to the field of computer vision, and in particular, to a behavior recognition method, an electronic device, and a storage medium.
Background
Behavior recognition has wide application prospects, such as intelligent video monitoring, video abstraction, intelligent interfaces, human-computer interaction, sports video analysis, video retrieval and the like.
In recent years, some feature algorithms suitable for behavior recognition have been proposed, such as Local Binary Pattern (LBP), direction gradient Histogram (HOG), scale-invariant feature transform (SIFT), and the like. However, the single feature is often affected by factors such as the appearance, environment, camera setting, etc. of the target object, and the behavior of the target object cannot be accurately described.
Disclosure of Invention
In this embodiment, a behavior recognition method, an electronic device, and a storage medium are provided to solve the problem that the behavior of a target object cannot be accurately described in the related art.
In a first aspect, in this embodiment, there is provided a behavior recognition method, including:
Determining a key point sequence of a target object based on a monitoring video containing a to-be-processed behavior of the target object, wherein the key point sequence comprises key point information of the target object in a plurality of video frames of the monitoring video;
Determining a gesture type sequence corresponding to the behavior to be processed according to the key point sequence;
Extracting features of the key point sequences to obtain first behavior features;
extracting features of the gesture type sequence to obtain a second behavior feature;
and identifying the behavior to be processed based on the first behavior feature and the second behavior feature.
In some of these embodiments, the method further comprises:
If the type of the behavior to be processed corresponds to the first gesture type in the result of identifying the behavior to be processed, and the type of the behavior to be processed is determined to be the set abnormal identification type, alarm processing is carried out; the first gesture type is a gesture type corresponding to the last frame of the target object in the monitoring video.
In some embodiments, the determining the key point sequence of the target object specifically includes:
Acquiring current frame data and associated frame data in the monitoring video, wherein the current frame data is any frame data in a plurality of video frames of the monitoring video, and the associated frame data is the previous frame data of the current frame data;
determining a first thermodynamic diagram according to the current frame data;
determining a second thermodynamic diagram from the associated frame data;
determining a third thermodynamic diagram from the first thermodynamic diagram and the second thermodynamic diagram; wherein the third thermodynamic diagram is a thermodynamic diagram of the target object at the current frame predicted based on the associated frame data;
determining a thermodynamic diagram of the target object in the current frame according to the first thermodynamic diagram and the third thermodynamic diagram;
And acquiring key point information corresponding to the target object in the current frame according to the thermodynamic diagram of the target object in the current frame.
In some of these embodiments, determining a third thermodynamic diagram from the first thermodynamic diagram and the second thermodynamic diagram comprises:
Determining a fourth thermodynamic diagram based on a difference between the first thermodynamic diagram and the second thermodynamic diagram;
Determining the third thermodynamic diagram based on the second thermodynamic diagram and the fourth thermodynamic diagram.
In some of these embodiments, determining a thermodynamic diagram of the target object in the current frame from the first thermodynamic diagram and the third thermodynamic diagram comprises:
And fusing the first thermodynamic diagram with the third thermodynamic diagram to obtain the thermodynamic diagram of the target object in the current frame.
In some of these embodiments, determining the third thermodynamic diagram from the second thermodynamic diagram and the fourth thermodynamic diagram comprises:
Based on a key point extraction module, identifying key points in the second thermodynamic diagram and the fourth thermodynamic diagram, and obtaining key point information corresponding to the second thermodynamic diagram and key point information corresponding to the fourth thermodynamic diagram;
And determining the third thermodynamic diagram according to the key point information corresponding to the second thermodynamic diagram and the key point information corresponding to the fourth thermodynamic diagram.
In some embodiments, the determining the key point sequence of the target object based on the monitoring video including the to-be-processed behavior of the target object specifically includes:
Acquiring key point information of a current frame, key point information of n frames before the current frame and key point information of m frames after the current frame based on the monitoring video, wherein the current frame is any video frame except a first video frame and a last video frame in a plurality of video frames of the monitoring video;
Determining the smoothed key point information of the current frame according to the key point information of the current frame, the key point information of the n frames before the current frame and the key point information of the m frames after the current frame;
and determining the key point sequence of the target object according to the key point information of the current frame after smoothing.
In some embodiments, feature extraction is performed on the gesture type sequence, and before obtaining the second behavior feature, the method includes:
Acquiring confidence coefficient corresponding to the gesture type of a current frame, confidence coefficient corresponding to the gesture type of a p frame before the current frame and confidence coefficient corresponding to the gesture type of a q frame after the current frame, wherein the current frame is any video frame except a first video frame and a last video frame in a plurality of video frames of the monitoring video;
determining the gesture type of the current frame after smoothing according to the confidence coefficient corresponding to the gesture type of the current frame, the confidence coefficient corresponding to the gesture type of the p frame before the current frame and the confidence coefficient corresponding to the gesture type of the q frame after the current frame;
and determining the gesture type sequence according to the gesture type of the current frame after smoothing.
In a second aspect, in this embodiment, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the behavior recognition method according to the first aspect.
In a third aspect, in this embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the behavior recognition method of the first aspect described above.
Compared with the related art, the behavior recognition method, the electronic device and the storage medium provided in the embodiment obtain the first behavior feature by extracting the feature of the key point sequence; extracting features of the gesture type sequence to obtain a second behavior feature; based on the first behavior feature and the second behavior feature, the behavior to be processed is identified, and when the behavior to be processed is identified, the gesture type sequence with image information is added, so that the problem that the behavior of the target object cannot be accurately described is solved, and the error rate of identifying the behavior to be processed is reduced.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of an application terminal of a behavior recognition method according to an embodiment of the present application;
FIG. 2 is a flow chart of a behavior recognition method according to an embodiment of the present application;
FIG. 3 is a flow chart of a method of keypoint sequence determination of a target object according to an embodiment of the application;
FIG. 4 is a flow chart of yet another behavior recognition method in accordance with an embodiment of the present application;
FIG. 5 is a flow chart of yet another behavior recognition method in accordance with an embodiment of the present application;
Fig. 6 is a schematic diagram of a behavior recognition module according to an embodiment of the present application.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples for a clearer understanding of the objects, technical solutions and advantages of the present application.
Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these" and similar terms in this application are not intended to be limiting in number, but may be singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used herein, are intended to encompass non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this disclosure, merely distinguish similar objects and do not represent a particular ordering for objects.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or similar computing device. For example, running on a terminal, fig. 1 is a block diagram of a hardware structure of an application terminal of a behavior recognition method according to an embodiment of the present application. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the behavior recognition method in the present embodiment, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, a behavior recognition method is provided, fig. 2 is a flowchart of a behavior recognition method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
In step S201, a key point sequence of the target object is determined based on the monitoring video including the behavior to be processed of the target object, the key point sequence including key point information of the target object in a plurality of video frames of the monitoring video.
In this embodiment, the target object is a human body or an animal, and the key point sequence includes key point information of the human body in a plurality of video frames of the monitoring video, which are continuous in time, assuming that the target object is a human body.
Step S202, determining a gesture type sequence corresponding to the behavior to be processed according to the key point sequence.
It will be appreciated that the sequence of gesture types includes a gesture type of the target object in a plurality of video frames of the surveillance video, the plurality of video frames being temporally successive, the target object key points generally corresponding to joints on the target object having a degree of freedom, such as a neck, shoulder, elbow, wrist, waist, knee, ankle, etc., the current gesture type of the target object being estimated by calculation of the relative position of the target object key points in three-dimensional space, the target object gesture type representing the current state of the target object, such as standing, raising, lying down, etc.
In this embodiment, the sequence of key points may be input to a convolutional neural network or an SVM classifier to determine a gesture type sequence corresponding to the target object, but the gesture type sequence is not limited to the above two methods, and no limitation is made on how to acquire the gesture type sequence.
And step S203, extracting the characteristics of the key point sequence to obtain a first behavior characteristic.
In this embodiment, feature extraction may be performed on the key point sequence through a time sequence convolutional network, a cyclic neural network or a transducer module, but the key point sequence features are not limited to the above three methods, and any network that may be used to extract the behavior features may be used to extract the key point sequence features, which is not limited herein how to extract the key point sequence features.
It should be noted that the time-series convolution network (Temporal convolutional network, TCN) is a network structure capable of processing time-series data.
The recurrent neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network (recursive neural network) that takes sequence data as input, performs recursion in the evolution direction of the sequence (recursion), and all nodes (looping units) are chained.
The cyclic neural network has memory, parameter sharing and complete figure (Turing completeness), so that the cyclic neural network has certain advantages in learning the nonlinear characteristics of the sequence. The recurrent neural network has application in the fields of natural language processing (Natural Language Processing, NLP), such as speech recognition, language modeling, machine translation, etc., and is also used for various time series predictions.
The transducer discards the traditional RNNs and CNNs, and firstly uses an attribute mechanism to reduce the distance between any two positions in the sequence to be a constant; secondly, the method is not similar to the RNN sequence structure, so that the method has better parallelism, accords with the existing GPU framework, and effectively solves the long-term dependence problem in NLP.
In the existing behavior recognition method, the behavior to be processed is recognized only through the first behavior feature in the step, so that the behavior to be processed cannot be recognized accurately, and based on the behavior recognition method, the method further comprises the step S204 and the step S205, so that the problem that the behavior to be processed cannot be recognized accurately is solved.
And step S204, extracting features of the gesture type sequence to obtain second behavior features.
Step S205, identifying the behavior to be processed based on the first behavior feature and the second behavior feature.
In this embodiment, the identifying of the behavior to be processed is identifying the type of the behavior to be processed or identifying the integrity of the behavior to be processed, and if the behavior to be processed is sit-ups, the identifying of the behavior to be processed is identifying the integrity of the behavior to be processed, and then whether the sit-ups is completed completely by the target object is determined according to the first behavior feature and the second behavior feature.
In addition, in this embodiment, without limiting how to identify the behavior to be processed according to the first behavior feature and the second behavior feature, any manner of identifying the behavior to be processed through the first behavior feature and the second behavior feature is within the protection scope of the present application, for example, the first behavior feature and the second behavior feature are fused in proportion, and the fused features are input into the classifier to identify the behavior to be processed.
Through the steps, when the target object behavior is identified, the gesture type sequence with the image information is added, so that the problem that the target object behavior cannot be accurately described is solved, and the error rate of identifying the behavior to be processed is reduced.
It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein. For example, step S203 and step S204 may be interchanged.
In some embodiments, if the type of the behavior to be processed corresponds to the first gesture type in the result of identifying the behavior to be processed, and the type of the behavior to be processed is determined to be the set abnormal identification type, alarm processing is performed; the first gesture type is a gesture type corresponding to the last frame of the target object in the monitoring video.
In this embodiment, if the behavior recognition type does not correspond to the first gesture type, it is indicated that the obtained behavior recognition type does not correspond to the first gesture type, for example, the obtained behavior recognition type is a falling, and the first gesture type is a standing state, which indicates that the current target object is not in a falling state, and an alarm is sent out to cause false alarm, so in the present application, if the behavior recognition type does not correspond to the first gesture type, an alarm is not sent out.
It can be understood that before judging whether the behavior recognition type is the set abnormal recognition type, the application judges whether the behavior recognition type corresponds to the first gesture type, thereby reducing false alarm and improving the accuracy and effectiveness of alarm.
In some embodiments, fig. 3 is a flowchart of a method for determining a key point sequence of a target object according to an embodiment of the present application, as shown in fig. 3, where the flowchart specifically includes the following steps:
Step S301, acquiring current frame data and associated frame data in a surveillance video, where the current frame data is any frame data in a plurality of video frames of the surveillance video, and the associated frame data is a frame data preceding the current frame data.
Step S302, a first thermodynamic diagram is determined according to the current frame data.
In this embodiment, the first thermodynamic diagram is a thermodynamic diagram of the target object in the current frame data.
Step S303, determining a second thermodynamic diagram according to the associated frame data.
In this embodiment, the second thermodynamic diagram is a thermodynamic diagram of the target object in the associated frame data.
Step S304, determining a third thermodynamic diagram according to the first thermodynamic diagram and the second thermodynamic diagram; wherein the third thermodynamic diagram is a thermodynamic diagram of the predicted target object at the current frame based on the associated frame data.
In step S305, a thermodynamic diagram of the target object in the current frame is determined according to the first thermodynamic diagram and the third thermodynamic diagram.
Step S306, obtaining key point information corresponding to the target object in the current frame according to the thermodynamic diagram of the target object in the current frame.
In this embodiment, without limiting how to obtain the keypoint information according to the thermodynamic diagram, any way of obtaining the keypoint information by the thermodynamic diagram in step S305 is within the scope of the present application, for example, the coordinates of the local maximum of the thermodynamic diagram may be used as the coordinates of the target object keypoints, i.e., the keypoint information may be determined in the thermodynamic diagram.
Through the steps, the thermodynamic diagram of the target object in the current frame is determined based on the associated frame data, namely the associated frame data is added when the thermodynamic diagram of the target object in the current frame is determined, and the influence of the associated frame data on the thermodynamic diagram of the target object in the current frame is considered, so that the thermodynamic diagram of the target object in the current frame can be more accurately determined, and further, the key point information corresponding to the target object in the current frame can be more accurately determined.
It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein. For example, step S302 and step S303 may be interchanged.
In one embodiment, the first thermodynamic diagram and the second thermodynamic diagram may also be obtained by:
acquiring current frame data and associated frame data in video data, wherein the current frame data is any frame data in a plurality of video frames of a monitoring video, and the associated frame data is the previous frame data of the current frame data;
respectively acquiring a target object sub-graph corresponding to the target object from the current frame data and the associated frame data;
Determining a first thermodynamic diagram according to a target object subgraph corresponding to current frame data;
And determining a second thermodynamic diagram according to the target object subgraph corresponding to the associated frame data.
It will be appreciated that, because the first thermodynamic diagram and the second thermodynamic diagram are determined according to the target object subgraph corresponding to the target object, the first thermodynamic diagram and the second thermodynamic diagram obtained in the above manner can more accurately reflect the thermodynamic diagram of the target object in the video data, and according to the thermodynamic diagram, the key point sequence and the gesture type sequence corresponding to the target object can be more accurately determined, and according to the key point sequence and the gesture type sequence, the behavior to be processed can be more accurately identified.
In some of these embodiments, determining the third thermodynamic diagram from the first thermodynamic diagram and the second thermodynamic diagram comprises:
determining a fourth thermodynamic diagram based on the difference between the first thermodynamic diagram and the second thermodynamic diagram;
And determining a third thermodynamic diagram according to the second thermodynamic diagram and the fourth thermodynamic diagram.
By the method, the thermodynamic diagram of the current frame can be accurately predicted according to the related frame data and the difference between the current frame data and the related frame data, meanwhile, the related frame data is considered in the thermodynamic diagram, and further, the behavior to be processed can be accurately identified based on the thermodynamic diagram.
In some of these embodiments, determining the third thermodynamic diagram from the second thermodynamic diagram and the fourth thermodynamic diagram comprises:
Based on the key point extraction module, key points in the second thermodynamic diagram and the fourth thermodynamic diagram are identified, and key point information corresponding to the second thermodynamic diagram and key point information corresponding to the fourth thermodynamic diagram are obtained;
And determining a third thermodynamic diagram according to the key point information corresponding to the second thermodynamic diagram and the key point information corresponding to the fourth thermodynamic diagram.
In the present embodiment, the keypoint extraction module may be constructed based on a convolutional neural network or a deformable convolutional neural network, but is not limited to the above two ways, and no limitation is made here as to how to construct the keypoint extraction module.
It can be appreciated that the second thermodynamic diagram and the fourth thermodynamic diagram are input into the convolutional neural network or the variable convolutional neural network, so that the thermodynamic diagram of the current frame can be predicted relatively accurately, meanwhile, the relevant frame data is considered in the thermodynamic diagram, and further, the behavior to be processed can be identified more accurately based on the thermodynamic diagrams.
In some of these embodiments, determining the thermodynamic diagram of the target object in the current frame from the first thermodynamic diagram and the third thermodynamic diagram comprises:
and fusing the first thermodynamic diagram and the third thermodynamic diagram to obtain the thermodynamic diagram of the target object in the current frame.
In this embodiment, the first thermodynamic diagram and the third thermodynamic diagram may be blended in proportion, and a specific proportion is not limited.
It will be appreciated that the third thermodynamic diagram is predicted from the associated frame data, and therefore, the thermodynamic diagrams obtained according to the first thermodynamic diagram and the third thermodynamic diagram can reflect the associated frame data, that is, the influence of the associated frame data on the current frame data is considered when the thermodynamic diagram of the current frame is predicted, so that the thermodynamic diagram of the target object in the current frame can be obtained more accurately, and further, the key point information of the target object in the current frame can be determined more accurately according to the thermodynamic diagram.
In some embodiments, determining the key point sequence of the target object based on the monitoring video including the behavior to be processed of the target object specifically includes:
Based on the monitoring video, acquiring key point information of a current frame, key point information of n frames before the current frame and key point information of m frames after the current frame, wherein the current frame is any video frame except a first video frame and a last video frame in a plurality of video frames of the monitoring video;
Determining the smoothed key point information of the current frame according to the key point information of the current frame, the key point information of the n frames before the current frame and the key point information of the m frames after the current frame;
and determining a key point sequence of the target object according to the key point information of the current frame after smoothing.
It can be appreciated that the application considers the change of the key point information of the current frame in the previous and subsequent frames when determining the key point information of the current frame, thereby reducing the influence of abnormal key points and determining the corresponding key point information of each video frame more accurately.
In some embodiments, feature extraction is performed on the gesture type sequence, and before obtaining the second behavior feature, the method includes:
Acquiring confidence coefficient corresponding to the gesture type of the current frame, confidence coefficient corresponding to the gesture type of the p frames before the current frame and confidence coefficient corresponding to the gesture type of the q frames after the current frame, wherein the current frame is any video frame except the first video frame and the last video frame in a plurality of video frames of the monitoring video;
Determining the gesture type of the current frame after smoothing according to the confidence coefficient corresponding to the gesture type of the current frame, the confidence coefficient corresponding to the gesture type of the p frames before the current frame and the confidence coefficient corresponding to the gesture type of the q frames after the current frame;
and determining a gesture type sequence according to the gesture type of the current frame after smoothing.
In this embodiment, the confidence coefficient corresponding to the gesture type of the current frame, the confidence coefficient corresponding to the gesture type of the p frames before the current frame, and the confidence coefficient corresponding to the gesture type of the q frames after the current frame are compared, and the gesture type with the largest confidence coefficient is selected as the smoothed gesture type.
It can be understood that when determining the gesture type of the current frame, the application considers the gesture type change of the current frame in the front and rear frames, thereby reducing the influence of abnormal gesture types, and determining the gesture type corresponding to each video frame more accurately, namely determining the gesture type sequence more accurately.
In some embodiments, feature extraction is performed on the gesture type sequence by using a second feature extraction module to obtain a second behavior feature, where the second feature extraction module includes:
a graph convolution network, a graph annotation network, a graph self-encoder, a graph generation network, or a graph spatiotemporal network.
It can be understood that the graph convolution network, the graph annotation meaning network, the graph self-encoder, the graph generation network or the graph space-time network can learn node characteristic information and structure information end to end at the same time, can better extract characteristic information of the gesture type sequence, and further can more accurately identify the behavior to be processed based on the characteristic information.
Fig. 4 is a flowchart of still another behavior recognition method according to an embodiment of the present application, as shown in fig. 4, the flowchart including the steps of:
Step S401, based on the monitoring video, acquiring key point information of a current frame, key point information of n frames before the current frame and key point information of m frames after the current frame, wherein the current frame is any video frame except a first video frame and a last video frame in a plurality of video frames of the monitoring video.
Step S402, determining the smoothed key point information of the current frame according to the key point information of the current frame, the key point information of the n frames before the current frame and the key point information of the m frames after the current frame.
Step S403, determining the key point sequence of the target object according to the key point information of the current frame after smoothing.
And step S404, extracting the characteristics of the key point sequence to obtain a first behavior characteristic.
Step S405, determining a gesture type sequence corresponding to the behavior to be processed according to the key point sequence.
Step S406, obtaining a confidence coefficient corresponding to the gesture type of the current frame, a confidence coefficient corresponding to the gesture type of the p frames before the current frame, and a confidence coefficient corresponding to the gesture type of the q frames after the current frame, wherein the current frame is any video frame except the first video frame and the last video frame among a plurality of video frames of the monitoring video.
Step S407, determining the gesture type of the current frame after smoothing according to the confidence coefficient corresponding to the gesture type of the current frame, the confidence coefficient corresponding to the gesture type of the p frames before the current frame and the confidence coefficient corresponding to the gesture type of the q frames after the current frame.
Step S408, determining a gesture type sequence according to the gesture type of the current frame after smoothing.
And S409, extracting features of the gesture type sequence to obtain second behavior features.
Step S410, based on the first behavior feature and the second behavior feature, the behavior to be processed is identified.
Through the steps, when the behavior to be processed is identified, the gesture type sequence with the image information is added, so that the problem that the behavior of the target object cannot be accurately described is solved, and the error rate of the behavior identification to be processed is reduced.
It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein. For example, step S404 and step S405 may be interchanged.
Fig. 5 is a flowchart of yet another behavior recognition method according to an embodiment of the present application, as shown in fig. 5, the flowchart including the steps of:
Step S501, each frame of data in the video data is acquired, a plurality of target objects in each frame of data are detected using the target detection model, and a human body sub-graph corresponding to each target object is determined according to the region where each target object is located.
Step S502, carrying out ID binding on each target object by using a tracking algorithm, so that each target object has a fixed ID, and binding the ID of each target object with the human body subgraph to which the target object belongs to obtain the human body subgraph after binding.
In this embodiment, each human sub-graph has its corresponding target object ID number, and the human sub-graph IDs corresponding to the same target object between the previous and subsequent frames are the same.
Step S503, inputting the current human body sub-graph and the related human body sub-graph of the same ID into the convolutional neural network to obtain the thermodynamic diagram of the current human body sub-graph and the thermodynamic diagram of the related human body sub-graph, wherein the related human body sub-graph is the human body sub-graph in the previous frame data of the current human body sub-graph.
In this embodiment, the current human body sub-graph is the human body sub-graph in the current frame data, the thermodynamic diagram of the current human body sub-graph and the thermodynamic diagram of the related human body sub-graph are cached, and the cached thermodynamic diagram of the current human body sub-graph and the thermodynamic diagram of the related human body sub-graph do not need to be predicted again.
Step S504, subtracting the thermodynamic diagram of the current human body subgraph from the thermodynamic diagram of the related human body subgraph to obtain a residual thermodynamic diagram.
In step S505, the residual thermodynamic diagram is input to a plurality of convolution modules, and the position offset and the mask of the convolution samples are obtained.
Step S506, inputting the thermodynamic diagram of the related human body subgraph, the position offset of the convolution sampling and the mask into the variability convolution network to obtain a fifth human body thermodynamic diagram.
It should be noted that the variability convolution network (Deformable Convolutional Networks, DCN) mainly includes variability convolution kernels (Deformable convolution), whose shape is variable, i.e. the convolution kernels should be clustered in the image region of real interest to achieve this operation, a new convolution kernel needs to be added, and this convolution kernel is applied to the input feature map in order to learn the position offset (offset) of the convolution samples.
In this embodiment, the fifth human body heat map is a heat map for predicting the current human body sub-map based on the associated human body sub-map.
Step S507, fusing the thermodynamic diagram of the current human body subgraph and the fifth human body thermodynamic diagram, determining the thermodynamic diagram of the target object in the current frame, and obtaining the human body key point image corresponding to the target object based on the thermodynamic diagram of the target object in the current frame.
Step S508, determining a human body key point sequence corresponding to the target object in the video data according to the human body key point image corresponding to each frame data.
Step S509, determining a human body posture type sequence corresponding to the target object according to the determined human body key point sequence.
And step S510, smoothing the human body key point sequence and the human body posture type sequence to obtain a smoothed human body key point sequence and a smoothed human body posture type sequence.
In this embodiment, the previous and subsequent multi-frame data are used to perform smoothing processing on the human body key point image of the current frame and the human body posture type of the current frame, where the current frame is any frame data except for the beginning and the end in each frame data. When the human body key point image of the current frame is smoothed, the key point image of the current frame and the key point images of the frames before and after the current frame are weighted and averaged; when the human body posture type of the current frame is smoothed, accumulating the confidence coefficient of different posture types between the front frame and the rear frame of the current frame, and taking the human body posture type corresponding to the maximum confidence coefficient as the smoothed current human body posture type.
In step S511, the smoothed human body key point sequence and the smoothed human body gesture type sequence are input into the behavior recognition module to obtain a human body behavior recognition type corresponding to the target object.
In this embodiment, a schematic diagram of the behavior recognition module is shown in fig. 6, and as shown in fig. 6, the smoothed human body key point sequence and the smoothed human body gesture type sequence are input into the behavior recognition module, the behavior recognition module extracts features in the smoothed human body key point sequence and the smoothed human body gesture type sequence respectively, fuses the extracted features, and classifies the extracted features by using softmax to obtain a human body behavior recognition type corresponding to the target object.
Step S512, determining whether the human behavior recognition type corresponds to the first human gesture type.
In this embodiment, the first human body posture type is a human body posture type corresponding to the last frame of the target object in the video data, if the human body behavior recognition type corresponds to the first human body posture type, step S513 is entered, and if not, step S514 is entered.
Step S513, determines whether the human behavior recognition type is the set abnormal human behavior recognition type.
In this embodiment, if the human body behavior recognition type is the set abnormal human body recognition type, the step S515 is entered, otherwise, the step S514 is entered, for example, if the human body behavior recognition type in the step S511 is smoking, the first human body posture type is that a cigarette is located in front of the lips of the target object, that is, the human body behavior recognition type corresponds to the first human body posture type, and if the human body behavior recognition type is found to be consistent with the set abnormal human body recognition type, an alarm is issued.
Step S514, no alarm is sent.
Step S515, an alarm is issued.
Through the steps, the existing human body behavior recognition method is improved, a human body gesture type sequence with image information is added, and compared with a human body behavior recognition method only using a key point sequence, more information is utilized to predict human body behaviors, so that the accuracy of human body behavior recognition is improved; the prediction effect of human body key points in the current frame is improved by using the historical frame information in the video data, the predicted human body key point images and human body gesture types are smoothed, the influence of abnormal values is weakened, and the accuracy of human body behavior recognition is further improved; in addition, the application does not use 3D convolution, is more friendly to hardware and is more suitable for being deployed to front-end products.
There is also provided in this embodiment an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
Determining a key point sequence of the target object based on a monitoring video containing the to-be-processed behavior of the target object, wherein the key point sequence comprises key point information of the target object in a plurality of video frames of the monitoring video;
Determining a gesture type sequence corresponding to the behavior to be processed according to the key point sequence;
extracting features of the key point sequences to obtain first behavior features;
extracting features of the gesture type sequence to obtain second behavior features;
and identifying the behavior to be processed based on the first behavior feature and the second behavior feature.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.
In addition, in combination with the behavior recognition method provided in the above embodiment, a storage medium may be provided in this embodiment. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the behavior recognition methods of the above embodiments.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure in accordance with the embodiments provided herein.
It is to be understood that the drawings are merely illustrative of some embodiments of the present application and that it is possible for those skilled in the art to adapt the present application to other similar situations without the need for inventive work. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as a departure from the disclosure.
The term "embodiment" in this disclosure means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in the present application can be combined with other embodiments without conflict.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (7)

1. A method of behavior recognition, comprising:
Determining a key point sequence of a target object based on a monitoring video containing a to-be-processed behavior of the target object, wherein the key point sequence comprises key point information of the target object in a plurality of video frames of the monitoring video;
the determining the key point sequence of the target object based on the monitoring video containing the to-be-processed behavior of the target object comprises the following steps:
Acquiring current frame data and associated frame data in the monitoring video, wherein the current frame data is any frame data in a plurality of video frames of the monitoring video, and the associated frame data is the previous frame data of the current frame data; determining a first thermodynamic diagram according to the current frame data; determining a second thermodynamic diagram from the associated frame data; determining a third thermodynamic diagram from the first thermodynamic diagram and the second thermodynamic diagram; wherein the third thermodynamic diagram is a thermodynamic diagram of the target object at the current frame predicted based on the associated frame data; determining a thermodynamic diagram of the target object in the current frame according to the first thermodynamic diagram and the third thermodynamic diagram; acquiring key point information corresponding to the target object in the current frame according to the thermodynamic diagram of the target object in the current frame;
Acquiring key point information of a current frame, key point information of n frames before the current frame and key point information of m frames after the current frame based on the monitoring video, wherein the current frame is any video frame except a first video frame and a last video frame in a plurality of video frames of the monitoring video; determining the smoothed key point information of the current frame according to the key point information of the current frame, the key point information of the n frames before the current frame and the key point information of the m frames after the current frame; determining a key point sequence of the target object according to the smoothed key point information of the current frame;
Determining a gesture type sequence corresponding to the behavior to be processed according to the key point sequence;
The determining the gesture type sequence corresponding to the behavior to be processed according to the key point sequence comprises the following steps:
Acquiring confidence coefficient corresponding to the gesture type of a current frame, confidence coefficient corresponding to the gesture type of a p frame before the current frame and confidence coefficient corresponding to the gesture type of a q frame after the current frame, wherein the current frame is any video frame except a first video frame and a last video frame in a plurality of video frames of the monitoring video; determining the gesture type of the current frame after smoothing according to the confidence coefficient corresponding to the gesture type of the current frame, the confidence coefficient corresponding to the gesture type of the p frame before the current frame and the confidence coefficient corresponding to the gesture type of the q frame after the current frame; determining the gesture type sequence according to the gesture type of the current frame after smoothing;
Extracting features of the key point sequences to obtain first behavior features;
extracting features of the gesture type sequence to obtain a second behavior feature;
and identifying the behavior to be processed based on the first behavior feature and the second behavior feature.
2. The behavior recognition method of claim 1, further comprising:
If the type of the behavior to be processed corresponds to the first gesture type in the result of identifying the behavior to be processed, and the type of the behavior to be processed is determined to be the set abnormal identification type, alarm processing is carried out; the first gesture type is a gesture type corresponding to the last frame of the target object in the monitoring video.
3. The behavior recognition method of claim 1, wherein determining a third thermodynamic diagram from the first thermodynamic diagram and the second thermodynamic diagram comprises:
Determining a fourth thermodynamic diagram based on a difference between the first thermodynamic diagram and the second thermodynamic diagram;
Determining the third thermodynamic diagram based on the second thermodynamic diagram and the fourth thermodynamic diagram.
4. The behavior recognition method of claim 1, wherein determining a thermodynamic diagram of the target object in a current frame from the first thermodynamic diagram and the third thermodynamic diagram comprises:
And fusing the first thermodynamic diagram with the third thermodynamic diagram to obtain the thermodynamic diagram of the target object in the current frame.
5. A method of behavioral recognition according to claim 3, wherein determining the third thermodynamic diagram from the second thermodynamic diagram and the fourth thermodynamic diagram comprises:
Based on a key point extraction module, identifying key points in the second thermodynamic diagram and the fourth thermodynamic diagram, and obtaining key point information corresponding to the second thermodynamic diagram and key point information corresponding to the fourth thermodynamic diagram;
And determining the third thermodynamic diagram according to the key point information corresponding to the second thermodynamic diagram and the key point information corresponding to the fourth thermodynamic diagram.
6. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the behavior recognition method of any one of claims 1 to 5.
7. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the behavior recognition method of any one of claims 1 to 5.
CN202110801176.7A 2021-07-15 2021-07-15 Behavior recognition method, electronic device and storage medium Active CN113657163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110801176.7A CN113657163B (en) 2021-07-15 2021-07-15 Behavior recognition method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110801176.7A CN113657163B (en) 2021-07-15 2021-07-15 Behavior recognition method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN113657163A CN113657163A (en) 2021-11-16
CN113657163B true CN113657163B (en) 2024-06-28

Family

ID=78489488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110801176.7A Active CN113657163B (en) 2021-07-15 2021-07-15 Behavior recognition method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113657163B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596530B (en) * 2022-03-23 2022-11-18 中国航空油料有限责任公司浙江分公司 Airplane refueling intelligent management method and device based on non-contact optical AI
CN115019386B (en) * 2022-04-15 2024-06-14 北京航空航天大学 Exercise assisting training method based on deep learning
CN114821717B (en) * 2022-04-20 2024-03-12 北京百度网讯科技有限公司 Target object fusion method and device, electronic equipment and storage medium
CN114818989B (en) * 2022-06-21 2022-11-08 中山大学深圳研究院 Gait-based behavior recognition method and device, terminal equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197589A (en) * 2018-01-19 2018-06-22 北京智能管家科技有限公司 Semantic understanding method, apparatus, equipment and the storage medium of dynamic human body posture
CN111104816A (en) * 2018-10-25 2020-05-05 杭州海康威视数字技术股份有限公司 Target object posture recognition method and device and camera

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388876B (en) * 2018-03-13 2022-04-22 腾讯科技(深圳)有限公司 Image identification method and device and related equipment
CN112668359A (en) * 2019-10-15 2021-04-16 富士通株式会社 Motion recognition method, motion recognition device and electronic equipment
CN111753643B (en) * 2020-05-09 2024-05-14 北京迈格威科技有限公司 Character gesture recognition method, character gesture recognition device, computer device and storage medium
CN111914661A (en) * 2020-07-06 2020-11-10 广东技术师范大学 Abnormal behavior recognition method, target abnormal recognition method, device, and medium
CN111767888A (en) * 2020-07-08 2020-10-13 北京澎思科技有限公司 Object state detection method, computer device, storage medium, and electronic device
CN112163479A (en) * 2020-09-16 2021-01-01 广州华多网络科技有限公司 Motion detection method, motion detection device, computer equipment and computer-readable storage medium
CN112257567B (en) * 2020-10-20 2023-04-07 浙江大华技术股份有限公司 Training of behavior recognition network, behavior recognition method and related equipment
CN112633196A (en) * 2020-12-28 2021-04-09 浙江大华技术股份有限公司 Human body posture detection method and device and computer equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197589A (en) * 2018-01-19 2018-06-22 北京智能管家科技有限公司 Semantic understanding method, apparatus, equipment and the storage medium of dynamic human body posture
CN111104816A (en) * 2018-10-25 2020-05-05 杭州海康威视数字技术股份有限公司 Target object posture recognition method and device and camera

Also Published As

Publication number Publication date
CN113657163A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN113657163B (en) Behavior recognition method, electronic device and storage medium
CN112949508B (en) Model training method, pedestrian detection method, electronic device, and readable storage medium
CN110443210B (en) Pedestrian tracking method and device and terminal
CN112597941A (en) Face recognition method and device and electronic equipment
CN109063584B (en) Facial feature point positioning method, device, equipment and medium based on cascade regression
WO2022121130A1 (en) Power target detection method and apparatus, computer device, and storage medium
CN108875482B (en) Object detection method and device and neural network training method and device
WO2021129107A1 (en) Depth face image generation method and device, electronic apparatus, and medium
CN112380955B (en) Action recognition method and device
CN115862136A (en) Lightweight filler behavior identification method and device based on skeleton joint
CN111626105A (en) Attitude estimation method and device and electronic equipment
CN111429476A (en) Method and device for determining action track of target person
CN111563245A (en) User identity identification method, device, equipment and medium
CN111159476A (en) Target object searching method and device, computer equipment and storage medium
CN112562159B (en) Access control method and device, computer equipment and storage medium
CN113792700A (en) Storage battery car boxing detection method and device, computer equipment and storage medium
CN113627334A (en) Object behavior identification method and device
CN111382638A (en) Image detection method, device, equipment and storage medium
CN113963202A (en) Skeleton point action recognition method and device, electronic equipment and storage medium
CN110633630B (en) Behavior identification method and device and terminal equipment
CN112668357A (en) Monitoring method and device
CN112508135B (en) Model training method, pedestrian attribute prediction method, device and equipment
CN115830342A (en) Method and device for determining detection frame, storage medium and electronic device
CN111353349B (en) Human body key point detection method and device, electronic equipment and storage medium
CN108932115B (en) Visual image updating method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant