CN114724058A - Method for extracting key frames of fusion characteristic motion video based on human body posture recognition - Google Patents
Method for extracting key frames of fusion characteristic motion video based on human body posture recognition Download PDFInfo
- Publication number
- CN114724058A CN114724058A CN202210245767.5A CN202210245767A CN114724058A CN 114724058 A CN114724058 A CN 114724058A CN 202210245767 A CN202210245767 A CN 202210245767A CN 114724058 A CN114724058 A CN 114724058A
- Authority
- CN
- China
- Prior art keywords
- frame
- video
- extracting
- motion
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000033001 locomotion Effects 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000004927 fusion Effects 0.000 title claims abstract description 24
- 230000003068 static effect Effects 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 9
- 230000007246 mechanism Effects 0.000 claims abstract description 8
- 230000000875 corresponding effect Effects 0.000 claims abstract description 7
- 230000002787 reinforcement Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims abstract description 4
- 230000011218 segmentation Effects 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 210000000988 bone and bone Anatomy 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 2
- 239000004576 sand Substances 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for extracting a fusion characteristic motion video key frame based on human body posture recognition, which comprises the following steps: s1, performing frame-by-frame segmentation on a target video segment; s2, extracting static characteristics by using a residual error network, and performing dimensionality on data to obtain static characteristics of a video frame; s3, extracting the skeleton data of the human body in the three-dimensional spaceExtracting motion characteristics of video frames to obtain motion characteristics Sd(ii) a S4, performing linear weighting processing on the extracted static characteristics and the extracted motion characteristics according to the weight; s5, extracting global features from the fused features through a self-attention mechanism, then calculating the importance of video frames, extracting key frames through corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.
Description
Technical Field
The invention relates to the field of video processing, in particular to a method for extracting a key frame of a motion video with fused features.
Background
For videos, the videos are all an image sequence, the content of the videos is much more than that of one image, the expressive force is strong, the information amount is large, generally, the analysis of the videos is performed after the videos are decomposed into video frames, but the video frames usually have a large amount of redundancy, and the analysis is performed after the video key frames are extracted, so that the operation time can be effectively reduced.
With the development of networks, multimedia information retrieval has more and more influence on various social fields, the traditional video retrieval method can utilize an image retrieval method to carry out retrieval frame by frame, and the method needs to process a large amount of image information and causes great burden on information transmission and calculation. In addition, nowadays when home camera equipment is popularized, a monitoring area needs to be saved frequently, but the saving of video information needs to occupy a large amount of storage space, and the video information is stored in a video key frame mode, so that the authenticity of the video information can be kept, and the space can be saved to a great extent.
For a motion video, the state of a motion object changes frequently, and due to the diversity of motion targets and the similarity of motions, if the condition that detection omission easily occurs in motion features is considered, the deviation of feature extraction may be large, so the method carries out research on a key frame extraction technology on the motion video in a feature fusion mode.
Disclosure of Invention
In order to solve the above problems, the present application provides a method for extracting feature-fused motion video key frames, which utilizes a face recognition technology and a mosaic technology to protect user privacy and improves the accuracy of small-size face recognition to a certain extent.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a method for extracting a fusion characteristic motion video key frame based on human body posture recognition comprises the following steps:
s1, performing frame-by-frame segmentation on a target video segment, and segmenting a video into a series of video frames;
s2, extracting static characteristics by using a residual error network, and performing dimensionality reduction on data to obtain static characteristics S of the video frames=[Ss1,Ss2,…,SsT];
S3, abstracting skeleton data of a human body in a three-dimensional space, and extracting motion characteristics of a video frame to obtain motion characteristics Sd=[Sd1,Sd2,…,SdT];
S4, extracting static characteristics SsAnd motion characteristics SdLinear weighting processing is carried out according to the weight, S is mSs+nSdM and n are respectively the weight factor of the static characteristic and the weight factor of the motion characteristic;
and S5, extracting global features from the fused features through a self-attention mechanism, then calculating the importance of the video frame, extracting key frames of corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.
Further preferably, the specific method of step S3 is as follows:
s31, extracting human skeleton for each frame in the video, and analyzing human posture by using a light-weight HRNet;
s32, using the coordinates and confidence degrees of the bone key points identified in each frame of the video as input, constructing a topological graph according to physical relations among bones, and then carrying out batch normalization processing on the topological graph;
s33, performing feature extraction on the processed data through a plurality of S-GCN units, endowing different weight coefficients for different trunks, and obtaining feature representation S of the videod={Sd1,Sd2,…,SdT}。
Further preferably, the specific method of step S31 is as follows:
s311, each sub-network of each branch at each stage comprises two residual blocks and a multi-resolution fusion module;
s312, replacing all residual blocks in the original network with a Shuffle module of the Shuffle Net, wherein the Shuffle module divides a channel into two parts, one part directly passes through the channel without any convolution operation, and the other part needs to be subjected to deep separable convolution;
s313, replacing convolution in the depth separable convolution by channel weighting, carrying out average pooling downsampling, adjusting to the size same as the minimum resolution, carrying out characteristic fusion of channel addition on the processed characteristic graphs of the i branches with different resolutions, and obtaining a weight matrix W by utilizing an SE moduletThe weight matrix WtAnd performing upsampling operation on each branch, recovering the original size, and weighting the channels.
Further preferably, the specific method of step S5 is as follows:
s51, modeling position information between video frames through bidirectional masks;
s52, after global context information of a video sequence is obtained, calculating a feature matching degree based on global correlation features, and then predicting an importance score of a video frame by adopting a full connection layer;
s53, after the frame score of each video frame is obtained, selecting a key frame for corresponding action through Bernoulli distributiont~B(Y),atExpressed as the probability of having the current frame as the key frame;
and S54, judging the quality of the extracted key frame result set by using reinforcement learning, wherein the state-action value is used as the sum of the importance and diversity of the result set for representation, the importance of the result set is evaluated by using the covering capability of the key frame set on complete video information, and the diversity of the result set is evaluated by using the difference of feature spaces among selected frames.
Further preferably, the specific method of step S51 is as follows:
s511, the forward mask indicates that the attention weight is related to the calculation result before the current position, and the reverse mask indicates that the attention weight is related to the calculation result after the current position;
s512, inputting T frame video X ═ XiI 1, …, T, each frame contains N key points, and through the self-attention mechanism, the correlation coefficient can be calculated
Wherein T, i belongs to [0, T), U and V are weight matrixes of two frames respectively, M is a position coding matrix, triangular information is reserved by a forward mask, triangular information is reserved by a reverse mask, lambda is a characteristic value of a fusion characteristic matrix, and s is a sum of the values of the two matrixestIs the fusion characteristic of the current frame, siIs the fusion feature of the frame before and after the frame;
s513, correlating the coefficientsCombined with the relative position (representing the position relation between the previous frame and the next frame) information of the frame to obtainAnd the front direction and the back direction are fused,mapping back to the original video frame sequence to obtain a sequence c ═ c containing context informationt|t=1,…,T}。
Advantageous effects
(1) The invention provides a motion video key frame extraction technology with characteristics fused through human body posture recognition, spatial graph convolution and characteristic fusion, and meets the requirements on key frame extraction accuracy and integrity.
(2) According to the invention, through providing a mode of extracting video frame features, static features extracted by human body posture recognition and motion features extracted by space graph convolution are fused to be used as final video frame features, so that importance analysis is carried out, and the problem of missing detection and false detection can be effectively avoided.
(3) According to the invention, the HRNet is improved through light weight by replacing the residual error module and adding the attention mechanism, and the calculation amount is greatly reduced on the basis of not losing the accuracy.
Drawings
Fig. 1 is a schematic diagram of a stage of a method for extracting a feature-fused key frame of a motion video according to an embodiment of the present invention;
fig. 2 is a specific schematic diagram of a human body gesture recognition module of the method for extracting a feature-fused motion video key frame according to the embodiment of the present invention;
fig. 3 is a schematic diagram of a key frame extraction result of a method for extracting a feature-fused motion video key frame according to an embodiment of the present invention;
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a method for extracting a feature-fused motion video key frame, which fuses a static feature extracted by a lightweight human body posture recognition algorithm and a motion feature extracted by a space map convolution as shown in figure 1, and improves the accuracy and the integrity of key frame detection, wherein the specific embodiment comprises the following steps:
(1) and performing frame-by-frame segmentation on the target video segment, and segmenting the video into a series of video frames.
(2) In order to better retain original information in an input image and reduce loss, a residual error network ResNet50 is used for static feature extraction, the data dimension is reduced to 256 dimensions, and the obtained static feature of a video frame is represented as Ss=[Ss1,Ss2,…,SsT]。
(3) Abstracting skeleton data of a human body in a three-dimensional space, carrying out human body posture analysis by using a lightweight HRNet algorithm, and then extracting motion characteristics of a video frame by using an ST-GCN network to obtain motion characteristics Sd=[Sd1,Sd2,…,SdT]。
The specific method of the step (3) is as follows:
(3.1) carry out human skeleton to each frame in the video and draw, in order not to aggravate the operation burden excessively when promoting the degree of accuracy, this application has used light-weight HRNet to carry out human attitude analysis.
As shown in fig. 2, the specific method of step (3.1) is as follows:
(3.1.1) for HRNet, the accuracy is greatly higher than that of other bottom-up algorithms, but the obvious defects are that the parameter quantity is large, the operation speed is slow, so the method aims at the obvious defect that light weight improvement is carried out, and the analysis speed is accelerated.
(3.1.2) to make the model as light as possible, the depth and width of the original HRNet network are first reduced, reducing the sub-net of each branch at each stage into two residual blocks and one multi-resolution fusion module.
(3.1.3) replacing all residual blocks in the original network by a Shuffle module of the Shuffle Net, wherein the module divides a channel into two parts, one part directly passes through the channel without any convolution operation, and the other part needs to carry out deep separable convolution.
(3.1.4) the 1 × 1 convolution in the deep separable convolution is replaced by channel weighting, and the effect of information exchange is also achieved, but the time complexity is far lower than that of the 1 × 1 convolution. Adjusting feature map to the same size as the minimum resolution by average pooling downsampling, performing add operation (feature fusion of channel addition) on the processed feature maps of i branches with different resolutions, and then obtaining a weight matrix W by an SE module (comprising two parts of Squeeze and Excitation)tThe weight matrix WtAnd performing upsampling operation on each branch, recovering the original size, and weighting the channels.
And (3.2) taking the coordinates and confidence degrees of the bone key points identified in each frame of the video as input, constructing a topological graph according to physical relations among bones, and then carrying out batch normalization processing on the topological graph to unify scattered data.
(3.3) processing the processed data by 9S-GCN unitsPerforming feature extraction on the elements, endowing different weight coefficients for different trunks, and obtaining feature representation S of the videod={Sd1,Sd2,…,SdT}。
(4) Carrying out linear weighting processing on the extracted static characteristics and the extracted motion characteristics according to the weight, wherein S is mSs+nSdM and n are respectively the weight factor of the static characteristic and the weight factor of the motion characteristic;
(5) extracting global features from the fused features through a self-attention mechanism, then calculating the importance of video frames, extracting key frames of corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.
The specific method of the step (5) is as follows:
(5.1) modeling the position information between the video frames through the bidirectional mask can ensure that the importance of the current video frame is influenced not only by the previous video frame but also by the subsequent video frame.
The specific method of step (5.1) is as follows:
(5.1.1) the forward mask indicates that the weights of attention are related to the calculation results before the current position, and the backward mask indicates that the weights of the current position are related to the calculation results after.
(5.1.2) input T-frame video X ═ { XiI 1, …, T, each frame contains N key points, and through the self-attention mechanism, the correlation coefficient can be calculated
Wherein T, i belongs to [0, T), U and V are weight matrixes of two frames respectively, M is a position coding matrix, the upper triangular information is reserved by a forward mask, and the lower triangular information is reserved by a reverse mask.
(5.1.3) correlation of numbersCombined with the relative position (representing the position relation with the previous and next frames) information of the frame to obtainAnd the front direction and the back direction are fused,mapping back to the original video frame sequence to obtain a sequence c ═ c containing context informationt|t=1,…,T}。
And (5.2) after obtaining the global context information of the video sequence, calculating a feature matching degree based on the global correlation features, and then predicting the importance score of the video frame by adopting the full connection layer.
(5.3) after obtaining the frame score of each video frame, selecting key frames for corresponding actions through Bernoulli distribution at~B(Y),atExpressed as the probability of having the current frame as the key frame.
(5.4) judging the quality of the extracted key frame result set by using reinforcement learning, wherein the quality is characterized by using a state-action value as the sum of the importance and diversity of the result set, evaluating the importance of the result set by using the covering capability of the key frame set on complete video information, and evaluating the diversity of the result set by using the difference size of a feature space between selected frames.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles disclosed herein.
Claims (5)
1. A method for extracting a fusion characteristic motion video key frame based on human body posture recognition is characterized by comprising the following steps:
s1, performing frame-by-frame segmentation on a target video segment, and segmenting a video into a series of video frames;
s2, extracting static characteristics by using a residual error network, and performing dimensionality reduction on data to obtain static characteristics S of the video frames=[Ss1,Ss2,...,SsT];
S3, abstracting skeleton data of a human body in a three-dimensional space, and extracting motion characteristics of a video frame to obtain motion characteristics Sd=[Sd1,Sd2,...,SdT];
S4, extracting static characteristics SsAnd motion characteristics SdLinear weighting processing is carried out according to the weight, and S is mSs+nSdM and n are respectively the weight factor of the static characteristic and the weight factor of the motion characteristic;
and S5, extracting global features from the fused features through a self-attention mechanism, then calculating the importance of the video frame, extracting key frames of corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.
2. The method for extracting the video key frames based on the fusion characteristics and the motion types of the human body posture recognition according to claim 1, wherein the specific method in step S3 is as follows:
s31, extracting human skeleton for each frame in the video, and analyzing human posture by using a light-weight HRNet;
s32, using the coordinates and confidence degrees of the bone key points identified in each frame of the video as input, constructing a topological graph according to physical relations among bones, and then carrying out batch normalization processing on the topological graph;
s33, performing feature extraction on the processed data through a plurality of S-GCN units, endowing different weight coefficients for different trunks, and obtaining feature representation S of the videod={Sd1,Sd2,...,SdT}。
3. The method for extracting the key frame of the fusion feature motion video based on human body posture recognition according to claim 2, wherein the specific method of the step S31 is as follows:
s311, each sub-network of each branch at each stage comprises two residual blocks and a multi-resolution fusion module;
s312, replacing all residual blocks in the original network with a Shuffle module of the Shuffle Net, wherein the Shuffle module divides a channel into two parts, one part directly passes through the channel without any convolution operation, and the other part needs to be subjected to deep separable convolution;
s313, replacing convolution in the depth separable convolution by channel weighting, carrying out average pooling downsampling, adjusting to the size same as the minimum resolution, carrying out characteristic fusion of channel addition on the processed characteristic graphs of the i branches with different resolutions, and obtaining a weight matrix W by utilizing an SE moduletThe weight matrix WtAnd performing upsampling operation on each branch, recovering the original size, and weighting the channels.
4. The method for extracting the key frame of the fusion feature motion video based on the human body posture recognition as claimed in claim l, wherein the specific method of the step S5 is as follows:
s51, modeling position information between video frames through bidirectional masks;
s52, after global context information of a video sequence is obtained, calculating a feature matching degree based on global correlation characteristics, and then predicting an importance score of a video frame by adopting a full connection layer;
s53, after the frame score of each video frame is obtained, selecting a key frame for corresponding action through Bernoulli distributiont~B(Y),atExpressed as the probability of having the current frame as the key frame;
and S54, judging the quality of the extracted key frame result set by using reinforcement learning, representing the sum of the importance and diversity of the result set by using a state-action value, evaluating the importance of the result set by using the covering capability of the key frame set on complete video information, and evaluating the diversity of the result set by using the difference of feature spaces among selected frames.
5. The method for extracting the key frame of the fusion feature motion video based on human body posture recognition according to claim 4, wherein the specific method of the step S51 is as follows:
s511, the forward mask indicates that the attention weight is related to the calculation result before the current position, and the reverse mask indicates that the attention weight is related to the calculation result after the current position;
s512, inputting T frame video X ═ Xi1.. T }, each frame contains N key points, and correlation coefficients are calculated by a self-attention mechanism
Wherein T, i belongs to [0, T), U and V are weight matrixes of two frames respectively, M is a position coding matrix, upper triangular information is reserved by a forward mask, lower triangular information is reserved by a reverse mask, lambda is a characteristic value of a fusion characteristic matrix, and s is a value of a fusion characteristic matrixtIs the fusion characteristic of the current frame, siIs the fusion feature of the frame before and after the frame;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210245767.5A CN114724058B (en) | 2022-03-14 | Human body gesture recognition-based fusion feature motion video key frame extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210245767.5A CN114724058B (en) | 2022-03-14 | Human body gesture recognition-based fusion feature motion video key frame extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114724058A true CN114724058A (en) | 2022-07-08 |
CN114724058B CN114724058B (en) | 2024-11-15 |
Family
ID=
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117726977A (en) * | 2024-02-07 | 2024-03-19 | 南京百伦斯智能科技有限公司 | Experimental operation key node scoring method and system based on DCNN |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228111A (en) * | 2016-07-08 | 2016-12-14 | 天津大学 | A kind of method based on skeleton sequential extraction procedures key frame |
KR20200108548A (en) * | 2019-03-11 | 2020-09-21 | 광운대학교 산학협력단 | A system of compressing the sequence of 3D point clouds and the method thereof |
US20200302180A1 (en) * | 2018-03-13 | 2020-09-24 | Tencent Technology (Shenzhen) Company Limited | Image recognition method and apparatus, terminal, and storage medium |
CN112686153A (en) * | 2020-12-30 | 2021-04-20 | 西安邮电大学 | Three-dimensional skeleton key frame selection method for human behavior recognition |
CN113283400A (en) * | 2021-07-19 | 2021-08-20 | 成都考拉悠然科技有限公司 | Skeleton action identification method based on selective hypergraph convolutional network |
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228111A (en) * | 2016-07-08 | 2016-12-14 | 天津大学 | A kind of method based on skeleton sequential extraction procedures key frame |
US20200302180A1 (en) * | 2018-03-13 | 2020-09-24 | Tencent Technology (Shenzhen) Company Limited | Image recognition method and apparatus, terminal, and storage medium |
KR20200108548A (en) * | 2019-03-11 | 2020-09-21 | 광운대학교 산학협력단 | A system of compressing the sequence of 3D point clouds and the method thereof |
CN112686153A (en) * | 2020-12-30 | 2021-04-20 | 西安邮电大学 | Three-dimensional skeleton key frame selection method for human behavior recognition |
CN113283400A (en) * | 2021-07-19 | 2021-08-20 | 成都考拉悠然科技有限公司 | Skeleton action identification method based on selective hypergraph convolutional network |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117726977A (en) * | 2024-02-07 | 2024-03-19 | 南京百伦斯智能科技有限公司 | Experimental operation key node scoring method and system based on DCNN |
CN117726977B (en) * | 2024-02-07 | 2024-04-12 | 南京百伦斯智能科技有限公司 | Experimental operation key node scoring method and system based on DCNN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network | |
CN109948475B (en) | Human body action recognition method based on skeleton features and deep learning | |
CN112163498B (en) | Method for establishing pedestrian re-identification model with foreground guiding and texture focusing functions and application of method | |
CN112950477A (en) | High-resolution saliency target detection method based on dual-path processing | |
CN113033454B (en) | Method for detecting building change in urban video shooting | |
CN116030498A (en) | Virtual garment running and showing oriented three-dimensional human body posture estimation method | |
CN114821374A (en) | Knowledge and data collaborative driving unmanned aerial vehicle aerial photography target detection method | |
CN114743273B (en) | Human skeleton behavior recognition method and system based on multi-scale residual error map convolution network | |
CN113807232B (en) | Fake face detection method, system and storage medium based on double-flow network | |
CN115482523A (en) | Small object target detection method and system of lightweight multi-scale attention mechanism | |
CN111882495B (en) | Image highlight processing method based on user-defined fuzzy logic and GAN | |
CN116934796B (en) | Visual target tracking method based on twinning residual error attention aggregation network | |
CN112528077A (en) | Video face retrieval method and system based on video embedding | |
CN114724058A (en) | Method for extracting key frames of fusion characteristic motion video based on human body posture recognition | |
CN117876905A (en) | Quick high-accuracy unmanned aerial vehicle aerial photographing target detection method | |
CN116993775A (en) | Pedestrian multi-target tracking method combined with instance segmentation | |
CN114758285B (en) | Video interaction action detection method based on anchor freedom and long-term attention perception | |
CN114724058B (en) | Human body gesture recognition-based fusion feature motion video key frame extraction method | |
CN113627245B (en) | CRTS target detection method | |
CN117058235A (en) | Visual positioning method crossing various indoor scenes | |
CN115965905A (en) | Crowd counting method and system based on multi-scale fusion convolutional network | |
CN115830707A (en) | Multi-view human behavior identification method based on hypergraph learning | |
CN110503061B (en) | Multi-feature-fused multi-factor video occlusion area detection method and system | |
Ranjan et al. | Video Frame Prediction by Joint Optimization of Direct Frame Synthesis and Optical-Flow Estimation | |
CN114882417B (en) | Light LIGHTDIMP single-target tracking method based on dimp tracker |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |