Nothing Special   »   [go: up one dir, main page]

CN114724058A - Method for extracting key frames of fusion characteristic motion video based on human body posture recognition - Google Patents

Method for extracting key frames of fusion characteristic motion video based on human body posture recognition Download PDF

Info

Publication number
CN114724058A
CN114724058A CN202210245767.5A CN202210245767A CN114724058A CN 114724058 A CN114724058 A CN 114724058A CN 202210245767 A CN202210245767 A CN 202210245767A CN 114724058 A CN114724058 A CN 114724058A
Authority
CN
China
Prior art keywords
frame
video
extracting
motion
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210245767.5A
Other languages
Chinese (zh)
Other versions
CN114724058B (en
Inventor
郑艳伟
江文
李博韬
于东晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210245767.5A priority Critical patent/CN114724058B/en
Priority claimed from CN202210245767.5A external-priority patent/CN114724058B/en
Publication of CN114724058A publication Critical patent/CN114724058A/en
Application granted granted Critical
Publication of CN114724058B publication Critical patent/CN114724058B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for extracting a fusion characteristic motion video key frame based on human body posture recognition, which comprises the following steps: s1, performing frame-by-frame segmentation on a target video segment; s2, extracting static characteristics by using a residual error network, and performing dimensionality on data to obtain static characteristics of a video frame; s3, extracting the skeleton data of the human body in the three-dimensional spaceExtracting motion characteristics of video frames to obtain motion characteristics Sd(ii) a S4, performing linear weighting processing on the extracted static characteristics and the extracted motion characteristics according to the weight; s5, extracting global features from the fused features through a self-attention mechanism, then calculating the importance of video frames, extracting key frames through corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.

Description

Method for extracting key frames of fusion characteristic motion video based on human body posture recognition
Technical Field
The invention relates to the field of video processing, in particular to a method for extracting a key frame of a motion video with fused features.
Background
For videos, the videos are all an image sequence, the content of the videos is much more than that of one image, the expressive force is strong, the information amount is large, generally, the analysis of the videos is performed after the videos are decomposed into video frames, but the video frames usually have a large amount of redundancy, and the analysis is performed after the video key frames are extracted, so that the operation time can be effectively reduced.
With the development of networks, multimedia information retrieval has more and more influence on various social fields, the traditional video retrieval method can utilize an image retrieval method to carry out retrieval frame by frame, and the method needs to process a large amount of image information and causes great burden on information transmission and calculation. In addition, nowadays when home camera equipment is popularized, a monitoring area needs to be saved frequently, but the saving of video information needs to occupy a large amount of storage space, and the video information is stored in a video key frame mode, so that the authenticity of the video information can be kept, and the space can be saved to a great extent.
For a motion video, the state of a motion object changes frequently, and due to the diversity of motion targets and the similarity of motions, if the condition that detection omission easily occurs in motion features is considered, the deviation of feature extraction may be large, so the method carries out research on a key frame extraction technology on the motion video in a feature fusion mode.
Disclosure of Invention
In order to solve the above problems, the present application provides a method for extracting feature-fused motion video key frames, which utilizes a face recognition technology and a mosaic technology to protect user privacy and improves the accuracy of small-size face recognition to a certain extent.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a method for extracting a fusion characteristic motion video key frame based on human body posture recognition comprises the following steps:
s1, performing frame-by-frame segmentation on a target video segment, and segmenting a video into a series of video frames;
s2, extracting static characteristics by using a residual error network, and performing dimensionality reduction on data to obtain static characteristics S of the video frames=[Ss1,Ss2,…,SsT];
S3, abstracting skeleton data of a human body in a three-dimensional space, and extracting motion characteristics of a video frame to obtain motion characteristics Sd=[Sd1,Sd2,…,SdT];
S4, extracting static characteristics SsAnd motion characteristics SdLinear weighting processing is carried out according to the weight, S is mSs+nSdM and n are respectively the weight factor of the static characteristic and the weight factor of the motion characteristic;
and S5, extracting global features from the fused features through a self-attention mechanism, then calculating the importance of the video frame, extracting key frames of corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.
Further preferably, the specific method of step S3 is as follows:
s31, extracting human skeleton for each frame in the video, and analyzing human posture by using a light-weight HRNet;
s32, using the coordinates and confidence degrees of the bone key points identified in each frame of the video as input, constructing a topological graph according to physical relations among bones, and then carrying out batch normalization processing on the topological graph;
s33, performing feature extraction on the processed data through a plurality of S-GCN units, endowing different weight coefficients for different trunks, and obtaining feature representation S of the videod={Sd1,Sd2,…,SdT}。
Further preferably, the specific method of step S31 is as follows:
s311, each sub-network of each branch at each stage comprises two residual blocks and a multi-resolution fusion module;
s312, replacing all residual blocks in the original network with a Shuffle module of the Shuffle Net, wherein the Shuffle module divides a channel into two parts, one part directly passes through the channel without any convolution operation, and the other part needs to be subjected to deep separable convolution;
s313, replacing convolution in the depth separable convolution by channel weighting, carrying out average pooling downsampling, adjusting to the size same as the minimum resolution, carrying out characteristic fusion of channel addition on the processed characteristic graphs of the i branches with different resolutions, and obtaining a weight matrix W by utilizing an SE moduletThe weight matrix WtAnd performing upsampling operation on each branch, recovering the original size, and weighting the channels.
Further preferably, the specific method of step S5 is as follows:
s51, modeling position information between video frames through bidirectional masks;
s52, after global context information of a video sequence is obtained, calculating a feature matching degree based on global correlation features, and then predicting an importance score of a video frame by adopting a full connection layer;
s53, after the frame score of each video frame is obtained, selecting a key frame for corresponding action through Bernoulli distributiont~B(Y),atExpressed as the probability of having the current frame as the key frame;
and S54, judging the quality of the extracted key frame result set by using reinforcement learning, wherein the state-action value is used as the sum of the importance and diversity of the result set for representation, the importance of the result set is evaluated by using the covering capability of the key frame set on complete video information, and the diversity of the result set is evaluated by using the difference of feature spaces among selected frames.
Further preferably, the specific method of step S51 is as follows:
s511, the forward mask indicates that the attention weight is related to the calculation result before the current position, and the reverse mask indicates that the attention weight is related to the calculation result after the current position;
s512, inputting T frame video X ═ XiI 1, …, T, each frame contains N key points, and through the self-attention mechanism, the correlation coefficient can be calculated
Figure BDA0003544379630000031
Figure BDA0003544379630000032
Wherein T, i belongs to [0, T), U and V are weight matrixes of two frames respectively, M is a position coding matrix, triangular information is reserved by a forward mask, triangular information is reserved by a reverse mask, lambda is a characteristic value of a fusion characteristic matrix, and s is a sum of the values of the two matrixestIs the fusion characteristic of the current frame, siIs the fusion feature of the frame before and after the frame;
s513, correlating the coefficients
Figure BDA0003544379630000033
Combined with the relative position (representing the position relation between the previous frame and the next frame) information of the frame to obtain
Figure BDA0003544379630000034
And the front direction and the back direction are fused,
Figure BDA0003544379630000035
mapping back to the original video frame sequence to obtain a sequence c ═ c containing context informationt|t=1,…,T}。
Advantageous effects
(1) The invention provides a motion video key frame extraction technology with characteristics fused through human body posture recognition, spatial graph convolution and characteristic fusion, and meets the requirements on key frame extraction accuracy and integrity.
(2) According to the invention, through providing a mode of extracting video frame features, static features extracted by human body posture recognition and motion features extracted by space graph convolution are fused to be used as final video frame features, so that importance analysis is carried out, and the problem of missing detection and false detection can be effectively avoided.
(3) According to the invention, the HRNet is improved through light weight by replacing the residual error module and adding the attention mechanism, and the calculation amount is greatly reduced on the basis of not losing the accuracy.
Drawings
Fig. 1 is a schematic diagram of a stage of a method for extracting a feature-fused key frame of a motion video according to an embodiment of the present invention;
fig. 2 is a specific schematic diagram of a human body gesture recognition module of the method for extracting a feature-fused motion video key frame according to the embodiment of the present invention;
fig. 3 is a schematic diagram of a key frame extraction result of a method for extracting a feature-fused motion video key frame according to an embodiment of the present invention;
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a method for extracting a feature-fused motion video key frame, which fuses a static feature extracted by a lightweight human body posture recognition algorithm and a motion feature extracted by a space map convolution as shown in figure 1, and improves the accuracy and the integrity of key frame detection, wherein the specific embodiment comprises the following steps:
(1) and performing frame-by-frame segmentation on the target video segment, and segmenting the video into a series of video frames.
(2) In order to better retain original information in an input image and reduce loss, a residual error network ResNet50 is used for static feature extraction, the data dimension is reduced to 256 dimensions, and the obtained static feature of a video frame is represented as Ss=[Ss1,Ss2,…,SsT]。
(3) Abstracting skeleton data of a human body in a three-dimensional space, carrying out human body posture analysis by using a lightweight HRNet algorithm, and then extracting motion characteristics of a video frame by using an ST-GCN network to obtain motion characteristics Sd=[Sd1,Sd2,…,SdT]。
The specific method of the step (3) is as follows:
(3.1) carry out human skeleton to each frame in the video and draw, in order not to aggravate the operation burden excessively when promoting the degree of accuracy, this application has used light-weight HRNet to carry out human attitude analysis.
As shown in fig. 2, the specific method of step (3.1) is as follows:
(3.1.1) for HRNet, the accuracy is greatly higher than that of other bottom-up algorithms, but the obvious defects are that the parameter quantity is large, the operation speed is slow, so the method aims at the obvious defect that light weight improvement is carried out, and the analysis speed is accelerated.
(3.1.2) to make the model as light as possible, the depth and width of the original HRNet network are first reduced, reducing the sub-net of each branch at each stage into two residual blocks and one multi-resolution fusion module.
(3.1.3) replacing all residual blocks in the original network by a Shuffle module of the Shuffle Net, wherein the module divides a channel into two parts, one part directly passes through the channel without any convolution operation, and the other part needs to carry out deep separable convolution.
(3.1.4) the 1 × 1 convolution in the deep separable convolution is replaced by channel weighting, and the effect of information exchange is also achieved, but the time complexity is far lower than that of the 1 × 1 convolution. Adjusting feature map to the same size as the minimum resolution by average pooling downsampling, performing add operation (feature fusion of channel addition) on the processed feature maps of i branches with different resolutions, and then obtaining a weight matrix W by an SE module (comprising two parts of Squeeze and Excitation)tThe weight matrix WtAnd performing upsampling operation on each branch, recovering the original size, and weighting the channels.
And (3.2) taking the coordinates and confidence degrees of the bone key points identified in each frame of the video as input, constructing a topological graph according to physical relations among bones, and then carrying out batch normalization processing on the topological graph to unify scattered data.
(3.3) processing the processed data by 9S-GCN unitsPerforming feature extraction on the elements, endowing different weight coefficients for different trunks, and obtaining feature representation S of the videod={Sd1,Sd2,…,SdT}。
(4) Carrying out linear weighting processing on the extracted static characteristics and the extracted motion characteristics according to the weight, wherein S is mSs+nSdM and n are respectively the weight factor of the static characteristic and the weight factor of the motion characteristic;
(5) extracting global features from the fused features through a self-attention mechanism, then calculating the importance of video frames, extracting key frames of corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.
The specific method of the step (5) is as follows:
(5.1) modeling the position information between the video frames through the bidirectional mask can ensure that the importance of the current video frame is influenced not only by the previous video frame but also by the subsequent video frame.
The specific method of step (5.1) is as follows:
(5.1.1) the forward mask indicates that the weights of attention are related to the calculation results before the current position, and the backward mask indicates that the weights of the current position are related to the calculation results after.
(5.1.2) input T-frame video X ═ { XiI 1, …, T, each frame contains N key points, and through the self-attention mechanism, the correlation coefficient can be calculated
Figure BDA0003544379630000051
Wherein T, i belongs to [0, T), U and V are weight matrixes of two frames respectively, M is a position coding matrix, the upper triangular information is reserved by a forward mask, and the lower triangular information is reserved by a reverse mask.
(5.1.3) correlation of numbers
Figure BDA0003544379630000052
Combined with the relative position (representing the position relation with the previous and next frames) information of the frame to obtain
Figure BDA0003544379630000053
And the front direction and the back direction are fused,
Figure BDA0003544379630000054
mapping back to the original video frame sequence to obtain a sequence c ═ c containing context informationt|t=1,…,T}。
And (5.2) after obtaining the global context information of the video sequence, calculating a feature matching degree based on the global correlation features, and then predicting the importance score of the video frame by adopting the full connection layer.
(5.3) after obtaining the frame score of each video frame, selecting key frames for corresponding actions through Bernoulli distribution at~B(Y),atExpressed as the probability of having the current frame as the key frame.
(5.4) judging the quality of the extracted key frame result set by using reinforcement learning, wherein the quality is characterized by using a state-action value as the sum of the importance and diversity of the result set, evaluating the importance of the result set by using the covering capability of the key frame set on complete video information, and evaluating the diversity of the result set by using the difference size of a feature space between selected frames.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles disclosed herein.

Claims (5)

1. A method for extracting a fusion characteristic motion video key frame based on human body posture recognition is characterized by comprising the following steps:
s1, performing frame-by-frame segmentation on a target video segment, and segmenting a video into a series of video frames;
s2, extracting static characteristics by using a residual error network, and performing dimensionality reduction on data to obtain static characteristics S of the video frames=[Ss1,Ss2,...,SsT];
S3, abstracting skeleton data of a human body in a three-dimensional space, and extracting motion characteristics of a video frame to obtain motion characteristics Sd=[Sd1,Sd2,...,SdT];
S4, extracting static characteristics SsAnd motion characteristics SdLinear weighting processing is carried out according to the weight, and S is mSs+nSdM and n are respectively the weight factor of the static characteristic and the weight factor of the motion characteristic;
and S5, extracting global features from the fused features through a self-attention mechanism, then calculating the importance of the video frame, extracting key frames of corresponding actions through a Bernoulli function, and optimizing a result set by using reinforcement learning.
2. The method for extracting the video key frames based on the fusion characteristics and the motion types of the human body posture recognition according to claim 1, wherein the specific method in step S3 is as follows:
s31, extracting human skeleton for each frame in the video, and analyzing human posture by using a light-weight HRNet;
s32, using the coordinates and confidence degrees of the bone key points identified in each frame of the video as input, constructing a topological graph according to physical relations among bones, and then carrying out batch normalization processing on the topological graph;
s33, performing feature extraction on the processed data through a plurality of S-GCN units, endowing different weight coefficients for different trunks, and obtaining feature representation S of the videod={Sd1,Sd2,...,SdT}。
3. The method for extracting the key frame of the fusion feature motion video based on human body posture recognition according to claim 2, wherein the specific method of the step S31 is as follows:
s311, each sub-network of each branch at each stage comprises two residual blocks and a multi-resolution fusion module;
s312, replacing all residual blocks in the original network with a Shuffle module of the Shuffle Net, wherein the Shuffle module divides a channel into two parts, one part directly passes through the channel without any convolution operation, and the other part needs to be subjected to deep separable convolution;
s313, replacing convolution in the depth separable convolution by channel weighting, carrying out average pooling downsampling, adjusting to the size same as the minimum resolution, carrying out characteristic fusion of channel addition on the processed characteristic graphs of the i branches with different resolutions, and obtaining a weight matrix W by utilizing an SE moduletThe weight matrix WtAnd performing upsampling operation on each branch, recovering the original size, and weighting the channels.
4. The method for extracting the key frame of the fusion feature motion video based on the human body posture recognition as claimed in claim l, wherein the specific method of the step S5 is as follows:
s51, modeling position information between video frames through bidirectional masks;
s52, after global context information of a video sequence is obtained, calculating a feature matching degree based on global correlation characteristics, and then predicting an importance score of a video frame by adopting a full connection layer;
s53, after the frame score of each video frame is obtained, selecting a key frame for corresponding action through Bernoulli distributiont~B(Y),atExpressed as the probability of having the current frame as the key frame;
and S54, judging the quality of the extracted key frame result set by using reinforcement learning, representing the sum of the importance and diversity of the result set by using a state-action value, evaluating the importance of the result set by using the covering capability of the key frame set on complete video information, and evaluating the diversity of the result set by using the difference of feature spaces among selected frames.
5. The method for extracting the key frame of the fusion feature motion video based on human body posture recognition according to claim 4, wherein the specific method of the step S51 is as follows:
s511, the forward mask indicates that the attention weight is related to the calculation result before the current position, and the reverse mask indicates that the attention weight is related to the calculation result after the current position;
s512, inputting T frame video X ═ Xi1.. T }, each frame contains N key points, and correlation coefficients are calculated by a self-attention mechanism
Figure FDA0003544379620000021
Figure FDA0003544379620000022
Wherein T, i belongs to [0, T), U and V are weight matrixes of two frames respectively, M is a position coding matrix, upper triangular information is reserved by a forward mask, lower triangular information is reserved by a reverse mask, lambda is a characteristic value of a fusion characteristic matrix, and s is a value of a fusion characteristic matrixtIs the fusion characteristic of the current frame, siIs the fusion feature of the frame before and after the frame;
s513. correlation coefficient is calculated
Figure FDA0003544379620000023
Combined with the relative position information of the frame to obtain
Figure FDA0003544379620000024
And the front direction and the reverse direction are fused,
Figure FDA0003544379620000025
mapping back to the original video frame sequence to obtain a sequence c ═ c containing context informationt|t=1,...,T}。
CN202210245767.5A 2022-03-14 Human body gesture recognition-based fusion feature motion video key frame extraction method Active CN114724058B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210245767.5A CN114724058B (en) 2022-03-14 Human body gesture recognition-based fusion feature motion video key frame extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210245767.5A CN114724058B (en) 2022-03-14 Human body gesture recognition-based fusion feature motion video key frame extraction method

Publications (2)

Publication Number Publication Date
CN114724058A true CN114724058A (en) 2022-07-08
CN114724058B CN114724058B (en) 2024-11-15

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726977A (en) * 2024-02-07 2024-03-19 南京百伦斯智能科技有限公司 Experimental operation key node scoring method and system based on DCNN

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228111A (en) * 2016-07-08 2016-12-14 天津大学 A kind of method based on skeleton sequential extraction procedures key frame
KR20200108548A (en) * 2019-03-11 2020-09-21 광운대학교 산학협력단 A system of compressing the sequence of 3D point clouds and the method thereof
US20200302180A1 (en) * 2018-03-13 2020-09-24 Tencent Technology (Shenzhen) Company Limited Image recognition method and apparatus, terminal, and storage medium
CN112686153A (en) * 2020-12-30 2021-04-20 西安邮电大学 Three-dimensional skeleton key frame selection method for human behavior recognition
CN113283400A (en) * 2021-07-19 2021-08-20 成都考拉悠然科技有限公司 Skeleton action identification method based on selective hypergraph convolutional network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228111A (en) * 2016-07-08 2016-12-14 天津大学 A kind of method based on skeleton sequential extraction procedures key frame
US20200302180A1 (en) * 2018-03-13 2020-09-24 Tencent Technology (Shenzhen) Company Limited Image recognition method and apparatus, terminal, and storage medium
KR20200108548A (en) * 2019-03-11 2020-09-21 광운대학교 산학협력단 A system of compressing the sequence of 3D point clouds and the method thereof
CN112686153A (en) * 2020-12-30 2021-04-20 西安邮电大学 Three-dimensional skeleton key frame selection method for human behavior recognition
CN113283400A (en) * 2021-07-19 2021-08-20 成都考拉悠然科技有限公司 Skeleton action identification method based on selective hypergraph convolutional network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726977A (en) * 2024-02-07 2024-03-19 南京百伦斯智能科技有限公司 Experimental operation key node scoring method and system based on DCNN
CN117726977B (en) * 2024-02-07 2024-04-12 南京百伦斯智能科技有限公司 Experimental operation key node scoring method and system based on DCNN

Similar Documents

Publication Publication Date Title
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
CN109948475B (en) Human body action recognition method based on skeleton features and deep learning
CN112163498B (en) Method for establishing pedestrian re-identification model with foreground guiding and texture focusing functions and application of method
CN112950477A (en) High-resolution saliency target detection method based on dual-path processing
CN113033454B (en) Method for detecting building change in urban video shooting
CN116030498A (en) Virtual garment running and showing oriented three-dimensional human body posture estimation method
CN114821374A (en) Knowledge and data collaborative driving unmanned aerial vehicle aerial photography target detection method
CN114743273B (en) Human skeleton behavior recognition method and system based on multi-scale residual error map convolution network
CN113807232B (en) Fake face detection method, system and storage medium based on double-flow network
CN115482523A (en) Small object target detection method and system of lightweight multi-scale attention mechanism
CN111882495B (en) Image highlight processing method based on user-defined fuzzy logic and GAN
CN116934796B (en) Visual target tracking method based on twinning residual error attention aggregation network
CN112528077A (en) Video face retrieval method and system based on video embedding
CN114724058A (en) Method for extracting key frames of fusion characteristic motion video based on human body posture recognition
CN117876905A (en) Quick high-accuracy unmanned aerial vehicle aerial photographing target detection method
CN116993775A (en) Pedestrian multi-target tracking method combined with instance segmentation
CN114758285B (en) Video interaction action detection method based on anchor freedom and long-term attention perception
CN114724058B (en) Human body gesture recognition-based fusion feature motion video key frame extraction method
CN113627245B (en) CRTS target detection method
CN117058235A (en) Visual positioning method crossing various indoor scenes
CN115965905A (en) Crowd counting method and system based on multi-scale fusion convolutional network
CN115830707A (en) Multi-view human behavior identification method based on hypergraph learning
CN110503061B (en) Multi-feature-fused multi-factor video occlusion area detection method and system
Ranjan et al. Video Frame Prediction by Joint Optimization of Direct Frame Synthesis and Optical-Flow Estimation
CN114882417B (en) Light LIGHTDIMP single-target tracking method based on dimp tracker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant