Nothing Special   »   [go: up one dir, main page]

CN114612936A - Unsupervised abnormal behavior detection method based on background suppression - Google Patents

Unsupervised abnormal behavior detection method based on background suppression Download PDF

Info

Publication number
CN114612936A
CN114612936A CN202210252961.6A CN202210252961A CN114612936A CN 114612936 A CN114612936 A CN 114612936A CN 202210252961 A CN202210252961 A CN 202210252961A CN 114612936 A CN114612936 A CN 114612936A
Authority
CN
China
Prior art keywords
dimensional
layer
convolution
activation function
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210252961.6A
Other languages
Chinese (zh)
Other versions
CN114612936B (en
Inventor
路文
李玎
朱志强
朱振杰
何立火
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210252961.6A priority Critical patent/CN114612936B/en
Publication of CN114612936A publication Critical patent/CN114612936A/en
Application granted granted Critical
Publication of CN114612936B publication Critical patent/CN114612936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an unsupervised abnormal behavior detection method based on background suppression, which comprises the following steps: (1) acquiring a training sample set and a test sample set; (2) constructing an unsupervised abnormal behavior detection network model; (3) carrying out iterative training on the unsupervised abnormal behavior detection network model H; (4) defining unsupervised abnormal behavior detection network model H*Is abnormal score ofA function score; (5) and acquiring an abnormal behavior detection result. The unsupervised abnormal behavior detection network model constructed by the invention overcomes the defect that the influence of the background characteristics of the video frames on algorithm perception and the influence of the marking accuracy of the training set on supervised learning are not considered in the prior art, and improves the abnormal behavior identification accuracy of the abnormal behavior detection method.

Description

Unsupervised abnormal behavior detection method based on background suppression
Technical Field
The invention belongs to the technical field of computer vision, and relates to an abnormal behavior detection method, in particular to an unsupervised road monitoring video abnormal behavior detection method based on background suppression.
Background
Road monitoring is the most convenient and direct way to observe the behavior of passerby, and as the number of traffic accidents caused by the fact that passerby use sidewalks not according to traffic regulations increases, urgent needs for detecting abnormal behavior of passerby are generated.
In recent years, with the rapid development of deep learning and source data sets, intelligent monitoring equipment is correspondingly developed, abnormal behavior detection is the most widely applied function of the current intelligent monitoring equipment in daily life, and reliable safety guarantee is provided for the daily work and life of people. However, in the process of detecting passers-by, the current intelligent monitoring equipment with a built-in detection algorithm is easily influenced by factors such as ambient light, background targets, background similar characteristics and the like, and in addition, if a supervision abnormal behavior detection algorithm is adopted, the accuracy of a used manual labeling data set also influences the algorithm, finally, inevitable interference is introduced, the accuracy of abnormal behavior detection is reduced, and the robustness of the algorithm is weakened. Therefore, the accuracy of abnormal behavior detection and the robustness of the algorithm are important indexes for evaluating the performance of the abnormal behavior detection algorithm.
In the patent document "abnormal behavior detection method based on deep learning" (patent application number: CN 202110611720.1; application publication number: CN113361370A) applied by Nanjing industry university, an abnormal behavior detection method based on deep learning is disclosed, which includes the steps of firstly, obtaining an RGB image of an actual scene by using a camera, then, detecting pedestrians in a current video frame by using a YOLOv5 algorithm, outputting position information, confidence and category of a detection frame, performing cascade matching on adjacent frame targets by using a constructed appearance feature network to obtain a matched track, and finally, deleting, creating and tracking a track result by using a Kalman prediction method to obtain a final track and matching the final track with a next frame, so that the cycle is performed. The method has the disadvantages that firstly, the method does not consider the influence of the background characteristics of the video frame on algorithm perception, so that the accuracy of the abnormal behavior detection algorithm is influenced under the interference of background information, secondly, the YOLOv5 algorithm adopted in the method is a supervision algorithm, and the accuracy of the detection algorithm is also influenced by the labeling accuracy of pedestrians in a manually labeled data set when the YOLOv5 algorithm is trained.
In patent document "a violent abnormal behavior detection method based on deep learning" applied by the university of Harbin's rational engineering (patent application No. CN 202110224967.8; application publication No. CN113191182A), a violent abnormal behavior detection method is proposed. The method comprises the steps of firstly carrying out framing processing on videos in a data set to obtain video frames, then stacking a plurality of continuous frames to form a cube, extracting three-dimensional features in the cube by using a three-dimensional convolution neural network, carrying out feature fusion, and judging whether the extracted features have the features of forbidden articles such as knives, guns, sticks and sticks by using a YOLO algorithm. The method has two disadvantages that firstly, the method does not fully consider the interference of similar background information characteristics in the actual life scene on the foreground information. Secondly, the YOLO algorithm adopted in the method is a supervised algorithm, and the accuracy of the labeling of pedestrians in a manually labeled data set can also influence the accuracy of the detection algorithm when the YOLO algorithm is trained.
Disclosure of Invention
The invention aims to provide an unsupervised abnormal behavior detection method based on background suppression aiming at the defects of the prior art, and the unsupervised abnormal behavior detection method is used for solving the technical problem of low detection accuracy caused by neglecting the background information of the video to be detected and manually dividing a data set in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) acquiring a training sample set and a testing sample set:
(1a) randomly selecting M personal sidewalk monitoring videos for decomposition to obtain M frame sequence sets,
Figure BDA0003547448580000021
wherein
Figure BDA0003547448580000022
Denotes the m-th contains KmA sequence of frames of a frame of the image,
Figure BDA0003547448580000023
vkto represent
Figure BDA0003547448580000028
The K-th frame image, M is more than or equal to 200, Km≥100;
(1b) From a set S of frame sequencesv1Each frame sequence involved
Figure BDA0003547448580000024
Respectively screened N only containing pedestrian walking eventsmThe frame images form a normal behavior frame sequence
Figure BDA0003547448580000025
And all normal behavior frame sequences contained in the M frame sequences form a training sample set BtrainThen will be
Figure BDA0003547448580000026
P remaining inmAbnormal behavior frame sequence formed by frame images
Figure BDA0003547448580000027
Then all abnormal behavior frame sequences are combined into a test sample set BtestWherein N ism≥Pm,Pm=Km-Nm
(2) Constructing an unsupervised abnormal behavior detection network model H:
(2a) constructing an unsupervised abnormal behavior detection network model H of a background suppression module, a prediction module and a background suppression constraint module which are connected in sequence, wherein the output end of the background suppression module is also connected with a context memory module; wherein:
the prediction module comprises a space encoder, a convolution long-term and short-term memory module and a decoder which are sequentially connected, wherein the space encoder adopts a feature extraction network comprising a plurality of two-dimensional convolution layers and a plurality of activation function layers; the convolution long-term and short-term memory module adopts a memory convolution neural network comprising a plurality of two-dimensional convolution layers, a plurality of tensor decomposition layers and a plurality of activation function layers; the decoder adopts a transposed convolutional neural network comprising a plurality of two-dimensional transposed convolutional layers and a plurality of activation function layers;
the context memory module comprises a motion matching encoder and a memory module which are connected in sequence, wherein the motion matching encoder adopts a three-dimensional convolutional neural network comprising a plurality of three-dimensional convolutional layers, a plurality of activation function layers, a plurality of three-dimensional maximum pooling layers and 1 three-dimensional average pooling layer;
the output end of the memory module in the context memory module is connected with the input end of the decoder in the prediction module;
(2b) background suppression loss function L defining a background suppression constraint moduleBGSBackground constrained loss function LrestrainMinimum square error L2Minimum absolute value deviation L1
Figure BDA0003547448580000031
Figure BDA0003547448580000032
Figure BDA0003547448580000033
Lrestrain=LBGS+L2+L1
Wherein | · | purple sweet1Representing 1 norm, Binary (·) representing binarization,
Figure BDA0003547448580000034
to represent
Figure BDA0003547448580000035
The result of the prediction of (a) is,
Figure BDA0003547448580000036
to represent
Figure BDA0003547448580000037
The nth frame image of (1);
(3) carrying out iterative training on the unsupervised abnormal behavior detection network model H:
(3a) the initial iteration time is T, the maximum iteration time is T, T is more than or equal to 80, and the parameter of the T-th iteration feature extraction network is thetaG1_tThe memory convolutional neural network parameter is thetaG2_tTransposed convolutional neural network parameter of θG3_tThe three-dimensional convolution neural network parameter is thetaG4_tLet t be 1;
(3b) will train sample set BtrainObtaining the t-th iteration time frame sequence as the input of an unsupervised abnormal behavior detection network model H
Figure BDA0003547448580000038
Predicted result of (2)
Figure BDA0003547448580000039
(3b1) Background suppression module pair training sample set BtrainOf each normal behavior frame sequence
Figure BDA0003547448580000041
Each normal behavior frame image in (1)
Figure BDA0003547448580000042
Making background informationInhibiting to obtain M frame sequences after background inhibition;
(3b2) frame sequence with background suppression by spatial coder in prediction module
Figure BDA0003547448580000043
Each frame image in the image processing system is subjected to feature extraction, and a convolution long-term and short-term memory module pair
Figure BDA0003547448580000044
Feature tensor of all extracted feature components
Figure BDA0003547448580000045
Decomposing to obtain
Figure BDA0003547448580000046
Characteristic information of
Figure BDA0003547448580000047
And store, c is [2, M-1 ]];
(3b3) Context memorization module for frame division sequence
Figure BDA0003547448580000048
Extracting features of each frame image in M-1 normal behavior frame sequences except the image sequence
Figure BDA0003547448580000049
The features of all previous frame images constitute the above information
Figure BDA00035474485800000410
And store, at the same time, will
Figure BDA00035474485800000411
The features of all the subsequent frame images constitute context information
Figure BDA00035474485800000412
And storing;
(3b4) the decoder in the prediction module is used for the step (3b2)Characteristic information of
Figure BDA00035474485800000413
And the above information obtained in step (3b3)
Figure BDA00035474485800000414
And context information
Figure BDA00035474485800000415
Decoding to obtain the t-th iteration time frame sequence
Figure BDA00035474485800000416
Predicted result of (2)
Figure BDA00035474485800000417
(3c) Background suppression constraint module pairs prediction results
Figure BDA00035474485800000418
And normal behavior frame sequences
Figure BDA00035474485800000419
Normal behavior frame image in
Figure BDA00035474485800000420
Performing binarization processing to obtain prediction result at t moment
Figure BDA00035474485800000421
Is performed on the binary image
Figure BDA00035474485800000422
Nth normal behavior frame image
Figure BDA00035474485800000423
Is performed on the binary image
Figure BDA00035474485800000424
(3d) Using a background suppression loss function LBGSBy passing
Figure BDA00035474485800000425
And
Figure BDA00035474485800000426
calculate HtBackground suppression loss value L ofBGSAnd using a background constrained loss function LrestrainThrough LBGS、L2And L1Calculate HtIs a background constraint loss value Lrestrain
(3e) Using a counter-propagating method and passing through LrestrainCalculate HtGradient of network parameters, then by a random gradient descent method through HtNetwork parameter gradient of (a) to network parameter thetaG1_t、θG2_t、θG3_t、θG4_tUpdating to obtain the unsupervised abnormal behavior detection network model H of the iterationt
(3f) Judging whether T is more than or equal to T, if so, obtaining a trained unsupervised abnormal behavior detection network model H*Otherwise, let t be t +1, HtH, and performing step (3 b);
(4) acquiring an abnormal behavior detection result:
(4a) set B of test samplestestSequence of the c-th anomalously behaving frame
Figure BDA0003547448580000051
Unsupervised abnormal behavior detection network model H as trained*Is forward propagated to obtain
Figure BDA0003547448580000052
Predicted frame image of
Figure BDA0003547448580000053
Figure BDA0003547448580000054
(4b) Using an anomaly score function score and by predicting the frame image
Figure BDA0003547448580000055
And frame image
Figure BDA0003547448580000056
Computing
Figure BDA0003547448580000057
And judging whether F and a preset abnormal score detection threshold I meet the condition that F is not less than I, if so, judging that F is not less than I
Figure BDA0003547448580000058
There is abnormal behavior, whereas there is no abnormal behavior, wherein:
Figure BDA0003547448580000059
Figure BDA00035474485800000510
compared with the prior art, the invention has the following advantages:
firstly, in the invention, because the constructed abnormal behavior detection network model comprises a background suppression module and a background suppression constraint module, in the process of training the model and acquiring the detection result, in consideration of the influence of background target characteristic information on foreground abnormal detection, the abnormal behavior detection network model firstly weakens static background information by means of the background suppression module, then suppresses dynamic background information by means of the background suppression constraint module, finally strengthens the information of the foreground target, avoids the false detection defect caused by only considering the foreground information and neglecting the background information in the prior art, and effectively improves the detection accuracy.
Secondly, the invention realizes unsupervised abnormal behavior detection by means of a spatial encoder and a decoder due to the fact that a prediction module contained in the constructed abnormal behavior detection network model is connected with the spatial encoder, the convolution long-term and short-term memory module and the decoder in sequence, and overcomes the influence of the accuracy of a manual labeling data set on supervised learning, so that the invention has the advantage of strong robustness under different data sets.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Fig. 2 is a schematic structural diagram of an abnormal behavior detection network model constructed by the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
Referring to fig. 1, the present invention includes the steps of:
step 1) obtaining a training sample set and a testing sample set:
(1a) randomly selecting M personal sidewalk monitoring videos for decomposition to obtain M frame sequence sets,
Figure BDA0003547448580000061
wherein
Figure BDA0003547448580000062
Denotes the m-th contains KmA sequence of frames of a frame of the image,
Figure BDA0003547448580000063
vkto represent
Figure BDA0003547448580000064
The K-th frame image, M is more than or equal to 200, Km≥100;
In this example, experiments show that when M is 200, the training speed is fast, and the detection effect of the model is good.
(1b) From a set S of frame sequencesv1Each frame sequence involved
Figure BDA0003547448580000065
Respectively screened N only containing pedestrian walking eventsmThe frame images form a normal behavior frame sequence
Figure BDA0003547448580000066
And all normal behavior frame sequences contained in the M frame sequences form a training sample set BtrainThen will be
Figure BDA0003547448580000067
P remaining inmAbnormal behavior frame sequence formed by frame images
Figure BDA0003547448580000068
Then all abnormal behavior frame sequences are combined into a test sample set BtestWherein N ism≥Pm,Pm=Km-Nm
In this example, walking of a pedestrian appearing in the sidewalk monitoring video is defined as a normal behavior, and riding a bicycle and a skateboard are defined as an abnormal behavior.
Step 2), constructing an unsupervised abnormal behavior detection network model H:
(2a) constructing an unsupervised abnormal behavior detection network model H of a background suppression module, a prediction module and a background suppression constraint module which are connected in sequence, wherein the output end of the background suppression module is also connected with a context memory module; the prediction module comprises a space encoder, a convolution long-term and short-term memory module and a decoder which are sequentially connected, wherein the space encoder adopts a feature extraction network comprising a plurality of two-dimensional convolution layers and a plurality of activation function layers; the convolution long-term and short-term memory module adopts a memory convolution neural network comprising a plurality of two-dimensional convolution layers, a plurality of tensor decomposition layers and a plurality of activation function layers; the decoder adopts a transposed convolutional neural network comprising a plurality of two-dimensional transposed convolutional layers and a plurality of activation function layers; the context memory module comprises a motion matching encoder and a memory module which are connected in sequence, and the output end of the memory module is connected with the input end of a decoder in the video prediction module; the motion matching encoder adopts a three-dimensional convolution neural network comprising a plurality of three-dimensional convolution layers, a plurality of activation function layers, a plurality of three-dimensional maximum pooling layers and a three-dimensional average pooling layer;
the number of the two-dimensional convolution layer and the number of the activation function layer which are contained in the space encoder are both 4, and the specific structure of the space encoder is as follows: the first two-dimensional convolution layer → the first activation function layer → the second two-dimensional convolution layer → the second activation function layer → the third two-dimensional convolution layer → the third activation function layer → the fourth two-dimensional convolution layer → the fourth activation function layer; wherein the input channel of the first two-dimensional convolutional layer is 1, the output channel is 64, and the step length is 2; the input channel of the second two-dimensional convolutional layer is 64, the output channel is 64, and the step length is 1; the third two-dimensional convolutional layer has an input channel of 64, an output channel of 128 and a step length of 2; the fourth two-dimensional convolutional layer has an input channel of 128, an output channel of 128 and a step length of 1; the convolution kernels used by the 4 two-dimensional convolution layers are all 3 multiplied by 3 in size; the 4 activation function layers all adopt ELU functions;
because each frame sequence in the example is obtained after the video decomposition, the frame image feature information in the frame sequence has strong correlation, and compared with the prior art in which only a common convolutional neural network is used for extracting the frame image feature information, the example uses a spatial encoder pair
Figure BDA0003547448580000071
The feature extraction is carried out on each frame image, so that the extracted feature information has strong relevance, and the feature information can obtain better decoding effect when being decoded in a decoder.
The convolution long-term memory module, it contains that the number of two-dimentional convolution layer and tensor decomposition layer is 2, and the number of activation function layer is 3, and concrete structure is: the first two-dimensional convolution layer → the second two-dimensional convolution layer → the first tensor decomposition layer → the second tensor decomposition layer → the first activation function layer → the second activation function layer → the third activation function layer; wherein the first two-dimensional convolutional layer and the second two-dimensional convolutional layer are the same, the input channel is 128, and the output channel is 128; 3 activation function layers all adopt sigmoid functions;
the decoder, its two-dimentional transposition convolution layer that contains number is 4, and the number of activation function layer is 3, and the concrete structure is: a first two-dimensional transposed convolution layer → a first activation function layer → a second two-dimensional transposed convolution layer → a second activation function layer → a third two-dimensional transposed convolution layer → a third activation function layer → a fourth two-dimensional transposed convolution layer; wherein the input channel of the first two-dimensional transpose convolution layer is 256, the output channel is 128, and the step length is 1; the second two-dimensional transpose convolution layer has an input channel of 128, an output channel of 64, and a step size of 2; the third two-dimensional transpose convolution layer has 64 input channels, 64 output channels and 1 step length; the fourth two-dimensional transpose convolution layer has an input channel of 64, an output channel of 1 and a step length of 1; convolution kernels used by the 4 two-dimensional transposition convolution layers are all 3 multiplied by 3 in the same size, and 3 activation function layers all adopt ELU functions;
the motion matching encoder comprises 6 three-dimensional convolution layers and 6 activation function layers, wherein the number of three-dimensional maximum pooling layers is 4, the number of three-dimensional average pooling layers is 1, and the specific structure is as follows: the first three-dimensional convolution layer → the first activation function layer → the first three-dimensional maximum pooling layer → the second three-dimensional convolution layer → the second activation function layer → the second three-dimensional maximum pooling layer → the third three-dimensional convolution layer → the third activation function layer → the fourth three-dimensional convolution layer → the fourth activation function layer → the third three-dimensional maximum pooling layer → the fifth three-dimensional convolution layer → the fifth activation function layer → the sixth three-dimensional convolution layer → the sixth activation function layer → the fourth three-dimensional maximum pooling layer → the average three-dimensional pooling layer; wherein the first three-dimensional convolution layer input channel is 1, and the output channel is 64; the second three-dimensional convolutional layer has an input channel of 64 and an output channel of 128; the third three-dimensional convolution layer has an input channel of 128 and an output channel of 256; the input channel of the fourth three-dimensional convolution layer is 256, and the output channel is 256; the input channel of the fifth three-dimensional convolution layer is 256, and the output channel is 512; the input channel of the sixth three-dimensional convolution layer is 512, and the output channel is 512; the step lengths are all 1; convolution kernels used by the 6 three-dimensional convolution layers are all 3 multiplied by 3 in size; the size of the first three-dimensional maximum pooling layer pooling core is 1 multiplied by 2, and the step length is 1 multiplied by 2; the sizes of the second three-dimensional maximum pooling layer pooling core, the third three-dimensional maximum pooling layer pooling core and the fourth three-dimensional maximum pooling layer pooling core are all 2 multiplied by 2, and the step lengths are all 2 multiplied by 2; the average three-dimensional pooling layer convolution kernel size is 1 multiplied by 2; the 6 activation function layers all adopt a ReLU function;
(2b) background suppression loss function L defining a background suppression constraint moduleBGSBackground constrained loss function LrestrainMinimum square error L2Minimum absolute value deviation L1
Figure BDA0003547448580000081
Figure BDA0003547448580000082
Figure BDA0003547448580000083
Lrestrain=LBGS+L2+L1
Wherein | · | purple sweet1Representing 1 norm, Binary (·) representing binarization,
Figure BDA0003547448580000084
to represent
Figure BDA0003547448580000085
The result of the prediction of (a) is,
Figure BDA0003547448580000086
to represent
Figure BDA0003547448580000087
The nth frame image of (1);
in this example, the loss function L is constrained if the backgroundrestrainUsing only the least square error L2And background rejection loss function LBGSCalculating loss of unsupervised abnormal behavior detection network model, although prediction result can be guaranteed
Figure BDA0003547448580000088
And normal behavior frame images
Figure BDA0003547448580000089
Image ofSimilarity of elements, but also ease of prediction
Figure BDA00035474485800000810
Blurring occurs and therefore to alleviate
Figure BDA00035474485800000811
Will deviate from the minimum absolute value by L1A background constraint penalty function L is also addedrestrainAnd calculating the loss of the unsupervised abnormal behavior detection network model.
Step 3) carrying out iterative training on the unsupervised abnormal behavior detection network model H:
(3a) the initial iteration time is T, the maximum iteration time is T, T is more than or equal to 80, and the parameter of the T-th iteration feature extraction network is thetaG1_tThe memory convolutional neural network parameter is thetaG2_tTransposed convolutional neural network parameter of θG3_tThe three-dimensional convolution neural network parameter is thetaG4_tLet t be 1;
in this example, when the maximum iteration number is T100, the trained unsupervised abnormal behavior detection network model has the best detection effect;
(3b) will train sample set BtrainObtaining the t-th iteration time frame sequence as the input of an unsupervised abnormal behavior detection network model H
Figure BDA0003547448580000091
Predicted result of (2)
Figure BDA0003547448580000092
(3b1) Background suppression module pair training sample set BtrainOf each normal behavior frame sequence
Figure BDA0003547448580000093
Each normal behavior frame image in (1)
Figure BDA0003547448580000094
Performing background information suppression, and suppressing all background informationThe frame image of (2) constitutes a frame image sequence, and the implementation steps are as follows:
background suppression module pair training sample set BtrainOf each normal behavior frame sequence
Figure BDA0003547448580000095
Each normal behavior frame image in (1)
Figure BDA0003547448580000096
Adjusting the illumination of the frame image by gamma correction, and correcting the gamma-corrected frame image
Figure BDA0003547448580000097
Gaussian filtering is carried out to remove noise points in the frame image, and then the frame image after Gaussian filtering is carried out
Figure BDA0003547448580000098
Performing laplacian sharpening to inhibit background information to obtain a frame image with the background information inhibited
Figure BDA0003547448580000099
(3b2) Frame sequence with background suppression by spatial coder in prediction module
Figure BDA00035474485800000910
Each frame image in the image processing system is subjected to feature extraction, and a convolution long-term and short-term memory module pair
Figure BDA00035474485800000911
Feature tensor composed of all extracted features
Figure BDA00035474485800000912
Decomposing to obtain
Figure BDA00035474485800000913
Characteristic information of
Figure BDA00035474485800000919
And store,c∈[2,M-1]The process is as follows:
spatial encoder pairs frame sequences by convolutional layers and activation function layers in feature extraction networks
Figure BDA00035474485800000914
Each frame image in the image processing system is subjected to feature extraction and stacked to obtain a feature tensor
Figure BDA00035474485800000915
The convolution long-short term memory module utilizes convolution layer, tensor decomposition layer and activation function layer pair
Figure BDA00035474485800000916
Decomposing to obtain characteristic information
Figure BDA00035474485800000917
(3b3) Context memorization module for frame division sequence
Figure BDA00035474485800000918
Extracting features of each frame image in M-1 normal behavior frame sequences except the image sequence
Figure BDA0003547448580000101
The features of all previous frame images constitute the above information
Figure BDA0003547448580000102
And store while at the same time
Figure BDA0003547448580000103
The features of all subsequent frame images constitute context information
Figure BDA0003547448580000104
And storing, the process is as follows:
for dividing frame sequence
Figure BDA0003547448580000105
Besides, all the frames are combinedEach frame image in the sequence is subjected to feature extraction by means of a three-dimensional convolutional neural network and the extracted features are encoded, and the frame sequence
Figure BDA0003547448580000106
All previous frame sequences
Figure BDA0003547448580000107
As the above information
Figure BDA0003547448580000108
And storing, a sequence of frames
Figure BDA0003547448580000109
All subsequent frame sequences
Figure BDA00035474485800001010
As the following information
Figure BDA00035474485800001011
And stored.
(3b4) The decoder in the prediction module compares the feature information obtained in step (3b2)
Figure BDA00035474485800001012
And the above information obtained in step (3b3)
Figure BDA00035474485800001013
And context information
Figure BDA00035474485800001014
Decoding to obtain the t-th iteration time frame sequence
Figure BDA00035474485800001015
Predicted result of (2)
Figure BDA00035474485800001016
The process is as follows:
decoder for the above information by means of transposed convolutional neural networks
Figure BDA00035474485800001017
Context information
Figure BDA00035474485800001018
And frame sequences
Figure BDA00035474485800001019
Characteristic information of
Figure BDA00035474485800001020
The formed tensors are transposed and decoded to obtain the frame sequence of the t iteration time
Figure BDA00035474485800001021
Predicted result of (2)
Figure BDA00035474485800001022
The decoder in the prediction module in this example uses simultaneously the sequence of frames extracted by the spatial encoder
Figure BDA00035474485800001023
The characteristic information and the characteristic information obtained by extracting the characteristics of other frame sequences are decoded by the motion matching encoder, so that the prediction results are more various, and the intelligent degree of the model is higher.
(3c) Background suppression constraint module pairs prediction results
Figure BDA00035474485800001024
And normal behavior frame sequences
Figure BDA00035474485800001025
Normal behavior frame image in
Figure BDA00035474485800001026
Performing binarization processing to obtain prediction result at t moment
Figure BDA00035474485800001027
Is generated from the binary image
Figure BDA00035474485800001028
Nth normal behavior frame image
Figure BDA00035474485800001029
Is generated from the binary image
Figure BDA00035474485800001030
Predicted results
Figure BDA00035474485800001031
And normal behavior frame sequences
Figure BDA00035474485800001032
Normal behavior frame image in
Figure BDA00035474485800001033
The background suppression constraint module performs binarization processing to change all pixel values of the frame image which are not 0 to 1.
Because the foreground object and the background object both move continuously in the video, and the change of the pixel value is continuous, when the moving object passes through a certain area, the pixel value of the area changes, and the fluctuation of the pixel value is also taken as potential feature extraction in the process of extracting the feature by the algorithm, thereby causing false detection.
In this example, the binarization process would be to normally-behave frame images
Figure BDA0003547448580000111
And predicting the result
Figure BDA0003547448580000112
All the pixel values which are not 0 in the background image are changed into 1, and then the problem that the pixel value of a moving target passing area is not 0 caused by target motion is solved through the difference frame of the two pixel values, so that dynamic background information is suppressed, and the accuracy of detection is improved.
(3d) Using a background suppression loss function LBGSDisclosure of the inventionFor treating
Figure BDA0003547448580000113
And
Figure BDA0003547448580000114
calculate HtBackground suppression loss value L ofBGSAnd using a background constrained loss function LrestrainThrough LBGS、L2And L1Calculate HtIs a background constraint loss value Lrestrain
(3e) Using a counter-propagating method and passing through LrestrainCalculate HtGradient of network parameters, then by H using a random gradient descent methodtNetwork parameter gradient of (a) to network parameter thetaG1_t、θG2_t、θG3_t、θG4_tUpdating to obtain the unsupervised abnormal behavior detection network model H of the iterationt
(3f) Judging whether T is more than or equal to T, if so, obtaining a trained unsupervised abnormal behavior detection network model H*Otherwise, let t be t +1, HtH, and performing step (3 b);
stochastic gradient descent algorithm through HtNetwork parameter gradient pair HtCharacteristic extraction network parameter theta ofG1_tMemorizing the convolution neural network parameter thetaG2_tTransposed convolutional neural network parameter θG3_tThree-dimensional convolutional neural network parameter thetaG4_tUpdating, wherein the updating formula is as follows:
Figure BDA0003547448580000115
Figure BDA00035474485800001110
mt=β1·vt-1+(1-β1)·gt
Figure BDA0003547448580000116
Figure BDA0003547448580000117
Figure BDA0003547448580000118
wherein: gtIs the gradient at the number of iterations t,
Figure BDA0003547448580000119
extracting network parameters theta for features, respectivelyG1_tMemorizing the convolution neural network parameter thetaG2_tTransposed convolutional neural network parameter θG3_tThree-dimensional convolutional neural network parameter thetaG4_tUpdated parameters, { fti(θ) | i ═ 1,2,3,4} is the parameter θGi_tObjective function of, beta1,β2Exponential decay rates of the first and second moments, { m }, respectivelyti1,2,3,4 is HtFirst moment estimation of network parameter gradients, { v }tiI | ═ 1,2,3,4} is for HtAn estimate of the second moment of the gradient of the network parameter,
Figure BDA0003547448580000121
is a pair { mtiCorrection of i | 1,2,3,4},
Figure BDA0003547448580000122
is betaiTo the power of t of (a),
Figure BDA0003547448580000123
for { vtiCorrection of | i ═ 1,2,3,4 [ { α ]iI | ═ 1,2,3,4} is the learning rate, { εiI | ═ 1,2,3,4} is a constant added to maintain numerical stability.
(3f) Judging whether T is more than or equal to T, if so, obtaining a trained unsupervised abnormal behavior detection network model H*Otherwise, let t be t +1,Hth, and performing step (3 b);
step 4), obtaining an abnormal behavior detection result:
(4a) set B of test samplestestSequence of the c-th anomalously behaving frame
Figure BDA0003547448580000124
Unsupervised abnormal behavior detection network model H as trained*Is forward propagated to obtain
Figure BDA0003547448580000125
Predicted frame image of
Figure BDA0003547448580000126
Figure BDA0003547448580000127
(4b) Using an anomaly score function score and by predicting the frame image
Figure BDA0003547448580000128
And frame image
Figure BDA0003547448580000129
Calculating out
Figure BDA00035474485800001210
And judging whether F and a preset abnormal score detection threshold I meet the condition that F is not less than I, if so, judging that F is not less than I
Figure BDA00035474485800001211
There is abnormal behavior, whereas there is no abnormal behavior, wherein:
Figure BDA00035474485800001212
Figure BDA00035474485800001213
the effect of the present invention will be further explained with reference to the following experiments:
1. the experimental conditions are as follows:
the hardware platform of the experiment of the invention is as follows: 2 blocks of NVIDIA GeForce GTX 2080Ti GPU.
The software platform of the experiment of the invention is as follows: ubuntu 16 operating system, Pytorch 1.7 framework, Python 3.8.
The data set used for the experiment was the ShanghaiTech data set, which had a total of 437 videos, each with different lighting conditions and camera angles.
2. Analysis of experimental contents and results thereof:
(1) evaluation index
The main evaluation index in the field of video monitoring abnormal behavior detection is the Area Under the Curve (AUC) of Receiver Operating Characteristic Curve (ROC). The ROC takes the false positive rate as the abscissa and the true positive rate as the ordinate. The false positive rate refers to the probability of predicting as a positive sample in all negative samples, and the true positive rate refers to the probability of predicting as a positive sample in all positive samples. The closer the ROC is to the upper left corner, the larger the AUC value, and the better the performance of the algorithm model. For the abnormal behavior detection task, AUC values are calculated based on image-level abnormality scores.
(3) Results and analysis of the experiments
The experiment is mainly used for verifying the advantages of the method and other existing abnormal behavior detection methods in the aspect of detection accuracy. In the experiment, various abnormal behavior detection methods are adopted to train and test on a ShanghaiTech data set, and finally, an evaluation index AUC on the data set is obtained.
Table 1 experimental results of different algorithms on ShanghaiTech dataset
Method AUC
Conv-AE 60.9%
StackedRNN 68%
Liuetal. 72.8%
VEC 74.8%
HF2-VED 76.2%
The invention 76.5%
As can be seen from the experimental results of table 1, the present invention has higher accuracy compared to the prior art.
In conclusion, compared with the prior art, the method has higher detection accuracy rate on the abnormal behavior, and has important practical significance. While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (4)

1. An unsupervised abnormal behavior detection method based on background suppression is characterized by comprising the following steps:
(1) acquiring a training sample set and a testing sample set:
(1a) randomly selecting M personal sidewalk monitoring videos for decomposition to obtain M frame sequence sets,
Figure FDA0003547448570000011
wherein
Figure FDA0003547448570000012
Denotes the m-th contains KmA sequence of frames of a frame of the image,
Figure FDA0003547448570000013
vkto represent
Figure FDA0003547448570000014
The K-th frame image, M is more than or equal to 200, Km≥100;
(1b) From a set S of frame sequencesv1Each frame sequence involved
Figure FDA0003547448570000015
Respectively screened N only containing pedestrian walking eventsmThe frame images form a normal behavior frame sequence
Figure FDA0003547448570000016
And all normal behavior frame sequences contained in the M frame sequences form a training sample set BtrainThen will be
Figure FDA0003547448570000017
P remaining inmAbnormal behavior frame sequence formed by frame images
Figure FDA0003547448570000018
Then all abnormal behavior frame sequences are combined into a test sample set BtestWherein N ism≥Pm,Pm=Km-Nm
(2) Constructing an unsupervised abnormal behavior detection network model H:
(2a) constructing an unsupervised abnormal behavior detection network model H of a background suppression module, a prediction module and a background suppression constraint module which are connected in sequence, wherein the output end of the background suppression module is also connected with a context memory module; wherein:
the prediction module comprises a space encoder, a convolution long-term and short-term memory module and a decoder which are sequentially connected, wherein the space encoder adopts a feature extraction network comprising a plurality of two-dimensional convolution layers and a plurality of activation function layers; the convolution long-term and short-term memory module adopts a memory convolution neural network comprising a plurality of two-dimensional convolution layers, a plurality of tensor decomposition layers and a plurality of activation function layers; the decoder adopts a transposed convolutional neural network comprising a plurality of two-dimensional transposed convolutional layers and a plurality of activation function layers;
the context memory module comprises a motion matching encoder and a memory module which are connected in sequence, wherein the motion matching encoder adopts a three-dimensional convolutional neural network comprising a plurality of three-dimensional convolutional layers, a plurality of activation function layers, a plurality of three-dimensional maximum pooling layers and 1 three-dimensional average pooling layer;
the output end of the memory module in the context memory module is connected with the input end of the decoder in the prediction module;
(2b) background suppression loss function L defining a background suppression constraint moduleBGSBackground constrained loss function LrestrainMinimum square error L2Minimum absolute value deviation L1
Figure FDA0003547448570000021
Figure FDA0003547448570000022
Figure FDA0003547448570000023
Lrestrain=LBGS+L2+L1
Wherein | · | charging1Representing 1 norm, Binary (·) representing binarization,
Figure FDA0003547448570000024
to represent
Figure FDA0003547448570000025
The result of the prediction of (a) is,
Figure FDA0003547448570000026
to represent
Figure FDA0003547448570000027
The nth frame image of (1);
(3) carrying out iterative training on the unsupervised abnormal behavior detection network model H:
(3a) the initial iteration time is T, the maximum iteration time is T, T is more than or equal to 80, and the parameter of the T-th iteration feature extraction network is thetaG1_tMemory convolutional neural network parameter is thetaG2_tTransposed convolutional neural network parameter of θG3_tThe three-dimensional convolution neural network parameter is thetaG4_tAnd let t equal to 1;
(3b) will train sample set BtrainObtaining the t-th iteration time frame sequence as the input of an unsupervised abnormal behavior detection network model H
Figure FDA0003547448570000028
Predicted result of (2)
Figure FDA0003547448570000029
(3b1) Background suppression module pair training sample set BtrainIn each of the normal behavior frame sequences
Figure FDA00035474485700000210
Each normal behavior frame image in (1)
Figure FDA00035474485700000211
Inhibiting background information, and forming all frame images with the suppressed background information into a frame image sequence;
(3b2) frame sequence with background suppression by spatial coder in prediction module
Figure FDA00035474485700000212
Each frame image in the image processing system is subjected to feature extraction, and a convolution long-term and short-term memory module pair
Figure FDA00035474485700000213
Feature tensor composed of all extracted features
Figure FDA00035474485700000214
Decomposing to obtain
Figure FDA00035474485700000215
Characteristic information of
Figure FDA00035474485700000216
And store, c is [2, M-1 ]];
(3b3) Context memorization module for frame division sequence
Figure FDA00035474485700000217
Extracting features of each frame image in M-1 normal behavior frame sequences except the image sequence
Figure FDA00035474485700000218
The features of all previous frame images constitute the above information
Figure FDA0003547448570000031
And store while at the same time
Figure FDA0003547448570000032
All subsequent framesFeatures of an image constitute contextual information
Figure FDA0003547448570000033
And storing;
(3b4) the decoder in the prediction module compares the feature information obtained in step (3b2)
Figure FDA0003547448570000034
And the above information obtained in step (3b3)
Figure FDA0003547448570000035
And context information
Figure FDA0003547448570000036
Decoding to obtain the t-th iteration time frame sequence
Figure FDA0003547448570000037
Predicted result of (2)
Figure FDA0003547448570000038
(3c) Background suppression constraint module pairs prediction results
Figure FDA0003547448570000039
And normal behavior frame sequences
Figure FDA00035474485700000310
Normal behavior frame image in
Figure FDA00035474485700000311
Performing binarization processing to obtain prediction result at t moment
Figure FDA00035474485700000312
Is generated from the binary image
Figure FDA00035474485700000313
Nth normal behavior frame image
Figure FDA00035474485700000314
Is generated from the binary image
Figure FDA00035474485700000315
(3d) Using a background suppression loss function LBGSBy passing
Figure FDA00035474485700000316
And
Figure FDA00035474485700000317
calculating HtBackground suppression loss value L ofBGSAnd using a background constrained loss function LrestrainThrough LBGS、L2And L1Calculate HtIs a background constraint loss value Lrestrain
(3e) Using a counter-propagating method and passing through LrestrainCalculate HtGradient of network parameters, then by H using a random gradient descent methodtNetwork parameter gradient of (a) to network parameter thetaG1_t、θG2_t、θG3_t、θG4_tUpdating to obtain the unsupervised abnormal behavior detection network model H of the iterationt
(3f) Judging whether T is more than or equal to T, if so, obtaining a trained unsupervised abnormal behavior detection network model H*Otherwise, let t be t +1, HtH, and performing step (3 b);
(4) acquiring an abnormal behavior detection result:
(4a) set B of test samplestestSequence of the c-th anomalously behaving frame
Figure FDA00035474485700000318
Unsupervised abnormal behavior detection network model H as trained*Is forward propagated to obtain
Figure FDA00035474485700000319
Predicted frame image of
Figure FDA00035474485700000320
Figure FDA00035474485700000321
(4b) Using an anomaly score function score and by predicting the frame image
Figure FDA00035474485700000322
And frame image
Figure FDA00035474485700000323
Computing
Figure FDA00035474485700000324
And judging whether F and a preset abnormal score detection threshold I meet the condition that F is not less than I, if so, judging that F is not less than I
Figure FDA0003547448570000041
Abnormal behavior is present, whereas abnormal behavior is absent, wherein:
Figure FDA0003547448570000042
Figure FDA0003547448570000043
2. the background suppression-based unsupervised abnormal behavior detection method according to claim 1, wherein the unsupervised abnormal behavior detection network model H in step (2a) is a network model H in which:
the number of the two-dimensional convolution layer and the number of the activation function layer which are contained in the space encoder are both 4, and the specific structure of the space encoder is as follows: the first two-dimensional convolution layer → the first activation function layer → the second two-dimensional convolution layer → the second activation function layer → the third two-dimensional convolution layer → the third activation function layer → the fourth two-dimensional convolution layer → the fourth activation function layer; wherein the input channel of the first two-dimensional convolutional layer is 1, the output channel is 64, and the step length is 2; the input channel of the second two-dimensional convolutional layer is 64, the output channel is 64, and the step length is 1; the third two-dimensional convolutional layer has an input channel of 64, an output channel of 128 and a step length of 2; the fourth two-dimensional convolutional layer has an input channel of 128, an output channel of 128 and a step length of 1; the convolution kernels used by the 4 two-dimensional convolution layers are all 3 multiplied by 3 in size; the 4 activation function layers all adopt ELU functions;
convolution long short-term memory module, it contains the number that two-dimentional convolution layer and tensor decompose the layer and is 2, and the number of activation function layer is 3, and concrete structure is: the first two-dimensional convolution layer → the second two-dimensional convolution layer → the first tensor decomposition layer → the second tensor decomposition layer → the first activation function layer → the second activation function layer → the third activation function layer; wherein the first two-dimensional convolutional layer and the second two-dimensional convolutional layer are the same, the input channel is 128, and the output channel is 128; 3 activation function layers all adopt sigmoid functions;
the decoder, its two-dimentional transposition convolution layer that contains number is 4, and the number of activation function layer is 3, and the concrete structure is: a first two-dimensional transposed convolution layer → a first activation function layer → a second two-dimensional transposed convolution layer → a second activation function layer → a third two-dimensional transposed convolution layer → a third activation function layer → a fourth two-dimensional transposed convolution layer; wherein the input channel of the first two-dimensional transpose convolution layer is 256, the output channel is 128, and the step length is 1; the second two-dimensional transpose convolution layer has an input channel of 128, an output channel of 64, and a step size of 2; the third two-dimensional transpose convolution layer has 64 input channels, 64 output channels and 1 step length; the fourth two-dimensional transpose convolution layer has an input channel of 64, an output channel of 1 and a step length of 1; convolution kernels used by the 4 two-dimensional transposition convolution layers are all 3 multiplied by 3 in the same size, and 3 activation function layers all adopt ELU functions;
the motion matching encoder comprises 6 three-dimensional convolution layers and 6 activation function layers, wherein the number of three-dimensional maximum pooling layers is 4, the number of three-dimensional average pooling layers is 1, and the specific structure is as follows: the first three-dimensional convolution layer → the first activation function layer → the first three-dimensional maximum pooling layer → the second three-dimensional convolution layer → the second activation function layer → the second three-dimensional maximum pooling layer → the third three-dimensional convolution layer → the third activation function layer → the fourth three-dimensional convolution layer → the fourth activation function layer → the third three-dimensional maximum pooling layer → the fifth three-dimensional convolution layer → the fifth activation function layer → the sixth three-dimensional convolution layer → the sixth activation function layer → the fourth three-dimensional maximum pooling layer → the average three-dimensional pooling layer; wherein the input channel of the first three-dimensional convolution layer is 1, and the output channel is 64; the second three-dimensional convolutional layer has an input channel of 64 and an output channel of 128; the third three-dimensional convolution layer has an input channel of 128 and an output channel of 256; the input channel of the fourth three-dimensional convolution layer is 256, and the output channel is 256; the fifth three-dimensional convolution layer input channel is 256 and the output channel is 512; the input channel of the sixth three-dimensional convolution layer is 512, and the output channel is 512; the step lengths are all 1; convolution kernels used by the 6 three-dimensional convolution layers are all 3 multiplied by 3 in size; the size of the first three-dimensional maximum pooling layer pooling core is 1 multiplied by 2, and the step length is 1 multiplied by 2; the sizes of the second three-dimensional maximum pooling layer pooling core, the third three-dimensional maximum pooling layer pooling core and the fourth three-dimensional maximum pooling layer pooling core are all 2 multiplied by 2, and the step lengths are all 2 multiplied by 2; the average three-dimensional pooling layer convolution kernel size is 1 multiplied by 2; the 6 activation function layers all adopt a ReLU function.
3. The background suppression-based unsupervised abnormal behavior detection method according to claim 1, wherein the background suppression module in step (3B1) applies the training sample set BtrainIn each of the normal behavior frame sequences
Figure FDA0003547448570000051
Each normal behavior frame image in (1)
Figure FDA0003547448570000052
The background information suppression is carried out, and the implementation steps are as follows:
background suppression module pair training sample set BtrainEach is normalBehavioral frame sequences
Figure FDA0003547448570000053
Each normal behavior frame image in (1)
Figure FDA0003547448570000054
Performing gamma correction, and subjecting the gamma-corrected frame image
Figure FDA0003547448570000055
Performing Gaussian filtering, and performing Gaussian filtering on the frame image
Figure FDA0003547448570000056
Performing Laplace sharpening to obtain a frame image with suppressed background information
Figure FDA0003547448570000057
4. The background suppression-based unsupervised abnormal behavior detection method according to claim 1, characterized in that: step (3e) is performed by a random gradient descent method through HtNetwork parameter gradient of (a) to network parameter thetaG1_t、θG2_t、θG3_t、θG4_tUpdating is carried out; the update formula is:
Figure FDA0003547448570000061
gt=▽θftt-1)
mt=β1·vt-1+(1-β1)·gt
Figure FDA0003547448570000062
Figure FDA0003547448570000063
Figure FDA0003547448570000064
wherein: g is a radical of formulatIs the gradient at the number of iterations t,
Figure FDA0003547448570000065
extracting network parameters theta for features, respectivelyG1_tMemorizing the convolution neural network parameter thetaG2_tTransposed convolution neural network parameter θG3_tThree-dimensional convolutional neural network parameter thetaG4_tUpdated parameters, { fti(θ) | i ═ 1,2,3,4} is the parameter θGi_tObjective function of, beta1,β2Exponential decay rates of the first and second moments, { m }, respectivelyti1,2,3,4 is HtFirst moment estimation of network parameter gradients, { v }tiI | ═ 1,2,3,4} is for HtAn estimate of the second moment of the gradient of the network parameter,
Figure FDA0003547448570000066
is a pair { mtiCorrection of 1,2,3,4 |,
Figure FDA0003547448570000067
is betaiTo the power of t of (a),
Figure FDA0003547448570000068
for { vtiCorrection of | i ═ 1,2,3,4 { α }iI | ═ 1,2,3,4} is the learning rate, { εiI | ═ 1,2,3,4} is a constant added to maintain numerical stability.
CN202210252961.6A 2022-03-15 2022-03-15 Non-supervision abnormal behavior detection method based on background suppression Active CN114612936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210252961.6A CN114612936B (en) 2022-03-15 2022-03-15 Non-supervision abnormal behavior detection method based on background suppression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210252961.6A CN114612936B (en) 2022-03-15 2022-03-15 Non-supervision abnormal behavior detection method based on background suppression

Publications (2)

Publication Number Publication Date
CN114612936A true CN114612936A (en) 2022-06-10
CN114612936B CN114612936B (en) 2024-08-23

Family

ID=81862820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210252961.6A Active CN114612936B (en) 2022-03-15 2022-03-15 Non-supervision abnormal behavior detection method based on background suppression

Country Status (1)

Country Link
CN (1) CN114612936B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024055948A1 (en) * 2022-09-14 2024-03-21 北京数慧时空信息技术有限公司 Improved unsupervised remote-sensing image abnormality detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103264A1 (en) * 2014-06-24 2017-04-13 Sportlogiq Inc. System and Method for Visual Event Description and Event Analysis
CN111832516A (en) * 2020-07-22 2020-10-27 西安电子科技大学 Video behavior identification method based on unsupervised video representation learning
CN113032778A (en) * 2021-03-02 2021-06-25 四川大学 Semi-supervised network abnormal behavior detection method based on behavior feature coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103264A1 (en) * 2014-06-24 2017-04-13 Sportlogiq Inc. System and Method for Visual Event Description and Event Analysis
CN111832516A (en) * 2020-07-22 2020-10-27 西安电子科技大学 Video behavior identification method based on unsupervised video representation learning
CN113032778A (en) * 2021-03-02 2021-06-25 四川大学 Semi-supervised network abnormal behavior detection method based on behavior feature coding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANITHA RAMCHANDRAN等: "Unsupervised deep learning system for local anomaly event detection in crowded scenes", 《 MULTIMEDIA TOOLS AND APPLICATIONS》, 12 May 2019 (2019-05-12) *
李玎: "面向监控视频的无监督异常事件检测算法研究与应用", 《万方数据》, 6 July 2023 (2023-07-06) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024055948A1 (en) * 2022-09-14 2024-03-21 北京数慧时空信息技术有限公司 Improved unsupervised remote-sensing image abnormality detection method

Also Published As

Publication number Publication date
CN114612936B (en) 2024-08-23

Similar Documents

Publication Publication Date Title
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN108765506B (en) Layer-by-layer network binarization-based compression method
CN104268594B (en) A kind of video accident detection method and device
CN111861925B (en) Image rain removing method based on attention mechanism and door control circulation unit
CN110120064B (en) Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning
CN112597815A (en) Synthetic aperture radar image ship detection method based on Group-G0 model
CN114882434A (en) Unsupervised abnormal behavior detection method based on background suppression
CN106529419A (en) Automatic detection method for significant stack type polymerization object in video
CN113378775B (en) Video shadow detection and elimination method based on deep learning
CN107424175B (en) Target tracking method combined with space-time context information
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
Wang et al. Fast infrared maritime target detection: Binarization via histogram curve transformation
CN111008608B (en) Night vehicle detection method based on deep learning
CN110929635A (en) False face video detection method and system based on face cross-over ratio under trust mechanism
Cai et al. A real-time smoke detection model based on YOLO-smoke algorithm
CN111368634A (en) Human head detection method, system and storage medium based on neural network
CN112634171B (en) Image defogging method and storage medium based on Bayesian convolutional neural network
CN114612936A (en) Unsupervised abnormal behavior detection method based on background suppression
CN111144220B (en) Personnel detection method, device, equipment and medium suitable for big data
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN116189096A (en) Double-path crowd counting method of multi-scale attention mechanism
CN111079572A (en) Forest smoke and fire detection method based on video understanding, storage medium and equipment
CN114462490A (en) Retrieval method, retrieval device, electronic device and storage medium of image object
CN105872859A (en) Video compression method based on moving target trajectory extraction of object
CN115375966A (en) Image countermeasure sample generation method and system based on joint loss function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant