CN112766165B - Falling pre-judging method based on deep neural network and panoramic segmentation - Google Patents
Falling pre-judging method based on deep neural network and panoramic segmentation Download PDFInfo
- Publication number
- CN112766165B CN112766165B CN202110076029.8A CN202110076029A CN112766165B CN 112766165 B CN112766165 B CN 112766165B CN 202110076029 A CN202110076029 A CN 202110076029A CN 112766165 B CN112766165 B CN 112766165B
- Authority
- CN
- China
- Prior art keywords
- image
- segmentation
- neural network
- data
- deep neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides a falling prediction method based on the combination of a deep neural network and a panoramic segmentation method, which can efficiently and quickly realize the falling detection prediction function, and is combined with the deep neural network and the image panoramic segmentation method to carry out short-term real-time evaluation and notification on the impending falling risk and carry out long-term behavior learning and prediction on the future risk. The invention adopts a Deep Neural Network (DNN) in deep learning to construct a panoramic segmentation network, and then carries out pixel-level segmentation on a video image in falling detection through an image panoramic segmentation algorithm, thereby realizing scene understanding of cared persons and the environment conditions and realizing falling prejudgment on dangerous environments.
Description
Technical Field
The invention relates to the field of intelligent communication, in particular to a falling prediction method based on the combination of a deep neural network and an image panorama segmentation algorithm.
Background
At present, there are many explorations on fall prediction methods based on computer vision at home and abroad, and the fall prediction methods can be divided into three types according to different algorithms and implementation methods: (1) and (3) attitude estimation: the method is used for acquiring data of various postures of a human body in a mode of combining deep learning and a recurrent neural network, so that a personal posture library is constructed, and the method is used for realizing pre-judgment and alarm on the possible falling through a series of posture actions. The method can realize the fall prejudgment to a certain extent, but has huge calculation amount, higher requirement on hardware equipment, difficulty in real-time detection and lack of identification and understanding of the environment where people are located, thereby causing low accuracy. (2) And (3) behavior recognition: the method comprises the steps of training walking, squatting, sitting, lying, falling and other behaviors by adopting a CNN (convolutional neural network), generating a falling model base, classifying and identifying the training behaviors, and carrying out alarm prompts in different degrees according to falling similarity grades, thereby realizing falling prejudgment. The method generates a model base of the method, and the accuracy of tumble prediction is greatly improved. However, the CNN training model has a large calculation amount, so that the algorithm efficiency is low, and real-time prejudgment cannot be achieved. (3) Scene understanding: the method comprises the steps of classifying input video images through a deep learning framework, classifying and identifying human bodies and environments where the human bodies are located through a trained image data set, displaying the relation between the human bodies and the surrounding environments, extracting various objects in the human bodies and the environments respectively through candidate frames, and setting a falling grade according to an environment danger grade to alarm, so that falling prejudgment is achieved. The method realizes scene understanding between the human body and the surrounding environment, can effectively realize the function of tumble prejudgment, but is complex in training method, lacks a quick and accurate image segmentation algorithm, and is difficult to accurately extract the human body and the surrounding environment. In addition, the large amount of image data trained by the deep neural network causes too large calculation amount and too high energy consumption, and real-time prejudgment is difficult to achieve.
In combination with the current state of research worldwide as analyzed above, it can be found that the current fall prediction method faces the following problems: (1) the calculation amount required by the operation of the algorithm is large, so that the operation speed is low, and the real-time operation cannot be realized; (2) a fast and accurate image segmentation algorithm is lacking.
Disclosure of Invention
The invention aims to solve the technical problem of how to improve the image segmentation quality and reduce the calculated amount required by training an image data set, thereby effectively realizing the function of tumble prejudgment.
In order to solve the technical problem, the invention provides a falling prediction method based on the combination of a deep neural network and a panoramic segmentation algorithm. The method converts floating-point data into integer data by using a data conversion method on a convolution layer, thereby reducing the calculation amount of floating-point operation; the matrix compression method is adopted at the full connection layer, the original large full connection layer matrix is decomposed into two small full connection layer matrixes and an intermediate layer matrix by utilizing a matrix Singular Value Decomposition (SVD) method, the intermediate layer contains less neurons, so that the matrix compression can be realized, the connection number and the weight scale are reduced, the calculation and storage requirements are reduced, the calculated amount required by the deep neural network training image data set is greatly reduced, the power consumption is reduced, and the algorithm real-time function is realized; and then, the semantic segmentation and the example segmentation are fused into two processes of the same segmentation network by organically combining an image feature fusion method based on feature pyramid fusion and a full convolution neural network structure, and two original parallel network structures are merged into one network structure, so that a brand-new panoramic segmentation algorithm is obtained. The panoramic segmentation algorithm is used for distinguishing the human body and the environment where the human body is located, so that scene understanding is achieved, and the falling prediction function is achieved.
Specifically, the invention provides a fall prejudging method based on the combination of a deep neural network and a panoramic segmentation method, which comprises the following steps:
step 1, acquiring a stable indoor image by using a full-color camera;
step 2, carrying out image processing on the video image obtained in the step 1, eliminating noise interference factors and obtaining processed image information;
step 3, training a data set PASCAL VOC2012, activating a neural network through an activation layer, and inputting the processed image information obtained in the step 2 into a convolutional layer;
step 4, inputting the acquired image information into the convolution layer, extracting image characteristics, and converting the floating point data into integer data by adopting a data conversion method on the data acquired in the step 3 so as to reduce the amount of operation data;
step 5, performing batch normalization processing on the extracted features, and uniformly outputting the features;
step 6, sending the images subjected to batch normalization processing into a pooling layer, performing feature dimensionality reduction, and extracting key features as output results, wherein the key features comprise main body components, contours, shapes and texture features in the images;
and 7, transmitting the output result in the step 6 into a full-connection layer, and classifying the data sets, wherein the classification of the data sets specifically adopts a matrix compression method, and specifically comprises the following steps: decomposing an original large full-connection layer matrix into two small full-connection layer matrices and an intermediate layer matrix by using a matrix singular value decomposition method, wherein the two small full-connection layer matrices comprise most neurons, and the intermediate layer matrix comprises a small number of neurons;
the matrix singular value decomposition method is specifically shown in the following formula:
wherein: suppose thatThe m × N matrix in the full connection layer FC8, U, V and N are intermediate variables of the SVD transformation matrix,for the middle layer matrix, the original weight matrix becomes the multiplication form of two matrices:
the matrix dot product satisfies the exchange rate, so the mapping of output y to input N is shown as:
b represents the offset of the full connection layer, if fine adjustment of the deep neural network is not needed, the value of b is 0, and N is the data volume output in the step 6;
step 8, outputting through a full connection layer, classifying and identifying all images in the data set, marking the categories of all the images, finishing the training of the data set, matching the video image obtained in the step 2 with the trained data set, and classifying and identifying all things in the video image so as to construct a panoramic segmentation image network;
step 9, carrying out characteristic pyramid fusion on the image information output in the step 8, and extracting an image after characteristic fusion;
step 10, performing semantic segmentation on the image after the feature fusion obtained in the step 9, selecting an interested area through a candidate frame, analyzing each pixel through the interested area, applying the panoramic segmentation image network trained in the step 8, realizing semantic category prediction on each pixel by using a pixel category prediction formula, and distinguishing different types of objects;
step 11, performing example segmentation on the image output in the step 10, distinguishing different objects of the same type by setting example mask region segmentation,
an example segmentation formula is shown below:
wherein: l isins(xi) Representing the result of the segmentation of the image instance, xiIs the ith pixel point, Nmask(i,j)Representing the number of example mask segmentation areas;
step 12, after step 11, completing a panoramic video image segmentation task, then obtaining an image segmentation model labeled with categories through a deep neural network and an image panoramic segmentation algorithm, classifying the image segmentation model according to the object placement condition, determining each identified object by a risk coefficient, and only defining the risk level according to the risk coefficient, specifically, identifying various specific conditions in the environment according to the image segmentation model, wherein the specific conditions are as follows: if no water or obstacles exist in the environment, the environment is judged to be a safe environment; if accumulated water or obstacles exist in the environment, the environment is judged to be a general dangerous environment; if dangerous factors such as stairs, water accumulation, barriers and the like which are easy to fall down exist in the environment, the environment is determined to be a high-risk environment, and an alarm is triggered to remind pedestrians and medical care personnel to pay attention.
Preferably, the activation function selected in step 3 is a linear modified unit function, and the specific expression is as follows:
where X denotes the image gradient and f (X) denotes the image gradient resulting from the data set.
Preferably, in the step 4, the 32-bit floating-point type data obtained in the step 3 is converted into 8-bit integer type data.
Preferably, in step 5, the batch normalization process formula applied is:
wherein g (x) is normalized image output information, x(k)For image dimension information, E denotes expectation, and Var denotes variance.
Preferably, the pooling layer processing method in step 6 adopts a maximum pooling method to reduce the amount of calculation and increase the training speed, and the obtained image information data volume is:
N=(g(x)-F+2P)/S+1
n is the amount of data after pooling, g (x) is the output of step 5, F is the filter size, P is the number of pixels padded by Padding, and S is the step size.
Preferably, in the step 9, using ResNet-50 as a basic network for image feature extraction, the implementation principle is as shown in the following formula:
wherein L isiDenotes the result after fusion of the i-th layer features, g (x)i) For the ith layer feature input, UP represents an upsampling operation,the characteristic nucleus size was 5 x 5.
Preferably, in step 10, the pixel class prediction formula is as follows:
wherein: l (p)i,li) For the pixel class prediction result, i is the pixel index, piIs the pixel probability, p* iTo label the probability, λ is the segmentationCoefficient of liFour coordinates representing the true candidate box boundary for a vector,to predict the candidate box boundary coordinates, NclsIs the total number of pixels of the object class, LclsFor the log-loss function of the object class (including the background), the formula is calculated as:
Nregis the number of pixels in the region of interest, LregAs a regression loss function, the calculation formula is:
smooth is a smoothing processing function, and obtained data are converted into 8-bit integer data, so that the calculation amount is reduced, and the data storage space is saved.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention can realize real-time prejudgment, greatly reduces the possible falling risk, and provides a latest falling prejudgment method based on the combination of a Deep Neural Network (DNN) and an image panorama segmentation algorithm.
(2) The invention uses the PASCAL VOC2012 data set which is relatively complete in categories, and the PASCAL VOC2012 data set is trained through a deep neural network and is marked with different objects in various images, so that the image classification training is completed, and the detection accuracy can be greatly improved. And then, constructing a panoramic segmentation network by using the trained data set, and carrying out panoramic segmentation on the image according to the pixel type and the individual difference of the example to finish scene understanding, thereby realizing the function of falling prediction.
(3) The invention adopts the convolution layer data conversion method and the full-connection layer matrix compression method to greatly reduce the calculation amount required by training data and reduce the algorithm operation power consumption, so that the training speed of a deep neural network on a large number of image data sets is greatly improved under the condition of ensuring the image classification accuracy, and the real-time property of the algorithm is ensured.
(4) The invention adopts a panoramic segmentation algorithm based on the characteristic pyramid fusion and the full convolution neural network (FCN) to accurately segment the acquired video image, wherein the characteristic pyramid fusion method reduces the operation amount of the segmentation network algorithm, improves the image segmentation speed, improves the segmentation accuracy of the full convolution neural network, combines semantic segmentation and example segmentation into the same network structure, ensures the pixel classification segmentation and also ensures the differentiation of the individual difference of the examples, so that the image segmentation algorithm is more perfect, the segmentation result is clearer, the scene understanding of the video image is facilitated, and the falling prediction result is more real and reliable.
Drawings
FIG. 1 is a general block diagram of a deep neural network and panorama segmentation based algorithm according to the present invention;
FIG. 2 is a schematic diagram of deep neural network training data according to the present invention;
FIG. 3 is a comparison of the effect of the fully connected layer matrix of the present invention before and after compression;
FIG. 4a is a schematic diagram of a panoramic segmentation network model of the present invention before improvement;
FIG. 4b is a schematic diagram of the panorama segmentation network model of the present invention after improvement;
fig. 5 is a flow chart of the image panorama segmentation algorithm of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
The fall prediction method disclosed by the invention comprises a deep neural network module and a panoramic segmentation module as shown in fig. 1.
The deep neural network module comprises an activation layer, a full connection layer, a convolution layer, a batch normalization layer and a pooling layer. The specific implementation steps of carrying out data set training and constructing a panoramic segmentation network based on the deep neural network are as follows:
step 1, collecting images by using a color camera, wherein the color camera is fixed at the ceiling of an entrance to observe the whole indoor environment. Room objects such as tables, chairs, obstacles, beds, books, stationery and the like are all static objects, indoor light is good, and stable and clear scene images can be captured; the experimental object simulates the behavior posture of the old, the action speed is slow, and the experimental object can be approximately regarded as uniform motion.
And 2, processing the video image obtained in the step 1 by adopting basic image processing algorithms such as Gaussian filtering, median filtering, morphological denoising and the like, thereby eliminating interference factors such as Gaussian noise, salt and pepper noise and the like and being beneficial to further analysis and processing of the image.
And 3, training a data set PASCAL VOC2012, activating a neural network through an activation layer, and inputting image information into the convolutional layer. The activation function is a linear modified unit function (ReLU), and the specific expression is as follows:
x denotes the image gradient and f (X) denotes the image gradient resulting from the data set.
And 4, inputting the acquired image information into the convolutional layer, and extracting the image characteristics. Relevant researches have proved that under a relatively stable image acquisition scene, the precision of integer fixed-point calculation can provide the result of shoulder-to-shoulder floating-point operation, and in a convolutional neural network structure, the accuracy of fixed-point calculation with reduced precision and 32-bit floating-point calculation is almost the same. Therefore, the data obtained in the step 3 are converted into integer data by adopting a data conversion method, so that the operation data amount is reduced.
And 5, performing batch normalization processing on the extracted features, and outputting uniformly. The batch normalization process formula is as follows:
wherein g (x) is the normalized image output information,x(k)For image dimension information, E denotes expectation, and Var denotes variance.
And 6, sending the images g (x) subjected to batch processing into a pooling layer, performing feature dimension reduction, extracting key features, and compressing image information data volume, so that the calculated amount is reduced, and the training speed is increased. The pond treatment is divided into two methods: average pooling and maximum pooling. The maximum pooling method is adopted, and the obtained image information data volume is as follows:
N=(g(x)-F+2P)/S+1
n is the amount of data after pooling, g (x) is the output of step 5, F is the filter size, P is the number of pixels padded by Padding, and S is the step size.
And 7, transmitting the output result in the step 6 into a full connection layer, and classifying the data set. Operations such as convolutional layer, pooling layer, and activation function can be understood as mapping the original data distribution space to the implicit space, and the fully-connected layer is mapping the learned features to the labeled class space. The experimental object detected by the invention is the old, the movement speed is slow, and the old can be approximately regarded as uniform movement, so that the nonlinear operation can be filtered. The invention adopts a matrix compression method, decomposes the original large full-connection layer matrix into two small full-connection layer matrices and an intermediate layer matrix by using a matrix Singular Value Decomposition (SVD) method, wherein the intermediate layer comprises less neurons, so that the matrix compression can be realized, the connection number and the weight scale are reduced, and the calculation and storage requirements are reduced. The specific implementation method is shown in the following formula:
wherein: suppose thatThe m × N matrix in the full connection layer FC8, U, V and N are intermediate variables of the SVD transformation matrix,is a middleA layer matrix. The original weight matrix becomes the form of multiplication of two matrices:
the matrix dot product satisfies the exchange rate, so the mapping of output y to input N can be expressed as:
wherein b represents the offset of the fully-connected layer, and if fine tuning of the deep neural network is not required, the value of b is 0. And N is the data volume output in the step 6.
And 8, outputting through a full connection layer, classifying and identifying all images in the data set, and marking the category to which each image belongs, wherein the data set training is finished at the moment. And matching the video image obtained in the step (2) with the trained data set, and classifying and identifying all things in the video image so as to construct a panoramic segmentation image network. During classification and identification, human bodies and articles are mainly distinguished, image labeling is mainly adopted, and a universal image labeling method is adopted in practical application.
Due to the fact that data conversion and matrix compression processing are carried out in the process of training the data set, the image data volume is greatly reduced, calculation time consumption caused by large data volume is saved, and training efficiency is improved. According to the method, the image data volume is compressed, so that the training efficiency is greatly improved on the premise of not influencing the accuracy of classification and identification, the calculation time loss is reduced, and the real-time falling prediction function is guaranteed.
The image panorama segmentation module comprises feature pyramid fusion, and semantic segmentation and instance segmentation based on a full convolution neural network (FCN). The method comprises the steps of extracting image features by adopting a feature pyramid fusion method, then realizing image pixel level analysis through semantic segmentation, distinguishing different types of images according to pixels, and finally distinguishing individual differences among the same types through example segmentation to realize scene understanding, thereby realizing falling prediction.
The specific implementation mode comprises the following steps:
and 9, carrying out characteristic pyramid fusion on the image information output in the step 8, and extracting image characteristics. The present invention uses ResNet-50 as the underlying network for image feature extraction. ResNet is divided into 5 stages according to the size of feature maps, which are respectively called res1, res2, res3, res4 and res5, and the feature map sizes are respectively 1/2,1/4,1/8,1/16 and 1/32 of the original. For the visual task, the depth of the network corresponds to the receptive field, and the larger the receptive field of the pixel points on the deep characteristic diagram is, the stronger the classification capability is. The fused feature maps with different resolutions can be used for object detection with corresponding resolution sizes respectively. The method can ensure that each layer has proper resolution and strong semantic features, and meanwhile, the method only adds extra cross-layer connection on the original basic network and hardly adds extra time and calculation amount.
The implementation principle is shown in the following formula:
wherein L isiDenotes the result after fusion of the i-th layer features, g (x)i) For the ith layer feature input, UP represents an upsampling operation,the characteristic nucleus size was 5 x 5.
And step 10, performing semantic segmentation on the image after feature fusion, selecting an interested area through a candidate frame, analyzing each pixel through the interested area, and applying the panoramic segmentation network trained in the step 8 to realize semantic category prediction of each pixel and distinguish different objects. The pixel class prediction formula is as follows:
wherein: l (p)i,li) For the pixel class prediction result, i is the pixel index, piIs the pixel probability, p* iFor labeling the probability, λ is the partition coefficient, liFour coordinates representing the true candidate box boundary for a vector,to predict candidate box boundary coordinates. N is a radical ofclsIs the total number of pixels of the object class, LclsFor the log-loss function of the object class (including the background), the formula is calculated as:
Nregis the number of pixels in the region of interest, LregAs a regression loss function, the calculation formula is:
wherein smooth is a smoothing processing function, and the obtained data is converted into 8 integer data, so that the calculation amount is reduced, and the data storage space is saved.
And 11, performing example segmentation on the image output in the step 10, wherein the example segmentation task needs not only to predict the pixel-level class, but also to distinguish different individuals belonging to the same class, namely to predict the example identification number. According to the invention, different objects of the same type are distinguished by setting example mask region segmentation, so that panoramic segmentation and scene understanding of the image are realized, and the specific situation of a person in the surrounding environment is accurately judged. An example segmentation formula is shown below:
wherein: l isins(xi) Representing the result of the segmentation of the image instance, xiIs the ith pixelPoint, Nmask(i,j)The number of example mask segmentation areas is shown.
And step 12, completing the panoramic division task of the video image through step 11. The image segmentation model marked with the category can be obtained through the deep neural network and the image panorama segmentation algorithm, and the specific situation of people in the environment can be identified according to the model. The invention sets that if no water or barrier exists in the pedestrian passageway, the pedestrian passageway is judged to be a safe environment; if accumulated water or obstacles exist on the pavement of the pedestrian, the pedestrian is judged to be a general dangerous environment; if dangerous factors such as stairs, water accumulation, barriers and the like which are easy to fall down exist in the pedestrian passageway, the high-risk environment is judged, and an alarm is triggered to remind pedestrians and medical care personnel to pay attention.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention shall fall within the protection scope defined by the claims of the present invention.
Claims (7)
1. A falling prediction method based on the combination of a deep neural network and a panoramic segmentation method is characterized in that: which comprises the following steps:
step 1, acquiring a stable indoor video image by using a full-color camera;
step 2, carrying out image processing on the video image obtained in the step 1, eliminating noise interference factors and obtaining processed image information;
step 3, training a data set PASCALVOC2012, activating a neural network through an activation layer, and inputting the processed image information obtained in the step 2 into a convolutional layer;
step 4, extracting image characteristics after inputting the acquired image information into the convolutional layer, converting the floating point data into integer data by filtering out decimal numbers by adopting a data conversion method for the data acquired in the step 3, and thus reducing the amount of operation data;
step 5, performing batch normalization processing on the extracted features, and uniformly outputting the features;
step 6, sending the images subjected to batch normalization processing into a pooling layer, performing feature dimensionality reduction, and extracting key features as output results, wherein the key features comprise main body components, contours, shapes and texture features in the images;
and 7, transmitting the output result in the step 6 into a full-connection layer, and classifying the data sets, wherein the classification of the data sets specifically adopts a matrix compression method, and specifically comprises the following steps: decomposing an original large full-connection layer matrix into two small full-connection layer matrices and an intermediate layer matrix by using a matrix singular value decomposition method, wherein the two small full-connection layer matrices comprise most neurons, and the intermediate layer matrix comprises a small number of neurons;
the matrix singular value decomposition method is specifically shown in the following formula:
wherein: suppose thatThe m × N matrix in the full connection layer FC8, U, V and N are intermediate variables of the SVD transformation matrix,for the middle layer matrix, the original weight matrix becomes the multiplication form of two matrices:
the matrix dot product satisfies the exchange rate, so the mapping of output y to input N is shown as:
b represents the offset of the full connection layer, if fine adjustment of the deep neural network is not needed, the value of b is 0, and N is the data volume output in the step 6;
step 8, outputting through a full connection layer, classifying and identifying all images in the data set, marking the categories of all the images, finishing the training of the data set, matching the video image obtained in the step 2 with the trained data set, and classifying and identifying all things in the video image so as to construct a panoramic segmentation image network;
step 9, carrying out characteristic pyramid fusion on the image information output in the step 8, and extracting an image after characteristic fusion;
step 10, performing semantic segmentation on the image after the feature fusion obtained in the step 9, selecting an interested area through a candidate frame, analyzing each pixel through the interested area, applying the panoramic segmentation image network trained in the step 8, realizing semantic category prediction on each pixel by using a pixel category prediction formula, and distinguishing different types of objects;
step 11, performing example segmentation on the image output in the step 10, distinguishing different objects of the same type by setting example mask region segmentation,
an example segmentation formula is shown below:
wherein: l isins(xi) Representing the result of the segmentation of the image instance, xiIs the ith pixel point, Nmask(i,j)Representing the number of example mask segmentation areas;
step 12, after step 11, completing a panoramic video image segmentation task, then obtaining an image segmentation model labeled with categories through a deep neural network and an image panoramic segmentation algorithm, classifying the image segmentation model according to the object placement condition, determining each identified object by a risk coefficient, and only defining the risk level according to the risk coefficient, specifically, identifying various specific conditions in the environment according to the image segmentation model, wherein the specific conditions are as follows: if no water or obstacles exist in the environment, the environment is judged to be a safe environment; if accumulated water or obstacles exist in the environment, the environment is judged to be a general dangerous environment; if dangerous factors such as stairs, water accumulation, barriers and the like which are easy to fall down exist in the environment, the environment is determined to be a high-risk environment, and an alarm is triggered to remind pedestrians and medical care personnel to pay attention.
2. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: the activation function selected in step 3 is a linear correction unit function, and the specific expression is as follows:
where X denotes the image gradient and f (X) denotes the image gradient resulting from the data set.
3. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in the step 4, the 32-bit floating-point data obtained in the step 3 is converted into 8-bit integer data.
4. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in step 5, the batch normalization processing formula applied is:
wherein g (x) is normalized image output information, x(k)For image dimension information, E denotes expectation, and Var denotes variance.
5. The fall pre-judging method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 4, wherein: the pooling layer processing method in the step 6 adopts a maximum pooling method to reduce the calculated amount and improve the training speed, and the obtained image information data amount is as follows:
N=(g(x)-F+2P)/S+1
n is the amount of data after pooling, g (x) is the output of step 5, F is the filter size, P is the number of pixels padded by Padding, and S is the step size.
6. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in the step 9, ResNet-50 is used as a basic network for image feature extraction, and the implementation principle is shown in the following formula:
7. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in step 10, the pixel class prediction formula is as follows:
wherein: l (p)i,li) For the pixel class prediction result, i is the pixel index, piIs the pixel probability, p* iFor labeling the probability, λ is the partition coefficient, liAs vectors, tablesFour coordinates, l, representing the true candidate box boundaryi *To predict the candidate box boundary coordinates, NclsIs the total number of pixels of the object class, LclsLogarithmic loss function of object class, LclsThe calculation formula of (2) is as follows:
Nregis the number of pixels in the region of interest, LregAs a function of the regression loss, LregThe calculation formula of (2) is as follows:
where smooth is a smoothing function that converts the resulting data into 8-bit integer data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110076029.8A CN112766165B (en) | 2021-01-20 | 2021-01-20 | Falling pre-judging method based on deep neural network and panoramic segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110076029.8A CN112766165B (en) | 2021-01-20 | 2021-01-20 | Falling pre-judging method based on deep neural network and panoramic segmentation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112766165A CN112766165A (en) | 2021-05-07 |
CN112766165B true CN112766165B (en) | 2022-03-22 |
Family
ID=75701752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110076029.8A Active CN112766165B (en) | 2021-01-20 | 2021-01-20 | Falling pre-judging method based on deep neural network and panoramic segmentation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112766165B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113297991A (en) * | 2021-05-28 | 2021-08-24 | 杭州萤石软件有限公司 | Behavior identification method, device and equipment |
CN113449611B (en) * | 2021-06-15 | 2023-07-07 | 电子科技大学 | Helmet recognition intelligent monitoring system based on YOLO network compression algorithm |
CN114595748B (en) * | 2022-02-21 | 2024-02-13 | 南昌大学 | Data segmentation method for fall protection system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967516A (en) * | 2017-10-12 | 2018-04-27 | 中科视拓(北京)科技有限公司 | A kind of acceleration of neutral net based on trace norm constraint and compression method |
CN110276765A (en) * | 2019-06-21 | 2019-09-24 | 北京交通大学 | Image panorama dividing method based on multi-task learning deep neural network |
CN111428726A (en) * | 2020-06-10 | 2020-07-17 | 中山大学 | Panorama segmentation method, system, equipment and storage medium based on graph neural network |
CN112163564A (en) * | 2020-10-26 | 2021-01-01 | 燕山大学 | Tumble prejudging method based on human body key point behavior identification and LSTM (least Square TM) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11179064B2 (en) * | 2018-12-30 | 2021-11-23 | Altum View Systems Inc. | Method and system for privacy-preserving fall detection |
-
2021
- 2021-01-20 CN CN202110076029.8A patent/CN112766165B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967516A (en) * | 2017-10-12 | 2018-04-27 | 中科视拓(北京)科技有限公司 | A kind of acceleration of neutral net based on trace norm constraint and compression method |
CN110276765A (en) * | 2019-06-21 | 2019-09-24 | 北京交通大学 | Image panorama dividing method based on multi-task learning deep neural network |
CN111428726A (en) * | 2020-06-10 | 2020-07-17 | 中山大学 | Panorama segmentation method, system, equipment and storage medium based on graph neural network |
CN112163564A (en) * | 2020-10-26 | 2021-01-01 | 燕山大学 | Tumble prejudging method based on human body key point behavior identification and LSTM (least Square TM) |
Non-Patent Citations (4)
Title |
---|
Ahmad Lotfi等.Supporting Independent Living for Older Adults * |
Employing a Visual Based Fall Detection Through Analysing the Motion and Shape of the Human Body.《IEEE Access ( Volume: 6)》.2018,正文全文. * |
基于深度学习的跌倒行为识别;马露;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200615;正文全文 * |
跌倒检测算法研究及其在移动机器人平台系统实现;孙鹏飞;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190815;正文全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112766165A (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107341452B (en) | Human behavior identification method based on quaternion space-time convolution neural network | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN110363183B (en) | Service robot visual image privacy protection method based on generating type countermeasure network | |
CN112766165B (en) | Falling pre-judging method based on deep neural network and panoramic segmentation | |
CN106407889B (en) | Method for recognizing human body interaction in video based on optical flow graph deep learning model | |
CN112818764B (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
CN107085716A (en) | Across the visual angle gait recognition method of confrontation network is generated based on multitask | |
CN112464730B (en) | Pedestrian re-identification method based on domain-independent foreground feature learning | |
CN111639719A (en) | Footprint image retrieval method based on space-time motion and feature fusion | |
CN111241963B (en) | First person view video interactive behavior identification method based on interactive modeling | |
CN110991340A (en) | Human body action analysis method based on image compression | |
CN104077742B (en) | Human face sketch synthetic method and system based on Gabor characteristic | |
CN106056553A (en) | Image inpainting method based on tight frame feature dictionary | |
CN113610046B (en) | Behavior recognition method based on depth video linkage characteristics | |
CN112149613B (en) | Action pre-estimation evaluation method based on improved LSTM model | |
CN109522865A (en) | A kind of characteristic weighing fusion face identification method based on deep neural network | |
CN116664462A (en) | Infrared and visible light image fusion method based on MS-DSC and I_CBAM | |
CN114299279B (en) | Mark-free group rhesus monkey motion quantity estimation method based on face detection and recognition | |
CN115346272A (en) | Real-time tumble detection method based on depth image sequence | |
Wang et al. | Infrared and visible image fusion based on Laplacian pyramid and generative adversarial network. | |
CN116993760A (en) | Gesture segmentation method, system, device and medium based on graph convolution and attention mechanism | |
CN116580450A (en) | Method for recognizing gait at split viewing angles | |
CN112613405B (en) | Method for recognizing actions at any visual angle | |
Mo et al. | The image inpainting algorithm used on multi-scale generative adversarial networks and neighbourhood | |
CN113673303A (en) | Human face action unit intensity regression method, device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |