Nothing Special   »   [go: up one dir, main page]

CN112766165B - Falling pre-judging method based on deep neural network and panoramic segmentation - Google Patents

Falling pre-judging method based on deep neural network and panoramic segmentation Download PDF

Info

Publication number
CN112766165B
CN112766165B CN202110076029.8A CN202110076029A CN112766165B CN 112766165 B CN112766165 B CN 112766165B CN 202110076029 A CN202110076029 A CN 202110076029A CN 112766165 B CN112766165 B CN 112766165B
Authority
CN
China
Prior art keywords
image
segmentation
neural network
data
deep neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110076029.8A
Other languages
Chinese (zh)
Other versions
CN112766165A (en
Inventor
张立国
李枫
胡林
杨曼
刘博�
孙胜春
张子豪
李义辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202110076029.8A priority Critical patent/CN112766165B/en
Publication of CN112766165A publication Critical patent/CN112766165A/en
Application granted granted Critical
Publication of CN112766165B publication Critical patent/CN112766165B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a falling prediction method based on the combination of a deep neural network and a panoramic segmentation method, which can efficiently and quickly realize the falling detection prediction function, and is combined with the deep neural network and the image panoramic segmentation method to carry out short-term real-time evaluation and notification on the impending falling risk and carry out long-term behavior learning and prediction on the future risk. The invention adopts a Deep Neural Network (DNN) in deep learning to construct a panoramic segmentation network, and then carries out pixel-level segmentation on a video image in falling detection through an image panoramic segmentation algorithm, thereby realizing scene understanding of cared persons and the environment conditions and realizing falling prejudgment on dangerous environments.

Description

Falling pre-judging method based on deep neural network and panoramic segmentation
Technical Field
The invention relates to the field of intelligent communication, in particular to a falling prediction method based on the combination of a deep neural network and an image panorama segmentation algorithm.
Background
At present, there are many explorations on fall prediction methods based on computer vision at home and abroad, and the fall prediction methods can be divided into three types according to different algorithms and implementation methods: (1) and (3) attitude estimation: the method is used for acquiring data of various postures of a human body in a mode of combining deep learning and a recurrent neural network, so that a personal posture library is constructed, and the method is used for realizing pre-judgment and alarm on the possible falling through a series of posture actions. The method can realize the fall prejudgment to a certain extent, but has huge calculation amount, higher requirement on hardware equipment, difficulty in real-time detection and lack of identification and understanding of the environment where people are located, thereby causing low accuracy. (2) And (3) behavior recognition: the method comprises the steps of training walking, squatting, sitting, lying, falling and other behaviors by adopting a CNN (convolutional neural network), generating a falling model base, classifying and identifying the training behaviors, and carrying out alarm prompts in different degrees according to falling similarity grades, thereby realizing falling prejudgment. The method generates a model base of the method, and the accuracy of tumble prediction is greatly improved. However, the CNN training model has a large calculation amount, so that the algorithm efficiency is low, and real-time prejudgment cannot be achieved. (3) Scene understanding: the method comprises the steps of classifying input video images through a deep learning framework, classifying and identifying human bodies and environments where the human bodies are located through a trained image data set, displaying the relation between the human bodies and the surrounding environments, extracting various objects in the human bodies and the environments respectively through candidate frames, and setting a falling grade according to an environment danger grade to alarm, so that falling prejudgment is achieved. The method realizes scene understanding between the human body and the surrounding environment, can effectively realize the function of tumble prejudgment, but is complex in training method, lacks a quick and accurate image segmentation algorithm, and is difficult to accurately extract the human body and the surrounding environment. In addition, the large amount of image data trained by the deep neural network causes too large calculation amount and too high energy consumption, and real-time prejudgment is difficult to achieve.
In combination with the current state of research worldwide as analyzed above, it can be found that the current fall prediction method faces the following problems: (1) the calculation amount required by the operation of the algorithm is large, so that the operation speed is low, and the real-time operation cannot be realized; (2) a fast and accurate image segmentation algorithm is lacking.
Disclosure of Invention
The invention aims to solve the technical problem of how to improve the image segmentation quality and reduce the calculated amount required by training an image data set, thereby effectively realizing the function of tumble prejudgment.
In order to solve the technical problem, the invention provides a falling prediction method based on the combination of a deep neural network and a panoramic segmentation algorithm. The method converts floating-point data into integer data by using a data conversion method on a convolution layer, thereby reducing the calculation amount of floating-point operation; the matrix compression method is adopted at the full connection layer, the original large full connection layer matrix is decomposed into two small full connection layer matrixes and an intermediate layer matrix by utilizing a matrix Singular Value Decomposition (SVD) method, the intermediate layer contains less neurons, so that the matrix compression can be realized, the connection number and the weight scale are reduced, the calculation and storage requirements are reduced, the calculated amount required by the deep neural network training image data set is greatly reduced, the power consumption is reduced, and the algorithm real-time function is realized; and then, the semantic segmentation and the example segmentation are fused into two processes of the same segmentation network by organically combining an image feature fusion method based on feature pyramid fusion and a full convolution neural network structure, and two original parallel network structures are merged into one network structure, so that a brand-new panoramic segmentation algorithm is obtained. The panoramic segmentation algorithm is used for distinguishing the human body and the environment where the human body is located, so that scene understanding is achieved, and the falling prediction function is achieved.
Specifically, the invention provides a fall prejudging method based on the combination of a deep neural network and a panoramic segmentation method, which comprises the following steps:
step 1, acquiring a stable indoor image by using a full-color camera;
step 2, carrying out image processing on the video image obtained in the step 1, eliminating noise interference factors and obtaining processed image information;
step 3, training a data set PASCAL VOC2012, activating a neural network through an activation layer, and inputting the processed image information obtained in the step 2 into a convolutional layer;
step 4, inputting the acquired image information into the convolution layer, extracting image characteristics, and converting the floating point data into integer data by adopting a data conversion method on the data acquired in the step 3 so as to reduce the amount of operation data;
step 5, performing batch normalization processing on the extracted features, and uniformly outputting the features;
step 6, sending the images subjected to batch normalization processing into a pooling layer, performing feature dimensionality reduction, and extracting key features as output results, wherein the key features comprise main body components, contours, shapes and texture features in the images;
and 7, transmitting the output result in the step 6 into a full-connection layer, and classifying the data sets, wherein the classification of the data sets specifically adopts a matrix compression method, and specifically comprises the following steps: decomposing an original large full-connection layer matrix into two small full-connection layer matrices and an intermediate layer matrix by using a matrix singular value decomposition method, wherein the two small full-connection layer matrices comprise most neurons, and the intermediate layer matrix comprises a small number of neurons;
the matrix singular value decomposition method is specifically shown in the following formula:
Figure BDA0002907651620000031
wherein: suppose that
Figure BDA0002907651620000032
The m × N matrix in the full connection layer FC8, U, V and N are intermediate variables of the SVD transformation matrix,
Figure BDA0002907651620000033
for the middle layer matrix, the original weight matrix becomes the multiplication form of two matrices:
Figure BDA0002907651620000034
the matrix dot product satisfies the exchange rate, so the mapping of output y to input N is shown as:
Figure BDA0002907651620000035
b represents the offset of the full connection layer, if fine adjustment of the deep neural network is not needed, the value of b is 0, and N is the data volume output in the step 6;
step 8, outputting through a full connection layer, classifying and identifying all images in the data set, marking the categories of all the images, finishing the training of the data set, matching the video image obtained in the step 2 with the trained data set, and classifying and identifying all things in the video image so as to construct a panoramic segmentation image network;
step 9, carrying out characteristic pyramid fusion on the image information output in the step 8, and extracting an image after characteristic fusion;
step 10, performing semantic segmentation on the image after the feature fusion obtained in the step 9, selecting an interested area through a candidate frame, analyzing each pixel through the interested area, applying the panoramic segmentation image network trained in the step 8, realizing semantic category prediction on each pixel by using a pixel category prediction formula, and distinguishing different types of objects;
step 11, performing example segmentation on the image output in the step 10, distinguishing different objects of the same type by setting example mask region segmentation,
an example segmentation formula is shown below:
Figure BDA0002907651620000041
wherein: l isins(xi) Representing the result of the segmentation of the image instance, xiIs the ith pixel point, Nmask(i,j)Representing the number of example mask segmentation areas;
step 12, after step 11, completing a panoramic video image segmentation task, then obtaining an image segmentation model labeled with categories through a deep neural network and an image panoramic segmentation algorithm, classifying the image segmentation model according to the object placement condition, determining each identified object by a risk coefficient, and only defining the risk level according to the risk coefficient, specifically, identifying various specific conditions in the environment according to the image segmentation model, wherein the specific conditions are as follows: if no water or obstacles exist in the environment, the environment is judged to be a safe environment; if accumulated water or obstacles exist in the environment, the environment is judged to be a general dangerous environment; if dangerous factors such as stairs, water accumulation, barriers and the like which are easy to fall down exist in the environment, the environment is determined to be a high-risk environment, and an alarm is triggered to remind pedestrians and medical care personnel to pay attention.
Preferably, the activation function selected in step 3 is a linear modified unit function, and the specific expression is as follows:
Figure BDA0002907651620000042
where X denotes the image gradient and f (X) denotes the image gradient resulting from the data set.
Preferably, in the step 4, the 32-bit floating-point type data obtained in the step 3 is converted into 8-bit integer type data.
Preferably, in step 5, the batch normalization process formula applied is:
Figure BDA0002907651620000051
wherein g (x) is normalized image output information, x(k)For image dimension information, E denotes expectation, and Var denotes variance.
Preferably, the pooling layer processing method in step 6 adopts a maximum pooling method to reduce the amount of calculation and increase the training speed, and the obtained image information data volume is:
N=(g(x)-F+2P)/S+1
n is the amount of data after pooling, g (x) is the output of step 5, F is the filter size, P is the number of pixels padded by Padding, and S is the step size.
Preferably, in the step 9, using ResNet-50 as a basic network for image feature extraction, the implementation principle is as shown in the following formula:
Figure BDA0002907651620000052
wherein L isiDenotes the result after fusion of the i-th layer features, g (x)i) For the ith layer feature input, UP represents an upsampling operation,
Figure BDA0002907651620000053
the characteristic nucleus size was 5 x 5.
Preferably, in step 10, the pixel class prediction formula is as follows:
Figure BDA0002907651620000054
wherein: l (p)i,li) For the pixel class prediction result, i is the pixel index, piIs the pixel probability, p* iTo label the probability, λ is the segmentationCoefficient of liFour coordinates representing the true candidate box boundary for a vector,
Figure BDA0002907651620000055
to predict the candidate box boundary coordinates, NclsIs the total number of pixels of the object class, LclsFor the log-loss function of the object class (including the background), the formula is calculated as:
Figure BDA0002907651620000061
Nregis the number of pixels in the region of interest, LregAs a regression loss function, the calculation formula is:
Figure BDA0002907651620000062
smooth is a smoothing processing function, and obtained data are converted into 8-bit integer data, so that the calculation amount is reduced, and the data storage space is saved.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention can realize real-time prejudgment, greatly reduces the possible falling risk, and provides a latest falling prejudgment method based on the combination of a Deep Neural Network (DNN) and an image panorama segmentation algorithm.
(2) The invention uses the PASCAL VOC2012 data set which is relatively complete in categories, and the PASCAL VOC2012 data set is trained through a deep neural network and is marked with different objects in various images, so that the image classification training is completed, and the detection accuracy can be greatly improved. And then, constructing a panoramic segmentation network by using the trained data set, and carrying out panoramic segmentation on the image according to the pixel type and the individual difference of the example to finish scene understanding, thereby realizing the function of falling prediction.
(3) The invention adopts the convolution layer data conversion method and the full-connection layer matrix compression method to greatly reduce the calculation amount required by training data and reduce the algorithm operation power consumption, so that the training speed of a deep neural network on a large number of image data sets is greatly improved under the condition of ensuring the image classification accuracy, and the real-time property of the algorithm is ensured.
(4) The invention adopts a panoramic segmentation algorithm based on the characteristic pyramid fusion and the full convolution neural network (FCN) to accurately segment the acquired video image, wherein the characteristic pyramid fusion method reduces the operation amount of the segmentation network algorithm, improves the image segmentation speed, improves the segmentation accuracy of the full convolution neural network, combines semantic segmentation and example segmentation into the same network structure, ensures the pixel classification segmentation and also ensures the differentiation of the individual difference of the examples, so that the image segmentation algorithm is more perfect, the segmentation result is clearer, the scene understanding of the video image is facilitated, and the falling prediction result is more real and reliable.
Drawings
FIG. 1 is a general block diagram of a deep neural network and panorama segmentation based algorithm according to the present invention;
FIG. 2 is a schematic diagram of deep neural network training data according to the present invention;
FIG. 3 is a comparison of the effect of the fully connected layer matrix of the present invention before and after compression;
FIG. 4a is a schematic diagram of a panoramic segmentation network model of the present invention before improvement;
FIG. 4b is a schematic diagram of the panorama segmentation network model of the present invention after improvement;
fig. 5 is a flow chart of the image panorama segmentation algorithm of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
The fall prediction method disclosed by the invention comprises a deep neural network module and a panoramic segmentation module as shown in fig. 1.
The deep neural network module comprises an activation layer, a full connection layer, a convolution layer, a batch normalization layer and a pooling layer. The specific implementation steps of carrying out data set training and constructing a panoramic segmentation network based on the deep neural network are as follows:
step 1, collecting images by using a color camera, wherein the color camera is fixed at the ceiling of an entrance to observe the whole indoor environment. Room objects such as tables, chairs, obstacles, beds, books, stationery and the like are all static objects, indoor light is good, and stable and clear scene images can be captured; the experimental object simulates the behavior posture of the old, the action speed is slow, and the experimental object can be approximately regarded as uniform motion.
And 2, processing the video image obtained in the step 1 by adopting basic image processing algorithms such as Gaussian filtering, median filtering, morphological denoising and the like, thereby eliminating interference factors such as Gaussian noise, salt and pepper noise and the like and being beneficial to further analysis and processing of the image.
And 3, training a data set PASCAL VOC2012, activating a neural network through an activation layer, and inputting image information into the convolutional layer. The activation function is a linear modified unit function (ReLU), and the specific expression is as follows:
Figure BDA0002907651620000081
x denotes the image gradient and f (X) denotes the image gradient resulting from the data set.
And 4, inputting the acquired image information into the convolutional layer, and extracting the image characteristics. Relevant researches have proved that under a relatively stable image acquisition scene, the precision of integer fixed-point calculation can provide the result of shoulder-to-shoulder floating-point operation, and in a convolutional neural network structure, the accuracy of fixed-point calculation with reduced precision and 32-bit floating-point calculation is almost the same. Therefore, the data obtained in the step 3 are converted into integer data by adopting a data conversion method, so that the operation data amount is reduced.
And 5, performing batch normalization processing on the extracted features, and outputting uniformly. The batch normalization process formula is as follows:
Figure BDA0002907651620000082
wherein g (x) is the normalized image output information,x(k)For image dimension information, E denotes expectation, and Var denotes variance.
And 6, sending the images g (x) subjected to batch processing into a pooling layer, performing feature dimension reduction, extracting key features, and compressing image information data volume, so that the calculated amount is reduced, and the training speed is increased. The pond treatment is divided into two methods: average pooling and maximum pooling. The maximum pooling method is adopted, and the obtained image information data volume is as follows:
N=(g(x)-F+2P)/S+1
n is the amount of data after pooling, g (x) is the output of step 5, F is the filter size, P is the number of pixels padded by Padding, and S is the step size.
And 7, transmitting the output result in the step 6 into a full connection layer, and classifying the data set. Operations such as convolutional layer, pooling layer, and activation function can be understood as mapping the original data distribution space to the implicit space, and the fully-connected layer is mapping the learned features to the labeled class space. The experimental object detected by the invention is the old, the movement speed is slow, and the old can be approximately regarded as uniform movement, so that the nonlinear operation can be filtered. The invention adopts a matrix compression method, decomposes the original large full-connection layer matrix into two small full-connection layer matrices and an intermediate layer matrix by using a matrix Singular Value Decomposition (SVD) method, wherein the intermediate layer comprises less neurons, so that the matrix compression can be realized, the connection number and the weight scale are reduced, and the calculation and storage requirements are reduced. The specific implementation method is shown in the following formula:
Figure BDA0002907651620000091
wherein: suppose that
Figure BDA0002907651620000092
The m × N matrix in the full connection layer FC8, U, V and N are intermediate variables of the SVD transformation matrix,
Figure BDA0002907651620000093
is a middleA layer matrix. The original weight matrix becomes the form of multiplication of two matrices:
Figure BDA0002907651620000094
the matrix dot product satisfies the exchange rate, so the mapping of output y to input N can be expressed as:
Figure BDA0002907651620000095
wherein b represents the offset of the fully-connected layer, and if fine tuning of the deep neural network is not required, the value of b is 0. And N is the data volume output in the step 6.
And 8, outputting through a full connection layer, classifying and identifying all images in the data set, and marking the category to which each image belongs, wherein the data set training is finished at the moment. And matching the video image obtained in the step (2) with the trained data set, and classifying and identifying all things in the video image so as to construct a panoramic segmentation image network. During classification and identification, human bodies and articles are mainly distinguished, image labeling is mainly adopted, and a universal image labeling method is adopted in practical application.
Due to the fact that data conversion and matrix compression processing are carried out in the process of training the data set, the image data volume is greatly reduced, calculation time consumption caused by large data volume is saved, and training efficiency is improved. According to the method, the image data volume is compressed, so that the training efficiency is greatly improved on the premise of not influencing the accuracy of classification and identification, the calculation time loss is reduced, and the real-time falling prediction function is guaranteed.
The image panorama segmentation module comprises feature pyramid fusion, and semantic segmentation and instance segmentation based on a full convolution neural network (FCN). The method comprises the steps of extracting image features by adopting a feature pyramid fusion method, then realizing image pixel level analysis through semantic segmentation, distinguishing different types of images according to pixels, and finally distinguishing individual differences among the same types through example segmentation to realize scene understanding, thereby realizing falling prediction.
The specific implementation mode comprises the following steps:
and 9, carrying out characteristic pyramid fusion on the image information output in the step 8, and extracting image characteristics. The present invention uses ResNet-50 as the underlying network for image feature extraction. ResNet is divided into 5 stages according to the size of feature maps, which are respectively called res1, res2, res3, res4 and res5, and the feature map sizes are respectively 1/2,1/4,1/8,1/16 and 1/32 of the original. For the visual task, the depth of the network corresponds to the receptive field, and the larger the receptive field of the pixel points on the deep characteristic diagram is, the stronger the classification capability is. The fused feature maps with different resolutions can be used for object detection with corresponding resolution sizes respectively. The method can ensure that each layer has proper resolution and strong semantic features, and meanwhile, the method only adds extra cross-layer connection on the original basic network and hardly adds extra time and calculation amount.
The implementation principle is shown in the following formula:
Figure BDA0002907651620000101
wherein L isiDenotes the result after fusion of the i-th layer features, g (x)i) For the ith layer feature input, UP represents an upsampling operation,
Figure BDA0002907651620000102
the characteristic nucleus size was 5 x 5.
And step 10, performing semantic segmentation on the image after feature fusion, selecting an interested area through a candidate frame, analyzing each pixel through the interested area, and applying the panoramic segmentation network trained in the step 8 to realize semantic category prediction of each pixel and distinguish different objects. The pixel class prediction formula is as follows:
Figure BDA0002907651620000103
wherein: l (p)i,li) For the pixel class prediction result, i is the pixel index, piIs the pixel probability, p* iFor labeling the probability, λ is the partition coefficient, liFour coordinates representing the true candidate box boundary for a vector,
Figure BDA0002907651620000104
to predict candidate box boundary coordinates. N is a radical ofclsIs the total number of pixels of the object class, LclsFor the log-loss function of the object class (including the background), the formula is calculated as:
Figure BDA0002907651620000105
Nregis the number of pixels in the region of interest, LregAs a regression loss function, the calculation formula is:
Figure BDA0002907651620000111
wherein smooth is a smoothing processing function, and the obtained data is converted into 8 integer data, so that the calculation amount is reduced, and the data storage space is saved.
And 11, performing example segmentation on the image output in the step 10, wherein the example segmentation task needs not only to predict the pixel-level class, but also to distinguish different individuals belonging to the same class, namely to predict the example identification number. According to the invention, different objects of the same type are distinguished by setting example mask region segmentation, so that panoramic segmentation and scene understanding of the image are realized, and the specific situation of a person in the surrounding environment is accurately judged. An example segmentation formula is shown below:
Figure BDA0002907651620000112
wherein: l isins(xi) Representing the result of the segmentation of the image instance, xiIs the ith pixelPoint, Nmask(i,j)The number of example mask segmentation areas is shown.
And step 12, completing the panoramic division task of the video image through step 11. The image segmentation model marked with the category can be obtained through the deep neural network and the image panorama segmentation algorithm, and the specific situation of people in the environment can be identified according to the model. The invention sets that if no water or barrier exists in the pedestrian passageway, the pedestrian passageway is judged to be a safe environment; if accumulated water or obstacles exist on the pavement of the pedestrian, the pedestrian is judged to be a general dangerous environment; if dangerous factors such as stairs, water accumulation, barriers and the like which are easy to fall down exist in the pedestrian passageway, the high-risk environment is judged, and an alarm is triggered to remind pedestrians and medical care personnel to pay attention.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention shall fall within the protection scope defined by the claims of the present invention.

Claims (7)

1. A falling prediction method based on the combination of a deep neural network and a panoramic segmentation method is characterized in that: which comprises the following steps:
step 1, acquiring a stable indoor video image by using a full-color camera;
step 2, carrying out image processing on the video image obtained in the step 1, eliminating noise interference factors and obtaining processed image information;
step 3, training a data set PASCALVOC2012, activating a neural network through an activation layer, and inputting the processed image information obtained in the step 2 into a convolutional layer;
step 4, extracting image characteristics after inputting the acquired image information into the convolutional layer, converting the floating point data into integer data by filtering out decimal numbers by adopting a data conversion method for the data acquired in the step 3, and thus reducing the amount of operation data;
step 5, performing batch normalization processing on the extracted features, and uniformly outputting the features;
step 6, sending the images subjected to batch normalization processing into a pooling layer, performing feature dimensionality reduction, and extracting key features as output results, wherein the key features comprise main body components, contours, shapes and texture features in the images;
and 7, transmitting the output result in the step 6 into a full-connection layer, and classifying the data sets, wherein the classification of the data sets specifically adopts a matrix compression method, and specifically comprises the following steps: decomposing an original large full-connection layer matrix into two small full-connection layer matrices and an intermediate layer matrix by using a matrix singular value decomposition method, wherein the two small full-connection layer matrices comprise most neurons, and the intermediate layer matrix comprises a small number of neurons;
the matrix singular value decomposition method is specifically shown in the following formula:
Figure FDA0002907651610000011
wherein: suppose that
Figure FDA0002907651610000012
The m × N matrix in the full connection layer FC8, U, V and N are intermediate variables of the SVD transformation matrix,
Figure FDA0002907651610000013
for the middle layer matrix, the original weight matrix becomes the multiplication form of two matrices:
Figure FDA0002907651610000021
the matrix dot product satisfies the exchange rate, so the mapping of output y to input N is shown as:
Figure FDA0002907651610000022
b represents the offset of the full connection layer, if fine adjustment of the deep neural network is not needed, the value of b is 0, and N is the data volume output in the step 6;
step 8, outputting through a full connection layer, classifying and identifying all images in the data set, marking the categories of all the images, finishing the training of the data set, matching the video image obtained in the step 2 with the trained data set, and classifying and identifying all things in the video image so as to construct a panoramic segmentation image network;
step 9, carrying out characteristic pyramid fusion on the image information output in the step 8, and extracting an image after characteristic fusion;
step 10, performing semantic segmentation on the image after the feature fusion obtained in the step 9, selecting an interested area through a candidate frame, analyzing each pixel through the interested area, applying the panoramic segmentation image network trained in the step 8, realizing semantic category prediction on each pixel by using a pixel category prediction formula, and distinguishing different types of objects;
step 11, performing example segmentation on the image output in the step 10, distinguishing different objects of the same type by setting example mask region segmentation,
an example segmentation formula is shown below:
Figure FDA0002907651610000023
wherein: l isins(xi) Representing the result of the segmentation of the image instance, xiIs the ith pixel point, Nmask(i,j)Representing the number of example mask segmentation areas;
step 12, after step 11, completing a panoramic video image segmentation task, then obtaining an image segmentation model labeled with categories through a deep neural network and an image panoramic segmentation algorithm, classifying the image segmentation model according to the object placement condition, determining each identified object by a risk coefficient, and only defining the risk level according to the risk coefficient, specifically, identifying various specific conditions in the environment according to the image segmentation model, wherein the specific conditions are as follows: if no water or obstacles exist in the environment, the environment is judged to be a safe environment; if accumulated water or obstacles exist in the environment, the environment is judged to be a general dangerous environment; if dangerous factors such as stairs, water accumulation, barriers and the like which are easy to fall down exist in the environment, the environment is determined to be a high-risk environment, and an alarm is triggered to remind pedestrians and medical care personnel to pay attention.
2. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: the activation function selected in step 3 is a linear correction unit function, and the specific expression is as follows:
Figure FDA0002907651610000031
where X denotes the image gradient and f (X) denotes the image gradient resulting from the data set.
3. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in the step 4, the 32-bit floating-point data obtained in the step 3 is converted into 8-bit integer data.
4. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in step 5, the batch normalization processing formula applied is:
Figure FDA0002907651610000032
wherein g (x) is normalized image output information, x(k)For image dimension information, E denotes expectation, and Var denotes variance.
5. The fall pre-judging method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 4, wherein: the pooling layer processing method in the step 6 adopts a maximum pooling method to reduce the calculated amount and improve the training speed, and the obtained image information data amount is as follows:
N=(g(x)-F+2P)/S+1
n is the amount of data after pooling, g (x) is the output of step 5, F is the filter size, P is the number of pixels padded by Padding, and S is the step size.
6. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in the step 9, ResNet-50 is used as a basic network for image feature extraction, and the implementation principle is shown in the following formula:
Figure FDA0002907651610000041
wherein L isiAs a result of fusion of the ith layer features, g (x)i) For the ith layer feature input, UP is the upsampling operation,
Figure FDA0002907651610000042
the characteristic nucleus size was 5 x 5.
7. The fall prediction method based on the combination of the deep neural network and the panorama segmentation method as claimed in claim 1, wherein: in step 10, the pixel class prediction formula is as follows:
Figure FDA0002907651610000043
wherein: l (p)i,li) For the pixel class prediction result, i is the pixel index, piIs the pixel probability, p* iFor labeling the probability, λ is the partition coefficient, liAs vectors, tablesFour coordinates, l, representing the true candidate box boundaryi *To predict the candidate box boundary coordinates, NclsIs the total number of pixels of the object class, LclsLogarithmic loss function of object class, LclsThe calculation formula of (2) is as follows:
Figure FDA0002907651610000044
Nregis the number of pixels in the region of interest, LregAs a function of the regression loss, LregThe calculation formula of (2) is as follows:
Figure FDA0002907651610000045
where smooth is a smoothing function that converts the resulting data into 8-bit integer data.
CN202110076029.8A 2021-01-20 2021-01-20 Falling pre-judging method based on deep neural network and panoramic segmentation Active CN112766165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110076029.8A CN112766165B (en) 2021-01-20 2021-01-20 Falling pre-judging method based on deep neural network and panoramic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110076029.8A CN112766165B (en) 2021-01-20 2021-01-20 Falling pre-judging method based on deep neural network and panoramic segmentation

Publications (2)

Publication Number Publication Date
CN112766165A CN112766165A (en) 2021-05-07
CN112766165B true CN112766165B (en) 2022-03-22

Family

ID=75701752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110076029.8A Active CN112766165B (en) 2021-01-20 2021-01-20 Falling pre-judging method based on deep neural network and panoramic segmentation

Country Status (1)

Country Link
CN (1) CN112766165B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297991A (en) * 2021-05-28 2021-08-24 杭州萤石软件有限公司 Behavior identification method, device and equipment
CN113449611B (en) * 2021-06-15 2023-07-07 电子科技大学 Helmet recognition intelligent monitoring system based on YOLO network compression algorithm
CN114595748B (en) * 2022-02-21 2024-02-13 南昌大学 Data segmentation method for fall protection system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967516A (en) * 2017-10-12 2018-04-27 中科视拓(北京)科技有限公司 A kind of acceleration of neutral net based on trace norm constraint and compression method
CN110276765A (en) * 2019-06-21 2019-09-24 北京交通大学 Image panorama dividing method based on multi-task learning deep neural network
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network
CN112163564A (en) * 2020-10-26 2021-01-01 燕山大学 Tumble prejudging method based on human body key point behavior identification and LSTM (least Square TM)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11179064B2 (en) * 2018-12-30 2021-11-23 Altum View Systems Inc. Method and system for privacy-preserving fall detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967516A (en) * 2017-10-12 2018-04-27 中科视拓(北京)科技有限公司 A kind of acceleration of neutral net based on trace norm constraint and compression method
CN110276765A (en) * 2019-06-21 2019-09-24 北京交通大学 Image panorama dividing method based on multi-task learning deep neural network
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network
CN112163564A (en) * 2020-10-26 2021-01-01 燕山大学 Tumble prejudging method based on human body key point behavior identification and LSTM (least Square TM)

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Ahmad Lotfi等.Supporting Independent Living for Older Adults *
Employing a Visual Based Fall Detection Through Analysing the Motion and Shape of the Human Body.《IEEE Access ( Volume: 6)》.2018,正文全文. *
基于深度学习的跌倒行为识别;马露;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200615;正文全文 *
跌倒检测算法研究及其在移动机器人平台系统实现;孙鹏飞;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190815;正文全文 *

Also Published As

Publication number Publication date
CN112766165A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN107341452B (en) Human behavior identification method based on quaternion space-time convolution neural network
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN110363183B (en) Service robot visual image privacy protection method based on generating type countermeasure network
CN112766165B (en) Falling pre-judging method based on deep neural network and panoramic segmentation
CN106407889B (en) Method for recognizing human body interaction in video based on optical flow graph deep learning model
CN112818764B (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN107085716A (en) Across the visual angle gait recognition method of confrontation network is generated based on multitask
CN112464730B (en) Pedestrian re-identification method based on domain-independent foreground feature learning
CN111639719A (en) Footprint image retrieval method based on space-time motion and feature fusion
CN111241963B (en) First person view video interactive behavior identification method based on interactive modeling
CN110991340A (en) Human body action analysis method based on image compression
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
CN106056553A (en) Image inpainting method based on tight frame feature dictionary
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN112149613B (en) Action pre-estimation evaluation method based on improved LSTM model
CN109522865A (en) A kind of characteristic weighing fusion face identification method based on deep neural network
CN116664462A (en) Infrared and visible light image fusion method based on MS-DSC and I_CBAM
CN114299279B (en) Mark-free group rhesus monkey motion quantity estimation method based on face detection and recognition
CN115346272A (en) Real-time tumble detection method based on depth image sequence
Wang et al. Infrared and visible image fusion based on Laplacian pyramid and generative adversarial network.
CN116993760A (en) Gesture segmentation method, system, device and medium based on graph convolution and attention mechanism
CN116580450A (en) Method for recognizing gait at split viewing angles
CN112613405B (en) Method for recognizing actions at any visual angle
Mo et al. The image inpainting algorithm used on multi-scale generative adversarial networks and neighbourhood
CN113673303A (en) Human face action unit intensity regression method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant