Method and system for identifying solitary old people based on electricity consumption data
Technical Field
The invention belongs to the technical field of power consumption data analysis and application, and relates to a method and a system for identifying solitary old people based on power consumption data.
Background
At present, the monitoring method for the elderly living alone is more, if the elderly living alone is detected through intelligent equipment, the elderly are prevented from encountering dangers and being unable to be rescued timely, such as intelligent walking sticks, intelligent hand rings and the like, or behaviors, expressions and the like of the elderly are identified and monitored through video monitoring and artificial intelligence technology.
However, many solitary old people often live at the edge of the society, especially solitary old people who urgently need help, how to find the solitary old people to provide necessary help for the solitary old people, and at present, no good method exists. The list of solitary old people is often obtained only by the basic community visit survey. The method consumes huge manpower and material resources and completely depends on community management level and work efficiency. In some huge communities with large and complex population, the list information of the elderly people living alone is difficult to update in time. Although the intensity of community work can be reduced by installing monitoring equipment and the like, a large amount of capital investment is required in the early stage, and special maintenance is required in the later stage.
Disclosure of Invention
In order to overcome the defects in the prior art, the application provides a solitary old man identification method and system based on electricity consumption data.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the method for identifying the old people living alone based on the electricity consumption data is characterized by comprising the following steps:
the method comprises the following steps:
step 1: randomly selecting batch low-voltage user electricity utilization data, cleaning the selected data, and removing abnormal values and null values of electricity utilization;
step 2: carrying out feature selection and vectorization on the data in the step 1;
and step 3: performing clustering analysis on the vectorized data in the step 2 by using a clustering algorithm;
and 4, step 4: step 3, combining the clustering analysis result with the power marketing data, and screening sample data of suspected solitary old people;
and 5: randomly verifying the sample data of the suspected solitary old people group in the step 4, and determining a positive sample and a negative sample for model training;
step 6: constructing a solitary old man identification model, and respectively labeling and combining the positive sample and the negative sample determined in the step (5) to be used as training data of the model to train the solitary old man identification model;
and 7: and acquiring full user power consumption data, and identifying the elderly people living alone by using the elderly people living alone identification model.
The invention further comprises the following preferred embodiments:
preferably, the low-voltage user in the step 1 is a user with an access voltage lower than 380V;
the used electricity consumption data is randomly selected 50 universal users of the daily electricity consumption in the last two years;
the abnormal value includes that the daily electricity consumption is a negative value or the daily electricity consumption is larger than a daily electricity consumption set threshold.
Preferably, in step 2, the electricity consumption statistical characteristics are selected and calculated for each user, and the electricity consumption statistical characteristics of each user are stored in a vector form.
Preferably, the electricity utilization statistical characteristics comprise average electricity utilization ratio of summer to winter, average level ratio of working days to non-working days, average electricity utilization and variance of holidays over three days and holidays over three days.
Preferably, the specific steps of step 3 are as follows:
step 301: taking each user feature vector as a class, and calculating the minimum Euclidean distance between every two users;
step 302: combining the two classes with the minimum Euclidean distance into a new class;
step 303: repeatedly calculating the distances between the new class and all classes;
step 304: step 302 and step 303 are repeated until all classes are merged into one class.
Preferably, the electric marketing data of step 4 comprises a user basic file and a payment channel.
Preferably, in step 5, random sampling is performed from sample data of the suspected solitary old people group, field verification is performed, if the accuracy reaches a set threshold, the sample data of the suspected solitary old people group in step 4 is used as a negative sample, the data group which is the farthest in the euclidean distance from the negative sample in the clustering result in step 3 is used as a positive sample, and if the accuracy does not reach the set threshold, the step 2 is returned, and feature selection and vectorization processing are performed again.
Preferably, in step 6, a random forest model is adopted to construct the elderly people living alone recognition model, the number of samples is assumed to be N, each sample has M characteristics, and the specific training steps are as follows:
step 601: sampling is carried out on samples for N times, 1 sample is obtained each time, N samples are formed, and a decision tree is trained by utilizing the randomly selected N samples to serve as samples at the root node of the decision tree;
step 602: when each node of the decision tree in the step 601 is split, randomly selecting M features from the M features, ensuring that M < < M, and then selecting one feature from the M features as the splitting feature of the node by using an information gain strategy;
step 603: repeating step 602 until the decision tree node cannot be split;
step 604: establishing a batch decision tree according to the sequence of the step 601, the step 602 and the step 603;
step 605: and (4) forming a random forest by the decision tree formed in the step 604 and using the random forest as a solitary old man identification model.
Preferably, in step 7, feature selection and vectorization are carried out on the full-amount user electricity consumption data according to the mode in step 2, and the feature selection and vectorization are input into the solitary old man recognition model trained in step 6 to obtain a solitary old man list;
the electricity consumption data of the full-quantity users refers to the daily electricity consumption of all low-voltage users in the whole province in the last two years.
The present application also discloses another invention, namely, a solitary old man identification system based on electricity consumption data, the system comprising:
the initial data acquisition module is used for randomly selecting batch low-voltage user electricity utilization data, cleaning the selected data and removing abnormal values and null values of electricity utilization;
the characteristic selection and vectorization module is used for carrying out characteristic selection and vectorization on the data of the initial data acquisition module;
the cluster analysis module is used for carrying out cluster analysis on the data of the opposite quantization by utilizing a clustering algorithm based on a hierarchy from bottom to top;
the sample data screening module is used for screening sample data of suspected solitary old people groups by utilizing the clustering analysis result in combination with the electric power marketing data;
the training sample verification module is used for randomly verifying the sample data of the suspected solitary old people group and determining a positive sample and a negative sample for model training;
the model building module is used for building a recognition model of the elderly living alone and performing model training by using a positive sample and a negative sample;
and the identification module is used for acquiring the full-scale electricity utilization data of the user and identifying the solitary old people by utilizing the solitary old people identification model.
The beneficial effect that this application reached:
1. according to the method, through analysis of power consumption data and auxiliary verification of other power marketing data, a list of high-probability solitary old people is rapidly found out, investigation work of communities and the like is reduced, and a precondition is provided for the follow-up care of the work of the solitary old people;
2. the invention can obtain data by means of electric power acquisition equipment such as an ammeter and the like, and does not need to invest a large amount of capital to install monitoring equipment; the old people living alone identification model can be constructed by utilizing original power utilization data under the condition of lacking a training sample, and the old people living alone can be quickly positioned.
Drawings
Fig. 1 is a flow chart of a solitary old man identification method based on electricity consumption data according to the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
The method utilizes the electricity consumption data of the users to quickly position the users of the elderly living alone, and obtains the labels and the characteristic information of the positive and negative samples for model training in the modes of clustering algorithm, electric power marketing data auxiliary verification and the like under the condition of lacking of training samples.
Specifically, as shown in fig. 1, the method for identifying the elderly living alone based on the electricity consumption data of the present invention includes the following steps:
step 1: randomly selecting batch low-voltage user electricity utilization data, cleaning the selected data, and removing abnormal values, null values and the like of electricity utilization, wherein the low-voltage users are users with access voltage lower than 380V;
the used electricity consumption data is randomly selected 50 universal users of the daily electricity consumption in the last two years;
the abnormal value comprises that the daily electricity consumption is a negative value or the daily electricity consumption is too large;
step 2: and (4) performing feature selection and vectorization on the data in the step 1.
Selecting power utilization statistical characteristics, calculating the power utilization statistical characteristics of each user, and storing the power utilization statistical characteristics of each user in a vector form;
the user electricity utilization statistical characteristics comprise average power consumption ratio in summer and winter, level average ratio between working days and non-working days, average power consumption and variance in holidays over three days and in holidays over non-three days and the like.
And step 3: and (3) carrying out clustering analysis on the vectorized data in the step (2) by using a clustering algorithm, wherein the specific clustering algorithm comprises the following steps:
step 301: taking each user feature vector as a class, and calculating the minimum Euclidean distance between every two users, if the feature vector of the user a is A ═ a (a)
1,a
2,a
3…), the feature vector of user B is B ═ B (B)
1,b
2,b
3…), the feature vector of user C is C ═ C (C)
1,c
2,c
3…), euclidean distance L of user a from user b feature vectors
abIs composed of
The Euclidean distance L between the user a and the user c can be obtained by the same method
acAnd Euclidean distance L between user b and user c
bc。
Step 302: merging two classes with the minimum Euclidean distance into a new class, such as Lab、Lbc、LacAmong three distances, LabIf the minimum, combining the user a and the user b into a new class;
step 303: repeatedly calculating the distances between the new class and all classes;
step 304: step 302 and step 303 are repeated until all classes are merged into one class.
Through the clustering, users with similar electricity utilization characteristics can be classified into the same group.
And 4, step 4: step 3, screening sample data of suspected solitary old people group by combining the clustering analysis result with the modes of electric power marketing data and the like;
the electric power marketing data comprises a user basic file, a payment channel and the like;
for example, information such as the age of the user and whether the user is paying off-line may be used as a condition, and a clustering group meeting the condition in the clustering result may be screened out.
And 5: randomly verifying the sample data of the suspected solitary old people group in the step 4, and determining a positive sample and a negative sample for model training;
and (3) randomly sampling from the sample data of the suspected solitary old people group, carrying out on-site verification, and if the accuracy reaches a set threshold value such as 80%, taking the sample data of the suspected solitary old people group in the step (4) as a negative sample, and taking the data group which is the farthest Euclidean distance from the negative sample in the clustering result in the step (3) as a positive sample. If the accuracy rate does not reach 80%, returning to the step 2, and performing feature selection and vectorization again;
step 6: constructing a solitary old man identification model, and respectively labeling and combining the positive sample and the negative sample determined in the step (5) to be used as training data of the model to train the solitary old man identification model, wherein the model is a two-class model and needs to judge whether input data is solitary old man, so that samples for training need two types, namely the positive sample and the negative sample determined in the step (5);
a random forest model is adopted to construct a solitary old man identification model, the number of samples is assumed to be N, each sample has M characteristics, and the specific training steps are as follows:
step 601: sampling is carried out on samples for N times, 1 sample is obtained each time, N samples are formed, and a decision tree is trained by utilizing the randomly selected N samples to serve as samples at the root node of the decision tree;
step 602: when each node of the decision tree in the step 601 is split, randomly selecting M features from the M features, ensuring that M < < M, and then selecting one feature from the M features as the splitting feature of the node by using an information gain strategy;
step 603: repeating step 602 until the decision tree node cannot be split;
step 604: establishing a batch decision tree according to the sequence of the step 601, the step 602 and the step 603;
step 605: and (4) forming a random forest by the decision tree formed in the step 604 and using the random forest as a solitary old man identification model.
And 7: and acquiring full user power consumption data, and identifying the elderly people living alone by using the elderly people living alone identification model.
Selecting and vectorizing characteristics of the full-user electricity consumption data according to the mode of the step 2, inputting the characteristics into the solitary old man identification model trained in the step 6, and obtaining a solitary old man list;
the electricity consumption data of the full-quantity users refers to the daily electricity consumption of all low-voltage users in the whole province in the last two years.
An elderly people solitary identification system based on power consumption data, the system comprising:
the initial data acquisition module is used for randomly selecting batch low-voltage user electricity utilization data, cleaning the selected data and removing abnormal values and null values of electricity utilization;
the characteristic selection and vectorization module is used for carrying out characteristic selection and vectorization on the data of the initial data acquisition module;
the cluster analysis module is used for carrying out cluster analysis on the data of the opposite quantization by utilizing a clustering algorithm based on a hierarchy from bottom to top;
the sample data screening module is used for screening sample data of suspected solitary old people groups by utilizing the clustering analysis result in combination with the electric power marketing data;
the training sample verification module is used for randomly verifying the sample data of the suspected solitary old people group and determining a positive sample and a negative sample for model training;
the model building module is used for building a recognition model of the elderly living alone and performing model training by using a positive sample and a negative sample;
and the identification module is used for acquiring the full-scale electricity utilization data of the user and identifying the solitary old people by utilizing the solitary old people identification model.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.