Nothing Special   »   [go: up one dir, main page]

CN110597795A - Data preprocessing system and method - Google Patents

Data preprocessing system and method Download PDF

Info

Publication number
CN110597795A
CN110597795A CN201910785781.2A CN201910785781A CN110597795A CN 110597795 A CN110597795 A CN 110597795A CN 201910785781 A CN201910785781 A CN 201910785781A CN 110597795 A CN110597795 A CN 110597795A
Authority
CN
China
Prior art keywords
data
numerical value
equipment
state data
preprocessing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910785781.2A
Other languages
Chinese (zh)
Other versions
CN110597795B (en
Inventor
刘伟
何金辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
THINKHOME Co Ltd
Original Assignee
THINKHOME Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by THINKHOME Co Ltd filed Critical THINKHOME Co Ltd
Priority to CN201910785781.2A priority Critical patent/CN110597795B/en
Publication of CN110597795A publication Critical patent/CN110597795A/en
Application granted granted Critical
Publication of CN110597795B publication Critical patent/CN110597795B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Air Conditioning Control Device (AREA)
  • Selective Calling Equipment (AREA)

Abstract

The invention provides a data preprocessing system and a data preprocessing method, which relate to the technical field of intelligent home, and comprise a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring running state data of all intelligent home equipment, indoor environment data acquired by a plurality of sensing equipment arranged indoors and outdoor environment data; the data cleaning module is used for performing data cleaning on the operation state data according to a first preset rule to generate an equipment state data set, performing data cleaning on the indoor environment data according to a second preset rule to generate an indoor environment data set, and performing data cleaning on the outdoor environment data according to a third preset rule to generate an outdoor environment data set; and the data storage module is used for adding the equipment state data set, the indoor environment data set and the outdoor environment data set into a user behavior data set and storing the user behavior data set for further data mining. The invention ensures that the data more conforms to the requirement of mining by adjusting the data format and the content, and ensures the accuracy and the effectiveness of the mining data.

Description

Data preprocessing system and method
Technical Field
The invention relates to the technical field of smart home, in particular to a data preprocessing system and method.
Background
In the big data era, the emergence of massive data enables people to obtain unprecedented large-scale samples when dealing with computational problems, but at the same time, people have to face more complex data objects. In the real world, original data is acquired from each actual application system, and because the data of each actual application system lacks the definition of a unified standard and has larger difference in data structure, the data between the systems has larger inconsistency and can not be directly used. Meanwhile, due to defects existing in the design of an actual application system and human factors in the use process, some data attribute values may be lost or uncertain conditions may occur in data records, and necessary data may be lost to cause data incompleteness, so that data mining cannot be directly performed, or the mining result is not satisfactory. Therefore, data preprocessing is a very important data preparation work before data analysis and mining.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a data preprocessing system, which is characterized in that a plurality of intelligent household devices are preset in a user room, and the data preprocessing system specifically comprises:
a data acquisition module, the data acquisition module specifically comprising:
the first data acquisition unit is used for acquiring the running state data of all the intelligent household equipment used by the user;
the second data acquisition unit is connected with the first data acquisition unit and used for acquiring indoor environment data which corresponds to the operation state data and is acquired by a plurality of sensing equipment arranged in the user room;
the third data acquisition unit is connected with the first data acquisition unit and used for acquiring outdoor environment data corresponding to the running state data;
the data cleaning module is connected with the data acquisition module, and specifically comprises:
the first data cleaning unit is used for performing data cleaning on the running state data of the intelligent home according to a first preset rule, generating a corresponding equipment state data set according to a data cleaning result and outputting the corresponding equipment state data set;
the second data cleaning unit is used for carrying out data cleaning on the indoor environment data according to a second preset rule, generating a corresponding indoor environment data set according to a data cleaning result and outputting the indoor environment data set;
the third data cleaning unit is used for cleaning the outdoor environment data according to a third preset rule, generating a corresponding outdoor environment data set according to a data cleaning result and outputting the corresponding outdoor environment data set;
and the data storage module is connected with the data cleaning module and is used for adding the equipment state data set, the indoor environment data set and the outdoor environment data set into a user behavior data set and storing the data set for further data mining.
Preferably, the first data cleansing unit specifically includes:
the data storage subunit is configured to store a preset first device information set, where the first device information set includes information of all the smart home devices of the user;
the first extraction subunit is used for extracting information of all the intelligent household equipment corresponding to each piece of running state data, and generating a second equipment information set according to an extraction result;
a data comparing subunit, respectively connected to the data storing subunit and the first extracting subunit, and configured to compare the first device information set with the second device information set, and output a first comparison result when the first device information set is equal to the second device information set, an
Outputting a second comparison result when the first device information set is not equal to the second device information set;
a data judging subunit, connected to the data comparing subunit, and configured to output, according to the first comparison result, a corresponding first judgment result when the operating state data corresponding to the smart home device in the second device information set indicates that the smart home device is an unadjustable mode device;
outputting a corresponding second judgment result when the running state data corresponding to the intelligent home equipment in the second equipment information set represents that the intelligent home equipment is adjustable mode equipment;
a first processing subunit, connected to the data determining subunit, and configured to set, according to the first determination result and when the operation state data corresponding to the smart home device indicates that the smart home device is in an on state, the operation state data as a first numerical value and add the first numerical value to the device state data set, and
when the running state data corresponding to the intelligent home equipment indicate that the intelligent home equipment is in a closed state, setting the running state data as a second numerical value and adding the second numerical value into the equipment state data set;
a second processing subunit, connected to the data determining subunit, and configured to set, according to the second determination result and when the operation state data corresponding to the smart home device indicates that the smart home device is in an on state, a mode in which the smart home device is located in the operation state data to a corresponding third numerical value, and add the third numerical value and an attribute value corresponding to the mode to the device state data set,
when the running state data shows that the intelligent household equipment is in a closed state, setting the mode as a corresponding fourth numerical value, setting an attribute value corresponding to the mode as a fifth numerical value, and adding the fourth numerical value and the fifth numerical value into the equipment state data set;
and the third processing subunit is connected with the data comparison subunit, and is configured to calculate a difference set between the first device information set and the second device information set according to the second comparison result, set the operating state data of each smart home device in the difference set to a sixth numerical value, and add the operating state data to the device state data set.
Preferably, the first value is 1, and the second value, the fourth value, the fifth value and the sixth value are all 0.
Preferably, the indoor environment data comprises light intensity data and a plurality of indoor detection data;
the second data cleansing unit specifically includes:
the fourth processing subunit is used for carrying out grading division on the light intensity data according to a preset magnitude, representing a grading division result by a seventh value and adding the grading division result into the indoor environment data set;
and the fifth processing subunit is used for converting each indoor detection data according to the national standard, representing the conversion result by using a first numerical value set and adding the conversion result into the indoor environment data set.
Preferably, the preset magnitude is set as the preset magnitude or ten as the preset magnitude.
Preferably, the outdoor environment data comprises visibility, weather phenomena, wind direction, wind speed and a plurality of outdoor detection data;
the third data cleaning unit specifically includes:
the sixth processing subunit is configured to perform level division on the visibility according to a national classification standard, and add a level division result to the outdoor environment data set, where the level division result is represented by an eighth value;
the seventh processing subunit is configured to represent the weather phenomenon by a ninth value according to the third preset rule and add the ninth value to the outdoor environment data set;
the eighth processing subunit is configured to represent and add the wind direction by a tenth numerical value to the outdoor environment data set according to the third preset rule;
and the ninth processing subunit is used for representing the wind speed by an eleventh numerical value according to the third preset rule and adding the wind speed into the outdoor environment data set.
A data preprocessing method is applied to any one of the data preprocessing systems, and specifically comprises the following steps:
step S1, the data preprocessing system acquires state data of all intelligent household equipment used by a user, and a plurality of sensing equipment arranged in a user room acquire indoor environment data corresponding to the operation state data and outdoor environment data corresponding to the operation state data;
step S2, the data preprocessing system carries out data cleaning on the running state data of the smart home according to a first preset rule, and generates a corresponding equipment state data set according to a data cleaning result;
step S3, the data preprocessing system carries out data cleaning on the indoor environment data according to a second preset rule and generates a corresponding indoor environment data set according to a data cleaning result;
step S4, the data preprocessing system carries out data cleaning on the outdoor environment data according to a third preset rule and generates a corresponding outdoor environment data set according to a data cleaning result;
step S5, the data preprocessing system adds the device state data set, the indoor environment data set, and the outdoor environment data set to a user behavior data set and stores them for further data mining.
Preferably, the step S2 specifically includes:
step S21, the data preprocessing system presets and stores a first device information set, wherein the first device information set comprises information of all the intelligent household devices of the user;
step S22, the data preprocessing system extracts information of all the intelligent household equipment corresponding to the running state data, and generates a second equipment information set according to the extraction result;
step S23, the data preprocessing system compares the first set of device information with the second set of device information:
if the first device information set is equal to the second device information set, go to step S24;
if the first device information set is not equal to the second device information set, go to step S27;
step S24, the data preprocessing system determines, according to the operation state data corresponding to the smart home devices in the second device information set:
if the intelligent household equipment is the equipment in the non-adjustable mode, turning to the step S25;
if the intelligent household equipment is adjustable mode equipment, turning to step S26;
step S25, when the running state data corresponding to the intelligent household equipment shows that the intelligent household equipment is in the starting state, the data preprocessing system sets the running state data as a first numerical value and adds the first numerical value into the equipment state data set, and
when the running state data corresponding to the intelligent home equipment indicate that the intelligent home equipment is in a closed state, setting the running state data as a second numerical value and adding the second numerical value into the equipment state data set;
step S26, when the running state data corresponding to the smart home devices indicate that the smart home devices are in the open state, the data preprocessing system sets the mode of the smart home devices in the running state data to be a corresponding third numerical value, and adds the third numerical value and the attribute value corresponding to the mode to the device state data set,
when the running state data shows that the intelligent household equipment is in a closed state, setting the mode as a corresponding fourth numerical value, setting an attribute value corresponding to the mode as a fifth numerical value, and adding the fourth numerical value and the fifth numerical value into the equipment state data set;
step S27, the data preprocessing system calculates a difference set between the first device information set and the second device information set, sets the operating state data of each smart home device in the difference set to a sixth value, and adds the operating state data to the device state data set.
Preferably, the indoor environment data comprises light intensity data and a plurality of indoor detection data;
the step S3 specifically includes:
step S31, the data preprocessing system carries out grading division on the light intensity data according to a preset magnitude, and a grading division result is represented by a seventh numerical value and added into the indoor environment data set;
step S32, the data preprocessing system converts each of the indoor detection data according to the national standard, and represents and adds the conversion result to the indoor environment data set by using a first value set.
Preferably, the outdoor environment data comprises visibility, weather phenomena, wind direction, wind speed and a plurality of outdoor detection data;
the step S4 specifically includes:
step S41, the data preprocessing system carries out grade division on the visibility according to the national grading standard, and the grade division result is expressed by an eighth numerical value and added into the outdoor environment data set;
step S42, the data preprocessing system represents the weather phenomenon with a ninth value according to the third preset rule and adds the ninth value into the outdoor environment data set;
step S43, the data preprocessing system represents the wind direction by a tenth numerical value according to the third preset rule and adds the wind direction into the outdoor environment data set;
and step S44, the data preprocessing system represents the wind speed by an eleventh numerical value according to the third preset rule and adds the wind speed into the outdoor environment data set.
The technical scheme has the following advantages or beneficial effects: by adjusting the data format and the content, the data is more in line with the requirement of mining, and the accuracy and the effectiveness of the mining data are ensured.
Drawings
FIG. 1 is a schematic diagram of a data preprocessing system according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart illustrating a data preprocessing method according to a preferred embodiment of the present invention;
fig. 3 is a schematic flow chart illustrating a data cleaning method for operating status data of smart home devices according to a preferred embodiment of the present invention;
FIG. 4 is a flow chart illustrating a data cleansing method for indoor environment data according to a preferred embodiment of the present invention;
fig. 5 is a flow chart illustrating a data cleansing method for outdoor environmental data according to a preferred embodiment of the invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present invention is not limited to the embodiment, and other embodiments may be included in the scope of the present invention as long as the gist of the present invention is satisfied.
In a preferred embodiment of the present invention, based on the above problems in the prior art, a data preprocessing system is provided, which presets a plurality of smart home devices, as shown in fig. 1, and specifically includes:
the data acquisition module 1, the data acquisition module 1 specifically includes:
the first data acquisition unit 11 is configured to acquire operation state data of all smart home devices used by a user;
the second data acquisition unit 12 is connected with the first data acquisition unit 11 and used for acquiring indoor environment data which corresponds to the operation state data and is acquired by a plurality of sensing devices arranged in the user room;
a third data obtaining unit 13 connected to the first data obtaining unit 11, for obtaining outdoor environment data corresponding to the operation state data;
data cleaning module 2 connects data acquisition module 1, and data cleaning module 2 specifically includes:
the first data cleaning unit 21 is configured to perform data cleaning on the operation state data of the smart home according to a first preset rule, and generate and output a corresponding device state data set according to a data cleaning result;
the second data cleaning unit 22 is used for performing data cleaning on the indoor environment data according to a second preset rule, and generating and outputting a corresponding indoor environment data set according to a data cleaning result;
the third data cleaning unit 23 is configured to perform data cleaning on the outdoor environment data according to a third preset rule, and generate and output a corresponding outdoor environment data set according to a data cleaning result;
and the data storage module 3 is connected with the data cleaning module 2 and is used for adding the equipment state data set, the indoor environment data set and the outdoor environment data set into a user behavior data set and storing the data set for further data mining.
Specifically, in this embodiment, the smart home devices include an unadjustable mode device and an adjustable mode device, the running state data of the corresponding unadjustable mode device includes that the smart home device is in an open state or the smart home device is in a closed state, and the running state data of the adjustable mode device includes a mode in which the smart home device is in the open state and an attribute value corresponding to the mode, or the smart home device is in the closed state; the indoor environmental data includes, but is not limited to, one or more of light intensity, temperature, humidity, PM2.5, oxygen content, CO2 concentration, formaldehyde concentration, air flow rate, respirable particles, benzene, ammonia, TVOC; the outdoor environment data includes, but is not limited to, one or more of weather phenomenon, temperature, air pressure, relative humidity, visibility, wind direction, wind speed, and cloud cover.
The running state data of the intelligent household equipment, the indoor environment data collected by the plurality of sensing equipment arranged in the user room and corresponding to the running state data and the outdoor environment data collected by the plurality of sensing equipment arranged in the user room are represented by corresponding numerical values, so that the data format and the content are unified, and the further mining and analysis of the data are facilitated. Specifically, it is assumed that the data acquired by the data acquisition module 1 includes: equipment 1, equipment 2 and equipment 3, the light intensity, the temperature are 30 ℃, the humidity is 40%, the PM2.5 is 16, the oxygen content is 21%, the CO2 concentration is 30%, and the formaldehyde concentration is 0.08mg/m3Air flow rate of 0.2m/s, inhalable particle of 23, benzene content of 0.20mg/m3The ammonia content is 0.2mg/m3TVOC is 0.5mg/m3The weather phenomenon is sunny day, the temperature is 31 ℃, the air pressure is 6 Pa, the relative humidity is 40%, the visibility is 3, the wind direction is northeast wind, the wind speed is 3.0m/s, wherein the equipment 1 is in an unadjustable modeThe device is in an on state, the device 2 is an adjustable mode device and is in an off state, the device 3 is an unadjustable mode device and is in an off state, then:
device status data set { device 1, device 2, device 3} {1, (0,0),0 };
indoor environment data set { light intensity, temperature, humidity, PM2.5, oxygen content, CO2 concentration, formaldehyde concentration, air flow rate, respirable particles, benzene content, ammonia content, TVOC } {3, 30, 40, 16, 21, 30, 0.08, 0.2, 23, 0.20, 0.2, 0.5 };
outdoor environment data set { weather phenomenon, temperature, barometric pressure, relative humidity, visibility, wind direction, wind speed } {1, 31, 6, 40, 3, 2, 2 };
then the user behavior data set is {1, (0,0),0, 3, 30, 40, 16, 21, 30, 0.08, 0.2, 23, 0.20, 0.2, 0.5, 1, 31, 6, 40, 3, 2, 2 }.
In a preferred embodiment of the present invention, the first data cleansing unit 21 specifically includes:
the data storage subunit 211 is configured to store a preset first device information set, where the first device information set includes information of all smart home devices of a user;
the first extraction subunit 212 is configured to extract information of all the smart home devices corresponding to each operating state data, and generate a second device information set according to an extraction result;
a data comparing subunit 213, respectively connected to the data storing subunit 211 and the first extracting subunit 212, for comparing the first device information set with the second device information set, and outputting a first comparison result when the first device information set is equal to the second device information set, an
Outputting a second comparison result when the first device information set is not equal to the second device information set;
the data judgment subunit 214 is connected to the data comparison subunit 213, and configured to output a corresponding first judgment result according to the first comparison result and when the operation state data corresponding to the smart home device in the second device information set indicates that the smart home device is an unregulated mode device;
outputting a corresponding second judgment result when the running state data corresponding to the intelligent household equipment in the second equipment information set represents that the intelligent household equipment is adjustable mode equipment;
a first processing subunit 215, connected to the data determining subunit 214, configured to set the operating status data to a first value and add the operating status data to the device status data set according to the first determination result when the operating status data corresponding to the smart home device indicates that the smart home device is in the on state, and
when the running state data corresponding to the intelligent home equipment in the second equipment information set represents that the intelligent home equipment is in a closed state, setting the running state data as a second numerical value and adding the second numerical value into the equipment state data set;
a second processing subunit 216, connected to the data determining subunit 214, configured to set, according to the second determination result and when the operation state data corresponding to the smart home devices indicates that the smart home devices are in the on state, a mode of the smart home devices in the operation state data to a corresponding third value, add the third value and an attribute value corresponding to the mode to the device state data set,
when the running state data indicate that the intelligent household equipment is in a closed state, setting the mode as a corresponding fourth numerical value, setting the attribute value corresponding to the mode as a fifth numerical value, and adding the fourth numerical value and the fifth numerical value into the equipment state data set;
and the third processing subunit 217, connected to the data comparing subunit 213, and configured to calculate a difference set between the first device information set and the second device information set according to the second comparison result, set the operating state data of each smart home device in the difference set to a sixth value, and add the sixth value to the device state data set.
Specifically, in this embodiment, it is assumed that four smart home devices are preset in a user room, that is, a first device set includes device 1, device 2, device 3, and device 4, two pieces of information of the smart home devices corresponding to the operation state data of each smart home device acquired by the first data acquisition unit 11 are included in a second device information set, that is, device 1 and device 4 are included in the second device information set, and a difference value set between the first device information set and the second device information set includes device 2 and device 3, then the operation state data corresponding to device 2 and device 3 is set to a sixth value, and the preferred sixth value is 0. Meanwhile, if the acquired running state data of the device 1 indicates that the device 1 is an unadjustable mode device and is in an open state, the running state data corresponding to the device 1 is set to be a first numerical value, and the preferred first numerical value is 1; if the acquired operation state data of the device 4 indicates that the device 4 is an unregulated mode device and is in a closed state, the operation state data corresponding to the device 4 is set to a second value, and the preferred second value is 0.
In this embodiment, if the obtained running state data of the device 1 indicates that the device 1 is an adjustable mode device and is in an on state, and the corresponding running state data includes a mode and an attribute value, taking an air conditioner as an example, the corresponding mode is a refrigeration mode, and the attribute value is a current temperature set value of 26 degrees celsius, setting the corresponding refrigeration mode as a third numerical value, where the preferred third numerical value is 1 and the corresponding attribute value is 26;
if the acquired running state data of the device 1 indicates that the device 1 is an adjustable mode device and is in an on state, similarly taking an air conditioner as an example, the corresponding mode is a heating mode, and the attribute value is the current temperature setting value of 26 ℃, setting the corresponding heating mode as a third numerical value, preferably setting the third numerical value to be 2, and setting the corresponding attribute value to be 26;
if the acquired operation state data of the device 1 indicates that the device 1 is an adjustable mode device and is in an off state, similarly taking an air conditioner as an example, the corresponding mode is set to be a fourth numerical value, preferably the fourth numerical value is 0, and the attribute value is set to be a fifth numerical value, preferably the fifth numerical value is 0.
In a preferred embodiment of the present invention, the first value is 1, and the second value, the fourth value, the fifth value and the sixth value are all 0.
In a preferred embodiment of the present invention, the indoor environment data includes light intensity data and a plurality of indoor detection data;
the second data cleansing unit 22 specifically includes:
the fourth processing subunit 221, configured to perform level division on the light intensity data according to a preset magnitude, and represent a level division result by a seventh value and add the level division result to the indoor environment data set;
and a fifth processing subunit 222, configured to convert each indoor detection data according to a national standard, represent a conversion result by a first value set, and add the conversion result to the indoor environment data set.
Specifically, in this embodiment, the indoor environment data is collected and reported by a corresponding sensing device disposed indoors, and the outdoor environment data is provided by a third-party weather data company. In the data preprocessing of the environment data, it is preferable that the light intensity in the indoor environment data is classified into a first, second, third, etc. level on the order of magnitude of one, ten, etc., and other indoor environment data is converted in the international standard unit, wherein the missing data is defined as 0.
In a preferred embodiment of the present invention, the predetermined magnitude is one of a predetermined magnitude and ten of a predetermined magnitude.
In a preferred embodiment of the present invention, the outdoor environment data includes visibility, weather phenomenon, wind direction, wind speed and a plurality of outdoor detection data;
the third data cleansing unit 23 specifically includes:
a sixth processing subunit 231, configured to perform level division on visibility according to the national classification standard, express a level division result by an eighth value, and add the level division result to the outdoor environment data set;
a seventh processing subunit 232, configured to represent the weather phenomenon by a ninth value according to a third preset rule and add the ninth value to the outdoor environment data set;
an eighth processing subunit 233, configured to represent and add the wind direction by a tenth numerical value to the outdoor environment data set according to a third preset rule;
a ninth processing subunit 234, configured to add the wind speed to the outdoor environmental data set according to a third preset rule and using an eleventh numerical value.
Specifically, in this embodiment, the weather phenomenon in the outdoor environment data is represented by a ranking result represented by an eighth numerical value, specifically: the clear day is represented by a numerical value 1, the clear night is represented by a numerical value 2, the cloudy day is represented by a numerical value 3, the cloudy day is represented by a numerical value 4, the cloudy day is represented by a numerical value 5, the light rain is represented by a numerical value 6, the medium rain is represented by a numerical value 7, the heavy rain is represented by a numerical value 8, the rainstorm is represented by a numerical value 9, the gust is represented by a numerical value 10, the thunderstorm is represented by a numerical value 11, the thunder is represented by a numerical value 12, the hail is represented by a numerical value 13, the light fog is represented by a numerical value 14, the fog is represented by a numerical value 15, the dense fog is represented by a numerical value 16, the haze is represented by a numerical value 17, the sleet is represented by a numerical value 18, the small snow is represented by a numerical value 19, the medium snow is represented by a numerical value 20, the large snow is represented by a numerical value 21, the snowstorm is represented by a numerical value 22, the frozen rain is represented by a numerical value 23, the frost is represented by, the 5 th wind is represented by a numerical value 26, the 6 th wind is represented by a numerical value 27, the 7 th wind is represented by a numerical value 28, the 8 th wind is represented by a numerical value 29, the 9 th wind is represented by a numerical value 30, the 10 th wind is represented by a numerical value 31, the 11 th wind is represented by a numerical value 32, the 12 th and higher winds are represented by a numerical value 33, the typhoon wind is represented by a numerical value 34, the floating dust is represented by a numerical value 35, the flying sand is represented by a numerical value 36, and the sand storm is represented by a numerical value 37. Preferably, the wind direction in the outdoor environment data is also represented by numerical values, wherein the due north direction is represented by a numerical value 1, the northeast direction is represented by a numerical value 2, the due east direction is represented by a numerical value 3, the southeast direction is represented by a numerical value 4, the due south direction is represented by a numerical value 5, the southwest direction is represented by a numerical value 6, the due west direction is represented by a numerical value 7, the northwest direction is represented by a numerical value 7, and so on, the different wind directions are represented by corresponding numerical values. Preferably, the wind speed in the outdoor environment is represented by wind class 0, 0.0-0.2m/s, 0.3-1.5m/s, 1.6-3.3m/s, 2, 3.4-5.4m/s, 3, 5.5-7.9m/s, 4, 8.0-10.7m/s, 5, 10.8-13.8m/s, 6, 13.9-17.1m/s, 7, 17.2-20.7m/s, 8, 20.8-24.4m/s, 9, 24.5-28.4m/s, 28.5-28.4 m/s, 6.6-11.6 m/s, the wind speed is 32.7-36.9m/s and is represented by a wind level 12, the wind speed is 37.0-41.4m/s and is represented by a wind level 13, the wind speed is 41.5-46.1m/s and is represented by a wind level 14, the wind speed is 46.2-50.9m/s and is represented by a wind level 15, the wind speed is 51.0-56.0m/s and is represented by a wind level 16, the wind speed is 56.1-61.2m/s and is represented by a wind level 17, and the wind speed is 61.3m/s or more and is represented by a wind level 18.
A data preprocessing method applied to any one of the data preprocessing systems, as shown in fig. 2, specifically includes:
step S1, the data preprocessing system acquires the running state data of all the intelligent household equipment used by the user, and a plurality of sensing equipment arranged in the user room acquire indoor environment data corresponding to the running state data and outdoor environment data corresponding to the running state data;
step S2, the data preprocessing system carries out data cleaning on the running state data of the smart home according to a first preset rule, and generates a corresponding equipment state data set according to a data cleaning result;
step S3, the data preprocessing system carries out data cleaning on the indoor environment data according to a second preset rule and generates a corresponding indoor environment data set according to a data cleaning result;
step S4, the data preprocessing system carries out data cleaning on the outdoor environment data according to a third preset rule and generates a corresponding outdoor environment data set according to a data cleaning result;
step S5, the data preprocessing system adds the device status data set, the indoor environment data set, and the outdoor environment data set to a user behavior data set and saves them for further data mining.
In a preferred embodiment of the present invention, as shown in fig. 3, step S2 specifically includes:
step S21, the data preprocessing system presets and stores a first device information set, wherein the first device information set comprises information of all intelligent household devices of a user;
step S22, the data preprocessing system extracts information of all intelligent household equipment corresponding to each running state data, and generates a second equipment information set according to the extraction result;
step S23, the data preprocessing system compares the first device information set with the second device information set:
if the first device information set is equal to the second device information set, go to step S24;
if the first device information set is not equal to the second device information set, go to step S27;
step S24, the data preprocessing system determines according to the operating state data corresponding to the smart home devices in the second device information set:
if the intelligent household equipment is the equipment in the non-adjustable mode, turning to the step S25;
if the intelligent household equipment is the adjustable mode equipment, turning to the step S26;
step S25, when the running state data corresponding to the intelligent household equipment shows that the intelligent household equipment is in the opening state, the data preprocessing system sets the running state data to be a first numerical value and adds the first numerical value into the equipment state data set, and
when the running state data corresponding to the intelligent home equipment indicate that the intelligent home equipment is in a closed state, setting the running state data as a second numerical value and adding the second numerical value into the equipment state data set;
step S26, when the operation state data corresponding to the intelligent household equipment shows that the intelligent household equipment is in the opening state, the data preprocessing system sets the mode of the intelligent household equipment in the operation state data as a corresponding third numerical value, adds the third numerical value and the attribute value corresponding to the mode into the equipment state data set,
when the running state data indicate that the intelligent household equipment is in a closed state, setting the mode as a corresponding fourth numerical value, setting the attribute value corresponding to the mode as a fifth numerical value, and adding the fourth numerical value and the fifth numerical value into the equipment state data set;
step S27, the data preprocessing system calculates a difference set between the first device information set and the second device information set, sets the operating state data of each smart home device in the difference set to a third sixth value, and adds the operating state data to the device state data set.
In a preferred embodiment of the present invention, the indoor environment data includes light intensity data and a plurality of indoor detection data;
as shown in fig. 4, step S3 specifically includes:
step S31, the data preprocessing system carries out grading division on the light intensity data according to a preset magnitude, and the grading division result is expressed by a seventh value and added into the indoor environment data set;
in step S32, the data preprocessing system converts each indoor detection data according to the national standard, and the conversion result is expressed by a first value set and added to the indoor environment data set.
In a preferred embodiment of the present invention, the outdoor environment data includes visibility, weather phenomenon, wind direction, wind speed and a plurality of outdoor detection data;
as shown in fig. 5, step S4 specifically includes:
step S41, the data preprocessing system carries out grade division on visibility according to the national grading standard, and the grade division result is expressed by an eighth value and added into the outdoor environment data set;
step S42, the data preprocessing system represents the weather phenomenon with a ninth value according to the third preset rule and adds the ninth value into the outdoor environment data set;
step S43, the data preprocessing system expresses the wind direction by a tenth numerical value according to a third preset rule and adds the wind direction into the outdoor environment data set;
in step S44, the data preprocessing system adds the wind speed to the outdoor environmental data set according to a third predetermined rule.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (10)

1. The utility model provides a data preprocessing system which characterized in that, sets up a plurality of intelligent household equipment in advance in the user's room, data preprocessing system specifically includes:
a data acquisition module, the data acquisition module specifically comprising:
the first data acquisition unit is used for acquiring the running state data of all the intelligent household equipment used by the user;
the second data acquisition unit is connected with the first data acquisition unit and used for acquiring indoor environment data which corresponds to the operation state data and is acquired by a plurality of sensing equipment arranged in the user room;
the third data acquisition unit is connected with the first data acquisition unit and used for acquiring outdoor environment data corresponding to the running state data; the data cleaning module is connected with the data acquisition module, and specifically comprises:
the first data cleaning unit is used for performing data cleaning on the running state data of the intelligent home according to a first preset rule, generating a corresponding equipment state data set according to a data cleaning result and outputting the corresponding equipment state data set;
the second data cleaning unit is used for carrying out data cleaning on the indoor environment data according to a second preset rule, generating a corresponding indoor environment data set according to a data cleaning result and outputting the indoor environment data set;
the third data cleaning unit is used for cleaning the outdoor environment data according to a third preset rule, generating a corresponding outdoor environment data set according to a data cleaning result and outputting the corresponding outdoor environment data set;
and the data storage module is connected with the data cleaning module and is used for adding the equipment state data set, the indoor environment data set and the outdoor environment data set into a user behavior data set and storing the data set for further data mining.
2. The data preprocessing system of claim 1, wherein the first data cleansing unit specifically comprises:
the data storage subunit is configured to store a preset first device information set, where the first device information set includes information of all the smart home devices of the user;
the first extraction subunit is used for extracting information of all the intelligent household equipment corresponding to each piece of running state data, and generating a second equipment information set according to an extraction result;
a data comparing subunit, respectively connected to the data storing subunit and the first extracting subunit, and configured to compare the first device information set with the second device information set, and output a first comparison result when the first device information set is equal to the second device information set, an
Outputting a second comparison result when the first device information set is not equal to the second device information set;
a data judging subunit, connected to the data comparing subunit, and configured to output, according to the first comparison result, a corresponding first judgment result when the operating state data corresponding to the smart home device in the second device information set indicates that the smart home device is an unadjustable mode device;
outputting a corresponding second judgment result when the running state data corresponding to the intelligent home equipment in the second equipment information set represents that the intelligent home equipment is adjustable mode equipment;
a first processing subunit, connected to the data determining subunit, and configured to set, according to the first determination result and when the operation state data corresponding to the smart home device indicates that the smart home device is in an on state, the operation state data as a first numerical value and add the first numerical value to the device state data set, and
when the running state data corresponding to the intelligent home equipment indicate that the intelligent home equipment is in a closed state, setting the running state data as a second numerical value and adding the second numerical value into the equipment state data set;
a second processing subunit, connected to the data determining subunit, and configured to set, according to the second determination result and when the operation state data corresponding to the smart home device indicates that the smart home device is in an on state, a mode in which the smart home device is located in the operation state data to a corresponding third numerical value, and add the third numerical value and an attribute value corresponding to the mode to the device state data set,
when the running state data shows that the intelligent household equipment is in a closed state, setting the mode as a corresponding fourth numerical value, setting an attribute value corresponding to the mode as a fifth numerical value, and adding the fourth numerical value and the fifth numerical value into the equipment state data set;
and the third processing subunit is connected with the data comparison subunit, and is configured to calculate a difference set between the first device information set and the second device information set according to the second comparison result, set the operating state data of each smart home device in the difference set to a sixth numerical value, and add the operating state data to the device state data set.
3. The data preprocessing system of claim 2 wherein the first value is 1 and the second, fourth, fifth and sixth values are all 0.
4. The data preprocessing system of claim 1 wherein the indoor environmental data includes light intensity data and a plurality of indoor detection data;
the second data cleansing unit specifically includes:
the fourth processing subunit is used for carrying out grading division on the light intensity data according to a preset magnitude, representing a grading division result by a seventh value and adding the grading division result into the indoor environment data set;
and the fifth processing subunit is used for converting each indoor detection data according to the national standard, representing the conversion result by using a first numerical value set and adding the conversion result into the indoor environment data set.
5. The data pre-processing system of claim 4, wherein the predetermined magnitude is one of the predetermined magnitude or ten of the predetermined magnitude.
6. The data pre-processing system of claim 1, wherein the outdoor environmental data comprises visibility, weather phenomena, wind direction, wind speed, and a number of outdoor detection data;
the third data cleaning unit specifically includes:
the sixth processing subunit is configured to perform level division on the visibility according to a national classification standard, and add a level division result to the outdoor environment data set, where the level division result is represented by an eighth value;
the seventh processing subunit is configured to represent the weather phenomenon by a ninth value according to the third preset rule and add the ninth value to the outdoor environment data set;
the eighth processing subunit is configured to represent and add the wind direction by a tenth numerical value to the outdoor environment data set according to the third preset rule;
and the ninth processing subunit is used for representing the wind speed by an eleventh numerical value according to the third preset rule and adding the wind speed into the outdoor environment data set.
7. A data preprocessing method applied to the data preprocessing system according to any one of claims 1 to 6, the data preprocessing method specifically comprising:
step S1, the data preprocessing system acquires the running state data of all intelligent household equipment used by a user, and a plurality of sensing equipment arranged in the user room acquire indoor environment data corresponding to the running state data and outdoor environment data corresponding to the running state data;
step S2, the data preprocessing system carries out data cleaning on the running state data of the smart home according to a first preset rule, and generates a corresponding equipment state data set according to a data cleaning result;
step S3, the data preprocessing system carries out data cleaning on the indoor environment data according to a second preset rule and generates a corresponding indoor environment data set according to a data cleaning result;
step S4, the data preprocessing system carries out data cleaning on the outdoor environment data according to a third preset rule and generates a corresponding outdoor environment data set according to a data cleaning result;
step S5, the data preprocessing system adds the device state data set, the indoor environment data set, and the outdoor environment data set to a user behavior data set and stores them for further data mining.
8. The data preprocessing method according to claim 7, wherein the step S2 specifically includes:
step S21, the data preprocessing system presets and stores a first device information set, wherein the first device information set comprises information of all the intelligent household devices of the user;
step S22, the data preprocessing system extracts information of all the intelligent household equipment corresponding to the running state data, and generates a second equipment information set according to the extraction result;
step S23, the data preprocessing system compares the first set of device information with the second set of device information:
if the first device information set is equal to the second device information set, go to step S24;
if the first device information set is not equal to the second device information set, go to step S27;
step S24, the data preprocessing system determines, according to the operation state data corresponding to the smart home devices in the second device information set:
if the intelligent household equipment is the equipment in the non-adjustable mode, turning to the step S25;
if the intelligent household equipment is adjustable mode equipment, turning to step S26;
step S25, when the running state data corresponding to the intelligent household equipment shows that the intelligent household equipment is in the starting state, the data preprocessing system sets the running state data as a first numerical value and adds the first numerical value into the equipment state data set, and
when the operation state data indicate that the intelligent household equipment is in a closed state, setting the operation state data as a second numerical value and adding the second numerical value into the equipment state data set;
step S26, when the running state data corresponding to the smart home devices indicate that the smart home devices are in the open state, the data preprocessing system sets the mode of the smart home devices in the running state data to be a corresponding third numerical value, and adds the third numerical value and the attribute value corresponding to the mode to the device state data set,
when the running state data shows that the intelligent household equipment is in a closed state, setting the mode as a corresponding fourth numerical value, setting an attribute value corresponding to the mode as a fifth numerical value, and adding the fourth numerical value and the fifth numerical value into the equipment state data set;
step S27, the data preprocessing system calculates a difference set between the first device information set and the second device information set, sets the operating state data of each smart home device in the difference set to a sixth value, and adds the operating state data to the device state data set.
9. The data preprocessing method of claim 7, wherein the indoor environment data includes light intensity data and a plurality of indoor detection data;
the step S3 specifically includes:
step S31, the data preprocessing system carries out grading division on the light intensity data according to a preset magnitude, and a grading division result is represented by a seventh numerical value and added into the indoor environment data set;
step S32, the data preprocessing system converts each of the indoor detection data according to the national standard, and represents and adds the conversion result to the indoor environment data set by using a first value set.
10. The data preprocessing method as claimed in claim 7, wherein the outdoor environment data includes visibility, weather phenomenon, wind direction, wind speed and a number of outdoor detection data;
the step S4 specifically includes:
step S41, the data preprocessing system carries out grade division on the visibility according to the national grading standard, and the grade division result is expressed by an eighth numerical value and added into the outdoor environment data set;
step S42, the data preprocessing system represents the weather phenomenon with a ninth value according to the third preset rule and adds the ninth value into the outdoor environment data set;
step S43, the data preprocessing system represents the wind direction by a tenth numerical value according to the third preset rule and adds the wind direction into the outdoor environment data set;
and step S44, the data preprocessing system represents the wind speed by an eleventh numerical value according to the third preset rule and adds the wind speed into the outdoor environment data set.
CN201910785781.2A 2019-08-23 2019-08-23 Data preprocessing system and method Expired - Fee Related CN110597795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910785781.2A CN110597795B (en) 2019-08-23 2019-08-23 Data preprocessing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910785781.2A CN110597795B (en) 2019-08-23 2019-08-23 Data preprocessing system and method

Publications (2)

Publication Number Publication Date
CN110597795A true CN110597795A (en) 2019-12-20
CN110597795B CN110597795B (en) 2022-06-10

Family

ID=68855369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910785781.2A Expired - Fee Related CN110597795B (en) 2019-08-23 2019-08-23 Data preprocessing system and method

Country Status (1)

Country Link
CN (1) CN110597795B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070109301A1 (en) * 2005-11-17 2007-05-17 Donville Smith Data analysis applications
US20120072029A1 (en) * 2010-09-20 2012-03-22 Heatvu Inc. Intelligent system and method for detecting and diagnosing faults in heating, ventilating and air conditioning (hvac) equipment
CN104580424A (en) * 2014-12-26 2015-04-29 珠海格力电器股份有限公司 Data reporting method and device of intelligent home system
CN104656615A (en) * 2015-01-04 2015-05-27 常州市武进区半导体照明应用技术研究院 Environment control equipment, learning method and control method thereof
CN106091255A (en) * 2016-06-16 2016-11-09 安庆市银瑞商贸有限公司 A kind of domestic air conditioner intelligent temperature-controlling system
CN106202457A (en) * 2016-07-17 2016-12-07 合肥赑歌数据科技有限公司 A kind of distributed big data schema method
CN106302041A (en) * 2016-08-05 2017-01-04 深圳博科智能科技有限公司 A kind of intelligent home equipment control method and device
CN109405195A (en) * 2018-10-31 2019-03-01 四川长虹电器股份有限公司 Air conditioner intelligent control system and method
CN109905489A (en) * 2019-04-01 2019-06-18 重庆大学 Multi-sensor data relevance processing method and system based on data anastomosing algorithm

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070109301A1 (en) * 2005-11-17 2007-05-17 Donville Smith Data analysis applications
US20120072029A1 (en) * 2010-09-20 2012-03-22 Heatvu Inc. Intelligent system and method for detecting and diagnosing faults in heating, ventilating and air conditioning (hvac) equipment
CN104580424A (en) * 2014-12-26 2015-04-29 珠海格力电器股份有限公司 Data reporting method and device of intelligent home system
CN104656615A (en) * 2015-01-04 2015-05-27 常州市武进区半导体照明应用技术研究院 Environment control equipment, learning method and control method thereof
CN106091255A (en) * 2016-06-16 2016-11-09 安庆市银瑞商贸有限公司 A kind of domestic air conditioner intelligent temperature-controlling system
CN106202457A (en) * 2016-07-17 2016-12-07 合肥赑歌数据科技有限公司 A kind of distributed big data schema method
CN106302041A (en) * 2016-08-05 2017-01-04 深圳博科智能科技有限公司 A kind of intelligent home equipment control method and device
CN109405195A (en) * 2018-10-31 2019-03-01 四川长虹电器股份有限公司 Air conditioner intelligent control system and method
CN109905489A (en) * 2019-04-01 2019-06-18 重庆大学 Multi-sensor data relevance processing method and system based on data anastomosing algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周楠树: "基于可穿戴设备的无监督室内/室外场景探测方法", 《计算机应用研究》 *

Also Published As

Publication number Publication date
CN110597795B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
Stevens Rethinking the lower bound on aerosol radiative forcing
Masters et al. Toward objective, standardized intensity estimates from surface wind speed observations
Manzato Hail in northeast Italy: Climatology and bivariate analysis with the sounding-derived indices
Yang et al. Typhoon Nina and the August 1975 flood over central China
Dev et al. Solar irradiance forecasting using triple exponential smoothing
Thomas et al. Global climatologies of fronts, airmass boundaries, and airstream boundaries: Why the definition of “front” matters
Seo et al. Assessing future changes in the East Asian summer monsoon using CMIP3 models: Results from the best model ensemble
Anselmo et al. The Amazonian low-level jet and its connection to convective cloud propagation and evolution
de Oliveira et al. Generalized extreme wind speed distributions in South America over the Atlantic Ocean region
Punkka et al. Mesoscale convective systems and their synoptic-scale environment in Finland
Caton Harrison et al. Satellite‐derived characteristics of Saharan cold pool outflows during boreal summer
CN110597795B (en) Data preprocessing system and method
Ya-li et al. Synoptic characteristics related to warm-sector torrential rainfall events in South China during the annually first rainy season
Satake et al. Tropical cyclone tracking using a neighbor enclosed area tracking algorithm
CN111123407B (en) Environmental monitoring big data integration system based on computer service
Jiang et al. Extreme-wind events in China in the past 50 years and their impacts on sandstorm variations
Stensrud et al. Importance of cold pools to NCEP mesoscale Eta Model forecasts
CN111638563A (en) Method for weather forecast of air route
CN108572402A (en) The prediction technique of convection weather
QIAN et al. Anomaly-based versus full-field-based weather analysis on the extraordinary storm in Henan province in July 2021
Jury et al. Mesoscale structure of trade wind convection over Puerto Rico: Composite observations and numerical simulation
Solari et al. Wind loading and response of structures in mixed climates
Yang et al. Impact of global warming on US summertime mesoscale convective systems: A simple Lagrangian parcel model perspective
Wang et al. An intraseasonal dipole mode in summertime surface air temperature over Eurasia and its association with heat wave occurrence
Bonnardot et al. Using atmospheric and statistical models to understand local climate and assess spatial temperature variability at fine scale over the Stellenbosch wine district, South Africa

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220610