Nothing Special   »   [go: up one dir, main page]

CN116719799A - Environment-friendly data management method, device, computer equipment and storage medium - Google Patents

Environment-friendly data management method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116719799A
CN116719799A CN202310504499.9A CN202310504499A CN116719799A CN 116719799 A CN116719799 A CN 116719799A CN 202310504499 A CN202310504499 A CN 202310504499A CN 116719799 A CN116719799 A CN 116719799A
Authority
CN
China
Prior art keywords
data
environmental protection
environment
friendly
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310504499.9A
Other languages
Chinese (zh)
Inventor
朱宏权
胡超
陈国金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shencai Technology Co ltd
Original Assignee
Shencai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shencai Technology Co ltd filed Critical Shencai Technology Co ltd
Priority to CN202310504499.9A priority Critical patent/CN116719799A/en
Publication of CN116719799A publication Critical patent/CN116719799A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to an environment-friendly data management method, an environment-friendly data management device, computer equipment and a storage medium. The method comprises the following steps: acquiring original environment-friendly data; carrying out integrated processing on the original environment-friendly data, and storing the integrated environment-friendly data into a data warehouse; performing quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report; and repairing the environmental protection data in the data warehouse according to the quality evaluation report. The method can improve the quality of the environment-friendly data.

Description

Environment-friendly data management method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of environmental protection technology, and in particular, to an environmental protection data management method, an environmental protection data management device, a computer device, a storage medium, and a computer program product.
Background
With the development of environmental protection data informatization, the increase of data volume and the rapid increase of the complexity of data service are increasingly important for the management of environmental protection data.
The conventional technology is to physically integrate data of each service system.
However, the quality of the environmental protection data obtained after the processing is not high in the current environmental protection data processing method.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an environmental protection data governance method, apparatus, computer device, computer readable storage medium and computer program product that can improve the quality of environmental protection data.
In a first aspect, the present application provides an environmental data governance method. The method comprises the following steps:
acquiring original environment-friendly data;
carrying out integrated processing on the original environment-friendly data, and storing the integrated environment-friendly data into a data warehouse;
performing quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report;
and repairing the environmental protection data in the data warehouse according to the quality evaluation report.
In one embodiment, the quality assessment of the environmental protection data in the data warehouse according to the preset environmental protection data standard comprises:
and according to a preset environmental protection data standard, evaluating the accuracy, the integrity, the consistency and the timeliness of the environmental protection data in the data warehouse.
In one embodiment, repairing environmental protection data in a data warehouse based on a quality assessment report includes:
acquiring loopholes existing in environment-friendly data in a data warehouse from a quality evaluation report;
repairing the loopholes existing in the environment-friendly data to obtain the repaired environment-friendly data;
and monitoring the data quality of the repaired environment-friendly data.
In one embodiment, obtaining raw environmental protection data includes:
raw environmental protection data is obtained from a plurality of data sources by means of full extraction or incremental extraction.
In one embodiment, the integrated processing of the raw environmental data includes:
performing data cleaning on the original environmental protection data to obtain cleaned environmental protection data;
carrying out data processing on the cleaned environment-friendly data to obtain processed environment-friendly data;
and performing data conversion on the processed environment-friendly data to obtain integrated environment-friendly data.
In one embodiment, the method further comprises:
and sending the repaired environment-friendly data to a data service packaging platform, wherein the data service packaging platform is used for calling a preset interface, and sending the repaired environment-friendly data to a terminal on the data management platform, which has completed data service registration, through the preset interface.
In a second aspect, the application also provides an environmental protection data management device. The device comprises:
the data acquisition module is used for acquiring original environment-friendly data;
the data integration module is used for carrying out integrated processing on the original environment-friendly data and storing the integrated environment-friendly data into a data warehouse;
the data evaluation module is used for carrying out quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report;
and the data restoration module is used for restoring the environment-friendly data in the data warehouse according to the quality evaluation report.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring original environment-friendly data;
carrying out integrated processing on the original environment-friendly data, and storing the integrated environment-friendly data into a data warehouse;
performing quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report;
and repairing the environmental protection data in the data warehouse according to the quality evaluation report.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring original environment-friendly data;
carrying out integrated processing on the original environment-friendly data, and storing the integrated environment-friendly data into a data warehouse;
performing quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report;
and repairing the environmental protection data in the data warehouse according to the quality evaluation report.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, performs the steps of:
acquiring original environment-friendly data;
carrying out integrated processing on the original environment-friendly data, and storing the integrated environment-friendly data into a data warehouse;
performing quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report;
and repairing the environmental protection data in the data warehouse according to the quality evaluation report.
According to the environmental protection data management method, the environmental protection data management device, the computer equipment, the storage medium and the computer program product, the acquired original environmental protection data are subjected to integrated processing, the quality of the acquired integrated environmental protection data is evaluated and repaired, the environmental protection data is effectively managed, the quality of the environmental protection data can be improved, the standard of the environmental protection data is unified, and then the accuracy of an environmental protection data analysis result and the developability of the environmental protection data are improved.
Drawings
FIG. 1 is an application environment diagram of an environmental data governance method in one embodiment;
FIG. 2 is a flow diagram of an environmental data remediation method in one embodiment;
FIG. 3 is a schematic diagram of a data platform architecture for data governance in one embodiment;
FIG. 4 is a flow chart of steps for repairing environmental protection data in a data warehouse based on a quality assessment report in one embodiment;
FIG. 5 is a schematic diagram of a data quality assessment and repair process in one embodiment;
FIG. 6 is a flowchart illustrating an integrated processing step performed on original environment-friendly data in one embodiment;
FIG. 7 is a schematic diagram of a data integration ETL flow in one embodiment;
FIG. 8 is a schematic diagram of an open flow of data services in one embodiment;
FIG. 9 is a flow chart of an environmental data remediation method according to another embodiment;
FIG. 10 is a block diagram of an environmental data remediation device in one embodiment;
FIG. 11 is an internal block diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The environmental protection data management method provided by the embodiment of the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The server 104 acquires the original environmental protection data sent by the terminal 102, the server 104 performs integrated processing on the original environmental protection data, the integrated data are stored in the data warehouse, the server 104 performs quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, a quality evaluation report is generated, and the server 104 performs repair processing on the environmental protection data in the data warehouse according to the quality evaluation report, so that effective treatment on the environmental protection data can be realized. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, an environmental protection data management method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
step 202, obtaining original environment-friendly data.
The original environmental protection data refer to data which is needed to be used in the process of ecological environment treatment, data generated by environment business processing, and the like.
Specifically, the server acquires original environment-friendly data from a data source, and the original environment-friendly data can be divided into internal data and external data according to different data sources. Internal data refers to data in various business systems and other systems of ecological environmental governance, such as pollution discharge permission data, project approval data, credit evaluation data, and the like. The external data refers to environment-friendly data acquired when interacting with the outside in ecological environment management, and can be data acquired from the Internet and the Internet of things, such as industrial and commercial personnel data, water conservancy facility data, meteorological wind field data, electricity consumption data, site dust data, pollution control data and the like.
And 204, carrying out integrated processing on the original environment-friendly data, and storing the integrated environment-friendly data into a data warehouse.
The integrated processing means that the data with different sources, formats and characteristic properties are organically concentrated logically or physically, so that the data collection and integration of multiple data types are realized, the problems of redundancy, inconsistency and the like of the data can be solved, and convenience is provided for subsequent data sharing. The data warehouse is used for storing data, can be used for solving the creation of data analysis and decision purposes, screens and integrates various data, guides the improvement, cost, quality and control of business processes, and is usually in butt joint with a business system to store and process business data generated by the business system in the process of business activities.
Specifically, the server integrates and unifies the original environment-friendly data acquired by the data sources, recognizes problems such as conflict and redundancy existing among the data, screens and unifies the data in a format, further obtains integrated data, and stores the integrated data in a target library of a data warehouse.
And 206, performing quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report.
The environment-friendly data standard refers to a data management standard formed according to laws and regulations, industry standards, service standards, index standards and the like in the environment-friendly data management process, so that service personnel and technicians can have standard dependence when facing data standard problems, and can comprise standard definition, standard mapping, standard execution and the like of data. Quality assessment refers to a method and process for assessing the quality of environmental data.
Specifically, the server acquires a preset environmental protection data standard, performs quality assessment on environmental protection data in the data warehouse according to the preset environmental protection data standard, and generates a corresponding quality assessment report according to an assessment result.
Further, the quality evaluation method employed in the present embodiment may include an analytic hierarchy process, a simple ratio process, and a maximum/minimum value process.
And step 208, repairing the environmental protection data in the data warehouse according to the quality evaluation report.
The repairing process refers to a process of repairing defects such as loopholes and the like of environment-friendly data, perfecting the data and further improving the data quality.
Specifically, the server acquires the loopholes and problems still existing in the environment-friendly data according to the quality evaluation report of the environment-friendly data, adopts a corresponding loophole solving method to solve the loopholes and problems, and can delete or manually fill the environment-friendly data when null values exist or the environment-friendly data exceeds the constraint range. And finally, monitoring the quality of the repaired data.
Further, the server can send the repaired data to the display terminal for display, so that industry technicians can perform data analysis, data mining and other works.
It should be noted that, the current environmental protection data management scheme only performs physical integration on the data of each service system, does not perform data cleaning, lacks a unified data standard, and cannot provide functions such as data service. Because of lacking unified data standard, each system has redundant data, has the condition that the meaning of data service is consistent but the definition of data name is inconsistent, and the data quality is poor, so that the data application is difficult, the data sharing can not be realized, and the feedback mechanism is lacking, so that the data quality problem can not be summarized in time, and the data problem exists for a long time.
According to the environmental protection data management method, the acquired original environmental protection data are subjected to integrated processing, the quality of the acquired integrated environmental protection data is evaluated and repaired, the environmental protection data are effectively managed, the quality of the environmental protection data can be improved, the environmental protection data standard is unified, and the accuracy of environmental protection data analysis results and the developability of the environmental protection data are further improved.
In one embodiment, a data platform is integrated in the server, and the data management can be performed on the original environment-friendly data through the data platform. A schematic diagram of a data platform architecture for data governance may be shown in fig. 3, where the data sources include internal data including data of business systems and other systems, and external data, the other systems being internal systems related to data governance. The business system exchanges data with the business number bin, and other systems interact with the big data platform. The external data includes data of the internet and the internet of things.
The data platform comprises a business number bin and a large data platform, wherein the business number bin refers to a data warehouse of business data and comprises a data paste source layer, a data collection layer and a data subject layer, and the business number bin can conduct integrated processing on the data. The big data platform can realize the structuring processing of the data and the external data of other systems, obtain the structuring data and store the structuring data in the business number bin.
The data management platform is responsible for monitoring centers, data task scheduling management, data quality management and data standard management. The monitoring center monitors the data flow. The data task scheduling management refers to scheduling management of processing tasks executed by data. The data quality management means that the quality of the data is managed according to a preset quality standard. Data standard management refers to the management of quality assessment criteria for data. Metadata management is a key component in enterprise-level data warehouses, throughout the lifecycle of the data warehouses, and metadata is used to drive the development of the data warehouses, automating the data warehouses, and visualizing.
The data in the data warehouse can be externally provided through two modes of a data mart and a data interface, the provided data can be externally provided through a data report form, and the data can be used for data analysis and mining and can also be applied to other systems.
In the embodiment, the environment-friendly data is managed through the data platform, so that the quality of the environment-friendly data can be improved, and the sharing of the environment-friendly data is realized through the data service application.
In one embodiment, quality assessment of environmental data in a data warehouse according to preset environmental data criteria includes: and according to a preset environmental protection data standard, evaluating the accuracy, the integrity, the consistency and the timeliness of the environmental protection data in the data warehouse.
The accuracy of the environment-friendly data refers to determining whether the information of the data record has an abnormality or error, such as a field value error, a missing, and a null value. The integrity of the environment-friendly data refers to the degree of describing the data information deletion, and the data deletion condition can be divided into a data information record deletion and a field information record deletion. Consistency of environment-friendly data refers to whether there is a conflict between environment-friendly data. The timeliness of the environment-friendly data refers to whether the data can be obtained when needed, and the timeliness of the data has a direct relationship with the data processing speed and efficiency of enterprises and is a key index for influencing business processing.
Specifically, the preset environmental data standard forms an evaluation constraint rule for evaluating the accuracy, the integrity, the consistency and the timeliness of the data. The evaluation rules comprise non-null constraint rules, value domain constraint rules, code constraint rules, lexical constraint rules, logic constraint rules, integrity constraint rules, equivalent consistency constraint rules, logic consistency constraint rules and timeliness constraint rules. And performing rule matching on the environment-friendly data according to the rules, so as to evaluate the accuracy, the integrity, the consistency and the timeliness of the environment-friendly data in the data warehouse.
And evaluating the accuracy of the environment-friendly data according to the non-null constraint rule, the value domain constraint rule, the code constraint rule, the lexical constraint rule and the logic constraint rule, and specifically, for example, when a field value error and a missing value occur in certain data, indicating that the accuracy of the data has a bug.
The integrity of the environment-friendly data is evaluated according to the integrity constraint rule, and specifically, for example, when the data information record of the data is missing and the field information record is missing, the integrity of the data is indicated to have a bug.
And evaluating the consistency of the data according to the equivalent consistency constraint rule and the logic consistency constraint rule, and specifically, for example, when the numerical values of two data which are defined identically in different systems are contradictory, indicating that the consistency of the data has a bug.
And evaluating the timeliness of the data according to the timeliness constraint rule, specifically, if the data cannot be retrieved from the database when the related data is required to be acquired, the existence of a vulnerability of the timeliness of the data is indicated.
The following explains the above rules:
the non-null constraint rule refers to a case where there is a null data value. If the client opens an account, the client name is a filling item, and the situation that the field is empty cannot occur, i.e. the field cannot be empty. And setting a field to be checked, inquiring that the column value cannot be empty through the sql, and inquiring the empty data to carry out rectification.
The value range constraint rule refers to a constraint that an attribute value of data must satisfy a defined enumerated value, such as a contract main type and subtype must be the enumerated value defined in the contract type base data.
The code constraint rule refers to whether a code value of data is in a corresponding code table.
The lexical constraint rule refers to whether the value of the data is input and stored according to certain requirements and specifications.
Constraint rules for logical relationships of data values between logical constraint rule data. The data value on one check object must satisfy some logical relationship, e.g., greater than, less than, etc., with the data value of another check object.
The integrity constraint rule refers to a rule for restricting and controlling entity integrity, reference integrity and user-defined integrity of data when a user performs operations such as inserting, modifying and deleting the data in order to prevent the data which does not meet the specification from entering the database.
The equivalence consistency constraint rule is a constraint rule for data value among data. One data value must be equal to another one or more data values under certain rules. Logical consistency constraint rules refer to inter-constraint relationships between fields.
Logical consistency constraint rules refer to the consistency of data in terms of data structure, data format, and attribute encoding correctness.
The timeliness constraint rule is a rule indicating whether the timeliness state of the corresponding actual service can be reflected in time. Timeliness is due to multiple systems, communications, etc., and typically requires manual verification by business or system personnel.
In this embodiment, through the preset environmental protection data standard, the accuracy, the integrity, the consistency and the timeliness of the environmental protection data in the data warehouse are evaluated, so that the quality problem of the environmental protection data in the data warehouse can be found, and a basis is provided for the subsequent restoration of the environmental protection data, thereby improving the quality of the environmental protection data.
In one embodiment, as shown in FIG. 4, repairing environmental protection data in a data warehouse based on a quality assessment report includes:
step 402, obtaining vulnerabilities existing in the environmental protection data in the data warehouse from the quality assessment report.
Specifically, the quality evaluation report may respectively indicate the problems of the environmental protection data in terms of accuracy, integrity, consistency and timeliness of the data, that is, the loopholes of the environmental protection data.
And step 404, repairing the loopholes existing in the environment-friendly data to obtain the repaired environment-friendly data.
Specifically, a corresponding repair strategy is obtained according to the loopholes of the environmental protection data, and corresponding repair treatment is carried out according to the problems of the quality evaluation report, which are noted in the quality evaluation report, on the accuracy, the integrity, the consistency and the timeliness of the environmental protection data.
And step 406, monitoring the data quality of the repaired environment-friendly data.
The quality monitoring is a process of monitoring the repaired data in real time through the existing data standard.
Specifically, the environmental protection data after repair is still limited by the preset environmental protection data standard, and because the repair process of the environmental protection data is a dynamic process, new evaluation rules and standards may be generated in the process of repairing the environmental protection data, so that the evaluation rules and standards are updated, the environmental protection data repaired by the old rules may not meet the requirement of the new rules, and quality monitoring processing is required to be performed on the repaired environmental protection data all the time according to the latest evaluation rules and standards.
In this embodiment, the quality of the environmental protection data can be improved by repairing the vulnerability of the environmental protection data in the data warehouse and monitoring the data quality of the vulnerability.
In one embodiment, as shown in FIG. 5, the data quality assessment and repair process, wherein the data quality assessment indicators include accuracy, integrity, consistency, and timeliness of the data. The quality evaluation is carried out on the service data according to the quality evaluation rule, and the accuracy of the environment-friendly data can be evaluated according to the non-empty constraint rule, the value domain constraint rule, the code constraint rule, the lexical constraint rule and the logic constraint rule; evaluating the integrity of the environmental protection data according to the integrity constraint rule; evaluating the consistency of the data according to the equivalent consistency constraint rule and the logic consistency constraint rule; and evaluating the timeliness of the data according to the timeliness constraint rule. Quality assessment methods include analytic hierarchy process, simple ratio process, and maximum/minimum value process.
After the quality evaluation is completed, a corresponding data quality analysis report is generated, and intelligent repair or manual repair is carried out on the data according to the problems in the data quality analysis report, wherein the repair comprises quality monitoring, active repair, quality improvement, rule accumulation and process dynamics. The quality monitoring means real-time monitoring of the quality of the repaired data; the active repair means that technicians can actively repair the data vulnerabilities; quality improvement refers to the improvement process of data quality; rule accumulation means that new quality evaluation rule standards can be generated in the data restoration process, and the new standards are summarized; process dynamics means that the quality repair process is dynamic, e.g. updating iterations of the evaluation rules.
In this embodiment, the quality of the environmental protection data can be improved by performing quality evaluation on the service data and performing corresponding repair processing.
In one embodiment, obtaining raw environmental protection data includes: raw environmental protection data is obtained from a plurality of data sources by means of full extraction or incremental extraction.
Where full extraction refers to complete extraction of data, similar to data migration or data replication. Incremental extraction refers to extracting only data that has been newly added or modified in a table to be extracted in the database since the last extraction.
Specifically, for the data of the service system or other systems which are not processed, the original environment-friendly data can be acquired in a full extraction mode, the table or view data in the data source is extracted as is, and then the table or view data is copied to the target database. If the target database already has data, the original data is covered at this time. For the processed data, only some related data are newly added on the basis of the original data, and the original environment-friendly data can be acquired in an incremental extraction mode. Further, the incremental extraction method adopted in the present embodiment may include a full table comparison method, a trigger method, a time stamp method, a full table deletion insertion method, and the like.
In this embodiment, the original environmental protection data is obtained from a plurality of data sources by means of full extraction or incremental extraction, so that the efficiency of data acquisition can be improved, and the cost of data management is saved while the data integrity is improved.
In one embodiment, as shown in FIG. 6, the integrating process of the raw environmental data includes:
step 602, data cleaning is performed on the original environmental protection data to obtain cleaned environmental protection data.
Data cleansing refers to the process of correcting or deleting inaccurate data records.
Specifically, the data warehouse can be divided into a data paste source layer, a data collection layer and a data theme layer, the data paste source layer of the data warehouse is used for cleaning the original environment-friendly data, including identifying and replacing incomplete, inaccurate, irrelevant or problematic data and records, deleting the irrelevant data, removing the data with errors in the original environment-friendly data, and finally obtaining the cleaned environment-friendly data.
Further, the data cleansing mode may include discarding part of the data, complementing the missing data, not processing the data, and performing a true value conversion on the data.
And step 604, performing data processing on the cleaned environmental protection data to obtain processed environmental protection data.
The data processing refers to a process of processing collected data into information meeting the requirement, generally comprises the steps of classifying, sorting, summarizing, calculating and the like of the data, and can also comprise data development.
Specifically, the cleaned environment-friendly data are classified, ordered and summarized through a data paste source layer of the data warehouse. And sending the environment-friendly data after the data processing treatment to a data aggregation layer.
And step 606, performing data conversion on the processed environmental protection data to obtain integrated environmental protection data.
Where data conversion refers to the process of converting data from one format or structure to another.
Specifically, the processed environment-friendly data is converted into a unified data name and format through a data collection layer of the data warehouse. For example, the data dictionary or data format of the same data field from different business systems may be different (e.g., called ids in table a, called ids in table B), and they need to be provided with a unified data dictionary and format in the data warehouse, and the data content may be normalized and converted to text format. And sending the environment-friendly data after the data conversion processing to a data theme layer.
In an alternative manner of this embodiment, the data integration process may be performed through ETL (Extract-Transform-Load), where the data obtained from the data source may include environmental quality data, environmental law enforcement data, pollution discharge permission data, pollution source data, etc., and these data may be extracted by full extraction or incremental extraction, and some text data and unstructured files may be obtained from the intermediate repository, as shown in fig. 7. Data quality control, data standard matching and metadata verification are performed through a data warehouse technology, and metadata verification refers to verification of data for controlling data visualization. And after data cleaning, data conversion, quality inspection and rule matching are carried out on the extracted data, loading the data, and transmitting the data to a target library in a file form.
In this embodiment, by performing data cleaning, data processing and data conversion on the environmental protection data, the quality of the environmental protection data can be improved, and the data from different sources can be stored according to a unified standard, so that an effective and scientific data set is prepared for data analysis.
In one embodiment, the method further comprises: and sending the repaired environment-friendly data to a data service packaging platform, wherein the data service packaging platform is used for calling a preset interface, and sending the repaired environment-friendly data to a terminal on the data management platform, which has completed data service registration, through the preset interface.
The data service packaging platform is a platform which packages the environmental protection data after repair processing, and can provide environmental protection data for other terminals which finish data service registration. The preset interface refers to a preset API interface. The data management platform is used for registering services when other terminals want to acquire corresponding data services. The data service registration refers to registration of corresponding data services required when other users want to acquire and use environment-friendly data on the platform.
Specifically, the environmental protection data after repair can be sent to the data service packaging platform in real time or not, other terminals of a third party can obtain the use right of the environmental protection data in the data service packaging platform by completing corresponding data service registration on the data management platform in advance, when the terminal needs to call the environmental protection data, the data service packaging platform calls a preset interface, sends the environmental protection data to the data management platform through a corresponding API interface, and then the terminal acquires the environmental protection data on the data management platform. After the environment-friendly data is acquired by the terminal of the third party, the environment-friendly data can be subjected to data mining and analysis, and the environment-friendly data can be applied to a system of the third party.
In an alternative manner of this embodiment, the repaired environment protection data is sent to the data service encapsulation platform, so as to implement data service opening. The open flow of data service is shown in fig. 8, and the data in the database can be transmitted to the data service packaging platform in real time to form a data source, or transmitted to the database of the data service packaging platform in non-real time to form a data source, or transmitted to the data service packaging platform in real time in a code form to form a data source. The application of the third party can acquire the data of the data service packaging platform on the governance platform in a service registration or flow authentication mode.
In this embodiment, by sending the repaired environmental protection data to the data service encapsulation platform, other terminals can acquire relevant environmental protection data through data service registration, so as to realize sharing and application of the environmental protection data.
In another embodiment, as shown in fig. 9, an environmental data governance method is provided, which may include the steps of:
at step 902, raw environmental protection data is obtained from a plurality of data sources by way of full or incremental extraction.
And step 904, cleaning the original environmental protection data to obtain cleaned environmental protection data.
And step 906, performing data processing on the cleaned environmental protection data to obtain processed environmental protection data.
Step 908, performing data conversion on the processed environmental protection data to obtain integrated environmental protection data, and storing the integrated environmental protection data in a data warehouse.
Step 910, according to a preset environmental protection data standard, evaluating accuracy, integrity, consistency and timeliness of environmental protection data in the data warehouse, and generating a quality evaluation report.
Step 912, obtaining vulnerabilities existing in the environmental protection data in the data warehouse from the quality assessment report.
Step 914, repairing the loopholes existing in the environment-friendly data to obtain the repaired environment-friendly data.
And step 916, performing data quality monitoring on the repaired environment-friendly data.
Step 918, the repaired environmental protection data is sent to a data service encapsulation platform, and the data service encapsulation platform is used for calling a preset interface, and sending the repaired environmental protection data to a terminal on the data management platform, wherein the terminal has completed data service registration.
In this embodiment, through carrying out data cleaning, data processing and data conversion to the original environmental protection data that obtain, obtain the environmental protection data after the integration, carry out the quality evaluation to the environmental protection data after the integration that obtains and restore the data vulnerability, realize the effective treatment to environmental protection data, can improve the quality of environmental protection data, unified environmental protection data's standard, and then improve environmental protection data analysis result's accuracy and developability, through sending the environmental protection data after the restoration to the data service encapsulation platform, realized the sharing of environmental protection data.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an environmental protection data management device for realizing the environmental protection data management method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the environmental protection data management device provided below may refer to the limitation of the environmental protection data management method hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 10, there is provided an environmental protection data governance device comprising: a data acquisition module 1002, a data integration module 1004, a data evaluation module 1006, and a data repair module 1008, wherein:
the data acquisition module 1002 is configured to acquire original environmental protection data.
The data integration module 1004 is configured to integrate the original environment-friendly data, and store the integrated environment-friendly data in a data warehouse.
The data evaluation module 1006 is configured to perform quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generate a quality evaluation report.
And the data restoration module 1008 is used for restoring the environment-friendly data in the data warehouse according to the quality evaluation report.
In one embodiment, the data evaluation module 1006 is further configured to evaluate accuracy, integrity, consistency, and timeliness of the environmental protection data in the data warehouse according to a preset environmental protection data standard.
In one embodiment, the data repair module 1008 is further configured to obtain vulnerabilities existing in the environmental protection data in the data warehouse from the quality assessment report; repairing the loopholes existing in the environment-friendly data to obtain the repaired environment-friendly data; and monitoring the data quality of the repaired environment-friendly data.
In one embodiment, the data acquisition module 1002 is further configured to acquire raw environmental protection data from a plurality of data sources by way of full or incremental extraction.
In one embodiment, the data integration module 1004 is further configured to perform data cleaning on the original environmental protection data to obtain cleaned environmental protection data; carrying out data processing on the cleaned environment-friendly data to obtain processed environment-friendly data; and performing data conversion on the processed environment-friendly data to obtain integrated environment-friendly data.
In one embodiment, the data repair module 1008 is further configured to send the repaired environmental protection data to the data service encapsulation platform, where the data service encapsulation platform is configured to invoke a preset interface, and send the repaired environmental protection data to the terminal on the data administration platform that has completed the registration of the data service through the preset interface.
All or part of the modules in the environmental protection data management device can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing original environmental protection data, integrated environmental protection data, repaired environmental protection data and the like. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor implements an environmental data governance method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. An environmental protection data governance method, the method comprising:
acquiring original environment-friendly data;
carrying out integrated processing on the original environment-friendly data, and storing the integrated environment-friendly data into a data warehouse;
performing quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report;
and repairing the environmental protection data in the data warehouse according to the quality evaluation report.
2. The method of claim 1, wherein the quality assessment of the environmental data in the data warehouse according to a preset environmental data standard comprises:
and evaluating the accuracy, the integrity, the consistency and the timeliness of the environmental protection data in the data warehouse according to the preset environmental protection data standard.
3. The method of claim 1, wherein repairing environmental protection data in the data warehouse based on the quality assessment report comprises:
acquiring loopholes existing in the environmental protection data in the data warehouse from the quality evaluation report;
repairing the loopholes existing in the environment-friendly data to obtain repaired environment-friendly data;
and monitoring the data quality of the repaired environment-friendly data.
4. The method of claim 1, wherein the obtaining raw environmental protection data comprises:
raw environmental protection data is obtained from a plurality of data sources by means of full extraction or incremental extraction.
5. The method of claim 1, wherein the integrating the raw environmental data comprises:
performing data cleaning on the original environmental protection data to obtain cleaned environmental protection data;
carrying out data processing on the cleaned environment-friendly data to obtain processed environment-friendly data;
and performing data conversion on the processed environment-friendly data to obtain integrated environment-friendly data.
6. The method according to claim 1, wherein the method further comprises:
and sending the repaired environmental protection data to a data service encapsulation platform, wherein the data service encapsulation platform is used for calling a preset interface, and sending the repaired environmental protection data to a terminal on a data management platform, which has completed data service registration, through the preset interface.
7. An environmental protection data processing apparatus, the apparatus comprising:
the data acquisition module is used for acquiring original environment-friendly data;
the data integration module is used for carrying out integrated processing on the original environment-friendly data and storing the integrated environment-friendly data into a data warehouse;
the data evaluation module is used for carrying out quality evaluation on the environmental protection data in the data warehouse according to a preset environmental protection data standard, and generating a quality evaluation report;
and the data restoration module is used for restoring the environment-friendly data in the data warehouse according to the quality evaluation report.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202310504499.9A 2023-05-06 2023-05-06 Environment-friendly data management method, device, computer equipment and storage medium Pending CN116719799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310504499.9A CN116719799A (en) 2023-05-06 2023-05-06 Environment-friendly data management method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310504499.9A CN116719799A (en) 2023-05-06 2023-05-06 Environment-friendly data management method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116719799A true CN116719799A (en) 2023-09-08

Family

ID=87872338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310504499.9A Pending CN116719799A (en) 2023-05-06 2023-05-06 Environment-friendly data management method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116719799A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217561A (en) * 2023-09-27 2023-12-12 国任财产保险股份有限公司 Data quality management system
CN117785983A (en) * 2024-02-20 2024-03-29 四川大学华西医院 Target object evaluation method, system, electronic device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217561A (en) * 2023-09-27 2023-12-12 国任财产保险股份有限公司 Data quality management system
CN117785983A (en) * 2024-02-20 2024-03-29 四川大学华西医院 Target object evaluation method, system, electronic device and storage medium

Similar Documents

Publication Publication Date Title
US11625387B2 (en) Structuring data
US11650854B2 (en) Executing algorithms in parallel
US10599684B2 (en) Data relationships storage platform
CA2978488C (en) Systems and methods for managing data
CN107908672B (en) Application report realization method, device and storage medium based on Hadoop platform
CN116719799A (en) Environment-friendly data management method, device, computer equipment and storage medium
US20220004529A1 (en) Automated audit balance and control processes for data stores
US20190347343A1 (en) Systems and methods for indexing and searching
CN111339073A (en) Real-time data processing method and device, electronic equipment and readable storage medium
US20070276970A1 (en) Data Consistency Validation
CN111858615A (en) Database table generation method, system, computer system and readable storage medium
CN113868498A (en) Data storage method, electronic device, device and readable storage medium
CN115794839B (en) Data collection method based on Php+Mysql system, computer equipment and storage medium
CN114787790A (en) Data archiving method and system using hybrid storage of data
CN112817958A (en) Electric power planning data acquisition method and device and intelligent terminal
CN110737729A (en) Engineering map data information management method based on knowledge map concept and technology
CN112988770A (en) Method and device for updating serial number, electronic equipment and storage medium
CN116483822B (en) Service data early warning method, device, computer equipment and storage medium
CN113220762A (en) Method, device, processor and storage medium for realizing general record processing of key service field change in big data application
CN108073624B (en) Service data processing system and method
CN114356945A (en) Data processing method, data processing device, computer equipment and storage medium
CN113934729A (en) Data management method based on knowledge graph, related equipment and medium
CN114661693A (en) Data auditing realization method, storage medium, electronic equipment and system
CN112905565A (en) Database management system and data inspection method
CN115733787A (en) Network identification method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination