CN112199433A - Data management system for city-level data middling station - Google Patents
Data management system for city-level data middling station Download PDFInfo
- Publication number
- CN112199433A CN112199433A CN202011176826.5A CN202011176826A CN112199433A CN 112199433 A CN112199433 A CN 112199433A CN 202011176826 A CN202011176826 A CN 202011176826A CN 112199433 A CN112199433 A CN 112199433A
- Authority
- CN
- China
- Prior art keywords
- data
- management
- metadata
- standard
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013523 data management Methods 0.000 title claims abstract description 22
- 238000007726 management method Methods 0.000 claims abstract description 200
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000008569 process Effects 0.000 claims abstract description 14
- 239000008280 blood Substances 0.000 claims abstract description 13
- 210000004369 blood Anatomy 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000003908 quality control method Methods 0.000 claims abstract description 4
- 238000004458 analytical method Methods 0.000 claims description 40
- 238000013507 mapping Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 11
- 238000012423 maintenance Methods 0.000 claims description 8
- 238000012550 audit Methods 0.000 claims description 7
- 230000008676 import Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000013070 change management Methods 0.000 claims description 4
- 238000007405 data analysis Methods 0.000 claims description 4
- 238000009795 derivation Methods 0.000 claims description 4
- 230000008571 general function Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 16
- 230000008859 change Effects 0.000 description 7
- 238000013461 design Methods 0.000 description 5
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005553 drilling Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000533950 Leucojum Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 235000019580 granularity Nutrition 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
- G06F16/287—Visualization; Browsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data management system for a city-level data middlebox, which relates to the technical field of data processing and comprises the following components: the data standard management platform is used for storing and managing at least one preset data management standard and at least one preset quality management standard; the metadata management module is used for managing metadata related to the data resources according to the data management specification; the data quality management module is used for generating a data quality management standard according to the quality management standard and performing quality control on the management process of the metadata according to the data quality management standard; and the public management and portal module is used for providing uniform user authority management support and navigation query for the metadata management module, the data standard management module and the data quality management module. The technical scheme meets the requirements of checking the metadata of the base table, checking the relationship and influence of the blood relationship among the data and analyzing the dependency relationship among the data, and can better serve the urban data middleboxes.
Description
Technical Field
The invention relates to the field of data processing, in particular to a data management system for a city-level data middling station.
Background
The data center platform is a data application development portal and has the functions of covering the whole process of data application development in a closed loop mode, completely covering off-line calculation, real-time calculation application and the like. The method can meet the requirements of developers on data acquisition, data analysis, data mining, data quality, data maps, data models and data API application in each level. It can be understood that the data center can be used for liberating the productivity of developers, greatly shortening the extraction process of data value and improving the capability of enterprises for refining the data value. However, in the existing data center, the data governance service is not complete enough, and the management function of the metadata in the data governance service function is not perfect enough.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a data management system for a city-level data middlebox, which specifically comprises the following steps:
the method is used for carrying out data governance on data resources stored in a data resource center of an urban data center, and comprises the following steps:
the data standard management platform is used for storing and managing at least one preset data management standard and at least one preset quality management standard;
the metadata management module is connected with the data standard management platform and is used for managing metadata related to the data resources according to the data management specification;
the data quality management module is respectively connected with the metadata management module and the data standard management platform and is used for generating a data quality management standard according to the quality management standard and performing quality control on the management process of the metadata according to the data quality management standard;
and the public management and portal module is respectively connected with the metadata management module, the data standard management platform and the data quality management module and is used for providing unified user authority management support and navigation query for the metadata management module, the data standard management platform and the data quality management module.
Preferably, the metadata management module adopts a layered architecture, and the layered architecture comprises, from bottom to top:
the acquisition layer is used for acquiring various metadata and carrying out preliminary data analysis on the metadata to obtain metadata relations among the metadata;
the storage layer is used for carrying out centralized storage on the metadata input by the acquisition layer and storing the metadata relation;
the application layer is used for analyzing and managing the metadata and the metadata relation;
and the service layer is used for providing external services based on the metadata.
Preferably, the collection layer collects the metadata through a pre-configured collection adapter and/or a pre-generated data import module.
Preferably, the metadata includes:
one or more of distributed file system metadata, distributed data warehouse metadata, distributed NoSQL database metadata, data retrieval component metadata, and real-time stream computation component metadata.
Preferably, the application layer includes:
the blood margin analysis unit is used for performing blood margin analysis on each metadata to obtain a data source of each metadata;
and the influence analysis unit is used for analyzing the influence on other systems and data when the metadata is changed.
A data importance analyzing unit for analyzing the importance of each metadata from the used frequency, the accessed frequency and the dark data discrimination result of each metadata;
and an analysis result derivation unit connected to the blood-margin analysis unit, the influence analysis unit, and the data importance analysis unit, respectively, and configured to derive an analysis result of each of the metadata.
Preferably, the application layer further includes a metadata management unit, configured to perform metadata view management, and/or metadata maintenance, and/or metadata query, and/or metadata export, and/or metadata version management, and/or metadata change management on the metadata.
Preferably, the data quality management module includes:
the standard mapping unit is used for generating corresponding constraints or templates according to the quality management specifications so that a user can customize data quality rules, the rating cards and the knowledge base;
the checking management unit is used for the user to configure and schedule the checking task generated by calculation;
the data quality report database is used for generating a customized data quality report according to the checking result corresponding to the checking task;
and the problem management unit is connected with the data quality report database and is used for discovering, alarming and processing the data quality problem according to the data quality report.
Preferably, the public management and portal module includes:
and the public management unit is used for providing services of user management, role management, authority management, log management, database management, safety management and audit management, and providing unified user authority management support for the metadata management module, the data standard management platform and the data quality management module.
Preferably, the public management and portal module further includes:
and the portal unit is used for providing services of query retrieval, navigation, general functions and portal management and realizing navigation query for the metadata management module, the data standard management platform and the data quality management module.
The technical scheme has the following advantages or beneficial effects:
the technical scheme integrates the metadata assets of all links so as to browse and analyze the metadata and also be a source for forming a data resource management portal; meanwhile, the method also meets the requirements of checking the metadata of the base table, checking the relationship and influence of the blood relationship among the data and analyzing the dependency relationship among the data, and can better serve for the middle platforms of the urban data.
Drawings
FIG. 1 is a schematic diagram of a data governance system in accordance with a preferred embodiment of the present invention;
FIG. 2 is a diagram illustrating a structure of a metadata management module according to a preferred embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present invention is not limited to the embodiment, and other embodiments may be included in the scope of the present invention as long as the gist of the present invention is satisfied.
In a preferred embodiment of the present invention, based on the above problems in the prior art, there is provided a data management system for a data center of a city-level data center, for performing data management on data resources stored in a data resource center of the city-level data center, as shown in fig. 1, including:
the data standard management platform 1 is used for storing and managing at least one preset data management standard and at least one preset quality management standard;
the metadata management module 2 is connected with the data standard management platform 1 and is used for managing metadata related to the data resources according to data management specifications;
the data quality management module 3 is respectively connected with the metadata management module 2 and the data standard management platform 1 and is used for generating a data quality management standard according to the quality management standard and performing quality control on the management process of the metadata according to the data quality management standard;
and the public management and portal module 4 is respectively connected with the metadata management module 2, the data standard management platform 1 and the data quality management module 3 and is used for providing unified user authority management support and navigation query for the metadata management module 2, the data standard management platform 1 and the data quality management module 3.
Specifically, in this embodiment, the data governance system is an important tool for data governance and data management, and can integrate metadata assets of each link of the platform so as to browse and analyze metadata, and is also a source for forming a data resource management portal. The data management system provides a visual metadata management module 2, and the requirements of checking the metadata of the base table, checking the blood relationship and influence among the data and analyzing the dependency relationship among the data are met.
The data management system comprises a metadata management module 2, a data standard management platform 1, a data quality management module 3, a public management and portal module 4 and data cube design work, and is used for supporting data extraction from regional data lakes as required, carrying out data cube modeling and data management according to established data standards, business models and the like to form a basic library and a theme library, and carrying out data resource management work such as metadata management and blood source analysis on data resources stored in all data resource centers.
In this embodiment, the data set of the metadata is analyzed using an online analysis processing technique. The online analysis processing is a multidimensional analysis technology, and helps business personnel to quickly and interactively know and observe data from multiple aspects, so that the information in the data can be deeply mastered. The online analysis process uses the concept of a multidimensional view cube to describe the structure of a data set. Fields in a dataset are divided into two categories, dimensions and measures, according to the role they play for the decision: dimension is a characteristic attribute describing a fact record, corresponding to an axis in the cube, such as time, location, measure is data reflected on the fact record, and value field is a general term, corresponding to location in coordinates, such as sales, production, population. The on-line analysis processing starts from dimension transformation, and provides operations of drilling, slicing, dicing, rotating and the like. Drilling is the analysis of different levels (from high level to low level, and from low level to high level) of different dimensionalities and granularities; the slicing and dicing is to select a specific dimension and perform analysis in the defined dimension; rotation is a transformation of the dimensional direction. Decision-making personnel can extract intuitive and understandable data reflecting government operation conditions from the original data through the operations, thereby providing support for decision-making. The data cube design tool serves a business library in the subject library which needs efficient dimension query. The cube design is carried out through a visual interface, the construction of an M-OLAP scene data mart is mainly met, an optimized storage data table which accords with business characteristics is established based on the existing data table, the query based on the source data table can fall on the optimized storage data table, the query performance is further accelerated, and the query concurrency is improved. Supporting the instantiation of the established cube in a distributed file system; supporting incremental construction and dimension reduction optimization; the snowflake model and the star model can be supported; the data source can support various formats, including a distributed file system and a relational database management system; the method supports the life cycle management of the cube, including the monitoring management of cube establishment, update and deletion; and the import and export of the cube model can be supported.
In this embodiment, the data standard management platform 1 manages the process and content of making the data standard according to the data standard management specification, and the data standard management platform 1 mainly has the following functions:
importing an external standard: the method is characterized in that an Excel template mode is provided, external national standards or industry standards are imported, standard import of types such as data items, data dictionaries, indexes and the like is supported, and expansion according to a meta model is supported.
Data standard mapping: and establishing a mapping relation from the database system to the data standard, and ensuring the landing and execution of the data standard.
Data standard cataloging: and supporting a custom data standard directory structure and content, and providing corresponding query, browsing and using pages.
Data resource association: support associating data criteria with arbitrary resources. The method comprises the steps of providing association analysis of data standards with metadata and data resources, analyzing data resource distribution of reference specified standards, and reference statistics of the standards.
Data standard maintenance: and standard maintenance functions including functions of standard addition, change, audit and the like are realized, standard version management and comparison functions are supported, and standard change conditions are recorded.
And (3) standard format configuration: and the storage and display format and type of the custom data standard are supported.
The data quality management is to provide support for the data quality management process according to the data quality management standard, so that the checking rules can be matched according to the data standard, and the checking objects can be automatically matched according to the metadata. The main process of data quality management comprises quality rule configuration and data quality check, and the problems of data integrity, uniqueness, authority, consistency, legality and the like can be solved.
In a preferred embodiment of the present invention, the metadata management module 2 adopts a layered architecture, as shown in fig. 2, the layered architecture comprises, from bottom to top:
the acquisition layer 21 is configured to acquire various metadata and perform preliminary data analysis on the metadata to obtain a metadata relationship between the metadata;
the storage layer 22 is used for storing all metadata input by the acquisition layer 21 in a centralized manner and storing metadata relationships;
the application layer 23 is used for analyzing and managing metadata and metadata relationships;
and a service layer 24 for providing an external service based on the metadata.
Specifically, in this embodiment, the process of the acquisition layer 21 for acquiring metadata includes: managing a collected data source: the technical scheme provides various metadata data source management functions, including access to different types of acquisition sources, acquisition source parameter configuration and the like.
And (3) collecting template management: the technical scheme provides template management for collecting the metadata and provides an auxiliary technical means for technicians to maintain the metadata. The template function is mainly used when the metadata is automatically and manually imported, the function mainly provides a data template for the technical staff to export the metadata to be imported, and the format of the imported data is conveniently provided.
Metadata mapping management: in order to solve the problem that a plurality of metadata exist between different layers in enterprise information construction, but the phenomenon that the same metadata exist is described, and the phenomenon that the plurality of metadata exist between design and implementation are also described, but the phenomenon that the same metadata exist is described, the technical scheme introduces the concept of metadata mapping management, and by configuring the mapping relation among the metadata, the metadata are considered to be the same metadata if the codes of the metadata are the same under a metadata directory with the mapping relation.
The method also comprises collection task management and collection scheduling management.
In a preferred embodiment of the present invention, the collection layer 21 collects the metadata through a pre-configured collection adapter and/or a pre-generated data import module.
Specifically, in the present embodiment, the acquisition layer 21 further implements the following management for acquisition adapters: according to the technical scheme, metadata information is collected from each system of an enterprise, and collection task configuration and the collected model result information are supported to be checked. The technical scheme supports the collection of metadata information such as various relational databases, urban data middleboxes, storage processes, modeling tools and the like.
In a preferred embodiment of the present invention, the metadata includes:
one or more of distributed file system metadata, distributed data warehouse metadata, distributed NoSQL database metadata, data retrieval component metadata, and real-time stream computation component metadata.
Specifically, in this embodiment, the metadata is a place where each component (distributed file system, analytic data warehouse, distributed NoSQL database, data retrieval component, real-time computation component, etc.) of the data governance system stores related data information, and is used to describe data, including creation information, affiliated space, access authority, type description, etc. The data governance system needs to provide a highly available database to provide a unified management storage of metadata for all components.
The distributed file system metadata comprises file attribute information such as file names, directory names, father directory information, file sizes, creation time, modification time and the like, and also comprises storage related information such as file blocking conditions, copy numbers, nodes where each copy is located and the like; and the data belonging relation is also recorded, the information of the user and the user group to which the user belongs is provided, and the authority of the user and the user group can be marked.
The distributed data warehouse metadata includes:
library level meta-information: including library name, description information, creator, creation time, authority to build table and look up table in library, etc.
Table level meta information: the method comprises table name, description information, creator, creation time, belonged library, table field, table adding and deleting authority, table deleting authority and the like.
Field meta information: including field names, descriptive information, field type, default values, whether empty, user access rights, etc.
The distributed NoSQL database metadata refers to metadata of the NoSQL database mapping table in the distributed bins. Similar to the distributed number warehouse table level and field authority, the system comprises a table name, table description information, a creator, creation time, a belonging library, a field in the table, a table adding and deleting authority, a table deleting authority, a field name, field description information, a field type, a user access authority and the like.
Data retrieval component metadata refers to metadata of a data retrieval engine mapping table in a distributed data warehouse. Similar to the distributed number warehouse table level and field authority, the system comprises a table name, table description information, a creator, creation time, a belonging library, a field in the table, a table adding and deleting authority, a table deleting authority, a field name, field description information, a field type, a user access authority and the like.
Real-time stream computation component metadata real-time stream computation has three core concepts: flows, flow tasks, and flow applications. A stream is a stream of data, a stream task is a task that performs computations on one or more streams of data and writes the results into a table, and a stream application is a collection of one or more stream tasks.
The stream meta information needs to include stream name, description information, creator, creation time, belonging library, in-stream field, in-stream add/delete/modify authority, delete stream authority, etc.
The stream task meta-information needs to include task name, description information, creator, start time, belonging library, task logic, start-stop authority, etc.
The stream application meta-information needs to include application name, description information, creator, creation time, belonging library, application stream task information, and the like.
In a preferred embodiment of the present invention, the application layer 23 includes:
a blood margin analysis unit 231, configured to perform blood margin analysis on each metadata to obtain a data source of each metadata;
and an influence analysis unit 232 for analyzing influences on other systems and data when the metadata is changed.
A data importance analyzing unit 233 for analyzing the importance of each metadata from the used frequency, the number of times of access, and the dark data discrimination result of each metadata;
the analysis result derivation section 234 is connected to the blood margin analysis section 231, the influence analysis section 232, and the data importance analysis section 233, and derives the analysis result of each metadata.
Specifically, in the embodiment, the blood margin analysis unit 231 performs blood margin analysis by using the metadata, and traces back the source of the data upward for reference for tracking and tracing the system operation and maintenance and data quality problems.
The influence analysis unit 232 analyzes the influence of a metadata object change on a downstream system in the next direction, and is used for analyzing the influence on other systems and data when the metadata change occurs.
The data importance analyzing unit 233 analyzes the importance of data from the frequency with which the data is used, the number of times it is accessed, and whether it is dark data, respectively.
The analysis result derivation unit 234 derives the result of each metadata analysis.
In a preferred embodiment of the present invention, the application layer 23 further comprises a metadata management unit 235 for performing metadata view management, and/or metadata maintenance, and/or metadata query, and/or metadata export, and/or metadata version management, and/or metadata change management on the metadata.
Specifically, in this embodiment, metadata view management refers to that the data governance system provides an enterprise data view according to a service line, so that service personnel can conveniently view data from a service perspective.
Metadata maintenance refers to query modification and deletion operations of metadata basic information, attributes, depended relationships, dependency relationships and combination relationships.
The metadata inquiry refers to inquiring metadata which accords with data access authority according to the search condition.
Metadata export refers to the data governance system providing metadata export functionality.
Metadata version management refers to the data management system providing the lifecycle management of metadata, and the release, deletion and state change have strict processes and provide version management functions.
The metadata change management means that a user can subscribe concerned metadata by himself, when the metadata are changed, the data management system informs the user of the change in a form specified by the user, and the user can further inquire specific content of the change and related influence analysis in the system according to guidance.
In a preferred embodiment of the present invention, the data quality management module 3 includes:
the standard mapping unit 31 is used for generating corresponding constraints or templates according to the quality management specifications, so that a user can customize the data quality rules, the rating cards and the knowledge base;
the checking management unit 32 is used for the user to configure and schedule the checking task generated by calculation;
the data quality report database 33 is used for generating a customized data quality report according to a checking result corresponding to the checking task;
and the problem management unit 34 is connected with the data quality report database 33 and is used for discovering, alarming and processing data quality problems according to the data quality reports.
Specifically, in this embodiment, the standard mapping unit 31 generates a corresponding constraint or template based on the relevant standard of the standard platform, and provides the user with customization of the data quality rule, the rating card, and the knowledge base; the checking management unit 32 supports the user to configure, schedule and manage the checking task. The related checking task is calculated and generated in the data quality analysis engine based on the apachespark self-research, and the scheduling management is realized through a calculation scheduling module in the universal component and service layer 24; the data quality report database 33 generates customized data quality reports according to the checking results, including quality summarization, quality trend analysis and the like, and supports various customized queries; the problem management unit 34 supports discovery, alerting and handling of relevant data quality problems for data quality reports.
The data quality management module 3 mainly functions including:
and (4) quality rule configuration, and data quality measurement rule and checking method management are realized. The configuration realizes the management of the checking task, and supports the generation of the quality checking method according to the data standard through the interface maintenance task.
And (4) data quality checking, namely performing compliance checking on checked objects in sequence according to data quality rules, and recording problem data and abnormal records after finding a data quality problem.
And the data quality problem detail report is required to be provided after the checking task is executed, the overall situation of the problem is described, and the detail quality report is provided according to dimensions such as ownership departments, resource classification and the like.
The quality management tool can perform rule setting such as data cleaning conversion and the like through the data migration tool, performs scheduling design on the quality management process, and embeds data standardization work into a business process so as to achieve the purpose of defining the type, definition and rule of data according to a unified data standard and a treatment standard.
And (2) data quality audit, wherein an analytic data warehouse provided by a big data support platform is used for providing a data audit function, dirty data are written into a specified dirty data table according to rules (including but not limited to data misreading, field type mismatching and UDF condition filtering), and after data import is finished, dirty data reasons, record numbers, import interfaces and data quality reports are recorded so as to facilitate judgment and processing of a monitoring program.
In a preferred embodiment of the present invention, the common management and portal module 4 includes:
and the public management unit 41 is used for providing services of user management, role management, authority management, log management, database management, security management and audit management, and providing unified user authority management support for the metadata management module 2, the data standard management platform 1 and the data quality management module 3.
Specifically, in this embodiment, the user manages: the interface unified identity authentication platform provides user management including modules of metadata, data standards, data quality and the like, including user information, password verification and the like.
And (3) role management: different roles are divided according to different business departments and responsibilities, such as an administrator, a data subject manager, an inquiry user and the like.
And (3) authority management: and managing related rights of data governance, including rights definition, rights allocation and rights removal.
Log management: including management of data standards, data quality, related logs of functional modules such as metadata, running logs of the system itself, and the like.
Knowledge base management: and managing data management related knowledge base, including processing knowledge of data problems, checking rule making knowledge and the like.
Safety management: the page watermark display is supported, and the access authority can be distinguished according to the network segment.
And (4) audit management: and auditing of operations such as addition, deletion, modification, check and the like is supported.
In a preferred embodiment of the present invention, the public management and portal module 4 further includes:
and the portal unit 42 is used for providing services of query retrieval, navigation, general functions and portal management, and is used for providing navigation query for the metadata management module 2, the data standard management platform 1 and the data quality management module 3.
Specifically, in this embodiment, query retrieval: the method comprises the steps of inputting keywords to inquire and derive related information of precise and fuzzy search, such as data standard information item inquiry, quality rule inquiry and data dictionary inquiry.
Navigation: and the entrance for guiding the user to find the related functions from different dimensions comprises function navigation, classification navigation, role navigation and the like.
General functions: various convenient functions of the user are provided, including bulletin boards, message notification, subscription services, favorites, backlogs and the like.
And (3) portal management: and managing and retrieving the portal, customizing a browsing view by a user and the like.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.
Claims (9)
1. A data governance system for a city-level data center, for data governance of data resources stored by a data resource center of the city-level data center, comprising:
the data standard management platform is used for storing and managing at least one preset data management standard and at least one preset quality management standard;
the metadata management module is connected with the data standard management platform and is used for managing metadata related to the data resources according to the data management specification;
the data quality management module is respectively connected with the metadata management module and the data standard management platform and is used for generating a data quality management standard according to the quality management standard and performing quality control on the management process of the metadata according to the data quality management standard;
and the public management and portal module is respectively connected with the metadata management module, the data standard management platform and the data quality management module and is used for providing unified user authority management support and navigation query for the metadata management module, the data standard management platform and the data quality management module.
2. The data governance system of claim 1, wherein the metadata management module employs a hierarchical architecture comprising, from bottom to top:
the acquisition layer is used for acquiring various metadata and carrying out preliminary data analysis on the metadata to obtain metadata relations among the metadata;
the storage layer is used for carrying out centralized storage on the metadata input by the acquisition layer and storing the metadata relation;
the application layer is used for analyzing and managing the metadata and the metadata relation;
and the service layer is used for providing external services based on the metadata.
3. The data governance system of claim 2, wherein the collection layer performs collection of each of the metadata via a pre-configured collection adapter and/or a pre-generated data import module.
4. The data governance system of claim 2, wherein the metadata comprises:
one or more of distributed file system metadata, distributed data warehouse metadata, distributed NoSQL database metadata, data retrieval component metadata, and real-time stream computation component metadata.
5. The data governance system of claim 2, wherein the application layer comprises:
the blood margin analysis unit is used for performing blood margin analysis on each metadata to obtain a data source of each metadata;
and the influence analysis unit is used for analyzing the influence on other systems and data when the metadata is changed.
A data importance analyzing unit for analyzing the importance of each metadata from the used frequency, the accessed frequency and the dark data discrimination result of each metadata;
and an analysis result derivation unit connected to the blood-margin analysis unit, the influence analysis unit, and the data importance analysis unit, respectively, and configured to derive an analysis result of each of the metadata.
6. The data governance system of claim 5, wherein the application layer further comprises a metadata management unit for metadata view management, and/or metadata maintenance, and/or metadata query, and/or metadata export, and/or metadata version management, and/or metadata change management for the metadata.
7. The data governance system of claim 1, wherein the data quality management module comprises:
the standard mapping unit is used for generating corresponding constraints or templates according to the quality management specifications so that a user can customize data quality rules, the rating cards and the knowledge base;
the checking management unit is used for the user to configure and schedule the checking task generated by calculation;
the data quality report database is used for generating a customized data quality report according to the checking result corresponding to the checking task;
and the problem management unit is connected with the data quality report database and is used for discovering, alarming and processing the data quality problem according to the data quality report.
8. The data governance system of claim 1, wherein the common management and portal module comprises:
and the public management unit is used for providing services of user management, role management, authority management, log management, database management, safety management and audit management, and providing unified user authority management support for the metadata management module, the data standard management platform and the data quality management module.
9. The data governance system of claim 8, wherein the common management and portal module further comprises:
and the portal unit is used for providing services of query retrieval, navigation, general functions and portal management and realizing navigation query for the metadata management module, the data standard management platform and the data quality management module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011176826.5A CN112199433A (en) | 2020-10-28 | 2020-10-28 | Data management system for city-level data middling station |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011176826.5A CN112199433A (en) | 2020-10-28 | 2020-10-28 | Data management system for city-level data middling station |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112199433A true CN112199433A (en) | 2021-01-08 |
Family
ID=74011859
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011176826.5A Pending CN112199433A (en) | 2020-10-28 | 2020-10-28 | Data management system for city-level data middling station |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112199433A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112699175A (en) * | 2021-01-15 | 2021-04-23 | 广州汇智通信技术有限公司 | Data management system and method thereof |
CN112712286A (en) * | 2021-01-15 | 2021-04-27 | 科技谷(厦门)信息技术有限公司 | Data asset management method based on data middleboxes |
CN112800046A (en) * | 2021-02-26 | 2021-05-14 | 上海帕科信息科技有限公司 | Artificial intelligence platform applied to field data management |
CN112819652A (en) * | 2021-02-24 | 2021-05-18 | 广州汇通国信科技有限公司 | Data center applied to power system and method thereof |
CN112862445A (en) * | 2021-02-22 | 2021-05-28 | 浪潮云信息技术股份公司 | High-concurrency data exchange processing method and storage medium |
CN112966246A (en) * | 2021-03-15 | 2021-06-15 | 安徽超清科技股份有限公司 | Online reading system in urban brain |
CN113138973A (en) * | 2021-04-20 | 2021-07-20 | 建信金融科技有限责任公司 | Data management system and working method |
CN114398442A (en) * | 2022-01-25 | 2022-04-26 | 中国电子科技集团公司第十研究所 | Data-driven information processing system |
CN114416714A (en) * | 2022-01-18 | 2022-04-29 | 军事科学院系统工程研究院后勤科学与技术研究所 | Data management system |
CN114780531A (en) * | 2022-05-07 | 2022-07-22 | 广州光点信息科技股份有限公司 | Multifunctional big data intelligent analysis service system and method |
CN114896234A (en) * | 2022-05-12 | 2022-08-12 | 建信金融科技有限责任公司 | Method, apparatus, device, medium, and program product for generating data quality rules |
CN116226894A (en) * | 2023-05-10 | 2023-06-06 | 杭州比智科技有限公司 | Data security treatment system and method based on meta bin |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110173149A1 (en) * | 2010-01-13 | 2011-07-14 | Ab Initio Technology Llc | Matching metadata sources using rules for characterizing matches |
CN108717456A (en) * | 2018-05-22 | 2018-10-30 | 浪潮软件股份有限公司 | A kind of data lifecycle management platform that data source is unrelated and method |
CN109344133A (en) * | 2018-08-27 | 2019-02-15 | 成都四方伟业软件股份有限公司 | A kind of data administer driving data and share exchange system and its working method |
CN110232098A (en) * | 2019-04-22 | 2019-09-13 | 汇通达网络股份有限公司 | A kind of data warehouse administered based on data and genetic connection designs |
CN110851426A (en) * | 2019-11-19 | 2020-02-28 | 重庆华龙网海数科技有限公司 | Data DNA visualization relation analysis system and method |
CN111125068A (en) * | 2019-11-13 | 2020-05-08 | 深圳市华傲数据技术有限公司 | Metadata management method and system |
-
2020
- 2020-10-28 CN CN202011176826.5A patent/CN112199433A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110173149A1 (en) * | 2010-01-13 | 2011-07-14 | Ab Initio Technology Llc | Matching metadata sources using rules for characterizing matches |
CN108717456A (en) * | 2018-05-22 | 2018-10-30 | 浪潮软件股份有限公司 | A kind of data lifecycle management platform that data source is unrelated and method |
CN109344133A (en) * | 2018-08-27 | 2019-02-15 | 成都四方伟业软件股份有限公司 | A kind of data administer driving data and share exchange system and its working method |
CN110232098A (en) * | 2019-04-22 | 2019-09-13 | 汇通达网络股份有限公司 | A kind of data warehouse administered based on data and genetic connection designs |
CN111125068A (en) * | 2019-11-13 | 2020-05-08 | 深圳市华傲数据技术有限公司 | Metadata management method and system |
CN110851426A (en) * | 2019-11-19 | 2020-02-28 | 重庆华龙网海数科技有限公司 | Data DNA visualization relation analysis system and method |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112699175B (en) * | 2021-01-15 | 2024-02-13 | 广州汇智通信技术有限公司 | Data management system and method thereof |
CN112712286A (en) * | 2021-01-15 | 2021-04-27 | 科技谷(厦门)信息技术有限公司 | Data asset management method based on data middleboxes |
CN112699175A (en) * | 2021-01-15 | 2021-04-23 | 广州汇智通信技术有限公司 | Data management system and method thereof |
CN112862445A (en) * | 2021-02-22 | 2021-05-28 | 浪潮云信息技术股份公司 | High-concurrency data exchange processing method and storage medium |
CN112819652A (en) * | 2021-02-24 | 2021-05-18 | 广州汇通国信科技有限公司 | Data center applied to power system and method thereof |
CN112800046A (en) * | 2021-02-26 | 2021-05-14 | 上海帕科信息科技有限公司 | Artificial intelligence platform applied to field data management |
CN112966246A (en) * | 2021-03-15 | 2021-06-15 | 安徽超清科技股份有限公司 | Online reading system in urban brain |
CN113138973A (en) * | 2021-04-20 | 2021-07-20 | 建信金融科技有限责任公司 | Data management system and working method |
CN114416714A (en) * | 2022-01-18 | 2022-04-29 | 军事科学院系统工程研究院后勤科学与技术研究所 | Data management system |
CN114398442B (en) * | 2022-01-25 | 2023-09-19 | 中国电子科技集团公司第十研究所 | Information processing system based on data driving |
CN114398442A (en) * | 2022-01-25 | 2022-04-26 | 中国电子科技集团公司第十研究所 | Data-driven information processing system |
CN114780531A (en) * | 2022-05-07 | 2022-07-22 | 广州光点信息科技股份有限公司 | Multifunctional big data intelligent analysis service system and method |
CN114896234A (en) * | 2022-05-12 | 2022-08-12 | 建信金融科技有限责任公司 | Method, apparatus, device, medium, and program product for generating data quality rules |
CN116226894A (en) * | 2023-05-10 | 2023-06-06 | 杭州比智科技有限公司 | Data security treatment system and method based on meta bin |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112199433A (en) | Data management system for city-level data middling station | |
CN112699175B (en) | Data management system and method thereof | |
CN111159191B (en) | Data processing method, device and interface | |
US10853387B2 (en) | Data retrieval apparatus, program and recording medium | |
CN110300963B (en) | Data management system in a large-scale data repository | |
CN107315776B (en) | Data management system based on cloud computing | |
CN112396404A (en) | Data center system | |
CN112364094A (en) | Visual modeling method, device and medium for data warehouse | |
CN103455540B (en) | The system and method for generating memory model from data warehouse model | |
WO2018051097A1 (en) | System for analysing data relationships to support query execution | |
CN110119395B (en) | Method for realizing association processing of data standard and data quality based on metadata in big data management | |
CN111984709A (en) | Visual big data middle station-resource calling and algorithm | |
WO2014031618A2 (en) | Data relationships storage platform | |
CN108959353B (en) | Object data organization method | |
CN111125068A (en) | Metadata management method and system | |
CN117472874A (en) | Government affair data resource integrated management system and method based on big data analysis | |
CN114218218A (en) | Data processing method, device and equipment based on data warehouse and storage medium | |
CN110990620A (en) | Intelligent transformer substation drawing and document data management method based on intelligent technology application | |
CN114880405A (en) | Data lake-based data processing method and system | |
CN112506892A (en) | Index traceability management system based on metadata technology | |
CN112699100A (en) | Management and analysis system based on metadata | |
CN115617776A (en) | Data management system and method | |
CN112817958A (en) | Electric power planning data acquisition method and device and intelligent terminal | |
CN112163017A (en) | Knowledge mining system and method | |
JP2007133624A (en) | Information management method and device using connection relation information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210108 |
|
RJ01 | Rejection of invention patent application after publication |