CN115357656A

CN115357656A - Information processing method and device based on big data and storage medium

Info

Publication number: CN115357656A
Application number: CN202211298771.4A
Authority: CN
Inventors: 李慧; 李以斌; 陈伟; 李国良; 裴洪岩; 贾丹丹; 张继影; 邵海金; 李桂林
Original assignee: Taiji Computer Corp Ltd
Current assignee: Taiji Computer Corp Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2022-11-18

Abstract

The application provides an information processing method, equipment and a storage medium based on big data, which belong to the technical field of data information processing, and the method comprises the following steps: acquiring target format data required by a target application program based on big data provided by a source terminal; extracting the relation among entities, places and time from the big data, establishing corresponding triples and constructing a knowledge graph; responding, by the target application, to a user access request from the front end based on the target format data; and under the condition that a first entity suspected to carry pathogenic microorganisms is found, searching an action track corresponding to the first entity based on the knowledge graph, and determining a target entity. Entity construction is carried out on the basis of the triple atlas of the relation among people, places and time, association nodes between nodes and the next node are quickly constructed, epidemic situation contact links are accurately searched, quick investigation is carried out, data are stored in a cache, operation efficiency is improved, and personnel and place information can be quickly presented according to the second level of a time line.

Description

Information processing method and device based on big data and storage medium

Technical Field

The present application relates to the field of data information processing technologies, and in particular, to an information processing method and apparatus based on big data, and a storage medium.

Background

Some infectious pathogenic microorganisms, such as new corona viruses (novel coronavirus), have exponentially increased propagation speed in people, and under the condition that entities (human beings, objects, animals and the like) suspected to carry the pathogenic microorganisms are found, other target entities which have excessive contact with the entities and further have infection risks are quickly found, so that the method has very important significance for blocking propagation and preventing further diffusion of the pathogenic microorganisms such as the new corona viruses in people.

In the checking method in the prior art, the checking efficiency needs to be further improved.

Disclosure of Invention

Embodiments of the present application provide an information processing method, device and storage medium based on big data to solve the above problems.

In a first aspect, an embodiment of the present application provides an information processing method based on big data, where the method includes:

acquiring target format data required by a target application program based on big data provided by a source terminal; extracting the relation among entities, places and time from the big data, establishing corresponding triples, and constructing a knowledge graph, wherein the entities comprise at least one of people, objects or animals; responding, by the target application, to a user access request from a front end based on the target format data; after a first entity suspected to carry pathogenic microorganisms is found, based on the knowledge graph, an action track corresponding to the first entity is searched, and a target entity in direct or indirect contact with the first entity is determined.

As a possible implementation manner, acquiring target format data required by a target application based on big data provided by a source terminal includes: carrying out integrated processing on big data from at least one source end; ETL processing is carried out on the big data; carrying out standardization processing and/or quality management on the big data after ETL processing to obtain standardized data with quality meeting a preset rule; the standardized data includes the target format data.

As a possible implementation manner, the collecting big data from at least one source end includes: adopting at least one of flume, kafka and sqoop clusters to carry out data acquisition; the flash is used for distributing real-time data and storing the real-time data into a specified database, and the kafka is used for acquiring the real-time data into a calculation engine for calculation; the Sqoop is used for collecting non-real-time data and/or collecting unstructured data to an Hbase library for storage; the real-time data is data with the updating frequency larger than or equal to the updating threshold, and the non-real-time data is data with the data updating frequency smaller than the frequency updating threshold.

As a possible implementation manner, the big data is subjected to ETL (Extract Transform Load) processing, which includes: and based on an ETL technology, extracting, converting and loading the big data.

As a possible implementation manner, the performing ETL processing on the big data includes: performing A-D-M-S processing on the big data; the A-D-M-S processing comprises the following steps: system level analysis, table level analysis, field level analysis, LDM (Local Data Manager) operation, PDM (Product Data Management) operation, SDM (Semantic Database Model) operation, and ETL JOB operation.

As a possible implementation, the system level analysis is used for system research, and the content relates to system information, functions and processes required to be warehoused in the project; the table level analysis is used for determining a table of the source system entering the integration layer and a table of the source system entering the proximity layer; the field set analysis is used for determining the field, the type, the unique index, the primary key and whether the field set is null or not of the near source layer table; the LDM is derived from ERwein, and records the Chinese name information in the table field of the model; the PDM is a data dictionary of the integration layer and determines table field English name information of the model layer; the SDM is a template for determining the warehousing form of the field of the source table from the target to the source; the ETLJOB is used for determining a loading algorithm of the integration layer task.

As a possible implementation manner, the normalizing and/or quality management is performed on the large data after ETL processing, including; based on a data standard management framework, executing corresponding standardization processing, wherein the data standard management framework comprises a data standard definition, a data standard mapping, a data standard execution and a data standard management process; and/or performing corresponding quality management based on a data quality management framework, wherein the data quality management framework comprises the steps of determining data quality check rules, finding data quality problems, analyzing the data quality problems, solving the data quality problems and monitoring an improvement process.

As a possible implementation manner, the tracing method further includes: a scene note linkage mechanism; generating a strong relation based on the attribute label and a weak relation based on the feature label for the city entity through automatic label learning; and establishing an atomic-level large-scale weak relation covering a global main body, and performing strong relation convergence based on space-time mapping aiming at any event to realize automatic extraction and self-adaptive growth learning of a multi-dimensional complex relation of urban people-enterprises-things.

In a second aspect, embodiments of the present application further provide an electronic device, which includes a memory for storing program instructions and a processor for executing the program instructions, wherein when the program instructions are executed by the processor, the electronic device is triggered to perform the method according to any one of the first aspect.

In a third aspect, embodiments of the present application further provide a storage medium, which stores program instructions, and when the storage medium is run on an electronic device, the storage medium causes the electronic device to perform the method according to any one of the above first aspects.

According to the big data-based information processing method in the embodiment of the application, entity construction is carried out on the basis of the triple atlas of the relation among people, places and time, the associated nodes between the nodes and the next node are quickly constructed, epidemic situation contact links are accurately searched, quick investigation is carried out, data are stored in a cache, the operation efficiency is improved, and people and place information can be quickly presented according to the second level of a timeline.

Drawings

Fig. 1 is a flowchart of an information processing method based on big data according to an embodiment of the present application;

FIG. 2 is a flow chart of data integration provided by an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a module division of a data integration platform according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a process for solving a data quality problem according to an embodiment of the present application;

fig. 5 is a schematic diagram of an a-D-M-S workflow provided in an embodiment of the present application.

Detailed Description

The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terms "comprises" and "comprising," and any variations thereof, in the description of the embodiments and claims and the drawings herein, are intended to cover a non-exclusive inclusion, such as, for example, a list of steps or elements.

The technical solution of the present application is further described in detail below with reference to the accompanying drawings and embodiments.

At present, the health treasure serves as a powerful and effective scientific and technological tool for rehabilitation and production recovery, plays a role in normalizing epidemic situation prevention and control, the health treasure is required to be opened to perform the code scanning and registering function of own information when entering each place, personal tracks are fused and stored in real time, automatic cleaning, fusion and classification and labeling processing are continuously performed on data through large data platform fusion, machine learning and continuous evolution of class knowledge map algorithm models, and data entity association relation is excavated.

The data processing comprises data cleaning, data association, data comparison, data identification and data desensitization processing.

The data cleaning is to detect the error and inconsistency of various data, detect and eliminate data abnormity, detect and eliminate approximate repeated records and improve the quality of data.

The data association is the ability of providing data and other business data and data model association according to the association rule, and outputting the association information.

The data comparison refers to comparing structured or unstructured information according to rules, and for data of a hit rule, outputting according to output description is supported.

The data identification provides the capability of expanding identification to data attributes according to rules and a knowledge base, and generates various self-defined data labels, such as basic attribute labels and service labels.

Data desensitization is to ensure that sensitive information in the data is not leaked, preventing misuse of the data. E.g., replacing some fields with similar characters; and shielding characters and replacing characters.

The whole process is similar to a knowledge graph node space, the node space is a network formed by a plurality of nodes, relations and attributes, the relations among node data are described through a knowledge graph, for example, people, vehicles, places and the like are data nodes, the data nodes classify and fuse hundreds of millions of different channel data, each type of data has own node space, and nodes formed by various entity data relations and attributes are arranged in the data nodes. Searching other related nodes from any node through respective attributes and relations of the nodes, spreading the nodes into a huge relation graph, searching a direct relation node and a secondary relation node which are related to a reference node by taking one of the nodes as the reference node, and extending layer by layer.

The algorithm model is gradually strengthened through deep learning of mass data, a large amount of complex, interconnected and low-structured data can be rapidly processed, and for the aspect of rapid investigation and traceability, the whole algorithm model is based on a person-place-time node network, people (identity information and mobile phone numbers) are used as nodes, and the entering place space information is sorted according to a time line. With places (physical locations) as nodes, information of people entering the places is sorted according to a time line. Based on the mode of 'I scan people and people scan me', continuously associating the corresponding next node, quickly and accurately checking tight-lock personnel and tracing.

As shown in fig. 1 and fig. 2, in the big data-based information processing method provided in the embodiment of the present application, acquiring basic service data from each epidemic prevention organization and channel includes:

the Beijing health treasure basic business data come from various epidemic prevention organizations and channels in China, and relate to various Weijian Commission, communities, public security, civil aviation, railways, roads and communication yards.

Performing data integration processing on the basic service data to obtain integrated data;

the big data governance platform establishes a complete data calculation model according to a data governance process and a standard specification, the data calculation model can uniformly clean, convert and label basic service data of each channel, and through quality detection model verification, high-quality service data such as personnel health states, nucleic acid results and vaccination are provided for Beijing health treasures, accurate epidemic situation prevention and control are achieved, and safety of personal data is guaranteed through data desensitization.

As shown in fig. 3, the data collection includes internal data and external data, where the internal data is divided into real-time data and non-real-time data according to real-time performance, and is divided into structured data and unstructured data according to technical types, the real-time data is track-type and communication-type data, and the non-real-time data is data whose data update frequency is less than a frequency update threshold.

The real-time data is structured data, and the unstructured data mainly refers to documents and pictures.

The data acquisition tool comprises: the system comprises a flash cluster, a kafka cluster and an sqoop cluster, wherein the flash cluster distributes real-time data and stores the real-time data into a basic library, and the kafka cluster is used for acquiring the real-time data into a real-time calculation engine for calculation; the Sqoop collects non-real time data.

Unstructured data are directly collected into an Hbase library through an Sqoop cluster for storage, the unstructured data are mainly stored into the HBASE, and part of scattered small files are directly stored into the HDFS.

And the data exchange platform exchanges data, stores external data into an external shared library of the private network, and integrates the data through an internal data integration method.

The data processing process is to extract data from the data exchange platform;

and (3) performing ex-library conversion according to the data model by adopting an ETL tool, loading the converted data into a database, and performing conversion operation of the data model by using SQL (structured query language) of the database.

The data loading process is that the data files provided by the source data system are cleaned and converted and then loaded into the temporary data area; and setting and processing the data external key in the temporary data area again, and realizing data loading from the data area to the unified model layer.

ETL scheduling: ETL scheduling needs to complete the dependency relationship of the whole system, and manual intervention is not needed in the conversion process;

error and exception handling: an error and exception handling mechanism of the ETL system is provided, and the reliability of the system is enhanced;

the reusability of ETL operation is improved and the maintenance difficulty of ETL codes is reduced in a mode of extracting a public module.

The ETL scheduling process refers to scheduling ETL operation according to the relationship between ETL tasks, so that the scheduling process is executed according to a set sequence.

The ETL inter-task relationships include the following relationships in Table 1:

table 1:

relationships between	Description of the invention
		Triggering Relationships between	The task trigger relationship refers to triggering the execution of another task after the successful execution of one task is completed in the current day. Then a is called trigger B, a being the upstream task and B being the downstream task.
Depend on Relationships between	Task dependencies refer to dependencies between tasks, i.e., a task must wait for another task to execute successfully before it can execute. If task A must wait for task B, then say A depends on B, B Is a dependent task of a and the current date of task B is greater than or equal to the current date of a.
		Correlation Relationships between	If the task A is the related task of the task B, the task A is to firstly check whether the last day task execution state of the data calendar of the task A relative to the data date is the last day task execution state when the task B is executed on the data date To be successful, B can be executed.

As shown in FIG. 5, the A-D-M-S workflow:

the system level analysis is mainly used for system research, and the content relates to system information, functions and processes needing to be warehoused in the project.

Table level analysis is used to determine the table of the source system entering the integration layer, the table entering the proximity layer.

The field set analysis is used to determine the fields, type, unique index, primary key, and whether the near source layer table is empty.

The LDM is generally derived from ERwein and records information such as the names of characters in the table fields of the model.

The PDM is a data dictionary of the integration layer and determines information such as table field English name of the model layer.

SDM is a template that determines the binning form of the source table fields, from target to source, and is a key step in generating ETL.

ETLJOB loading algorithm for determining integration layer tasks

The A-D-M-S model reduces the development period of the ETL script in the project and provides great convenience for developers.

The data model collects the data of each internal business line and the external data of each social business, investigates the basic data and analyzes the relation between various data contained in the business and the data; and (4) refining the data theme, and designing a conceptual data model, a logical data model and a physical data model.

A general business language of a project is formed, a consistent platform is provided for communication and understanding of data meanings among cross-line, cross-business and cross-development teams, and a data model relates to a data modeling process and comprises 6 main stages of early preparation, business investigation, information investigation, concept model construction, logic model design and physical model design. The data model is the basis for building a primary resource library, a secondary resource library and a tertiary resource, and really realizes the label application, efficient comprehensive analysis and high-performance service of the data.

And (3) data processing standardization: formulating a data cleaning and converting rule, a data extraction association rule, a data comparison identification rule and a data integration rule, formulating a data processing flow according to the data resource type (including structured, unstructured and semi-structured), and realizing the standardization of the data processing flow. And (3) data standardization statistics: and multi-dimensional statistics of data standardization conditions in the platform are supported, such as resource standardization rate and standardization level.

The data quality management realizes the landing of a data quality control mechanism, realizes the quality control of a data platform, and realizes the definition, the setting, the operation and the discovery of a data quality detection rule; meanwhile, recording the solving process of the data quality problem; and generating a data quality treatment report.

As shown in fig. 4, the data quality management framework includes defining data quality check rules, discovering data quality problems, analyzing data quality problems, resolving data quality problems, and monitoring improvement processes.

Storing the integration data into an integration database as follows:

the Beijing Jiankangbao big data support platform adopts a Restful style with a unified micro-service technical architecture to realize agile development and deployment, and a support platform application program can adjust service capability through measures of service fusing, current limiting and degradation, thereby completely meeting the high concurrency processing requirement of a front end. The Redis cluster is used as basic service data cache storage, and performance guarantee is provided for highly concurrent access of Beijing health treasure user groups.

The high-throughput distributed Kafka cluster is used for peak clipping and decoupling of the support platform application program, the support platform stores the front-end access log in an asynchronous processing mode through the message queue, and response efficiency of front-end access is improved. The data storage adopts a distributed cluster database for storage, and a sub-database and sub-table master-slave mode is used for realizing read-write separation, so that the throughput of data storage and retrieval is improved, and the data safety is guaranteed.

Searching space-time cross personnel and epidemic places in the integrated database according to the identity information of the confirmed personnel;

the method for adopting the scene-note linkage mechanism comprises the following steps:

aiming at the scene linkage problem facing the complex relation, an event-driven complex scene self-adaptive large-scale dynamic relation learning technology is provided. The strong relation based on the attribute label and the weak relation based on the characteristic label are generated for the city entity through automatic labeling learning, the atomic-level large-scale weak relation covering the universe body is established, strong relation convergence based on space-time mapping can be carried out aiming at any event, automatic extraction and self-adaptive growth learning of the city-level 'person-enterprise-object' multi-dimensional complex relation are achieved, real-time response of the city complex scene is supported, and the problems that the traditional relation construction method is time-consuming, labor-consuming and inaccurate are solved.

And analyzing the epidemic situation personnel track and the corresponding epidemic situation site based on a GIS geographic platform dotting technology according to the epidemic situation site.

Code extending-code scanning linkage mechanism

(1) Shen code

Step 1: click on "visitor information registry".

Step 2: if the health status of the person is 'not abnormal', a prompt message pops up, please carefully read and confirm, click the 'agree' button, and pop up the 'application register' page (prompting: click the 'disagree' button, close the prompt box, and end the register application process).

And step 3: and filling in the relevant name of the unit, the street where the unit is located and the detailed address information, and clicking a 'submit' button after the completion of the filling in to generate the visitor information register book.

The code registration book can accurately know the health information of the personnel in and out of the place at the first time, and is respectively suitable for unit registrars and visitors. The visitor information register provides electronic registration service, can replace paper registration, and after a unit register applies for a two-dimensional code of the register, a visitor can realize registration by scanning the two-dimensional code of the register, so that the risks of contact infection and personal privacy disclosure are avoided. The register applied by the claimant is an entity place unit, and is similar to a reference entity node of a knowledge graph node space. The first time "who swept me" is located.

(2) Printing register book

Step 1: click the "print register" button.

Step 2: and clicking 'saving to an album', the electronic register can be saved to the mobile phone album, and printing paper and pasting for use in a photo printing mode.

Exporting a registration record

Clicking the 'export registration record' button, opening an export code scanning record page, checking historical code scanning records within 14 days by day, or clicking the 'export' button, copying and pasting a link to a browser, downloading the code scanning records to the local in an EXCEL form, and exporting the code scanning records once within four hours.

(3) Sweep sign indicating number

Step 1: clicking 'personal information scanning code registration' and popping up a scanning page.

Step 2: the electronic register of the visited place is scanned, and the health status detail page is displayed as 'no exception', so that the registration is successful.

The code scanning can record the health state according to a time line, meanwhile, the 'who I scanned', different place units are associated at the first time, and the next child node is defined based on the physical place unit reference.

When a local confirmed case occurs, no matter a hospital heating outpatient screening or a nucleic acid detection point 10 mixed 1 screening, when a positive result occurs, a disease control center implements a control task and simultaneously starts a flow regulation and touch scheduling task at the first time according to the situation of an epidemic situation, quickly positions the personnel track based on a time line within 14 days through the identity card of the confirmed personnel and the number of a mobile phone, determines an epidemic site unit based on who I has swept, quickly starts a site unit closing and killing task, confirms who has swept I through the epidemic site unit to determine the tight sealing and secondary tight sealing conditions, quickly finds space-time cross personnel and an epidemic site based on a background high-performance redis database, and quickly positions epidemic classification 'tight sealing personnel', 'secondary tight sealing personnel', 'high risk personnel' and the epidemic site minute level. And quickly tracing the associated points of the chains with epidemic situations. Based on a GIS geographic platform dotting technology, epidemic situation personnel tracks and epidemic places are analyzed in hour-level separation. Strictly actualizing hidden spreading of the epidemic situation caused by the occurrence of the situations of misrepresentation, concealed spreading and unreported of personnel, and effectively blocking the chain of epidemic situation spreading and reducing the spreading risk to the minimum through racing with the epidemic situation spreading speed by technical means.

Has the beneficial effects that: form the incidence through people, place and three key elements of time, utilize big data label technique to study rapidly and judge out the epidemic situation personnel of different kinds, accurate discernment close receiver promotes the epidemic situation and prevents and control the precision, requires to manage and control the epidemic situation personnel rapidly through the disease accuse center.

In a second aspect, embodiments of the present application further provide an electronic device, which includes a memory for storing program instructions and a processor for executing the program instructions, wherein when the program instructions are executed by the processor, the electronic device is triggered to execute the method according to any one of the above embodiments.

In a third aspect, embodiments of the present application further provide a storage medium, where program instructions are stored, and when the storage medium is run on an electronic device, the storage medium causes the electronic device to perform the method according to any of the above embodiments.

The number of the data processors in the computer may be one or more, and optionally, the number of the memories may also be one or more, and the data processors and the memories may be connected by a bus or in other manners. The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the devices in the embodiments of the present application. The processor executes various functional applications and data processing by running non-transitory software programs, instructions and modules stored in the memory, that is, implements the tamper-proof method in any of the above-described method embodiments. The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; and necessary data, etc. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk), among others.

In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and indicates that three relationships may exist, for example, a and/or B, and may indicate that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An information processing method based on big data, characterized in that the method comprises:

acquiring target format data required by a target application program based on big data provided by a source terminal;

extracting the relation among entities, places and time from the big data, establishing corresponding triples, and constructing a knowledge graph, wherein the entities comprise at least one of people, objects or animals;

responding, by the target application, to a user access request from a front end based on the target format data;

and under the condition that a first entity suspected of carrying pathogenic microorganisms is found, based on the knowledge graph, finding an action track corresponding to the first entity, and determining a target entity in direct or indirect contact with the first entity.

2. The method of claim 1, wherein obtaining target format data required by a target application based on big data provided by a source terminal comprises:

carrying out integrated processing on big data from at least one source end;

ETL processing is carried out on the big data;

carrying out standardization processing and/or quality management on the big data after ETL processing to obtain standardized data with quality meeting a preset rule; the standardized data includes the target format data.

3. The method of claim 2, wherein collecting big data from at least one source comprises:

adopting at least one of flume, kafka and sqoop clusters to carry out data acquisition; the flash is used for distributing real-time data and storing the real-time data into a specified database, and the kafka is used for acquiring the real-time data into a calculation engine for calculation; the Sqoop is used for collecting non-real-time data and/or collecting unstructured data to an Hbase library for storage; the real-time data is data with the updating frequency larger than or equal to the updating threshold, and the non-real-time data is data with the data updating frequency smaller than the frequency updating threshold.

4. The method of claim 3, wherein the big data is subjected to ETL processing comprising:

and based on an ETL technology, extracting, converting and loading the big data.

5. The method of claim 3, wherein the big data is subjected to ETL processing comprising:

performing A-D-M-S processing on the big data; the A-D-M-S processing comprises the following steps:

system level analysis, table level analysis, field level analysis, LDM operation, PDM operation, SDM operation, and ETLJOB operation.

6. The method of claim 5,

the system level analysis is used for system research, and the content relates to system information, functions and processes needing to be warehoused in the project;

the table level analysis is used for determining a table of the source system entering the integration layer and a table of the source system entering the proximity layer;

the field set analysis is used for determining a field, a type, a unique index, a primary key and whether the field, the type, the unique index and the primary key of the near source layer table are null;

the LDM is derived from ERwein and records the name information of the Chinese characters in the table field of the model;

the PDM is a data dictionary of the integration layer and determines table field English name information of the model layer;

the SDM is a template for determining the warehousing form of the fields of the source table from a target to a source;

the ETLJOB is used for determining a loading algorithm of the integration layer task.

7. The method of claim 2, wherein the normalizing and/or quality managing the ETL processed big data comprises;

based on a data standard management framework, executing corresponding standardization processing, wherein the data standard management framework comprises a data standard definition, a data standard mapping, a data standard execution and a data standard management process;

and/or the presence of a gas in the gas,

and executing corresponding quality management based on a data quality management framework, wherein the data quality management framework comprises the steps of determining a data quality check rule, finding a data quality problem, analyzing the data quality problem, solving the data quality problem and monitoring an improvement process.

8. The method of claim 1, further comprising a scene note linkage mechanism;

generating a strong relation based on the attribute label and a weak relation based on the feature label for the city entity through automatic label learning;

and establishing an atomic-level large-scale weak relation covering a global main body, and performing strong relation convergence based on space-time mapping aiming at any event to realize automatic extraction and self-adaptive growth learning of a multi-dimensional complex relation of urban people-enterprises-things.

9. An electronic device, comprising a memory for storing program instructions and a processor for executing the program instructions, wherein the program instructions, when executed by the processor, trigger the electronic device to perform the method of any of claims 1-8.

10. A storage medium having stored therein program instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1-8.