Nothing Special   »   [go: up one dir, main page]

CN115357656A - Information processing method and device based on big data and storage medium - Google Patents

Information processing method and device based on big data and storage medium Download PDF

Info

Publication number
CN115357656A
CN115357656A CN202211298771.4A CN202211298771A CN115357656A CN 115357656 A CN115357656 A CN 115357656A CN 202211298771 A CN202211298771 A CN 202211298771A CN 115357656 A CN115357656 A CN 115357656A
Authority
CN
China
Prior art keywords
data
big data
entity
processing
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211298771.4A
Other languages
Chinese (zh)
Inventor
李慧
李以斌
陈伟
李国良
裴洪岩
贾丹丹
张继影
邵海金
李桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiji Computer Corp Ltd
Original Assignee
Taiji Computer Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiji Computer Corp Ltd filed Critical Taiji Computer Corp Ltd
Priority to CN202211298771.4A priority Critical patent/CN115357656A/en
Publication of CN115357656A publication Critical patent/CN115357656A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides an information processing method, equipment and a storage medium based on big data, which belong to the technical field of data information processing, and the method comprises the following steps: acquiring target format data required by a target application program based on big data provided by a source terminal; extracting the relation among entities, places and time from the big data, establishing corresponding triples and constructing a knowledge graph; responding, by the target application, to a user access request from the front end based on the target format data; and under the condition that a first entity suspected to carry pathogenic microorganisms is found, searching an action track corresponding to the first entity based on the knowledge graph, and determining a target entity. Entity construction is carried out on the basis of the triple atlas of the relation among people, places and time, association nodes between nodes and the next node are quickly constructed, epidemic situation contact links are accurately searched, quick investigation is carried out, data are stored in a cache, operation efficiency is improved, and personnel and place information can be quickly presented according to the second level of a time line.

Description

Information processing method and device based on big data and storage medium
Technical Field
The present application relates to the field of data information processing technologies, and in particular, to an information processing method and apparatus based on big data, and a storage medium.
Background
Some infectious pathogenic microorganisms, such as new corona viruses (novel coronavirus), have exponentially increased propagation speed in people, and under the condition that entities (human beings, objects, animals and the like) suspected to carry the pathogenic microorganisms are found, other target entities which have excessive contact with the entities and further have infection risks are quickly found, so that the method has very important significance for blocking propagation and preventing further diffusion of the pathogenic microorganisms such as the new corona viruses in people.
In the checking method in the prior art, the checking efficiency needs to be further improved.
Disclosure of Invention
Embodiments of the present application provide an information processing method, device and storage medium based on big data to solve the above problems.
In a first aspect, an embodiment of the present application provides an information processing method based on big data, where the method includes:
acquiring target format data required by a target application program based on big data provided by a source terminal; extracting the relation among entities, places and time from the big data, establishing corresponding triples, and constructing a knowledge graph, wherein the entities comprise at least one of people, objects or animals; responding, by the target application, to a user access request from a front end based on the target format data; after a first entity suspected to carry pathogenic microorganisms is found, based on the knowledge graph, an action track corresponding to the first entity is searched, and a target entity in direct or indirect contact with the first entity is determined.
As a possible implementation manner, acquiring target format data required by a target application based on big data provided by a source terminal includes: carrying out integrated processing on big data from at least one source end; ETL processing is carried out on the big data; carrying out standardization processing and/or quality management on the big data after ETL processing to obtain standardized data with quality meeting a preset rule; the standardized data includes the target format data.
As a possible implementation manner, the collecting big data from at least one source end includes: adopting at least one of flume, kafka and sqoop clusters to carry out data acquisition; the flash is used for distributing real-time data and storing the real-time data into a specified database, and the kafka is used for acquiring the real-time data into a calculation engine for calculation; the Sqoop is used for collecting non-real-time data and/or collecting unstructured data to an Hbase library for storage; the real-time data is data with the updating frequency larger than or equal to the updating threshold, and the non-real-time data is data with the data updating frequency smaller than the frequency updating threshold.
As a possible implementation manner, the big data is subjected to ETL (Extract Transform Load) processing, which includes: and based on an ETL technology, extracting, converting and loading the big data.
As a possible implementation manner, the performing ETL processing on the big data includes: performing A-D-M-S processing on the big data; the A-D-M-S processing comprises the following steps: system level analysis, table level analysis, field level analysis, LDM (Local Data Manager) operation, PDM (Product Data Management) operation, SDM (Semantic Database Model) operation, and ETL JOB operation.
As a possible implementation, the system level analysis is used for system research, and the content relates to system information, functions and processes required to be warehoused in the project; the table level analysis is used for determining a table of the source system entering the integration layer and a table of the source system entering the proximity layer; the field set analysis is used for determining the field, the type, the unique index, the primary key and whether the field set is null or not of the near source layer table; the LDM is derived from ERwein, and records the Chinese name information in the table field of the model; the PDM is a data dictionary of the integration layer and determines table field English name information of the model layer; the SDM is a template for determining the warehousing form of the field of the source table from the target to the source; the ETLJOB is used for determining a loading algorithm of the integration layer task.
As a possible implementation manner, the normalizing and/or quality management is performed on the large data after ETL processing, including; based on a data standard management framework, executing corresponding standardization processing, wherein the data standard management framework comprises a data standard definition, a data standard mapping, a data standard execution and a data standard management process; and/or performing corresponding quality management based on a data quality management framework, wherein the data quality management framework comprises the steps of determining data quality check rules, finding data quality problems, analyzing the data quality problems, solving the data quality problems and monitoring an improvement process.
As a possible implementation manner, the tracing method further includes: a scene note linkage mechanism; generating a strong relation based on the attribute label and a weak relation based on the feature label for the city entity through automatic label learning; and establishing an atomic-level large-scale weak relation covering a global main body, and performing strong relation convergence based on space-time mapping aiming at any event to realize automatic extraction and self-adaptive growth learning of a multi-dimensional complex relation of urban people-enterprises-things.
In a second aspect, embodiments of the present application further provide an electronic device, which includes a memory for storing program instructions and a processor for executing the program instructions, wherein when the program instructions are executed by the processor, the electronic device is triggered to perform the method according to any one of the first aspect.
In a third aspect, embodiments of the present application further provide a storage medium, which stores program instructions, and when the storage medium is run on an electronic device, the storage medium causes the electronic device to perform the method according to any one of the above first aspects.
According to the big data-based information processing method in the embodiment of the application, entity construction is carried out on the basis of the triple atlas of the relation among people, places and time, the associated nodes between the nodes and the next node are quickly constructed, epidemic situation contact links are accurately searched, quick investigation is carried out, data are stored in a cache, the operation efficiency is improved, and people and place information can be quickly presented according to the second level of a timeline.
Drawings
Fig. 1 is a flowchart of an information processing method based on big data according to an embodiment of the present application;
FIG. 2 is a flow chart of data integration provided by an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a module division of a data integration platform according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a process for solving a data quality problem according to an embodiment of the present application;
fig. 5 is a schematic diagram of an a-D-M-S workflow provided in an embodiment of the present application.
Detailed Description
The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms "comprises" and "comprising," and any variations thereof, in the description of the embodiments and claims and the drawings herein, are intended to cover a non-exclusive inclusion, such as, for example, a list of steps or elements.
The technical solution of the present application is further described in detail below with reference to the accompanying drawings and embodiments.
At present, the health treasure serves as a powerful and effective scientific and technological tool for rehabilitation and production recovery, plays a role in normalizing epidemic situation prevention and control, the health treasure is required to be opened to perform the code scanning and registering function of own information when entering each place, personal tracks are fused and stored in real time, automatic cleaning, fusion and classification and labeling processing are continuously performed on data through large data platform fusion, machine learning and continuous evolution of class knowledge map algorithm models, and data entity association relation is excavated.
The data processing comprises data cleaning, data association, data comparison, data identification and data desensitization processing.
The data cleaning is to detect the error and inconsistency of various data, detect and eliminate data abnormity, detect and eliminate approximate repeated records and improve the quality of data.
The data association is the ability of providing data and other business data and data model association according to the association rule, and outputting the association information.
The data comparison refers to comparing structured or unstructured information according to rules, and for data of a hit rule, outputting according to output description is supported.
The data identification provides the capability of expanding identification to data attributes according to rules and a knowledge base, and generates various self-defined data labels, such as basic attribute labels and service labels.
Data desensitization is to ensure that sensitive information in the data is not leaked, preventing misuse of the data. E.g., replacing some fields with similar characters; and shielding characters and replacing characters.
The whole process is similar to a knowledge graph node space, the node space is a network formed by a plurality of nodes, relations and attributes, the relations among node data are described through a knowledge graph, for example, people, vehicles, places and the like are data nodes, the data nodes classify and fuse hundreds of millions of different channel data, each type of data has own node space, and nodes formed by various entity data relations and attributes are arranged in the data nodes. Searching other related nodes from any node through respective attributes and relations of the nodes, spreading the nodes into a huge relation graph, searching a direct relation node and a secondary relation node which are related to a reference node by taking one of the nodes as the reference node, and extending layer by layer.
The algorithm model is gradually strengthened through deep learning of mass data, a large amount of complex, interconnected and low-structured data can be rapidly processed, and for the aspect of rapid investigation and traceability, the whole algorithm model is based on a person-place-time node network, people (identity information and mobile phone numbers) are used as nodes, and the entering place space information is sorted according to a time line. With places (physical locations) as nodes, information of people entering the places is sorted according to a time line. Based on the mode of 'I scan people and people scan me', continuously associating the corresponding next node, quickly and accurately checking tight-lock personnel and tracing.
As shown in fig. 1 and fig. 2, in the big data-based information processing method provided in the embodiment of the present application, acquiring basic service data from each epidemic prevention organization and channel includes:
the Beijing health treasure basic business data come from various epidemic prevention organizations and channels in China, and relate to various Weijian Commission, communities, public security, civil aviation, railways, roads and communication yards.
Performing data integration processing on the basic service data to obtain integrated data;
the big data governance platform establishes a complete data calculation model according to a data governance process and a standard specification, the data calculation model can uniformly clean, convert and label basic service data of each channel, and through quality detection model verification, high-quality service data such as personnel health states, nucleic acid results and vaccination are provided for Beijing health treasures, accurate epidemic situation prevention and control are achieved, and safety of personal data is guaranteed through data desensitization.
As shown in fig. 3, the data collection includes internal data and external data, where the internal data is divided into real-time data and non-real-time data according to real-time performance, and is divided into structured data and unstructured data according to technical types, the real-time data is track-type and communication-type data, and the non-real-time data is data whose data update frequency is less than a frequency update threshold.
The real-time data is structured data, and the unstructured data mainly refers to documents and pictures.
The data acquisition tool comprises: the system comprises a flash cluster, a kafka cluster and an sqoop cluster, wherein the flash cluster distributes real-time data and stores the real-time data into a basic library, and the kafka cluster is used for acquiring the real-time data into a real-time calculation engine for calculation; the Sqoop collects non-real time data.
Unstructured data are directly collected into an Hbase library through an Sqoop cluster for storage, the unstructured data are mainly stored into the HBASE, and part of scattered small files are directly stored into the HDFS.
And the data exchange platform exchanges data, stores external data into an external shared library of the private network, and integrates the data through an internal data integration method.
The data processing process is to extract data from the data exchange platform;
and (3) performing ex-library conversion according to the data model by adopting an ETL tool, loading the converted data into a database, and performing conversion operation of the data model by using SQL (structured query language) of the database.
The data loading process is that the data files provided by the source data system are cleaned and converted and then loaded into the temporary data area; and setting and processing the data external key in the temporary data area again, and realizing data loading from the data area to the unified model layer.
ETL scheduling: ETL scheduling needs to complete the dependency relationship of the whole system, and manual intervention is not needed in the conversion process;
error and exception handling: an error and exception handling mechanism of the ETL system is provided, and the reliability of the system is enhanced;
the reusability of ETL operation is improved and the maintenance difficulty of ETL codes is reduced in a mode of extracting a public module.
The ETL scheduling process refers to scheduling ETL operation according to the relationship between ETL tasks, so that the scheduling process is executed according to a set sequence.
The ETL inter-task relationships include the following relationships in Table 1:
table 1:
relationships between Description of the invention
Triggering Relationships between The task trigger relationship refers to triggering the execution of another task after the successful execution of one task is completed in the current day. Then a is called trigger B, a being the upstream task and B being the downstream task.
Depend on Relationships between Task dependencies refer to dependencies between tasks, i.e., a task must wait for another task to execute successfully before it can execute. If task A must wait for task B, then say A depends on B, B Is a dependent task of a and the current date of task B is greater than or equal to the current date of a.
Correlation Relationships between If the task A is the related task of the task B, the task A is to firstly check whether the last day task execution state of the data calendar of the task A relative to the data date is the last day task execution state when the task B is executed on the data date To be successful, B can be executed.
As shown in FIG. 5, the A-D-M-S workflow:
the system level analysis is mainly used for system research, and the content relates to system information, functions and processes needing to be warehoused in the project.
Table level analysis is used to determine the table of the source system entering the integration layer, the table entering the proximity layer.
The field set analysis is used to determine the fields, type, unique index, primary key, and whether the near source layer table is empty.
The LDM is generally derived from ERwein and records information such as the names of characters in the table fields of the model.
The PDM is a data dictionary of the integration layer and determines information such as table field English name of the model layer.
SDM is a template that determines the binning form of the source table fields, from target to source, and is a key step in generating ETL.
ETLJOB loading algorithm for determining integration layer tasks
The A-D-M-S model reduces the development period of the ETL script in the project and provides great convenience for developers.
The data model collects the data of each internal business line and the external data of each social business, investigates the basic data and analyzes the relation between various data contained in the business and the data; and (4) refining the data theme, and designing a conceptual data model, a logical data model and a physical data model.
A general business language of a project is formed, a consistent platform is provided for communication and understanding of data meanings among cross-line, cross-business and cross-development teams, and a data model relates to a data modeling process and comprises 6 main stages of early preparation, business investigation, information investigation, concept model construction, logic model design and physical model design. The data model is the basis for building a primary resource library, a secondary resource library and a tertiary resource, and really realizes the label application, efficient comprehensive analysis and high-performance service of the data.
And (3) data processing standardization: formulating a data cleaning and converting rule, a data extraction association rule, a data comparison identification rule and a data integration rule, formulating a data processing flow according to the data resource type (including structured, unstructured and semi-structured), and realizing the standardization of the data processing flow. And (3) data standardization statistics: and multi-dimensional statistics of data standardization conditions in the platform are supported, such as resource standardization rate and standardization level.
The data quality management realizes the landing of a data quality control mechanism, realizes the quality control of a data platform, and realizes the definition, the setting, the operation and the discovery of a data quality detection rule; meanwhile, recording the solving process of the data quality problem; and generating a data quality treatment report.
As shown in fig. 4, the data quality management framework includes defining data quality check rules, discovering data quality problems, analyzing data quality problems, resolving data quality problems, and monitoring improvement processes.
Storing the integration data into an integration database as follows:
the Beijing Jiankangbao big data support platform adopts a Restful style with a unified micro-service technical architecture to realize agile development and deployment, and a support platform application program can adjust service capability through measures of service fusing, current limiting and degradation, thereby completely meeting the high concurrency processing requirement of a front end. The Redis cluster is used as basic service data cache storage, and performance guarantee is provided for highly concurrent access of Beijing health treasure user groups.
The high-throughput distributed Kafka cluster is used for peak clipping and decoupling of the support platform application program, the support platform stores the front-end access log in an asynchronous processing mode through the message queue, and response efficiency of front-end access is improved. The data storage adopts a distributed cluster database for storage, and a sub-database and sub-table master-slave mode is used for realizing read-write separation, so that the throughput of data storage and retrieval is improved, and the data safety is guaranteed.
Searching space-time cross personnel and epidemic places in the integrated database according to the identity information of the confirmed personnel;
the method for adopting the scene-note linkage mechanism comprises the following steps:
aiming at the scene linkage problem facing the complex relation, an event-driven complex scene self-adaptive large-scale dynamic relation learning technology is provided. The strong relation based on the attribute label and the weak relation based on the characteristic label are generated for the city entity through automatic labeling learning, the atomic-level large-scale weak relation covering the universe body is established, strong relation convergence based on space-time mapping can be carried out aiming at any event, automatic extraction and self-adaptive growth learning of the city-level 'person-enterprise-object' multi-dimensional complex relation are achieved, real-time response of the city complex scene is supported, and the problems that the traditional relation construction method is time-consuming, labor-consuming and inaccurate are solved.
And analyzing the epidemic situation personnel track and the corresponding epidemic situation site based on a GIS geographic platform dotting technology according to the epidemic situation site.
Code extending-code scanning linkage mechanism
(1) Shen code
Step 1: click on "visitor information registry".
Step 2: if the health status of the person is 'not abnormal', a prompt message pops up, please carefully read and confirm, click the 'agree' button, and pop up the 'application register' page (prompting: click the 'disagree' button, close the prompt box, and end the register application process).
And step 3: and filling in the relevant name of the unit, the street where the unit is located and the detailed address information, and clicking a 'submit' button after the completion of the filling in to generate the visitor information register book.
The code registration book can accurately know the health information of the personnel in and out of the place at the first time, and is respectively suitable for unit registrars and visitors. The visitor information register provides electronic registration service, can replace paper registration, and after a unit register applies for a two-dimensional code of the register, a visitor can realize registration by scanning the two-dimensional code of the register, so that the risks of contact infection and personal privacy disclosure are avoided. The register applied by the claimant is an entity place unit, and is similar to a reference entity node of a knowledge graph node space. The first time "who swept me" is located.
(2) Printing register book
Step 1: click the "print register" button.
Step 2: and clicking 'saving to an album', the electronic register can be saved to the mobile phone album, and printing paper and pasting for use in a photo printing mode.
Exporting a registration record
Clicking the 'export registration record' button, opening an export code scanning record page, checking historical code scanning records within 14 days by day, or clicking the 'export' button, copying and pasting a link to a browser, downloading the code scanning records to the local in an EXCEL form, and exporting the code scanning records once within four hours.
(3) Sweep sign indicating number
Step 1: clicking 'personal information scanning code registration' and popping up a scanning page.
Step 2: the electronic register of the visited place is scanned, and the health status detail page is displayed as 'no exception', so that the registration is successful.
The code scanning can record the health state according to a time line, meanwhile, the 'who I scanned', different place units are associated at the first time, and the next child node is defined based on the physical place unit reference.
When a local confirmed case occurs, no matter a hospital heating outpatient screening or a nucleic acid detection point 10 mixed 1 screening, when a positive result occurs, a disease control center implements a control task and simultaneously starts a flow regulation and touch scheduling task at the first time according to the situation of an epidemic situation, quickly positions the personnel track based on a time line within 14 days through the identity card of the confirmed personnel and the number of a mobile phone, determines an epidemic site unit based on who I has swept, quickly starts a site unit closing and killing task, confirms who has swept I through the epidemic site unit to determine the tight sealing and secondary tight sealing conditions, quickly finds space-time cross personnel and an epidemic site based on a background high-performance redis database, and quickly positions epidemic classification 'tight sealing personnel', 'secondary tight sealing personnel', 'high risk personnel' and the epidemic site minute level. And quickly tracing the associated points of the chains with epidemic situations. Based on a GIS geographic platform dotting technology, epidemic situation personnel tracks and epidemic places are analyzed in hour-level separation. Strictly actualizing hidden spreading of the epidemic situation caused by the occurrence of the situations of misrepresentation, concealed spreading and unreported of personnel, and effectively blocking the chain of epidemic situation spreading and reducing the spreading risk to the minimum through racing with the epidemic situation spreading speed by technical means.
Has the beneficial effects that: form the incidence through people, place and three key elements of time, utilize big data label technique to study rapidly and judge out the epidemic situation personnel of different kinds, accurate discernment close receiver promotes the epidemic situation and prevents and control the precision, requires to manage and control the epidemic situation personnel rapidly through the disease accuse center.
In a second aspect, embodiments of the present application further provide an electronic device, which includes a memory for storing program instructions and a processor for executing the program instructions, wherein when the program instructions are executed by the processor, the electronic device is triggered to execute the method according to any one of the above embodiments.
In a third aspect, embodiments of the present application further provide a storage medium, where program instructions are stored, and when the storage medium is run on an electronic device, the storage medium causes the electronic device to perform the method according to any of the above embodiments.
The number of the data processors in the computer may be one or more, and optionally, the number of the memories may also be one or more, and the data processors and the memories may be connected by a bus or in other manners. The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the devices in the embodiments of the present application. The processor executes various functional applications and data processing by running non-transitory software programs, instructions and modules stored in the memory, that is, implements the tamper-proof method in any of the above-described method embodiments. The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; and necessary data, etc. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk), among others.
In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and indicates that three relationships may exist, for example, a and/or B, and may indicate that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. An information processing method based on big data, characterized in that the method comprises:
acquiring target format data required by a target application program based on big data provided by a source terminal;
extracting the relation among entities, places and time from the big data, establishing corresponding triples, and constructing a knowledge graph, wherein the entities comprise at least one of people, objects or animals;
responding, by the target application, to a user access request from a front end based on the target format data;
and under the condition that a first entity suspected of carrying pathogenic microorganisms is found, based on the knowledge graph, finding an action track corresponding to the first entity, and determining a target entity in direct or indirect contact with the first entity.
2. The method of claim 1, wherein obtaining target format data required by a target application based on big data provided by a source terminal comprises:
carrying out integrated processing on big data from at least one source end;
ETL processing is carried out on the big data;
carrying out standardization processing and/or quality management on the big data after ETL processing to obtain standardized data with quality meeting a preset rule; the standardized data includes the target format data.
3. The method of claim 2, wherein collecting big data from at least one source comprises:
adopting at least one of flume, kafka and sqoop clusters to carry out data acquisition; the flash is used for distributing real-time data and storing the real-time data into a specified database, and the kafka is used for acquiring the real-time data into a calculation engine for calculation; the Sqoop is used for collecting non-real-time data and/or collecting unstructured data to an Hbase library for storage; the real-time data is data with the updating frequency larger than or equal to the updating threshold, and the non-real-time data is data with the data updating frequency smaller than the frequency updating threshold.
4. The method of claim 3, wherein the big data is subjected to ETL processing comprising:
and based on an ETL technology, extracting, converting and loading the big data.
5. The method of claim 3, wherein the big data is subjected to ETL processing comprising:
performing A-D-M-S processing on the big data; the A-D-M-S processing comprises the following steps:
system level analysis, table level analysis, field level analysis, LDM operation, PDM operation, SDM operation, and ETLJOB operation.
6. The method of claim 5,
the system level analysis is used for system research, and the content relates to system information, functions and processes needing to be warehoused in the project;
the table level analysis is used for determining a table of the source system entering the integration layer and a table of the source system entering the proximity layer;
the field set analysis is used for determining a field, a type, a unique index, a primary key and whether the field, the type, the unique index and the primary key of the near source layer table are null;
the LDM is derived from ERwein and records the name information of the Chinese characters in the table field of the model;
the PDM is a data dictionary of the integration layer and determines table field English name information of the model layer;
the SDM is a template for determining the warehousing form of the fields of the source table from a target to a source;
the ETLJOB is used for determining a loading algorithm of the integration layer task.
7. The method of claim 2, wherein the normalizing and/or quality managing the ETL processed big data comprises;
based on a data standard management framework, executing corresponding standardization processing, wherein the data standard management framework comprises a data standard definition, a data standard mapping, a data standard execution and a data standard management process;
and/or the presence of a gas in the gas,
and executing corresponding quality management based on a data quality management framework, wherein the data quality management framework comprises the steps of determining a data quality check rule, finding a data quality problem, analyzing the data quality problem, solving the data quality problem and monitoring an improvement process.
8. The method of claim 1, further comprising a scene note linkage mechanism;
generating a strong relation based on the attribute label and a weak relation based on the feature label for the city entity through automatic label learning;
and establishing an atomic-level large-scale weak relation covering a global main body, and performing strong relation convergence based on space-time mapping aiming at any event to realize automatic extraction and self-adaptive growth learning of a multi-dimensional complex relation of urban people-enterprises-things.
9. An electronic device, comprising a memory for storing program instructions and a processor for executing the program instructions, wherein the program instructions, when executed by the processor, trigger the electronic device to perform the method of any of claims 1-8.
10. A storage medium having stored therein program instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1-8.
CN202211298771.4A 2022-10-24 2022-10-24 Information processing method and device based on big data and storage medium Pending CN115357656A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211298771.4A CN115357656A (en) 2022-10-24 2022-10-24 Information processing method and device based on big data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211298771.4A CN115357656A (en) 2022-10-24 2022-10-24 Information processing method and device based on big data and storage medium

Publications (1)

Publication Number Publication Date
CN115357656A true CN115357656A (en) 2022-11-18

Family

ID=84008515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211298771.4A Pending CN115357656A (en) 2022-10-24 2022-10-24 Information processing method and device based on big data and storage medium

Country Status (1)

Country Link
CN (1) CN115357656A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117114843A (en) * 2023-10-25 2023-11-24 浙江农商数字科技有限责任公司 Bank data quality control method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836235A (en) * 2021-09-29 2021-12-24 平安医疗健康管理股份有限公司 Data processing method based on data center and related equipment thereof
CN113934729A (en) * 2021-10-20 2022-01-14 平安国际智慧城市科技股份有限公司 Data management method based on knowledge graph, related equipment and medium
WO2022068745A1 (en) * 2020-09-30 2022-04-07 华为技术有限公司 Data processing method and device
CN114550947A (en) * 2022-01-18 2022-05-27 西北工业大学 Infectious disease prevention and control processing method and system
CN114817368A (en) * 2022-04-29 2022-07-29 深圳市东晟数据有限公司 Epidemic situation flow adjustment screening method based on big data and graph calculation
CN115171912A (en) * 2022-07-05 2022-10-11 山东浪潮智慧医疗科技有限公司 Epidemic propagation chain analysis method and device and computer medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022068745A1 (en) * 2020-09-30 2022-04-07 华为技术有限公司 Data processing method and device
CN113836235A (en) * 2021-09-29 2021-12-24 平安医疗健康管理股份有限公司 Data processing method based on data center and related equipment thereof
CN113934729A (en) * 2021-10-20 2022-01-14 平安国际智慧城市科技股份有限公司 Data management method based on knowledge graph, related equipment and medium
CN114550947A (en) * 2022-01-18 2022-05-27 西北工业大学 Infectious disease prevention and control processing method and system
CN114817368A (en) * 2022-04-29 2022-07-29 深圳市东晟数据有限公司 Epidemic situation flow adjustment screening method based on big data and graph calculation
CN115171912A (en) * 2022-07-05 2022-10-11 山东浪潮智慧医疗科技有限公司 Epidemic propagation chain analysis method and device and computer medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯鑫等: "基于知识实体的突发公共卫生事件数据平台构建研究", 《知识管理论坛》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117114843A (en) * 2023-10-25 2023-11-24 浙江农商数字科技有限责任公司 Bank data quality control method
CN117114843B (en) * 2023-10-25 2024-02-23 浙江农商数字科技有限责任公司 Bank data quality control method

Similar Documents

Publication Publication Date Title
US20230409609A1 (en) Data relationships storage platform
EP4195112A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
US20190155924A1 (en) Identification of domain information for use in machine learning models
Dumas et al. Fast detection of exact clones in business process model repositories
CN105556552A (en) Fraud detection and analysis
KR20220064016A (en) Method for extracting construction safety accident based data mining using big data
CN106528828A (en) Multi-dimensional checking rule-based data quality detection method
Boufares et al. Heterogeneous data-integration and data quality: Overview of conflicts
Darrab et al. Modern applications and challenges for rare itemset mining
Homayouni et al. Data warehouse testing
CN115357656A (en) Information processing method and device based on big data and storage medium
Wrembel Data integration, cleaning, and deduplication: Research versus industrial projects
Wu et al. Event evolution model based on random walk model with hot topic extraction
CN111209750A (en) Internet of vehicles threat intelligence modeling method, device and readable storage medium
Ahmad et al. Fiviz: forensics investigation through visualization for malware in internet of things
Jabeen et al. Divided we stand out! Forging Cohorts fOr Numeric Outlier Detection in large scale knowledge graphs (CONOD)
CN116260866A (en) Government information pushing method and device based on machine learning and computer equipment
CN115204393A (en) Smart city knowledge ontology base construction method and device based on knowledge graph
Shao et al. An improved approach to the recovery of traceability links between requirement documents and source codes based on latent semantic indexing
Eyal-Salman et al. Identifying traceability links between product variants and their features
Jayaweera et al. Detect anomalies in cloud platforms by using network data: a review
Christopher et al. SCHEMADB: Structures in relational datasets
Nevin et al. The non-linear impact of data handling on network diffusion models
Zhang et al. A new semantic annotation approach for software vulnerability source code
CN114254081B (en) Enterprise big data search system, method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221118

RJ01 Rejection of invention patent application after publication