Nothing Special   »   [go: up one dir, main page]

CN112214453A - Large-scale industrial data compression storage method, system and medium - Google Patents

Large-scale industrial data compression storage method, system and medium Download PDF

Info

Publication number
CN112214453A
CN112214453A CN202010961819.XA CN202010961819A CN112214453A CN 112214453 A CN112214453 A CN 112214453A CN 202010961819 A CN202010961819 A CN 202010961819A CN 112214453 A CN112214453 A CN 112214453A
Authority
CN
China
Prior art keywords
data
format
avro
compression
scale industrial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010961819.XA
Other languages
Chinese (zh)
Other versions
CN112214453B (en
Inventor
高响
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Original Assignee
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Weiyi Intelligent Manufacturing Technology Co ltd, Changzhou Weiyizhi Technology Co Ltd filed Critical Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Priority to CN202010961819.XA priority Critical patent/CN112214453B/en
Publication of CN112214453A publication Critical patent/CN112214453A/en
Application granted granted Critical
Publication of CN112214453B publication Critical patent/CN112214453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a large-scale industrial data compression storage method, a system and a medium, comprising the following steps: step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation; step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in; and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data. The invention can define the conversion chain and the compression and storage format for any type of data, and greatly improves the data processing speed and the data compression ratio of the computing platform.

Description

Large-scale industrial data compression storage method, system and medium
Technical Field
The invention relates to the technical field of data compression and storage, in particular to a large-scale industrial data compression and storage method, system and medium.
Background
With the rapid development of new infrastructure, more and more traditional industrial enterprises are beginning to increase productivity by means of internet technology, with data being the most critical. In the traditional internet, large data processing has more and more data, and many enterprises can back up 2 pieces of data. This results in wasted disks.
Patent document CN108304472A (application No. 201711455790.2) discloses a data compression storage method and a data compression storage apparatus, the data compression method including the steps of: a segmentation step, in which original data is segmented into a plurality of fields; and a compression step, based on different data contents, adopting different compression strategies to compress different fields and storing compressed data. According to the data compression storage method and the data compression storage device, different compression methods can be adopted in consideration of different data contents, the data compression efficiency can be effectively improved, and the data compression rate is obviously improved compared with the data compression tools such as the general GZIP and SNAPPY.
Disclosure of Invention
In view of the defects in the prior art, the invention aims to provide a large-scale industrial data compression storage method, a large-scale industrial data compression storage system and a large-scale industrial data compression storage medium.
The large-scale industrial data compression and storage method provided by the invention comprises the following steps:
step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
Preferably, the step 1 comprises:
step 1.1: classifying the data source according to a data format and a storage medium, wherein the data format comprises structured data and unstructured data, and the storage medium comprises Kafka and Rabbitmq;
step 1.2: and selecting a corresponding data acquisition system through a software configuration management tool, wherein Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
Preferably, said step 2 of converting the data into an Avro format comprises: the industrial data maps the Avro formatted set of database objects and generates temporary Avro formatted data.
Preferably, the industrial data mapping Avro-formatted database object set comprises the following steps:
step 2.1: defining a conversion chain by configuring a field required to be output and an input field;
step 2.2: and configuring an interceptor component of the data acquisition system, intercepting data, preloading a database object set in an Avro format during data conversion, and injecting the database object set into a header file.
Preferably, the industrial data generating the temporary Avro format data includes the following steps:
step 2.3: the data acquisition system receives industrial equipment log events, sends the industrial equipment log events to a data export assembly of the data acquisition system, converts the industrial equipment log events into records and transmits the records to ReadLine, the ReadLine extracts log lines and data pipelines, uses a regular expression for matching, and sends the records to each line of input streams, and the lines are used as character strings and put into messages to output fields;
step 2.4: and configuring a Flume interceptor, intercepting the database object set with the Avro format, and converting the generated database object set into temporary Avro format data.
Preferably, the step 3 comprises:
step 3.1: generating a JSON file of a data set partition, wherein the partition is used for storing data and processing the data based on time inquiry and an enterprise ID;
step 3.2: and defining a data set according to the uniform resource identifier and the set of the database objects, and creating or specifying the data set by the data management platform according to a create command, wherein the data set comprises a uniform resource locator of the data set, a set of specified database objects and a partition field JSON.
Preferably, the step of generating the data set partition policy JSON file includes:
step 3.1.1: specifying partition fields and types;
step 3.1.2: a partition JSON storage path is designated;
step 3.1.3: and submitting a command for generating the partition strategy JSON.
Preferably, the data set is identified by a uniform resource identifier;
and acquiring the address and the storage mode of the stored data through the uniform resource identifier.
The large-scale industrial data compression storage system provided by the invention comprises:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the method adopts the Flume as a data pipeline to connect each data source of the industrial data platform, and adopts Morphlines to reduce the time and energy required for constructing and changing the ETL flow processing application program of the data, only needs to pay attention to business logic, carries out configuration operation through configuration files, and can extract, convert and load the data into a distributed storage system such as an HDFS (Hadoop distributed file system) without writing complex code programs;
2. the problem that JSON data can not be directly converted into a request format when stored in hdfs is solved by adopting a DataSet data set, the DataSet specifies the data formats to be a column-type storage format and a snapshot compression format when the data set is created, the compression ratio of the size of snapshot compressed data reaches 30% -40%, the compression and decompression rates reach 180M/1s and 430M/1s respectively, and the landing efficiency of the data and the utilization rate of a disk are greatly improved;
3. according to the method, data of messages such as kafka and the like of an industrial data platform are docked through the flash, the data are processed and landed through the flash, are stored into a queue format and are compressed by snap, only one copy of the data is stored, the consistency of the data is guaranteed through flash, when the data are landed, the flash can perform rollback operation through a self transaction mechanism, and a code writing mode is not adopted, so that the working time of developers is greatly reduced, the working efficiency is improved, and the resource utilization rate is increased.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example (b):
referring to fig. 1, the large-scale industrial data compression and storage method provided by the invention comprises:
and (3) industrial data extraction: configuring different FlumeSource according to different data sources, and realizing configurable universal setting by performing interface operation on the FlumeSource configuration;
data temporary preload is Avro step: defining a conversion chain, configuring Schame of an Avro data format, and configuring Morpthline to temporarily convert different types of data formats into data of the Avro format;
create Dataset step: creating a data set with the sequence as a storage format in Hdfs through the Dataset, compressing data by a GPL protocol, and declaring that the final landing data is in the sequence format and a snappy compression format;
the combined operation steps are as follows: the steps are connected and operated through flash configuration, and finally, data are stored in a distributed storage system in a queue format through compression and preprocessing of a large amount of data in different formats.
The step of universal interfacing configuration FlumeSource comprises the following steps:
step A1: data stored by the industrial data processing platform is classified according to data format and storage media, including structured data and unstructured data, and the storage media include Kafka and Rabbitmq data storage media.
Step A2: through a Flume configuration management tool, a corresponding Flume resource is selected, Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
The step of temporarily preloading data into Avro comprises the following steps: the industrial data maps Schame of Avro data and generates temporary Avro format data.
The Schame step of mapping the industrial data to Avro data comprises the following steps:
step B1: by configuring the fields that need to be output and the input fields to define a transformation chain, the transformation chain can use any type of data from any type of data source.
Step B2: the flumeInterceptor is configured to intercept data before streaming to the next step, preload the AvroSchame when the data is transformed by extraction, and inject a pattern into the header file so that the AvroEventSerrializer can pick it up.
The step of generating temporary Avro-formatted data from the industrial data comprises:
step C1: the fluorine receives the industrial device log events and sends them to the fluorine morphinesink, which converts each fluorine event into a record and passes it to the readLine command through the pipe. The readLine command extracts log lines and data pipes, uses regular expression pattern matching, sends one record per line in the input stream, and the line is put into a message output field as a character string.
Step C2: and configuring a Flume interceptor, intercepting the data after the step B2, generating structured or unstructured data into temporary Avro-format data by matching with Schame of Avro, and flowing the temporary Avro-format data into a FileChannel for further processing.
The creating Dataset step includes:
step D1: a dataset partition JSON file is generated, a dataset being a collection of records, similar to a relational database table. The records are similar to the table rows, but the columns may contain not only strings or numbers, but also nested data structures, such as lists, maps and other records, create a create command to partition primarily using datasets, may define partitioning policies such as date _ time: year, date _ time: month, date _ time: day by year, month and day, and partition data _ time by month and day. The partitions define logical partitions for data storage. Time-based queries are most often used to process data. When using data after 7/14/2020, Hadoop only needs to access the data/year-2020/month-7/day-14 stored in the partition. By using partitions corresponding to the most common queries, the application may run faster, increasing data computation efficiency and commit resource utilization.
Step D2: to create a data set, at least the URI and schema are required to define the data set. The data management platform creates or specifies a data set through a create command, mainly comprising url of the data set, a specified schedule and a partition field JSON, wherein the data storage is in a partial format, the schedule is defined in the step B2, and the partition JSON is generated in the step D1. The data set is identified by the URI. The created URI tells how and where to store the data. Dataset created using URI HDFS:/user/2020/7/14/then data is finally stored/user/2020/7/14/in the directory of HDFS. The created data set finally generates a metadata folder in the Hdfs, wherein a schema and a descriptor are arranged below the folder, and the descriptor file contain a compressed format of snap, a data format of request, a data storage path and a partition field.
The large-scale industrial data compression storage system provided by the invention comprises:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
The invention realizes the following functions:
1) the problem that a large amount of codes need to be compiled and operation and maintenance deployment codes need to be solved by compiling the configuration file to define the data conversion process;
2) the data set for storing the data is created in advance, and the data is temporarily converted into the data in the avro format, so that the flow of processing the data by borrowing spark is solved, and the utilization rate of computing resources and the data processing flow are saved;
3) by presetting dataset partition fields and automatically partitioning according to field contents in data, the problems that data needs to be stored in an isolated mode among different enterprises and the subsequent data analysis and calculation efficiency are solved.
The invention carries out data access, data circulation and data storage through the configuration interface. The method has extremely high compression and storage efficiency, greatly improves the utilization rate of storage resources and computing resources, mostly adopts spark for data processing and storage in the mainstream technology of storing data in the request format, needs additional computing resources and data processing components, needs different code development and deployment aiming at different industrial data, and is complicated in maintenance and development. The calculation time of the same data sample is improved by about 5 times through the subsequent analysis and calculation of the data with the format.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A large-scale industrial data compression storage method is characterized by comprising the following steps:
step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
2. The large-scale industrial data compression storage method according to claim 1, wherein the step 1 comprises:
step 1.1: classifying the data source according to a data format and a storage medium, wherein the data format comprises structured data and unstructured data, and the storage medium comprises Kafka and Rabbitmq;
step 1.2: and selecting a corresponding data acquisition system through a software configuration management tool, wherein Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
3. The large-scale industrial data compression storage method according to claim 2, wherein the step 2 of converting the data into the Avro format comprises: the industrial data maps the Avro formatted set of database objects and generates temporary Avro formatted data.
4. The large-scale industrial data compression storage method according to claim 3, wherein the industrial data is mapped to the Avro-formatted database object set, comprising the following steps:
step 2.1: defining a conversion chain by configuring a field required to be output and an input field;
step 2.2: and configuring an interceptor component of the data acquisition system, intercepting data, preloading a database object set in an Avro format during data conversion, and injecting the database object set into a header file.
5. The large-scale industrial data compression storage method according to claim 4, wherein the industrial data generating temporary Avro format data comprises the following steps:
step 2.3: the data acquisition system receives industrial equipment log events, sends the industrial equipment log events to a data export assembly of the data acquisition system, converts the industrial equipment log events into records and transmits the records to ReadLine, the ReadLine extracts log lines and data pipelines, uses a regular expression for matching, and sends the records to each line of input streams, and the lines are used as character strings and put into messages to output fields;
step 2.4: and configuring a Flume interceptor, intercepting the database object set with the Avro format, and converting the generated database object set into temporary Avro format data.
6. The large-scale industrial data compression storage method according to claim 1, wherein the step 3 comprises:
step 3.1: generating a JSON file of a data set partition, wherein the partition is used for storing data and processing the data based on time inquiry and an enterprise ID;
step 3.2: and defining a data set according to the uniform resource identifier and the set of the database objects, and creating or specifying the data set by the data management platform according to a create command, wherein the data set comprises a uniform resource locator of the data set, a set of specified database objects and a partition field JSON.
7. The large-scale industrial data compression storage method according to claim 6, wherein the step of generating a data set partition strategy JSON file comprises:
step 3.1.1: specifying partition fields and types;
step 3.1.2: a partition JSON storage path is designated;
step 3.1.3: and submitting a command for generating the partition strategy JSON.
8. The large-scale industrial data compression storage method according to claim 6, wherein the data set is identified by a uniform resource identifier;
and acquiring the address and the storage mode of the stored data through the uniform resource identifier.
9. A large-scale industrial data compression storage system, comprising:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202010961819.XA 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium Active CN112214453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010961819.XA CN112214453B (en) 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010961819.XA CN112214453B (en) 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium

Publications (2)

Publication Number Publication Date
CN112214453A true CN112214453A (en) 2021-01-12
CN112214453B CN112214453B (en) 2021-10-01

Family

ID=74050285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010961819.XA Active CN112214453B (en) 2020-09-14 2020-09-14 Large-scale industrial data compression storage method, system and medium

Country Status (1)

Country Link
CN (1) CN112214453B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507013A (en) * 2021-02-07 2021-03-16 北京工业大数据创新中心有限公司 Industrial equipment data storage method and device
CN115017218A (en) * 2022-06-17 2022-09-06 中国电信股份有限公司 Processing method and device of distributed call chain, storage medium and electronic equipment
CN116719866A (en) * 2023-05-09 2023-09-08 上海银满仓数字科技有限公司 Multi-format data self-adaptive distribution method and system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246671A (en) * 2012-02-09 2013-08-14 中兴通讯股份有限公司 Processing method and device for abstract syntax notation files
CN106294374A (en) * 2015-05-15 2017-01-04 北京国双科技有限公司 The method of small documents merging and data query system
CN107784039A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 A kind of data load method, apparatus and system
US10289739B1 (en) * 2014-05-07 2019-05-14 ThinkAnalytics System to recommend content based on trending social media topics
US20190220532A1 (en) * 2018-01-17 2019-07-18 International Business Machines Corporation Data processing with nullable schema information
CN110813783A (en) * 2019-10-29 2020-02-21 常州微亿智造科技有限公司 Appearance intelligent detection system based on manipulator
US10592282B2 (en) * 2015-09-16 2020-03-17 Salesforce.Com, Inc. Providing strong ordering in multi-stage streaming processing
CN110914818A (en) * 2017-06-07 2020-03-24 起元技术有限责任公司 Dataflow graph configuration
CN111046022A (en) * 2019-12-04 2020-04-21 山西云时代技术有限公司 Database auditing method based on big data technology
CN111125513A (en) * 2019-11-22 2020-05-08 博智安全科技股份有限公司 Recommendation system based on Spark
CN111324688A (en) * 2020-02-24 2020-06-23 南京莱斯网信技术研究院有限公司 Semi-structured data and unstructured data acquisition system based on events
CN111625616A (en) * 2020-05-11 2020-09-04 苏州盈数智能科技有限公司 Enterprise-level data management system capable of realizing mass storage

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246671A (en) * 2012-02-09 2013-08-14 中兴通讯股份有限公司 Processing method and device for abstract syntax notation files
US10289739B1 (en) * 2014-05-07 2019-05-14 ThinkAnalytics System to recommend content based on trending social media topics
CN106294374A (en) * 2015-05-15 2017-01-04 北京国双科技有限公司 The method of small documents merging and data query system
US10592282B2 (en) * 2015-09-16 2020-03-17 Salesforce.Com, Inc. Providing strong ordering in multi-stage streaming processing
CN107784039A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 A kind of data load method, apparatus and system
CN110914818A (en) * 2017-06-07 2020-03-24 起元技术有限责任公司 Dataflow graph configuration
US20190220532A1 (en) * 2018-01-17 2019-07-18 International Business Machines Corporation Data processing with nullable schema information
CN110813783A (en) * 2019-10-29 2020-02-21 常州微亿智造科技有限公司 Appearance intelligent detection system based on manipulator
CN111125513A (en) * 2019-11-22 2020-05-08 博智安全科技股份有限公司 Recommendation system based on Spark
CN111046022A (en) * 2019-12-04 2020-04-21 山西云时代技术有限公司 Database auditing method based on big data technology
CN111324688A (en) * 2020-02-24 2020-06-23 南京莱斯网信技术研究院有限公司 Semi-structured data and unstructured data acquisition system based on events
CN111625616A (en) * 2020-05-11 2020-09-04 苏州盈数智能科技有限公司 Enterprise-level data management system capable of realizing mass storage

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
GEOFFREY C. FOX等: "HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack", 《2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING》 *
STUBHUB: "Introduction to Morphlines", 《HTTPS://MY.OSCHINA.NET/STUBHUB/BLOG/325044》 *
TOMSCUT: "构建大数据ETL通道--Json数据的流式转换--Avro转Parquet(二)", 《HTTPS://BLOG.CSDN.NET/QQ_29829081/ARTICLE/DETAILS/80518671》 *
佚名: "Introduction to Datasets", 《HTTP://KITESDK.ORG/DOCS/1.0.0/INTRODUCTION-TO-DATASETS.HTML》 *
张志军: "《大数据技术在高校中的应用研究》", 30 September 2017, 北京邮电大学出版社 *
徐宇辉: "Flume+Morphlines实现数据的实时ETL", 《HTTPS://MP.WEIXIN.QQ.COM/S/XCSDKQO1XMQWU91LCH29UW》 *
杨敏等: "流量大数据安全分析平台的设计与实现", 《通信学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507013A (en) * 2021-02-07 2021-03-16 北京工业大数据创新中心有限公司 Industrial equipment data storage method and device
CN112507013B (en) * 2021-02-07 2021-07-02 北京工业大数据创新中心有限公司 Industrial equipment data storage method and device
CN115017218A (en) * 2022-06-17 2022-09-06 中国电信股份有限公司 Processing method and device of distributed call chain, storage medium and electronic equipment
CN115017218B (en) * 2022-06-17 2024-01-30 中国电信股份有限公司 Processing method and device of distributed call chain, storage medium and electronic equipment
CN116719866A (en) * 2023-05-09 2023-09-08 上海银满仓数字科技有限公司 Multi-format data self-adaptive distribution method and system
CN116719866B (en) * 2023-05-09 2024-02-13 上海银满仓数字科技有限公司 Multi-format data self-adaptive distribution method and system

Also Published As

Publication number Publication date
CN112214453B (en) 2021-10-01

Similar Documents

Publication Publication Date Title
US10983967B2 (en) Creation of a cumulative schema based on an inferred schema and statistics
CN107622103B (en) Managing data queries
CN104298771B (en) A kind of magnanimity web daily record datas inquiry and analysis method
CN112214453B (en) Large-scale industrial data compression storage method, system and medium
Tao et al. Minimal mapreduce algorithms
CN111324610A (en) Data synchronization method and device
CN103593422A (en) Virtual access management method of heterogeneous database
CN105144080A (en) System for metadata management
Yang et al. F1 Lightning: HTAP as a Service
CN102495906A (en) Incremental data migration method capable of realizing breakpoint transmission
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
US20130290300A1 (en) In-database parallel analytics
CN104572895A (en) MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method
Samwel et al. F1 query: Declarative querying at scale
WO2014163624A1 (en) Query integration across databases and file systems
CN106528898A (en) Method and device for converting data of non-relational database into relational database
Bidoit et al. Processing XML queries and updates on map/reduce clusters
Sathya et al. Application of Hadoop MapReduce technique to Virtual Database system design
Sethy et al. Big data analysis using Hadoop: a survey
CN106708972B (en) Method for optimizing ABAP program by utilizing SLT component based on HANA database
CN109165262A (en) Fragmentation clustering system and fragmentation method of relational large table
CN109829003A (en) Database backup method and device
CN102360382B (en) High-speed object-based parallel storage system directory replication method
Sinthong et al. AFrame: Extending DataFrames for large-scale modern data analysis (Extended Version)
Bobunov et al. Development of the concept and architecture of an automated system for updating physical knowledge for information support of search design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant