CN112214453A - Large-scale industrial data compression storage method, system and medium - Google Patents
Large-scale industrial data compression storage method, system and medium Download PDFInfo
- Publication number
- CN112214453A CN112214453A CN202010961819.XA CN202010961819A CN112214453A CN 112214453 A CN112214453 A CN 112214453A CN 202010961819 A CN202010961819 A CN 202010961819A CN 112214453 A CN112214453 A CN 112214453A
- Authority
- CN
- China
- Prior art keywords
- data
- format
- avro
- compression
- scale industrial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/116—Details of conversion of file system types or formats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a large-scale industrial data compression storage method, a system and a medium, comprising the following steps: step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation; step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in; and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data. The invention can define the conversion chain and the compression and storage format for any type of data, and greatly improves the data processing speed and the data compression ratio of the computing platform.
Description
Technical Field
The invention relates to the technical field of data compression and storage, in particular to a large-scale industrial data compression and storage method, system and medium.
Background
With the rapid development of new infrastructure, more and more traditional industrial enterprises are beginning to increase productivity by means of internet technology, with data being the most critical. In the traditional internet, large data processing has more and more data, and many enterprises can back up 2 pieces of data. This results in wasted disks.
Patent document CN108304472A (application No. 201711455790.2) discloses a data compression storage method and a data compression storage apparatus, the data compression method including the steps of: a segmentation step, in which original data is segmented into a plurality of fields; and a compression step, based on different data contents, adopting different compression strategies to compress different fields and storing compressed data. According to the data compression storage method and the data compression storage device, different compression methods can be adopted in consideration of different data contents, the data compression efficiency can be effectively improved, and the data compression rate is obviously improved compared with the data compression tools such as the general GZIP and SNAPPY.
Disclosure of Invention
In view of the defects in the prior art, the invention aims to provide a large-scale industrial data compression storage method, a large-scale industrial data compression storage system and a large-scale industrial data compression storage medium.
The large-scale industrial data compression and storage method provided by the invention comprises the following steps:
step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
Preferably, the step 1 comprises:
step 1.1: classifying the data source according to a data format and a storage medium, wherein the data format comprises structured data and unstructured data, and the storage medium comprises Kafka and Rabbitmq;
step 1.2: and selecting a corresponding data acquisition system through a software configuration management tool, wherein Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
Preferably, said step 2 of converting the data into an Avro format comprises: the industrial data maps the Avro formatted set of database objects and generates temporary Avro formatted data.
Preferably, the industrial data mapping Avro-formatted database object set comprises the following steps:
step 2.1: defining a conversion chain by configuring a field required to be output and an input field;
step 2.2: and configuring an interceptor component of the data acquisition system, intercepting data, preloading a database object set in an Avro format during data conversion, and injecting the database object set into a header file.
Preferably, the industrial data generating the temporary Avro format data includes the following steps:
step 2.3: the data acquisition system receives industrial equipment log events, sends the industrial equipment log events to a data export assembly of the data acquisition system, converts the industrial equipment log events into records and transmits the records to ReadLine, the ReadLine extracts log lines and data pipelines, uses a regular expression for matching, and sends the records to each line of input streams, and the lines are used as character strings and put into messages to output fields;
step 2.4: and configuring a Flume interceptor, intercepting the database object set with the Avro format, and converting the generated database object set into temporary Avro format data.
Preferably, the step 3 comprises:
step 3.1: generating a JSON file of a data set partition, wherein the partition is used for storing data and processing the data based on time inquiry and an enterprise ID;
step 3.2: and defining a data set according to the uniform resource identifier and the set of the database objects, and creating or specifying the data set by the data management platform according to a create command, wherein the data set comprises a uniform resource locator of the data set, a set of specified database objects and a partition field JSON.
Preferably, the step of generating the data set partition policy JSON file includes:
step 3.1.1: specifying partition fields and types;
step 3.1.2: a partition JSON storage path is designated;
step 3.1.3: and submitting a command for generating the partition strategy JSON.
Preferably, the data set is identified by a uniform resource identifier;
and acquiring the address and the storage mode of the stored data through the uniform resource identifier.
The large-scale industrial data compression storage system provided by the invention comprises:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the method adopts the Flume as a data pipeline to connect each data source of the industrial data platform, and adopts Morphlines to reduce the time and energy required for constructing and changing the ETL flow processing application program of the data, only needs to pay attention to business logic, carries out configuration operation through configuration files, and can extract, convert and load the data into a distributed storage system such as an HDFS (Hadoop distributed file system) without writing complex code programs;
2. the problem that JSON data can not be directly converted into a request format when stored in hdfs is solved by adopting a DataSet data set, the DataSet specifies the data formats to be a column-type storage format and a snapshot compression format when the data set is created, the compression ratio of the size of snapshot compressed data reaches 30% -40%, the compression and decompression rates reach 180M/1s and 430M/1s respectively, and the landing efficiency of the data and the utilization rate of a disk are greatly improved;
3. according to the method, data of messages such as kafka and the like of an industrial data platform are docked through the flash, the data are processed and landed through the flash, are stored into a queue format and are compressed by snap, only one copy of the data is stored, the consistency of the data is guaranteed through flash, when the data are landed, the flash can perform rollback operation through a self transaction mechanism, and a code writing mode is not adopted, so that the working time of developers is greatly reduced, the working efficiency is improved, and the resource utilization rate is increased.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example (b):
referring to fig. 1, the large-scale industrial data compression and storage method provided by the invention comprises:
and (3) industrial data extraction: configuring different FlumeSource according to different data sources, and realizing configurable universal setting by performing interface operation on the FlumeSource configuration;
data temporary preload is Avro step: defining a conversion chain, configuring Schame of an Avro data format, and configuring Morpthline to temporarily convert different types of data formats into data of the Avro format;
create Dataset step: creating a data set with the sequence as a storage format in Hdfs through the Dataset, compressing data by a GPL protocol, and declaring that the final landing data is in the sequence format and a snappy compression format;
the combined operation steps are as follows: the steps are connected and operated through flash configuration, and finally, data are stored in a distributed storage system in a queue format through compression and preprocessing of a large amount of data in different formats.
The step of universal interfacing configuration FlumeSource comprises the following steps:
step A1: data stored by the industrial data processing platform is classified according to data format and storage media, including structured data and unstructured data, and the storage media include Kafka and Rabbitmq data storage media.
Step A2: through a Flume configuration management tool, a corresponding Flume resource is selected, Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
The step of temporarily preloading data into Avro comprises the following steps: the industrial data maps Schame of Avro data and generates temporary Avro format data.
The Schame step of mapping the industrial data to Avro data comprises the following steps:
step B1: by configuring the fields that need to be output and the input fields to define a transformation chain, the transformation chain can use any type of data from any type of data source.
Step B2: the flumeInterceptor is configured to intercept data before streaming to the next step, preload the AvroSchame when the data is transformed by extraction, and inject a pattern into the header file so that the AvroEventSerrializer can pick it up.
The step of generating temporary Avro-formatted data from the industrial data comprises:
step C1: the fluorine receives the industrial device log events and sends them to the fluorine morphinesink, which converts each fluorine event into a record and passes it to the readLine command through the pipe. The readLine command extracts log lines and data pipes, uses regular expression pattern matching, sends one record per line in the input stream, and the line is put into a message output field as a character string.
Step C2: and configuring a Flume interceptor, intercepting the data after the step B2, generating structured or unstructured data into temporary Avro-format data by matching with Schame of Avro, and flowing the temporary Avro-format data into a FileChannel for further processing.
The creating Dataset step includes:
step D1: a dataset partition JSON file is generated, a dataset being a collection of records, similar to a relational database table. The records are similar to the table rows, but the columns may contain not only strings or numbers, but also nested data structures, such as lists, maps and other records, create a create command to partition primarily using datasets, may define partitioning policies such as date _ time: year, date _ time: month, date _ time: day by year, month and day, and partition data _ time by month and day. The partitions define logical partitions for data storage. Time-based queries are most often used to process data. When using data after 7/14/2020, Hadoop only needs to access the data/year-2020/month-7/day-14 stored in the partition. By using partitions corresponding to the most common queries, the application may run faster, increasing data computation efficiency and commit resource utilization.
Step D2: to create a data set, at least the URI and schema are required to define the data set. The data management platform creates or specifies a data set through a create command, mainly comprising url of the data set, a specified schedule and a partition field JSON, wherein the data storage is in a partial format, the schedule is defined in the step B2, and the partition JSON is generated in the step D1. The data set is identified by the URI. The created URI tells how and where to store the data. Dataset created using URI HDFS:/user/2020/7/14/then data is finally stored/user/2020/7/14/in the directory of HDFS. The created data set finally generates a metadata folder in the Hdfs, wherein a schema and a descriptor are arranged below the folder, and the descriptor file contain a compressed format of snap, a data format of request, a data storage path and a partition field.
The large-scale industrial data compression storage system provided by the invention comprises:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
The invention realizes the following functions:
1) the problem that a large amount of codes need to be compiled and operation and maintenance deployment codes need to be solved by compiling the configuration file to define the data conversion process;
2) the data set for storing the data is created in advance, and the data is temporarily converted into the data in the avro format, so that the flow of processing the data by borrowing spark is solved, and the utilization rate of computing resources and the data processing flow are saved;
3) by presetting dataset partition fields and automatically partitioning according to field contents in data, the problems that data needs to be stored in an isolated mode among different enterprises and the subsequent data analysis and calculation efficiency are solved.
The invention carries out data access, data circulation and data storage through the configuration interface. The method has extremely high compression and storage efficiency, greatly improves the utilization rate of storage resources and computing resources, mostly adopts spark for data processing and storage in the mainstream technology of storing data in the request format, needs additional computing resources and data processing components, needs different code development and deployment aiming at different industrial data, and is complicated in maintenance and development. The calculation time of the same data sample is improved by about 5 times through the subsequent analysis and calculation of the data with the format.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A large-scale industrial data compression storage method is characterized by comprising the following steps:
step 1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
step 2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
and step 3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
2. The large-scale industrial data compression storage method according to claim 1, wherein the step 1 comprises:
step 1.1: classifying the data source according to a data format and a storage medium, wherein the data format comprises structured data and unstructured data, and the storage medium comprises Kafka and Rabbitmq;
step 1.2: and selecting a corresponding data acquisition system through a software configuration management tool, wherein Kafka corresponds to a Kafka data source selector, and Rabbitmq corresponds to a Rabbitmq data source selector.
3. The large-scale industrial data compression storage method according to claim 2, wherein the step 2 of converting the data into the Avro format comprises: the industrial data maps the Avro formatted set of database objects and generates temporary Avro formatted data.
4. The large-scale industrial data compression storage method according to claim 3, wherein the industrial data is mapped to the Avro-formatted database object set, comprising the following steps:
step 2.1: defining a conversion chain by configuring a field required to be output and an input field;
step 2.2: and configuring an interceptor component of the data acquisition system, intercepting data, preloading a database object set in an Avro format during data conversion, and injecting the database object set into a header file.
5. The large-scale industrial data compression storage method according to claim 4, wherein the industrial data generating temporary Avro format data comprises the following steps:
step 2.3: the data acquisition system receives industrial equipment log events, sends the industrial equipment log events to a data export assembly of the data acquisition system, converts the industrial equipment log events into records and transmits the records to ReadLine, the ReadLine extracts log lines and data pipelines, uses a regular expression for matching, and sends the records to each line of input streams, and the lines are used as character strings and put into messages to output fields;
step 2.4: and configuring a Flume interceptor, intercepting the database object set with the Avro format, and converting the generated database object set into temporary Avro format data.
6. The large-scale industrial data compression storage method according to claim 1, wherein the step 3 comprises:
step 3.1: generating a JSON file of a data set partition, wherein the partition is used for storing data and processing the data based on time inquiry and an enterprise ID;
step 3.2: and defining a data set according to the uniform resource identifier and the set of the database objects, and creating or specifying the data set by the data management platform according to a create command, wherein the data set comprises a uniform resource locator of the data set, a set of specified database objects and a partition field JSON.
7. The large-scale industrial data compression storage method according to claim 6, wherein the step of generating a data set partition strategy JSON file comprises:
step 3.1.1: specifying partition fields and types;
step 3.1.2: a partition JSON storage path is designated;
step 3.1.3: and submitting a command for generating the partition strategy JSON.
8. The large-scale industrial data compression storage method according to claim 6, wherein the data set is identified by a uniform resource identifier;
and acquiring the address and the storage mode of the stored data through the uniform resource identifier.
9. A large-scale industrial data compression storage system, comprising:
module M1: configuring different data acquisition systems according to the types of the data sources, and extracting data acquired by the data acquisition systems through an interface operation;
module M2: defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in;
module M3: and compressing the data in the Avro format by using a GPL protocol, wherein the compression format is snappy, creating a data set with the queue as a storage format in the distributed file system, and storing the compressed data.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010961819.XA CN112214453B (en) | 2020-09-14 | 2020-09-14 | Large-scale industrial data compression storage method, system and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010961819.XA CN112214453B (en) | 2020-09-14 | 2020-09-14 | Large-scale industrial data compression storage method, system and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112214453A true CN112214453A (en) | 2021-01-12 |
CN112214453B CN112214453B (en) | 2021-10-01 |
Family
ID=74050285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010961819.XA Active CN112214453B (en) | 2020-09-14 | 2020-09-14 | Large-scale industrial data compression storage method, system and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112214453B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507013A (en) * | 2021-02-07 | 2021-03-16 | 北京工业大数据创新中心有限公司 | Industrial equipment data storage method and device |
CN115017218A (en) * | 2022-06-17 | 2022-09-06 | 中国电信股份有限公司 | Processing method and device of distributed call chain, storage medium and electronic equipment |
CN116719866A (en) * | 2023-05-09 | 2023-09-08 | 上海银满仓数字科技有限公司 | Multi-format data self-adaptive distribution method and system |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246671A (en) * | 2012-02-09 | 2013-08-14 | 中兴通讯股份有限公司 | Processing method and device for abstract syntax notation files |
CN106294374A (en) * | 2015-05-15 | 2017-01-04 | 北京国双科技有限公司 | The method of small documents merging and data query system |
CN107784039A (en) * | 2016-08-31 | 2018-03-09 | 阿里巴巴集团控股有限公司 | A kind of data load method, apparatus and system |
US10289739B1 (en) * | 2014-05-07 | 2019-05-14 | ThinkAnalytics | System to recommend content based on trending social media topics |
US20190220532A1 (en) * | 2018-01-17 | 2019-07-18 | International Business Machines Corporation | Data processing with nullable schema information |
CN110813783A (en) * | 2019-10-29 | 2020-02-21 | 常州微亿智造科技有限公司 | Appearance intelligent detection system based on manipulator |
US10592282B2 (en) * | 2015-09-16 | 2020-03-17 | Salesforce.Com, Inc. | Providing strong ordering in multi-stage streaming processing |
CN110914818A (en) * | 2017-06-07 | 2020-03-24 | 起元技术有限责任公司 | Dataflow graph configuration |
CN111046022A (en) * | 2019-12-04 | 2020-04-21 | 山西云时代技术有限公司 | Database auditing method based on big data technology |
CN111125513A (en) * | 2019-11-22 | 2020-05-08 | 博智安全科技股份有限公司 | Recommendation system based on Spark |
CN111324688A (en) * | 2020-02-24 | 2020-06-23 | 南京莱斯网信技术研究院有限公司 | Semi-structured data and unstructured data acquisition system based on events |
CN111625616A (en) * | 2020-05-11 | 2020-09-04 | 苏州盈数智能科技有限公司 | Enterprise-level data management system capable of realizing mass storage |
-
2020
- 2020-09-14 CN CN202010961819.XA patent/CN112214453B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246671A (en) * | 2012-02-09 | 2013-08-14 | 中兴通讯股份有限公司 | Processing method and device for abstract syntax notation files |
US10289739B1 (en) * | 2014-05-07 | 2019-05-14 | ThinkAnalytics | System to recommend content based on trending social media topics |
CN106294374A (en) * | 2015-05-15 | 2017-01-04 | 北京国双科技有限公司 | The method of small documents merging and data query system |
US10592282B2 (en) * | 2015-09-16 | 2020-03-17 | Salesforce.Com, Inc. | Providing strong ordering in multi-stage streaming processing |
CN107784039A (en) * | 2016-08-31 | 2018-03-09 | 阿里巴巴集团控股有限公司 | A kind of data load method, apparatus and system |
CN110914818A (en) * | 2017-06-07 | 2020-03-24 | 起元技术有限责任公司 | Dataflow graph configuration |
US20190220532A1 (en) * | 2018-01-17 | 2019-07-18 | International Business Machines Corporation | Data processing with nullable schema information |
CN110813783A (en) * | 2019-10-29 | 2020-02-21 | 常州微亿智造科技有限公司 | Appearance intelligent detection system based on manipulator |
CN111125513A (en) * | 2019-11-22 | 2020-05-08 | 博智安全科技股份有限公司 | Recommendation system based on Spark |
CN111046022A (en) * | 2019-12-04 | 2020-04-21 | 山西云时代技术有限公司 | Database auditing method based on big data technology |
CN111324688A (en) * | 2020-02-24 | 2020-06-23 | 南京莱斯网信技术研究院有限公司 | Semi-structured data and unstructured data acquisition system based on events |
CN111625616A (en) * | 2020-05-11 | 2020-09-04 | 苏州盈数智能科技有限公司 | Enterprise-level data management system capable of realizing mass storage |
Non-Patent Citations (7)
Title |
---|
GEOFFREY C. FOX等: "HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack", 《2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING》 * |
STUBHUB: "Introduction to Morphlines", 《HTTPS://MY.OSCHINA.NET/STUBHUB/BLOG/325044》 * |
TOMSCUT: "构建大数据ETL通道--Json数据的流式转换--Avro转Parquet(二)", 《HTTPS://BLOG.CSDN.NET/QQ_29829081/ARTICLE/DETAILS/80518671》 * |
佚名: "Introduction to Datasets", 《HTTP://KITESDK.ORG/DOCS/1.0.0/INTRODUCTION-TO-DATASETS.HTML》 * |
张志军: "《大数据技术在高校中的应用研究》", 30 September 2017, 北京邮电大学出版社 * |
徐宇辉: "Flume+Morphlines实现数据的实时ETL", 《HTTPS://MP.WEIXIN.QQ.COM/S/XCSDKQO1XMQWU91LCH29UW》 * |
杨敏等: "流量大数据安全分析平台的设计与实现", 《通信学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507013A (en) * | 2021-02-07 | 2021-03-16 | 北京工业大数据创新中心有限公司 | Industrial equipment data storage method and device |
CN112507013B (en) * | 2021-02-07 | 2021-07-02 | 北京工业大数据创新中心有限公司 | Industrial equipment data storage method and device |
CN115017218A (en) * | 2022-06-17 | 2022-09-06 | 中国电信股份有限公司 | Processing method and device of distributed call chain, storage medium and electronic equipment |
CN115017218B (en) * | 2022-06-17 | 2024-01-30 | 中国电信股份有限公司 | Processing method and device of distributed call chain, storage medium and electronic equipment |
CN116719866A (en) * | 2023-05-09 | 2023-09-08 | 上海银满仓数字科技有限公司 | Multi-format data self-adaptive distribution method and system |
CN116719866B (en) * | 2023-05-09 | 2024-02-13 | 上海银满仓数字科技有限公司 | Multi-format data self-adaptive distribution method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112214453B (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10983967B2 (en) | Creation of a cumulative schema based on an inferred schema and statistics | |
CN107622103B (en) | Managing data queries | |
CN104298771B (en) | A kind of magnanimity web daily record datas inquiry and analysis method | |
CN112214453B (en) | Large-scale industrial data compression storage method, system and medium | |
Tao et al. | Minimal mapreduce algorithms | |
CN111324610A (en) | Data synchronization method and device | |
CN103593422A (en) | Virtual access management method of heterogeneous database | |
CN105144080A (en) | System for metadata management | |
Yang et al. | F1 Lightning: HTAP as a Service | |
CN102495906A (en) | Incremental data migration method capable of realizing breakpoint transmission | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
US20130290300A1 (en) | In-database parallel analytics | |
CN104572895A (en) | MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method | |
Samwel et al. | F1 query: Declarative querying at scale | |
WO2014163624A1 (en) | Query integration across databases and file systems | |
CN106528898A (en) | Method and device for converting data of non-relational database into relational database | |
Bidoit et al. | Processing XML queries and updates on map/reduce clusters | |
Sathya et al. | Application of Hadoop MapReduce technique to Virtual Database system design | |
Sethy et al. | Big data analysis using Hadoop: a survey | |
CN106708972B (en) | Method for optimizing ABAP program by utilizing SLT component based on HANA database | |
CN109165262A (en) | Fragmentation clustering system and fragmentation method of relational large table | |
CN109829003A (en) | Database backup method and device | |
CN102360382B (en) | High-speed object-based parallel storage system directory replication method | |
Sinthong et al. | AFrame: Extending DataFrames for large-scale modern data analysis (Extended Version) | |
Bobunov et al. | Development of the concept and architecture of an automated system for updating physical knowledge for information support of search design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |