CN109033196A - A kind of distributed data scheduling system and method - Google Patents
A kind of distributed data scheduling system and method Download PDFInfo
- Publication number
- CN109033196A CN109033196A CN201810689485.8A CN201810689485A CN109033196A CN 109033196 A CN109033196 A CN 109033196A CN 201810689485 A CN201810689485 A CN 201810689485A CN 109033196 A CN109033196 A CN 109033196A
- Authority
- CN
- China
- Prior art keywords
- log
- data
- logs
- sub
- distributed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Debugging And Monitoring (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of distributed datas to dispatch system and method, which includes at least one scheduling component, and the scheduling component is divided into multiple sub- logs suitable for obtaining offline logs to be processed from file system, and by the offline logs to be processed;The scheduling component is further adapted for for the multiple sub- log being distributed to multiple data mining units, and excavates log metadata from the sub- log according to preset rules by the data mining unit;The scheduling component, be further adapted for storing the log metadata and other procedural informations to include multiple initialized data bases storage assembly in.Solve the problems, such as in the prior art as central equipment focus on task and caused by single-point, when mission failure or execute equipment and break down, continued to execute by other equipment node, realizing the automatic multimachine of mission failure retries, guarantee task in time, correct operation.Alert notice can also be carried out to user by the system by occurring other problems during task execution.
Description
Technical field
The present invention relates to field of computer technology, dispatch system and method more particularly to a kind of distributed data.
Background technique
Heimdall is that the mass data with entirely autonomous intellectual property is excavated and analysis system, the system can be with
Realize the excavation and processing to mass data, and provide easy-to-use tool to make for data mining personnel and OA operation analysis personnel
With.For present analysis personnel using the system when inquiring file, what is found be file is usually original log, therefore is also needed
Original log is processed again, handles, analyze, this undoubtedly will increase the workload of analysis personnel, is unfavorable for improving and divide
The working efficiency of analysis personnel needs directly real in Heimdall system at this time in order to provide convenience for analysis personnel etc.
Further extraction, the refinement of existing original log.
But script/meter is all based on when carrying out data mining processing or data pick-up task using the system at present
It calculates platform/hard coded mode and strings whole flow process.In this kind of mode, all data scheduling duties are all unified by central equipment
It completes, data processing task scheduling information concentration is aggregated into this management node of central equipment, causes information flow crowded, if should
Management node, which breaks down, will affect the data processing task of whole system, also, the data processing task efficiency of current system
It is low, it is unable to satisfy the demand for directly further extracting, refining to original log within the system.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind
State the distributed data scheduling system and corresponding method of problem.
According to one aspect of the present invention, a kind of distributed data scheduling system, including at least one scheduling group are provided
Part,
The scheduling component, suitable for obtaining offline logs to be processed from file system, and by the offline day to be processed
Will is divided into multiple sub- logs;
The scheduling component is further adapted for the multiple sub- log being distributed to multiple data mining units, and by the number
Log metadata is excavated from the sub- log according to preset rules according to unit is excavated;
The scheduling component is further adapted for storing the log metadata and other procedural informations to including multiple preset
In the storage assembly of database.
Optionally, the scheduling component, is further adapted for:
The offline logs to be processed are divided into multiple sub- logs in conjunction with the source of the offline logs to be processed, and according to
The multiple sub- log is distributed to corresponding data mining unit by the operating status of each data mining unit.
Optionally, the data mining unit, is further adapted for:
The sub- log is excavated based on MapReduce model and using Spark engine, and extracts log member number
According to.
Optionally, the offline logs to be processed obtained from file system include at least one of:
The log that client accesses log caused by the behavior of server-side, sample flyback behavior generates.
Optionally, the content of the log metadata includes at least one of:
User identity information, Log Types.
Optionally, the scheduling component, is further adapted for:
Each data mining unit is monitored to the mining process of corresponding sub- log, and different monitoring any mining process operation
Chang Shi starts other data mining units automatically and continues to excavate corresponding sub- log.
Optionally, the scheduling component is further adapted for:
If multiple initialized data bases in the storage assembly include mysql database, etcd database and redis number
According to library, then the log metadata is stored to the mysql database and/or etcd database, and other processes are believed
Breath is stored into the redis database.
Optionally, the system, further includes:
At least one front end unit, the front end unit are suitable for showing the mining process of each data mining unit, and monitor
The display state executes alert notice in the display abnormal state.
Optionally, the front end unit, is further adapted for:
The triggering of user is received to suspend the mining process executed needed for the mining process being carrying out or starting.
Optionally, the system, further includes:
Data processing unit generates corresponding Virtual table suitable for sort out merging by the offline logs, and to described
Log metadata is counted to obtain corresponding statistical information.
Optionally, the data processing unit, is further adapted for:
The offline logs sort out merging according at least one of described log content metadata and are generated accordingly
Virtual table.
Optionally, the data processing unit, is further adapted for:
Polymerization calculating is carried out according to preset rules to the offline logs, obtains the log of specific format;
The log of the specific format is sorted out to merge and generates corresponding Virtual table.
Optionally, the preset rules include:
The data processing unit carries out polymerization calculating to the offline logs received according to prefixed time interval, obtains spy
The log for the formula that fixes.
According to one aspect of the present invention, a kind of distributed data dispatching method is additionally provided, comprising:
Offline logs to be processed are obtained from file system, and the offline logs to be processed are divided into multiple sub- logs;
The multiple sub- log is distributed to multiple data mining units, and by the data mining unit according to default rule
Log metadata is then excavated from the sub- log;
By the log metadata and other procedural informations store to include multiple initialized data bases storage assembly in.
Optionally, the offline logs to be processed are divided into multiple sub- logs, comprising:
The offline logs to be processed are divided into multiple sub- logs in conjunction with the source of the offline logs to be processed.
Optionally, the multiple sub- log is distributed to multiple data mining units, comprising:
According to the operating status of each data mining unit, the multiple sub- log is distributed to corresponding data mining list
Member.
Optionally, log metadata is excavated from the sub- log according to preset rules, comprising:
The sub- log is excavated based on MapReduce model and using Spark engine, and extracts log member number
According to.
Optionally, pre-stored multiple offline logs include at least one of in the file system:
The log that client accesses log caused by the behavior of server-side, sample flyback behavior generates.
Optionally, the content of the log metadata includes at least one of:
User identity information, Log Types.
Optionally, the method, further includes:
Each data mining unit is monitored to the mining process of corresponding sub- log, and different monitoring any mining process operation
Chang Shi starts other data mining units automatically and continues to excavate corresponding sub- log.
Optionally, the log metadata and other procedural informations are stored to the storage for including multiple initialized data bases
In component, comprising:
If multiple initialized data bases in the storage assembly include mysql database, etcd database and redis number
According to library, then the log metadata is stored to the mysql database and/or etcd database, and other processes are believed
Breath is stored into the redis database.
Optionally, the method, further includes:
It shows the mining process of each data mining unit, and monitors the display state, held in the display abnormal state
Row alert notice.
Optionally, the method, further includes:
The triggering of user is received to suspend the mining process executed needed for the mining process being carrying out or starting.
Optionally, the method, further includes:
The offline logs sort out merging and generate corresponding Virtual table, and the log metadata is counted
Obtain corresponding statistical information.
Optionally, the offline logs sort out merging and generate corresponding Virtual table, comprising:
The offline logs sort out merging according at least one of described log content metadata and are generated accordingly
Virtual table.
Optionally, the offline logs sort out merging and generate corresponding Virtual table, further includes:
Polymerization calculating is carried out according to preset rules to the offline logs, obtains the log of specific format;
The log of the specific format is sorted out to merge and generates corresponding Virtual table.
Optionally, the preset rules include:
The data processing unit carries out polymerization calculating to the offline logs received according to prefixed time interval, obtains spy
The log for the formula that fixes.
According to another aspect of the invention, a kind of computer storage medium, the computer storage medium are additionally provided
It is stored with computer program code, when the computer program code is run on computers, the computer is caused to execute
Distributed data dispatching method described in any of the above embodiments.
According to another aspect of the invention, a kind of calculating equipment is additionally provided, comprising: processor;It is stored with computer
The memory of program code;When the computer program code is run by the processor, the calculating equipment is caused to execute
Distributed data dispatching method described in any of the above embodiments.
Distributed data of the invention dispatches system, including at least one scheduling component, multiple data mining units and
It include the storage assembly of multiple initialized data bases.Firstly, offline logs to be processed are obtained from file system by scheduling component,
And acquired offline logs to be processed are divided into multiple sub- logs.Further, above-mentioned multiple sub- logs are divided by scheduling component
It is sent to multiple data mining units, it, can be according to preset rules from the son after data mining unit receives corresponding sub- log
Log metadata is excavated in log.Data mining unit can also generate other processes letter during excavating to sub- log
Breath, in the mining process of data mining unit, can by log metadata that scheduling component will acquire in real time and its
His procedural information store to include multiple initialized data bases storage assembly in.It can be seen that the present invention passes through at least one
Scheduling component, multiple data mining units and include multiple initialized data bases storage assembly distributed structure/architecture, by adjusting
The offline logs that degree component will acquire are divided into multiple sub- logs, and execute mining task parallel by multiple data mining units, from
Corresponding log metadata is excavated in each sub- log, solve in the prior art as central equipment focus on task and caused by
Single-point problem, also, the structure due to executing task using multi-node parallel, when data processing task failure or execution equipment go out
When existing failure, the task can be continued to execute by other equipment node, the automatic multimachine of mission failure is realized and retries, ensure that and appoint
Timely, the correct operation of business.On the other hand, the present invention is deposited by carrying out classification to log metadata and other procedural informations
Storage searches convenient for subsequent data analysis personnel or other data processing equipments and obtains corresponding informance, further improves from side
The timeliness of data processing task.In addition, distributed data provided by the invention scheduling system can also automatically will be extracted
Data task code classification granting, facilitates the processing of each data task, is further concerned about how code runs without user, only needs
It is uploaded to data dispatch system of the invention, the code of required operation can be classified automatically and is dispatched to suitable machine
Upper operation further realizes the function that mission failure retries automatically.More, it asks when there are other in data task implementation procedure
Topic can also carry out alert notice to user by the system.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
According to the following detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings, those skilled in the art will be brighter
The above and other objects, advantages and features of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the structural schematic diagram of distributed data scheduling system according to an embodiment of the invention;
Fig. 2 is another structural schematic diagram of distributed data scheduling system according to an embodiment of the invention;
Fig. 3 is another structural schematic diagram of distributed data scheduling system according to an embodiment of the invention;
Fig. 4 is the design structure schematic diagram of distributed scheduling system according to an embodiment of the invention;And
Fig. 5 is the flow chart of distributed data dispatching method according to an embodiment of the invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of distributed datas to dispatch system.Fig. 1 is shown
The structural schematic diagram of distributed data scheduling system according to an embodiment of the invention.Referring to Fig. 1, the distribution of the present embodiment
Formula data dispatch system includes at least one scheduling component 10, multiple data mining units 20, and including multiple preset data
The storage assembly 30 in library.
Now introduce the embodiment of the present invention based on distributed data scheduling system each component part function and each portion
Connection relationship between point:
Component 10 is dispatched, suitable for obtaining offline logs to be processed from file system, and will be acquired to be processed offline
Log is divided into multiple sub- logs, and above-mentioned multiple sub- logs are further distributed to data mining unit 20;
Data mining unit 20 is coupled with scheduling component 10, the sub- log of correspondence distributed suitable for receiving scheduling component 10, and
Log metadata is excavated from sub- log according to preset rules;
Storage assembly 30 is coupled with scheduling component 10, is dispatched from data mining unit 20 suitable for will dispatch component 10
Log metadata and other procedural informations are stored into multiple initialized data bases.
It should be noted that in the present embodiment, for convenience and clear, Fig. 1 illustrates only a scheduling component
10 with the connection relationship of other each sections, it is to be understood that other any scheduling in the distributed data scheduling system of the present embodiment
Component 10 has identical structure and function, each connection relationship and Fig. 1 with the scheduling component 10 enumerated in the embodiment
It is identical in example, it no longer excessively repeats herein, the connection relationship in relation to other scheduling components 10 in figure is also no longer shown.
The present invention is by least one scheduling component 10, multiple data mining unit 20 and includes multiple preset data
The distributed structure/architecture of the storage assembly 30 in library is divided into multiple sub- logs by the offline logs that scheduling component 10 will acquire, and by more
A data mining unit 20 executes mining task parallel, and corresponding log metadata is excavated from each sub- log, is solved existing
In technology as central equipment focus on task and caused by single-point problem, also, due to executing task using multi-node parallel
Structure, when data processing task failure or execute equipment break down when, this can be continued to execute by other equipment node
Business, realizes the automatic multimachine of mission failure and retries, ensure that timely, the correct operation of task.On the other hand, the present invention by pair
Log metadata and other procedural informations carry out classification storage, are convenient for subsequent data analysis personnel or other data processing equipments
It searches and obtains corresponding informance, the timeliness of data processing task is further improved from side.In addition, distribution provided by the invention
Formula data dispatch system can also facilitate the processing of each data task automatically by extracted data task code classification granting,
Further be concerned about how code runs without user, only need to be uploaded to data dispatch system of the invention, can will needed for
The code of operation, which is classified automatically and is dispatched on suitable machine, to be run, and the function that mission failure retries automatically is further realized.
More, when occur in data task implementation procedure other problems can also by the system to user carry out alert notice.
Specifically, in an embodiment of the present invention, it is obtained from file system by scheduling component 10 first to be processed offline
Log.In the embodiment, file system can be hdfs (the HadoopDistributed File for being stored with massive logs
System, distributed file system), the file system such as S3 (Simple Storage Service, simple storage service), certainly
It can also be other file system.In addition, pre-stored massive logs may include such as client visit in file system
Ask the log that log caused by the behavior of server-side, sample flyback behavior generate etc. log.The embodiment of the present invention is to log
Type do not do specific restriction.
Further, when scheduling component 10 gets day caused by the behavior of client access server-side from file system
After the offline logs such as the log that will, sample flyback behavior generate, acquired offline logs can be further processed.?
In one embodiment of the invention, the source of offline logs to be processed can be combined to be classified as multiple sub- logs by scheduling component 10,
And according to the operating status of each data mining unit 20, above-mentioned multiple sub- logs are distributed to corresponding data mining unit
20.It specifically, can be directly using log caused by the behavior of client access server-side as a sub- log and by sample
The log that this flyback behavior generates divides other offline logs as another sub- log, and according to the source of other offline logs
For multiple sub- logs.In addition, in an alternative embodiment, client can also be accessed caused by the behavior of server-side
Log is divided into multiple sub- logs according further to other rules, and the log that sample flyback behavior is generated is according further to other rule
Then it is divided into multiple sub- logs etc..In addition, scheduling component 10 can also be according to any other feasible rule to be processed offline
Log is classified, by the way that offline logs to be processed are divided into multiple sub- logs, realize to based on script computing platform it is hard
The scheme that the mode of coding strings whole flow process has done the improvement of essence, by the way that entire flow chart of data processing to be divided at modularization
Reason, so that can not only improve data-handling efficiency between each data processing module with parallel processing, and avoids the list of system
Point failure problem executes data processing task in time, steadily.
It in an embodiment of the present invention, can also be by scheduling component after offline logs to be processed being divided into multiple sub- logs
Multiple sub- logs according to the operating status of each data mining unit 20, are distributed to corresponding data mining unit 20 by 10, into
And excavation processing is carried out to corresponding sub- log by data mining unit 20.Specifically, according to each data mining unit 20
When state is distributed multiple sub- logs, it can first determine whether each data mining unit 20 is currently carrying out task.
If distributed data scheduling system is in rigid starting state, it is generally the case that most of data mining unit 20 is in the free time
State, at this point, if after scheduling component 10 gets offline logs to be processed and be classified as multiple sub- logs from file system, it can
To choose the data mining unit 20 of required number arbitrarily in multiple data mining units 20 being in idle condition with waiting
Sub- log is received, it can also be according to the address information of each data mining unit 20 or other unique identification informations to each data mining list
Member 20 is ranked up, and then multiple sub- logs are distributed to ordering preset number data mining unit by scheduling component 10
In 20.It should be noted that the above description of the present embodiment is only to enumerate, and do not constitute a limitation of the invention.
It, can be by data mining unit after data mining unit 20 receives the correspondence sub- log that scheduling component 10 is distributed
20 pairs of corresponding sub- logs are further processed.It specifically, in an embodiment of the present invention, can be first by data mining unit 20
According to the corresponding running environment of the received sub- log creation of institute, further by data mining unit 20 according to preset rules from sub- log
Middle excavation log metadata.For example data mining unit 20 can be based on MapReduce model and using Spark engine antithetical phrase day
Will is excavated, and extracts log metadata.Wherein, MapReduce model is a kind of programming model, is used for large-scale dataset
The concurrent operation of (being greater than 1TB), Spark engine are the computing engines for the Universal-purpose quick for aiming at large-scale data processing and designing.
In the present embodiment, log metadata may include the identification information of user and the type of log, according to the needs of users, day
The metadata of will can also include the contents such as the generation time of log.
Further, during above-mentioned data mining unit 20 carries out data mining, scheduling component 10 can be monitored
And the treatment process of each 20 neutron log of data mining unit is recorded, and when monitoring any sub- log processing exception, it can be with
Automatically start other data mining units 20 to continue to carry out data mining processing to the sub- log.Specifically, starting it automatically
When his data mining unit 20, unit 20 can be excavated with random start any data and continue with the sub- log, it can also be into one
Step combines the status information of other data mining units 20 that suitable data mining unit 20 is selected to continue with, can also basis
Preset rule chooses corresponding data mining unit 20 and handles corresponding sub- log, and the present embodiment only need to be in any data
When excavation 20 neutron log processing process of unit is abnormal, starts other data mining units 20 automatically and continue with the sub- day
Will with guarantee the sub- log can be able in time, properly process.
In addition, in data mining unit 20 after excavating log metadata in sub- log, it can be by scheduling component 10 by institute
The log metadata of excavation and other procedural informations are stored to the storage assembly 30 for including multiple initialized data bases.Specifically
Ground, in the present embodiment, multiple initialized data bases of storage assembly 30 can for etcd database, mysql database or
Redis database, wherein etcd database is the key assignments storage system an of High Availabitity, is mainly used for configuration sharing kimonos
Business discovery;Mysql database is mainly used for storing the metadata information of some data, will such as deposit after the metadata statistics of log
Storage is into mysql database;Redis database is in the use ANSI C language an of open source writes, supports network, can be based on
Deposit also can persistence log type, Key-Value database.In the present embodiment, when storage unit 30 include etcd database,
When mysql database and redis database, then log metadata is stored to etcd database and/or mysql database
In, and other procedural informations are stored into redis database.By to log metadata generated and other processes
Information carries out classification storage, so that data processing task performed by distributed data scheduling system is definitely changed, and more just
In the inquiry and extraction of subsequent analysis personnel or data processing equipment to corresponding information.
Further, Fig. 2 shows distributed data according to an embodiment of the invention scheduling system another
Structural schematic diagram, as shown in Fig. 2, the distributed data scheduling system of the present embodiment further includes at least one front end unit 40.This
The front end unit 40 of embodiment couples with scheduling component 10, is suitable for by data processing staff development code, and the generation that will be developed
Code is uploaded to file system and is stored.
In addition, in an embodiment of the present invention, front end unit 40 is further adapted for showing the antithetical phrase day of each data mining unit 20
The mining process of will, and the display state is monitored, further alert notice can be executed to user when showing abnormal state.Separately
On the one hand, front end unit 40 can also receive the triggering of user to suspend performed by data mining unit 20 to any sub- log
Mining task, or starting needed for execute any mining task.
In an embodiment of the present invention, Fig. 3 shows distributed data scheduling system according to an embodiment of the invention
Another structural block diagram.As shown in figure 3, distributed data scheduling system further includes data processing unit 50, the data processing
Unit 50 is coupled with storage assembly 30, corresponding virtual suitable for the offline logs in storage assembly 30 are carried out classification merging generation
Table, and the log metadata in storage assembly 30 is counted to obtain corresponding statistical information.In an embodiment of the present invention,
For data processing unit 50 when carrying out sorting out the merging corresponding Virtual table of generation to offline logs, can be based on patent frame will
The offline logs received carry out classification merging, for example, being sorted out according to the different-format of offline logs, field etc. to log
Merge and generate corresponding Virtual table, wherein field herein can be organized to the corresponding data shape for needing to carry out structuring
Formula.Corresponding Virtual table, the same day are generated in another example can carry out sorting out according at least one of log content metadata merging
When will metadata includes the contents such as log generation time, user identity information, Log Types, data processing unit 50 can also be according to
Log sort out merging according at least one of log generation time, user identity information, Log Types and is generated accordingly
Virtual table.
In the embodiment, Virtual table may include following several:
Basic, sample Basic Information Table such as historical query amount, are first appeared including the sample key message of log
Time, rank etc. can quickly understand the significance level of a sample, also, can also be realized based on the Virtual table subsequent
The quick search of sample;
Specimen_detail, specimen details table;
Specimen_cloud_detail, specimen cloud look into static attribute information table;
Upload, file upload information table of tracing to the source;
Cloud_info, sample cloud look into information table, look into relevant information including the sample cloud of log, as file path,
History rank, PV (Page View, page browsing amount), UV (Unique Visitor, independent access number of users) etc.;
Network_behavior, network_behavior table, the network of samples behavior including log are relevant
Information;
Proc_behavior, process behavior table;
Proc_chain, chain of processes information table;
Dropped_files, file discharge information table;
Scan_log, scanning information table;Scan_info scanning information table;pestring;mid2ip;file_
Relations, document relationship table;Specimen has collected sample information table;Information table can be performed in pe_info, including sample
Originally relevant table of executable information etc..
In addition, in an embodiment of the present invention, data processing unit 50 is carrying out offline logs to sort out merging generation phase
When the Virtual table answered, polymerization calculating first can also be carried out according to preset rules to offline logs to be processed, and obtain specific format
Log, and then again the log of specific format is sorted out to merge and generates corresponding Virtual table.Specifically, the data in the embodiment
Processing unit 50 has data pick-up and the ability polymerizeing, and can be taken out from different data sources according to simple configuration file
Log is taken, and polymerization calculating is carried out to log, finally accumulates the log of specific format.For example, the log of specific format can be with
It is the log of json format, certainly, polymerize the log that the log being calculated can also be extended formatting.
In the embodiment, when carrying out polymerization calculating according to preset rules to offline logs, it can be according between preset time
Every carrying out polymerization calculating to offline logs, prefixed time interval, which can be, daily calculates once, either offline logs polymerization
It calculates primary to offline logs polymerization every two days or offline logs polymerization was calculated in one month primary etc..
Distributed data scheduling system in foregoing embodiments is actually that offline logs are carried out with the core of processed offline
Equipment is based primarily upon a distributed data pick-up and aggregation framework, can handle the offline day of magnanimity in file system
Will can run dozens of task daily, handle the hundreds of TB of data volume, can be extraction feature trillion.
It is described in detail below with a specific embodiment to distributed data scheduling system of the invention.
The distributed data scheduling system of the present embodiment is realized based on distributed scheduling system, first below to distribution
Scheduling system is illustrated.The core of distributed scheduling system is distributed task dispatching, and task may be data conversion task
It is also likely to be other tasks, belongs to an infrastructure component system.Distributed scheduling system is set based on master slave structure
Meter, have and restore simultaneously retray function after executing mission failure automatically, and can support multiple-task type, is such as based on
MapReduce model is simultaneously extracted log metadata, scheduling and is downloaded from the offline logs of file system using Spark engine
File, load stored in hdfs etc..Referring to fig. 4, the specific work process of distributed scheduling system is now introduced, it is distributed
Data dispatch system is in a distributed manner based on scheduling system, and the respective course of work is similar, is first introduced below distributed
The specific work process of scheduling system.
In distributed scheduling system, etcd cluster, i.e. master (the scheduling component i.e. in the present invention) cluster,
There are multiple master in master cluster.Any master can be stored from file extracts to be processed appoint in (S3/hdfs)
Be engaged in task, and the waiting task of extraction is distributed to and corresponding worker (the data mining unit i.e. in the present invention)
In node, for example, the task of extraction can be distributed in 4 corresponding worker nodes by master leader, this 4
A worker node can be parallel execution task.
During worker node execution task, master can be to the current task and corresponding each of executing
The implementation procedure of worker node is recorded.In addition, the task member number that master can also will be generated during the task of execution
Store according to (such as log metadata) into etcd, mysql database, and by generation other record (such as task quantity),
Temporary information is stored into memory/redis database.
Data processing task is dispatched using the distributed scheduling system of the embodiment of the present invention, it can be in a node tasks
Other nodes re-execute task after failure, effectively prevent single-point problem, also, also greatly facilitate data processing and appoint
Business, without being concerned about how task executes, as long as task is uploaded to distributed scheduler, task Automatic dispatching to suitable machine
It is run on device, and can be carried out failure and retry.
The embodiment of the invention also provides a kind of distributed datas to dispatch system, below to distributed data scheduling system
Workflow is specifically introduced.The core function of distributed data scheduling system is to carry out the scheduling and conversion of data,
The scheduling of data analysis task can be carried out using distributed scheduling system above, this task can be conversion task,
It can be other tasks.I.e. using after distributed scheduling system scheduler task, as distributed scheduling system dispatch it is to be processed
Offline logs, then the further data processing of task progress by distributed data scheduling system to scheduling, such as offer elasticity/can
Process flow, data stream monitoring, storage easy to use, modular data mart modeling process of programming etc..Distributed data tune
Degree system can be based on data processing shelf design above.
In the embodiment, distributed data dispatches system and task has been re-started definition, such as node, rdd, meta,
In, node can represent a kind of mode that data processing is collected, and the output of a node can be used as the input of next node,
Each node is logically independent, but can by configure/xml strings together.Node may include such as Types Below:
Filter, the node of filtration types can handle the rdd of input with customized filter condition;
Event, the node of event type customized can extract result according to customized event;
Fill has mended the node of type, customized can mend rule to handle the rdd of input;
Map/reduce carries out the node of data processing by map/reduce program;
Spark carries out the node of data processing by spark program;
Script carries out the node of data processing by script.
Rdd is derived from the concept of spark, elasticity distribution formula data set, and a results set of node is exactly rdd, and rdd can be certainly
Definition storage, or data volume can be defined and automatically select storage, in addition, rdd also can define the rule cut, cut output.
Meta metadata, the data type that each node can be handled, such as processing sample, can be the data structure of sample
It is defined as the form of Virtual table described above.
It is hereby achieved that distributed data scheduling its core function of system is to be configured to execute according to node in individual node
Corresponding service logic.
Data pick-up task is described with simply example below.Such as data pick-up task is to extract the sample of Baidu
This.
Firstly, extracting md5 (message-digest algorithm 5, message digest algorithm 5) from cloud killing log
It is then calculated and is corresponded to according to md5, sha1 value of extraction with sha1 (secure hash algorithm, Secure Hash Algorithm)
The daily pv/uv of sample, and obtain parent_url (the parent_uniform resource of sample of the pv/uv greater than 100w
Locator, parent uniform resource locator).If the sample comprising Baidu, its all subprocess is obtained, and extract previous
The details of hundred subprocess are shown.
The specific of above-mentioned data pick-up task is executed using the distributed data scheduling system of the embodiment of the present invention and executes step
Suddenly it may is that
Step1, monitoring pv are greater than 1000000 sample;Specific code can be
Step2, the parent_url attribute for pulling sample;Specific code can be
Step3, the sample that parent_url includes Baidu is filtered out;Specific code can be
Step5, the sample of previous hundred subprocess is shown in front end.
In the embodiment of the present invention, distributed data dispatches system and realizes that the Scheduling Core of logic can be according to whole configuration
Above-mentioned each step is stringed together, and is responsible for the relevant storage of management rdd, the task of each node is distributed to each node and is held
Row.In the embodiment, single node can be independently executed.
In addition, data processing system can also provide the function of visual edit by setting front end page, visualization
The conf (configuration file) that editor's configuration generates json format is submitted to Scheduling Core.Also, front end page can not only be shown
The progress of each node can also provide the function of being started manually by the user the stopping single node of the task.
Based on the same inventive concept, the embodiment of the invention also provides a kind of distributed data dispatching method, Fig. 5 is shown
The flow chart of distributed data dispatching method according to an embodiment of the invention.As shown in figure 5, the distributed data dispatching party
Method includes at least step S502 to step S506:
Step S502, offline logs to be processed are obtained from file system, and offline logs to be processed are divided into multiple sons
Log;
Step S504, multiple sub- logs are distributed to multiple data mining units, and by data mining unit according to default
Rule excavates log metadata from sub- log;
Step S506, log metadata and other procedural informations are stored to the storage group for including multiple initialized data bases
In part.
In an embodiment of the present invention, execute step S502 when, can also after getting offline logs to be processed, in conjunction with
Above-mentioned offline logs to be processed are divided into multiple sub- logs by the source of offline logs to be processed.Further, step S504 is executed, it will
Multiple sub- logs are distributed to multiple data mining units, specifically, can be according to the operating status of each data mining unit, will
Multiple sub- logs are distributed to corresponding data mining unit.
It further,, can be by data when step S504 is specifically executed after data mining unit receives corresponding sub- log
It excavates unit and excavates log metadata from sub- log according to preset rules.In an embodiment of the present invention, it can be based on
MapReduce model simultaneously excavates sub- log using Spark engine, and extracts log metadata.The file of the present embodiment
Pre-stored multiple offline logs may include log caused by the behavior of client access server-side, sample time in system
Sweep the log etc. of behavior generation.The content of log metadata may include user identity information, Log Types etc..
In an embodiment of the present invention, in above-mentioned steps implementation procedure, each data can also be monitored by scheduling unit and are dug
Unit is dug to the mining process of corresponding sub- log, and when monitoring any mining process operation exception, starts other numbers automatically
Continue to excavate corresponding sub- log according to unit is excavated, to realize mission failure retries automatically in the system function.
It further, can be by extracted day in data mining unit after extracting log metadata in sub- log
Other procedural informations caused by will metadata and mining process are stored to the storage assembly for including multiple initialized data bases
In.Specifically, if multiple initialized data bases in storage assembly include mysql database, etcd database and redis data
Library then stores log metadata to mysql database and/or etcd database, and by other procedural informations store to
In redis database.
In addition, can also show the excavation of each data mining unit by front end unit in above-mentioned steps implementation procedure
Process, and the display state is monitored, further alert notice is executed to user when monitoring display abnormal state.In addition, also
The triggering of user can be received by front end unit to suspend the excavation executed needed for the mining process being carrying out or starting
Journey.
Further, after above steps execution terminates, offline logs can also be carried out by data processing unit
Sort out to merge and generate corresponding Virtual table, and log metadata is counted to obtain corresponding statistical information.It specifically, will be from
Line log, which sort out, to be merged when generating corresponding Virtual table, can by offline logs according in log content metadata at least
It is a kind of to carry out sorting out the corresponding Virtual table of merging generation.
It on the other hand, can also be to by data when carrying out sorting out the merging corresponding Virtual table of generation by offline logs
Reason unit carries out polymerization calculating according to preset rules to offline logs, obtains the log of specific format.Further to specific format
Log sort out to merge and generate corresponding Virtual table.In the present embodiment, preset rules may include: data processing unit according to
Prefixed time interval carries out polymerization calculating to the offline logs received, obtains the log of specific format.
The embodiment of the invention also provides a kind of computer storage medium, computer storage medium is stored with computer program
Code causes calculating equipment to execute point in any embodiment above when computer program code is run on the computing device
Cloth data dispatching method.
In addition, the embodiment of the invention also provides a kind of calculating equipment, including processor;It is stored with computer program code
Memory;When computer program code is run by processor, calculating equipment is caused to execute point in any embodiment above
Cloth data dispatching method.
According to the combination of any one above-mentioned preferred embodiment or multiple preferred embodiments, the embodiment of the present invention can reach
It is following the utility model has the advantages that
The present invention is by least one scheduling component, multiple data mining unit and includes multiple initialized data bases
The distributed structure/architecture of storage assembly is divided into multiple sub- logs by the offline logs that scheduling component will acquire, and is dug by multiple data
Pick unit executes mining task parallel, and corresponding log metadata is excavated from each sub- log, is solved in the prior art in
Entreat equipment centralized processing task and caused by single-point problem, also, due to using multi-node parallel execute task structure, work as number
When according to processing mission failure or executing equipment failure, the task can be continued to execute by other equipment node, realizes and appoints
The automatic multimachine of business failure retries, and ensure that timely, the correct operation of task.On the other hand, the present invention passes through to log metadata
And other procedural informations carry out classification storage, search acquisition pair convenient for subsequent data analysis personnel or other data processing equipments
Information is answered, the timeliness of data processing task is further improved from side.In addition, distributed data scheduling provided by the invention
System can also facilitate the processing of each data task automatically by extracted data task code classification granting, without user into
One step is concerned about how code runs, and only need to be uploaded to data dispatch system of the invention, can be by the code of required operation
Automatically classify and be dispatched on suitable machine and run, further realize the function that mission failure retries automatically.More, work as number
According to occurring other problems during task execution alert notice can also be carried out to user by the system.
It is apparent to those skilled in the art that the specific work of the system of foregoing description, equipment and unit
Make process, can refer to corresponding processes in the foregoing method embodiment, for brevity, does not repeat separately herein.
In addition, each functional unit in each embodiment of the present invention can be physically independent, can also two or
More than two functional units integrate, and can be all integrated in a processing unit with all functional units.It is above-mentioned integrated
Functional unit both can take the form of hardware realization, can also be realized in the form of software or firmware.
Those of ordinary skill in the art will appreciate that: if integrated functional unit is realized in the form of software and as only
Vertical product when selling or using, can store in a computer readable storage medium.Based on this understanding, this hair
Bright technical solution is substantially or all or part of the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium comprising some instructions, with (such as personal so that calculating equipment
Computer, server or network etc.) all or part of the steps of execution various embodiments of the present invention method in operating instruction.
And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or
The various media that can store program code such as person's CD.
Alternatively, realizing that all or part of the steps of preceding method embodiment can be (all by the relevant hardware of program instruction
Such as personal computer, the calculating equipment of server or network etc.) it completes, described program instruction can store to be calculated in one
In machine read/write memory medium, when described program instruction is executed by the processor of calculating equipment, the calculating equipment executes sheet
Invent all or part of the steps of each embodiment the method.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Present invention has been described in detail with reference to the aforementioned embodiments for pipe, those skilled in the art should understand that: at this
Within the spirit and principle of invention, it is still possible to modify the technical solutions described in the foregoing embodiments or right
Some or all of the technical features are equivalently replaced;And these are modified or replaceed, and do not make corresponding technical solution de-
From protection scope of the present invention.
Based on one aspect of the present invention, a kind of distributed data scheduling system of A1., including at least one scheduling are provided
Component,
The scheduling component, suitable for obtaining offline logs to be processed from file system, and by the offline day to be processed
Will is divided into multiple sub- logs;
The scheduling component is further adapted for the multiple sub- log being distributed to multiple data mining units, and by the number
Log metadata is excavated from the sub- log according to preset rules according to unit is excavated;
The scheduling component is further adapted for storing the log metadata and other procedural informations to including multiple preset
In the storage assembly of database.
A2. system according to a1, wherein the scheduling component is further adapted for:
The offline logs to be processed are divided into multiple sub- logs in conjunction with the source of the offline logs to be processed, and according to
The multiple sub- log is distributed to corresponding data mining unit by the operating status of each data mining unit.
A3. the system according to A2, wherein the data mining unit is further adapted for:
The sub- log is excavated based on MapReduce model and using Spark engine, and extracts log member number
According to.
A4. according to the described in any item systems of A1-A3, wherein the offline day to be processed obtained from file system
Will includes at least one of:
The log that client accesses log caused by the behavior of server-side, sample flyback behavior generates.
A5. according to the described in any item systems of A1-A3, wherein the content of the log metadata include it is following at least it
One:
User identity information, Log Types.
A6. according to the described in any item systems of A1-A3, wherein the scheduling component is further adapted for:
Each data mining unit is monitored to the mining process of corresponding sub- log, and different monitoring any mining process operation
Chang Shi starts other data mining units automatically and continues to excavate corresponding sub- log.
A7. according to the described in any item systems of A1-A3, wherein the scheduling component is further adapted for:
If multiple initialized data bases in the storage assembly include mysql database, etcd database and redis number
According to library, then the log metadata is stored to the mysql database and/or etcd database, and other processes are believed
Breath is stored into the redis database.
A8. according to the described in any item systems of A1-A3, wherein further include:
At least one front end unit, the front end unit are suitable for showing the mining process of each data mining unit, and monitor
The display state executes alert notice in the display abnormal state.
A9. the system according to A8, wherein the front end unit is further adapted for:
The triggering of user is received to suspend the mining process executed needed for the mining process being carrying out or starting.
A10. according to the described in any item systems of A1-A3, wherein further include:
Data processing unit generates corresponding Virtual table suitable for sort out merging by the offline logs, and to described
Log metadata is counted to obtain corresponding statistical information.
A11. the system according to A10, wherein the data processing unit is further adapted for:
The offline logs sort out merging according at least one of described log content metadata and are generated accordingly
Virtual table.
A12. according to the described in any item systems of A1-A3, wherein the data processing unit is further adapted for:
Polymerization calculating is carried out according to preset rules to the offline logs, obtains the log of specific format;
The log of the specific format is sorted out to merge and generates corresponding Virtual table.
A13. the system according to A12, wherein the preset rules include:
The data processing unit carries out polymerization calculating to the offline logs received according to prefixed time interval, obtains spy
The log for the formula that fixes.
Based on another aspect of the present invention, a kind of distributed data dispatching method of B14. is additionally provided, comprising:
Offline logs to be processed are obtained from file system, and the offline logs to be processed are divided into multiple sub- logs;
The multiple sub- log is distributed to multiple data mining units, and by the data mining unit according to default rule
Log metadata is then excavated from the sub- log;
By the log metadata and other procedural informations store to include multiple initialized data bases storage assembly in.
B15. method according to b14, wherein the offline logs to be processed are divided into multiple sub- logs, comprising:
The offline logs to be processed are divided into multiple sub- logs in conjunction with the source of the offline logs to be processed.
B16. the method according to B15, wherein the multiple sub- log is distributed to multiple data mining units, is wrapped
It includes:
According to the operating status of each data mining unit, the multiple sub- log is distributed to corresponding data mining list
Member.
B17. method according to b14, wherein log metadata is excavated from the sub- log according to preset rules,
Include:
The sub- log is excavated based on MapReduce model and using Spark engine, and extracts log member number
According to.
B18. according to the described in any item methods of B14-B17, wherein pre-stored multiple offline in the file system
Log includes at least one of:
The log that client accesses log caused by the behavior of server-side, sample flyback behavior generates.
B19. according to the described in any item methods of B14-B17, wherein the content of the log metadata include it is following at least
One of:
User identity information, Log Types.
B20. according to the described in any item methods of B14-B17, wherein further include:
Each data mining unit is monitored to the mining process of corresponding sub- log, and different monitoring any mining process operation
Chang Shi starts other data mining units automatically and continues to excavate corresponding sub- log.
B21. according to the described in any item methods of B14-B17, wherein deposit the log metadata and other procedural informations
Store up to include multiple initialized data bases storage assembly in, comprising:
If multiple initialized data bases in the storage assembly include mysql database, etcd database and redis number
According to library, then the log metadata is stored to the mysql database and/or etcd database, and other processes are believed
Breath is stored into the redis database.
B22. according to the described in any item methods of B14-B17, wherein further include:
It shows the mining process of each data mining unit, and monitors the display state, held in the display abnormal state
Row alert notice.
B23. the method according to B22, wherein further include:
The triggering of user is received to suspend the mining process executed needed for the mining process being carrying out or starting.
B24. according to the described in any item methods of B14-B17, wherein further include:
The offline logs sort out merging and generate corresponding Virtual table, and the log metadata is counted
Obtain corresponding statistical information.
B25. the method according to B24, wherein it is corresponding virtual that the offline logs are subjected to classification merging generation
Table, comprising:
The offline logs sort out merging according at least one of described log content metadata and are generated accordingly
Virtual table.
B26. according to the described in any item methods of B14-B17, wherein carry out the offline logs to sort out merging generation phase
The Virtual table answered, further includes:
Polymerization calculating is carried out according to preset rules to the offline logs, obtains the log of specific format;
The log of the specific format is sorted out to merge and generates corresponding Virtual table.
B27. the method according to B26, wherein the preset rules include:
Polymerization calculating is carried out to the offline logs received according to prefixed time interval by data processing unit, is obtained specific
The log of format.
Based on another aspect of the present invention, a kind of computer storage medium of C28., the computer storage are additionally provided
Media storage has computer program code, when the computer program code is run on computers, leads to the computer
Perform claim requires the described in any item distributed data dispatching methods of B14-B27.
Based on an additional aspect of the present invention, a kind of calculating equipment of D29. is additionally provided, comprising: processor;It is stored with meter
The memory of calculation machine program code;When the computer program code is run by the processor, lead to the calculating equipment
Perform claim requires the described in any item distributed data dispatching methods of B14-B27.
Claims (10)
1. a kind of distributed data dispatches system, including at least one scheduling component,
The scheduling component divides suitable for obtaining offline logs to be processed from file system, and by the offline logs to be processed
For multiple sub- logs;
The scheduling component is further adapted for the multiple sub- log being distributed to multiple data mining units, and is dug by the data
It digs unit and excavates log metadata from the sub- log according to preset rules;
The scheduling component is further adapted for storing the log metadata and other procedural informations to including multiple preset data
In the storage assembly in library.
2. system according to claim 1, wherein the scheduling component is further adapted for:
The offline logs to be processed are divided into multiple sub- logs in conjunction with the source of the offline logs to be processed, and according to each
The multiple sub- log is distributed to corresponding data mining unit by the operating status of data mining unit.
3. system according to claim 2, wherein the data mining unit is further adapted for:
The sub- log is excavated based on MapReduce model and using Spark engine, and extracts log metadata.
4. system according to claim 1-3, wherein the offline day to be processed obtained from file system
Will includes at least one of:
The log that client accesses log caused by the behavior of server-side, sample flyback behavior generates.
5. system according to claim 1-3, wherein the content of the log metadata include it is following at least it
One:
User identity information, Log Types.
6. system according to claim 1-3, wherein the scheduling component is further adapted for:
Each data mining unit is monitored to the mining process of corresponding sub- log, and be operating abnormally monitoring any mining process
When, start other data mining units automatically and continues to excavate corresponding sub- log.
7. system according to claim 1-3, wherein the scheduling component is further adapted for:
If multiple initialized data bases in the storage assembly include mysql database, etcd database and redis data
Library then stores the log metadata to the mysql database and/or etcd database, and by other procedural informations
It stores into the redis database.
8. a kind of distributed data dispatching method, comprising:
Offline logs to be processed are obtained from file system, and the offline logs to be processed are divided into multiple sub- logs;
The multiple sub- log is distributed to multiple data mining units, and by the data mining unit according to preset rules from
Log metadata is excavated in the sub- log;
By the log metadata and other procedural informations store to include multiple initialized data bases storage assembly in.
9. a kind of computer storage medium, the computer storage medium is stored with computer program code, when the computer
When program code is run on computers, lead to distributed data dispatching method described in the computer perform claim requirement 8.
10. a kind of calculating equipment, comprising: processor;It is stored with the memory of computer program code;When the computer program
When code is run by the processor, lead to distributed data dispatching method described in the calculating equipment perform claim requirement 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810689485.8A CN109033196A (en) | 2018-06-28 | 2018-06-28 | A kind of distributed data scheduling system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810689485.8A CN109033196A (en) | 2018-06-28 | 2018-06-28 | A kind of distributed data scheduling system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109033196A true CN109033196A (en) | 2018-12-18 |
Family
ID=65520776
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810689485.8A Pending CN109033196A (en) | 2018-06-28 | 2018-06-28 | A kind of distributed data scheduling system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033196A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459896A (en) * | 2019-01-18 | 2020-07-28 | 阿里巴巴集团控股有限公司 | Data recovery system and method, electronic device, and computer-readable storage medium |
CN111723063A (en) * | 2019-03-18 | 2020-09-29 | 北京沃东天骏信息技术有限公司 | Method and device for processing offline log data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110179105A1 (en) * | 2010-01-15 | 2011-07-21 | International Business Machines Corporation | Method and system for distributed task dispatch in a multi-application environment based on consensus |
CN102306168A (en) * | 2011-08-23 | 2012-01-04 | 成都市华为赛门铁克科技有限公司 | Log operation method and device and file system |
CN102495758A (en) * | 2011-12-05 | 2012-06-13 | 中南大学 | Scheduling method of real-time tasks in distributing type high performance calculation environment |
CN103138989A (en) * | 2013-02-25 | 2013-06-05 | 武汉华工安鼎信息技术有限责任公司 | System and method for analyzing large number of logs |
CN106202399A (en) * | 2016-07-11 | 2016-12-07 | 浪潮软件集团有限公司 | Method for implementing data management system of big data |
CN106452899A (en) * | 2016-10-27 | 2017-02-22 | 中国工商银行股份有限公司 | Distributed data mining system and method |
CN107766147A (en) * | 2016-08-23 | 2018-03-06 | 上海宝信软件股份有限公司 | Distributed data analysis task scheduling system |
-
2018
- 2018-06-28 CN CN201810689485.8A patent/CN109033196A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110179105A1 (en) * | 2010-01-15 | 2011-07-21 | International Business Machines Corporation | Method and system for distributed task dispatch in a multi-application environment based on consensus |
CN102306168A (en) * | 2011-08-23 | 2012-01-04 | 成都市华为赛门铁克科技有限公司 | Log operation method and device and file system |
CN102495758A (en) * | 2011-12-05 | 2012-06-13 | 中南大学 | Scheduling method of real-time tasks in distributing type high performance calculation environment |
CN103138989A (en) * | 2013-02-25 | 2013-06-05 | 武汉华工安鼎信息技术有限责任公司 | System and method for analyzing large number of logs |
CN106202399A (en) * | 2016-07-11 | 2016-12-07 | 浪潮软件集团有限公司 | Method for implementing data management system of big data |
CN107766147A (en) * | 2016-08-23 | 2018-03-06 | 上海宝信软件股份有限公司 | Distributed data analysis task scheduling system |
CN106452899A (en) * | 2016-10-27 | 2017-02-22 | 中国工商银行股份有限公司 | Distributed data mining system and method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459896A (en) * | 2019-01-18 | 2020-07-28 | 阿里巴巴集团控股有限公司 | Data recovery system and method, electronic device, and computer-readable storage medium |
CN111459896B (en) * | 2019-01-18 | 2023-05-02 | 阿里云计算有限公司 | Data recovery system and method, electronic device, and computer-readable storage medium |
CN111723063A (en) * | 2019-03-18 | 2020-09-29 | 北京沃东天骏信息技术有限公司 | Method and device for processing offline log data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11449562B2 (en) | Enterprise data processing | |
US11163731B1 (en) | Autobuild log anomaly detection methods and systems | |
US11663033B2 (en) | Design-time information based on run-time artifacts in a distributed computing cluster | |
US7418453B2 (en) | Updating a data warehouse schema based on changes in an observation model | |
US11182691B1 (en) | Category-based sampling of machine learning data | |
WO2014031618A2 (en) | Data relationships storage platform | |
CN108985981A (en) | Data processing system and method | |
CN108829505A (en) | A kind of distributed scheduling system and method | |
CN114756629B (en) | Multi-source heterogeneous data interaction analysis engine and method based on SQL | |
CN103077192A (en) | Data processing method and system thereof | |
CN109033196A (en) | A kind of distributed data scheduling system and method | |
CN107423035B (en) | Product data management system in software development process | |
CN116431668A (en) | Metadata acquisition-based data blood-edge analysis method and device and electronic equipment | |
Gu et al. | Characterizing job-task dependency in cloud workloads using graph learning | |
Davidson et al. | Technical review of apache flink for big data | |
CN115658133A (en) | Multi-version gray scale release system for enterprise software | |
US11386170B2 (en) | Search data curation and enrichment for deployed technology | |
Huang et al. | A web interface for XALT log data analysis | |
US12045654B2 (en) | Memory management through control of data processing tasks | |
US11822566B2 (en) | Interactive analytics workflow with integrated caching | |
CN116860754A (en) | Report data processing method and device, electronic equipment and storage medium | |
CN116910352A (en) | Report recommending method, device, equipment and medium based on artificial intelligence | |
CN117609362A (en) | Data processing method, device, computer equipment and storage medium | |
CN118096387A (en) | Data asset management method, device, equipment and medium | |
Bernaschi et al. | Forensic disk image indexing and search in an HPC environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181218 |
|
RJ01 | Rejection of invention patent application after publication |