Nothing Special   »   [go: up one dir, main page]

CN112286990A - Method and device for predicting platform operation execution time and electronic equipment - Google Patents

Method and device for predicting platform operation execution time and electronic equipment Download PDF

Info

Publication number
CN112286990A
CN112286990A CN202011177481.5A CN202011177481A CN112286990A CN 112286990 A CN112286990 A CN 112286990A CN 202011177481 A CN202011177481 A CN 202011177481A CN 112286990 A CN112286990 A CN 112286990A
Authority
CN
China
Prior art keywords
job
target
execution time
execution
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011177481.5A
Other languages
Chinese (zh)
Inventor
吴恩慈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiyue Information Technology Co Ltd
Original Assignee
Shanghai Qiyue Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiyue Information Technology Co Ltd filed Critical Shanghai Qiyue Information Technology Co Ltd
Priority to CN202011177481.5A priority Critical patent/CN112286990A/en
Publication of CN112286990A publication Critical patent/CN112286990A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a method and a device for predicting platform operation execution time and electronic equipment, wherein the method comprises the following steps: extracting a job data set in a target job, the job data set comprising: feature vector data and measurement index data; acquiring the execution time of each stage of the target operation through a time prediction model; calculating the execution time of the target operation according to the execution time of each stage of the target operation; and automatically and dynamically adjusting and executing the execution plan of the target job according to the predicted execution time of the target job, and acquiring the adjusted execution plan of the target job. The invention can realize accurate prediction of the execution time of the complex operator in the distributed computing platform, can dynamically adjust the operation execution plan, effectively reduces the complexity of the computing process and improves the overall performance of the computing cluster.

Description

Method and device for predicting platform operation execution time and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for predicting platform operation execution time and electronic equipment.
Background
The distributed computing platform can scale the cluster to thousands of nodes with the help of the core engine, and the Catalyst Optimizer provides a rule and cost-based Optimizer to push the computing power of the data warehouse to a new height. However, in the case of a very large-scale data set, there are problems of usability and extensibility, a structured production language or a Dataset program is parsed into a logical plan before being executed, and then an executable physical plan is generated, and different execution plans have great influence on performance.
The theoretical research on the aspect of the operation execution time prediction of the distributed computing platform and the technical practice have the following technical problems:
the execution time of a complex operator in a distributed computing platform cannot be accurately predicted, a method for predicting query execution time in a Hadoop distributed storage system is researched in the prior art, a kernel-typical correlation analysis statistical model is adopted, the query output size and the execution time are correlated, the operation execution time is predicted through the performance of the latest similar query, experiments show that the method has a certain reference value on operators with simple computing processes, and the accuracy of the complex operator is low.
Disclosure of Invention
The invention provides a method and a device for predicting platform operation execution time and electronic equipment, which are used for accurately predicting the execution time of a complex operator in a distributed computing platform, dynamically adjusting an operation execution plan, effectively reducing the complexity of a computing process and improving the overall performance of a computing cluster. .
An embodiment of the present specification provides a method for predicting platform operation execution time, including:
extracting a job data set in a target job, the job data set comprising: feature vector data and measurement index data;
acquiring the execution time of each stage of the target operation through a time prediction model;
calculating the execution time of the target operation according to the execution time of each stage of the target operation;
and automatically and dynamically adjusting and executing the execution plan of the target job according to the predicted execution time of the target job, and acquiring the adjusted execution plan of the target job.
Preferably, the extracting the job data set in the target job includes:
acquiring the operation data set in any mode of an operation scheduling page, an REST interface and an external monitoring tool;
extracting the feature vector data by a listener bus mechanism and the metric data by an indicator system.
Preferably, each phase execution time of the target job is a time interval between a start time and a finish time of each phase, the start time is a start time for starting to execute the task, and the finish time is a finish time for executing the task after left.
Preferably, the execution time of the target job is the sum of the execution times of each stage of the target job.
Preferably, the obtaining of the execution time of each phase of the target job through the time prediction model includes:
compiling the target operation into a directed acyclic graph, wherein the directed acyclic graph runs in dependence on an elastic distributed data set;
ordering the target jobs by a directed acyclic graph scheduler;
and scheduling and executing the target job through a job scheduler to obtain the execution time of each stage of the target job.
Preferably, the automatically and dynamically adjusting and executing the execution plan of the target job includes:
and when the adjusted execution plan of the target job fails, adopting a fault tolerance mechanism or retrying the failed execution plan of the target job.
An embodiment of the present specification further provides a device for predicting a platform operation execution time, including:
a data extraction module that extracts a job data set in a target job, the job data set comprising: feature vector data and measurement index data;
the time prediction module is used for acquiring the execution time of each stage of the target operation through a time prediction model;
the data processing module is used for calculating the execution time of the target operation according to the execution time of each stage of the target operation;
and the job execution module automatically and dynamically adjusts and executes the execution plan of the target job according to the predicted execution time of the target job and acquires the adjusted execution plan of the target job.
Preferably, the extracting the job data set in the target job includes:
acquiring the operation data set in any mode of an operation scheduling page, an REST interface and an external monitoring tool;
extracting the feature vector data by a listener bus mechanism and the metric data by an indicator system.
Preferably, each phase execution time of the target job is a time interval between a start time and a finish time of each phase, the start time is a start time for starting to execute the task, and the finish time is a finish time for executing the task after left.
Preferably, the execution time of the target job is the sum of the execution times of each stage of the target job.
Preferably, the obtaining of the execution time of each phase of the target job through the time prediction model includes:
compiling the target operation into a directed acyclic graph, wherein the directed acyclic graph runs in dependence on an elastic distributed data set;
ordering the target jobs by a directed acyclic graph scheduler;
and scheduling and executing the target job through a job scheduler to obtain the execution time of each stage of the target job.
Preferably, the automatically and dynamically adjusting and executing the execution plan of the target job includes:
and when the adjusted execution plan of the target job fails, adopting a fault tolerance mechanism or retrying the failed execution plan of the target job.
An electronic device, wherein the electronic device comprises:
a processor and a memory storing computer executable instructions that, when executed, cause the processor to perform the method of any of the above.
A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of the above.
The beneficial effects are that:
the invention can realize accurate prediction of the execution time of the complex operator in the distributed computing platform, can dynamically adjust the operation execution plan, effectively reduces the complexity of the computing process and improves the overall performance of the computing cluster.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram illustrating a method for predicting platform operation execution time according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an apparatus for predicting platform job execution time according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a computer-readable medium provided in an embodiment of the present specification.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. The same reference numerals denote the same or similar elements, components, or parts in the drawings, and thus their repetitive description will be omitted.
Features, structures, characteristics or other details described in a particular embodiment do not preclude the fact that the features, structures, characteristics or other details may be combined in a suitable manner in one or more other embodiments in accordance with the technical idea of the invention.
In describing particular embodiments, the present invention has been described with reference to features, structures, characteristics or other details that are within the purview of one skilled in the art to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific features, structures, characteristics, or other details.
The diagrams depicted in the figures are exemplary only, and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The term "and/or" and/or "includes all combinations of any one or more of the associated listed items.
Referring to fig. 1, a schematic diagram of a method for predicting platform job execution time provided in an embodiment of the present disclosure includes:
s101: extracting a job data set in a target job, the job data set comprising: feature vector data and measurement index data;
in the preferred embodiment of the present invention, the job data set is extracted from the target job, wherein the job data set includes feature vector data and metric index data, and in this embodiment, the job data set is obtained through the REST interface, the feature vector data is extracted through the listener bus mechanism, the metric index data is extracted through the index system, and the job data set extraction event is submitted to the corresponding event listener by using the asynchronous thread.
S102: acquiring the execution time of each stage of the target operation through a time prediction model;
in the embodiment of the family, the scheduling program identifies the dependency relationship between the elastic distributed data set and the directed acyclic graph, the application program compiles the target operation into the operation execution plan, and the main basis for dividing the scheduling stage is to determine whether the input of the current calculation factor is determined.
In the embodiment of the family, the directed acyclic graph scheduler sequences the target jobs, the job scheduler schedules the job sets by the directed acyclic graph scheduler and creates a job set manager to be added to the scheduling pool, sequences all the job set managers in the scheduling pool, allocates resources according to a data locality principle, and runs an execution plan on each allocated node. The intermediate result and the final result of the execution plan calculation are stored in a storage system, a job monitor monitors the execution success or failure of each task in the job, the execution condition of the task is reported to a directed acyclic graph scheduler through monitoring events, and a retry and fault tolerance mechanism exists for the failed task. The data locality principle is as follows: if the operation execution plan is in the scheduling stage of operation start, the data locality of the preferred operation position of the corresponding elastic distributed data set partition is Node Local; if the task is in the scheduling stage of the non-job beginning, obtaining a preferred position according to the running position of the father scheduling stage; if the Executor is active, the data locality is ProcessLocal.
S103: calculating the execution time of the target operation according to the execution time of each stage of the target operation;
in the embodiment of the family, the feature vector is extracted by analyzing the job execution plan, the execution time of each stage in the model execution plan is predicted according to the execution time of each stage, specifically, the execution time of a certain stage is the interval between the start time of the earliest task started at the stage and the completion time of the last executed task, as shown in formula (1), FTST represents the start execution time of the first task at a certain stage on the data partition, and LTET represents the end time of the last task at a certain stage. Equation (2) gives how to predict the execution time of the entire job, which has N stages, from the estimated execution time of each stage.
Stage_Time=FTSFi-LTETi (1);
Wherein Stage _ Time is the execution Time of each Stage of the target operation;
Figure BDA0002749115470000071
wherein Job _ Time is the execution Time of the target Job.
Decision trees can fit data using complex nonlinear models for regression analysis by varying the metric of purity. Variance is a measure used to measure the label uniformity at the nodes of the regression model.
As shown in equation (3), the root mean square error RMSE is the square root of the mean square error MSE, the accuracy is further amplified, and a closer to zero indicates a more accurate prediction, wTx (i) is a predicted value of each job execution time, y (i) is an actual value of all job execution times:
Figure BDA0002749115470000072
as shown in formula (4), the average absolute error MAE is the average value of the absolute values of the differences between the predicted value and the actual value, and the MAE avoids the mutual cancellation of positive and negative errors and better reflects the actual situation of the predicted value error.
Figure BDA0002749115470000073
As shown in equation (5), the goodness-of-fit (R-squared Coefficient) is used to evaluate the goodness of fit data of the model and the degree of variation of the measurement target variable, indicating that the variant part of the dependent variable can be explained from the variation of the independent variable. When R is2Approaching 1 indicates that the higher the degree of interpretation of the dependent variable by the independent variable.
Figure BDA0002749115470000074
Wherein R is2Is the goodness of fit.
S104: and automatically and dynamically adjusting and executing the execution plan of the target job according to the predicted execution time of the target job, and acquiring the adjusted execution plan of the target job.
In the preferred embodiment of the invention, the distributed computing platform automatically and dynamically adjusts and executes the execution plan of the target operation according to the predicted execution time of the target operation, obtains the adjusted execution plan of the target operation, and fully utilizes the data obtained during operation of the operation, thereby realizing the dynamic selection of an optimal physical execution plan. For example, based on the Shuffle information during operation of the job, according to the prediction result of the time prediction model, a proper Shuffle Partition value is automatically set for each job, the task parallelism of the data inclination is adjusted, the job execution plan is dynamically adjusted, the complexity of the calculation process of the distributed calculation platform can be effectively reduced, and the overall performance of the calculation cluster is improved.
Further, the extracting the job data set in the target job includes:
acquiring the operation data set in any mode of an operation scheduling page, an REST interface and an external monitoring tool;
extracting the feature vector data by a listener bus mechanism and the metric data by an indicator system.
In a preferred embodiment of the present invention, the job data set may be obtained through any one of a job scheduling page, a REST interface, and an external monitoring tool, and the feature vector data is extracted through a listener bus mechanism and the metric index data is extracted through an index system, so as to complete extraction of the feature data of the target job and predict the execution time of the target job.
Further, each stage execution time of the target job is a time interval between a start time and a finish time of each stage, the start time is a start time for starting to execute the task, and the finish time is a finish time for executing the task after the left.
Specifically, the execution time calculation mode for each stage of the target job is shown in formula (1).
Further, the execution time of the target job is the sum of the execution times of each stage of the target job.
Specifically, the execution time calculation method of the target job is represented by formula (2).
Further, the obtaining of the execution time of each phase of the target job through the time prediction model includes:
compiling the target operation into a directed acyclic graph, wherein the directed acyclic graph runs in dependence on an elastic distributed data set;
ordering the target jobs by a directed acyclic graph scheduler;
and scheduling and executing the target job through a job scheduler to obtain the execution time of each stage of the target job.
In the embodiment of the family, the directed acyclic graph scheduler sorts target jobs, the job scheduler schedules the job sets by the directed acyclic graph scheduler and creates a job set manager to be added to a scheduling pool, sorts all the job set managers in the scheduling pool, allocates resources according to a data locality principle, and runs an execution plan on each allocated node. The intermediate result and the final result of the execution plan calculation are stored in a storage system, a job monitor monitors the execution success or failure of each task in the job, the execution condition of the task is reported to a directed acyclic graph scheduler through monitoring events, and a retry and fault tolerance mechanism exists for the failed task.
Further, the automatically and dynamically adjusting and executing the execution plan of the target job includes:
and when the adjusted execution plan of the target job fails, adopting a fault tolerance mechanism or retrying the failed execution plan of the target job.
Specifically, when the execution plan of the target operation is automatically and dynamically adjusted and executed, the execution plan is stored successfully or fails, and for the failed execution plan, the distributed computing platform adopts a fault-tolerant mechanism or retry, so that the system performance of the distributed computing platform is more complete.
In the preferred embodiment of the present invention, there are Memory application and allocation problems in the task scheduling process, Tungsten is a Memory allocation and release implementation, and the Memory Block data structure similar to the page cache of the operating system is implemented by directly operating the system Memory. The off-heap memory is accurately applied and released, the space occupied by the serialized data is accurately calculated, and the management difficulty and errors are reduced. The data in the memory block is located in the virtual machine heap memory or the off-heap memory, and mainly includes two attributes, obj and offset. The obj attribute of the memory block holds the address of the object in the virtual machine heap, the offset attribute holds the offset of the start address of the page cache relative to the address of the object in the virtual machine heap, and the length attribute holds the size of the page cache. When Tungsten is in an in-heap memory mode, data is stored in a virtual machine heap as an object, and the specific position of the object using offset positioning data is found from the heap; when the memory is in the off-heap memory mode, the data is located from the off-heap memory by the offset attribute, and the fixed-length continuous memory block is obtained from the starting position of the obj and the offset location. If the applied Memory blocks are larger than or equal to 1MB and the Memory blocks with the specified size exist in the Memory Buffer Pools, obtaining the Memory blocks from the Memory cache pool, otherwise, independently establishing the Memory blocks for distribution.
In the preferred embodiment of the invention, the configuration parameters of the time prediction model are optimized through a cross validation algorithm, and the goodness of fit of the time prediction model is optimized through a random forest algorithm and a gradient lifting tree algorithm.
The invention can realize accurate prediction of the execution time of the complex operator in the distributed computing platform, can dynamically adjust the operation execution plan, effectively reduces the complexity of the computing process and improves the overall performance of the computing cluster.
Fig. 2 is a schematic structural diagram of an apparatus for predicting platform job execution time according to an embodiment of the present disclosure, including:
the data extraction module 201 extracts a job data set from a target job, where the job data set includes: feature vector data and measurement index data;
in a preferred embodiment of the present invention, the data extraction module 201 extracts a job data set from a target job, where the job data set includes feature vector data and metric index data, and in this embodiment, the job data set is obtained through a REST interface, the feature vector data is extracted through a listener bus mechanism, the metric index data is extracted through an index system, and an asynchronous thread is adopted to submit an extracted event of the job data set to a corresponding event listener.
The time prediction module 202 is used for acquiring the execution time of each stage of the target operation through a time prediction model;
in the embodiment of the family, the scheduling program identifies the dependency relationship between the elastic distributed data set and the directed acyclic graph, the application program compiles the target operation into the operation execution plan, and the main basis for dividing the scheduling stage is to determine whether the input of the current calculation factor is determined.
In the embodiment of the family, the directed acyclic graph scheduler sequences the target jobs, the job scheduler schedules the job sets by the directed acyclic graph scheduler and creates a job set manager to be added to the scheduling pool, sequences all the job set managers in the scheduling pool, allocates resources according to a data locality principle, and runs an execution plan on each allocated node. The intermediate result and the final result of the execution plan calculation are stored in a storage system, a job monitor monitors the execution success or failure of each task in the job, the execution condition of the task is reported to a directed acyclic graph scheduler through monitoring events, and a retry and fault tolerance mechanism exists for the failed task. The data locality principle is as follows: if the operation execution plan is in the scheduling stage of operation start, the data locality of the preferred operation position of the corresponding elastic distributed data set partition is Node Local; if the task is in the scheduling stage of the non-job beginning, obtaining a preferred position according to the running position of the father scheduling stage; if the Executor is active, the data locality is Process Local.
The data processing module 203 calculates the execution time of the target operation according to the execution time of each stage of the target operation;
in the embodiment of the present invention, the data processing module 203 extracts feature vectors by parsing the job execution plan, temporally predicts the execution time of each stage in the model execution plan, and estimates the execution time of the job according to the execution time of each stage, specifically, the execution time of a certain stage is the interval between the start time of the earliest task started at the stage and the completion time of the last executed task, as shown in formula (1), and formula (2) gives how to predict the execution time of the whole job according to the estimated execution time of each stage, where the job has N stages.
And the job execution module 204 is used for automatically and dynamically adjusting and executing the execution plan of the target job according to the predicted execution time of the target job and acquiring the adjusted execution plan of the target job.
In the preferred embodiment of the present invention, the job execution module 204 automatically and dynamically adjusts and executes the execution plan of the target job according to the predicted execution time of the target job, obtains the adjusted execution plan of the target job, and fully utilizes the data obtained during the job running, thereby dynamically selecting an optimal physical execution plan. For example, based on the Shuffle information during operation of the job, according to the prediction result of the time prediction model, a proper Shuffle Partition value is automatically set for each job, the task parallelism of the data inclination is adjusted, the job execution plan is dynamically adjusted, the complexity of the calculation process of the distributed calculation platform can be effectively reduced, and the overall performance of the calculation cluster is improved.
Further, the extracting the job data set in the target job includes:
acquiring the operation data set in any mode of an operation scheduling page, an REST interface and an external monitoring tool;
extracting the feature vector data by a listener bus mechanism and the metric data by an indicator system.
Further, each stage execution time of the target job is a time interval between a start time and a finish time of each stage, the start time is a start time for starting to execute the task, and the finish time is a finish time for executing the task after the left.
Further, the execution time of the target job is the sum of the execution times of each stage of the target job.
Further, the obtaining of the execution time of each phase of the target job through the time prediction model includes:
compiling the target operation into a directed acyclic graph, wherein the directed acyclic graph runs in dependence on an elastic distributed data set;
ordering the target jobs by a directed acyclic graph scheduler;
and scheduling and executing the target job through a job scheduler to obtain the execution time of each stage of the target job.
Further, the automatically and dynamically adjusting and executing the execution plan of the target job includes:
and when the adjusted execution plan of the target job fails, adopting a fault tolerance mechanism or retrying the failed execution plan of the target job.
The invention can realize accurate prediction of the execution time of the complex operator in the distributed computing platform, can dynamically adjust the operation execution plan, effectively reduces the complexity of the computing process and improves the overall performance of the computing cluster.
Based on the same inventive concept, the embodiment of the specification further provides the electronic equipment.
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as specific physical implementations for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification. An electronic device 300 according to this embodiment of the invention is described below with reference to fig. 3. The electronic device 300 shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 3, electronic device 300 is embodied in the form of a general purpose computing device. The components of electronic device 300 may include, but are not limited to: at least one processing unit 310, at least one memory unit 320, a bus 330 connecting different device components (including the memory unit 320 and the processing unit 310), a display unit 340, and the like.
Wherein the storage unit stores program code executable by the processing unit 310 to cause the processing unit 310 to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned processing method section of the present specification. For example, the processing unit 310 may perform the steps as shown in fig. 1.
The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)3201 and/or a cache storage unit 3202, and may further include a read only memory unit (ROM) 3203.
The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating device, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 330 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 300 may also communicate with one or more external devices 400 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 300, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 300 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 350. Also, the electronic device 300 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 360. Network adapter 360 may communicate with other modules of electronic device 300 via bus 330. It should be appreciated that although not shown in FIG. 3, other hardware and/or software modules may be used in conjunction with electronic device 300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID devices, tape drives, and data backup storage devices, to name a few.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention. The computer program, when executed by a data processing apparatus, enables the computer readable medium to implement the above-described method of the invention, namely: such as the method shown in fig. 1.
Fig. 4 is a schematic diagram of a computer-readable medium provided in an embodiment of the present disclosure.
A computer program implementing the method shown in fig. 1 may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components in embodiments in accordance with the invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP). The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method for predicting platform job execution time, comprising:
extracting a job data set in a target job, the job data set comprising: feature vector data and measurement index data;
acquiring the execution time of each stage of the target operation through a time prediction model;
calculating the execution time of the target operation according to the execution time of each stage of the target operation;
and automatically and dynamically adjusting and executing the execution plan of the target job according to the predicted execution time of the target job, and acquiring the adjusted execution plan of the target job.
2. The method of claim 1, wherein extracting the job data set from the target job comprises:
acquiring the operation data set in any mode of an operation scheduling page, an REST interface and an external monitoring tool;
extracting the feature vector data by a listener bus mechanism and the metric data by an indicator system.
3. The method for predicting the execution time of the platform job according to claim 1 or 2, wherein each stage execution time of the target job is a time interval between a start time and a completion time of each stage, the start time is a start time for starting to execute the task, and the completion time is a completion time for executing the task after the left.
4. The method for predicting platform job execution time according to any one of claims 1-3, wherein the execution time of the target job is the sum of the execution times of each stage of the target job.
5. The method for predicting the execution time of the platform operation according to any one of claims 1 to 4, wherein the obtaining of the execution time of each phase of the target operation through a time prediction model comprises:
compiling the target operation into a directed acyclic graph, wherein the directed acyclic graph runs in dependence on an elastic distributed data set;
ordering the target jobs by a directed acyclic graph scheduler;
and scheduling and executing the target job through a job scheduler to obtain the execution time of each stage of the target job.
6. The method for predicting platform job execution time according to any one of claims 1-5, wherein the automatically and dynamically adjusting and executing the execution plan of the target job comprises:
and when the adjusted execution plan of the target job fails, adopting a fault tolerance mechanism or retrying the failed execution plan of the target job.
7. An apparatus for predicting a platform job execution time, comprising:
a data extraction module that extracts a job data set in a target job, the job data set comprising: feature vector data and measurement index data;
the time prediction module is used for acquiring the execution time of each stage of the target operation through a time prediction model;
the data processing module is used for calculating the execution time of the target operation according to the execution time of each stage of the target operation;
and the job execution module automatically and dynamically adjusts and executes the execution plan of the target job according to the predicted execution time of the target job and acquires the adjusted execution plan of the target job.
8. The apparatus for predicting platform job execution time according to claim 7, wherein the obtaining of the target job execution time for each phase through the time prediction model comprises:
compiling the target operation into a directed acyclic graph, wherein the directed acyclic graph runs in dependence on an elastic distributed data set;
ordering the target jobs by a directed acyclic graph scheduler;
and scheduling and executing the target job through a job scheduler to obtain the execution time of each stage of the target job.
9. An electronic device, wherein the electronic device comprises:
a processor and a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-6.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-6.
CN202011177481.5A 2020-10-29 2020-10-29 Method and device for predicting platform operation execution time and electronic equipment Pending CN112286990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011177481.5A CN112286990A (en) 2020-10-29 2020-10-29 Method and device for predicting platform operation execution time and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011177481.5A CN112286990A (en) 2020-10-29 2020-10-29 Method and device for predicting platform operation execution time and electronic equipment

Publications (1)

Publication Number Publication Date
CN112286990A true CN112286990A (en) 2021-01-29

Family

ID=74373744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011177481.5A Pending CN112286990A (en) 2020-10-29 2020-10-29 Method and device for predicting platform operation execution time and electronic equipment

Country Status (1)

Country Link
CN (1) CN112286990A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661444A (en) * 2022-03-30 2022-06-24 阿里巴巴(中国)有限公司 Scheduling method, first computing node, second computing node and scheduling system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472889A (en) * 2019-08-22 2019-11-19 泰康保险集团股份有限公司 Resource allocation method, device for allocating resources, storage medium and electronic equipment
CN111126668A (en) * 2019-11-28 2020-05-08 中国人民解放军国防科技大学 Spark operation time prediction method and device based on graph convolution network
CN111815200A (en) * 2020-07-31 2020-10-23 深圳壹账通智能科技有限公司 Task scheduling method and device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472889A (en) * 2019-08-22 2019-11-19 泰康保险集团股份有限公司 Resource allocation method, device for allocating resources, storage medium and electronic equipment
CN111126668A (en) * 2019-11-28 2020-05-08 中国人民解放军国防科技大学 Spark operation time prediction method and device based on graph convolution network
CN111815200A (en) * 2020-07-31 2020-10-23 深圳壹账通智能科技有限公司 Task scheduling method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661444A (en) * 2022-03-30 2022-06-24 阿里巴巴(中国)有限公司 Scheduling method, first computing node, second computing node and scheduling system

Similar Documents

Publication Publication Date Title
US9715408B2 (en) Data-aware workload scheduling and execution in heterogeneous environments
US9971570B2 (en) Automated generation of memory consumption aware code
KR101106595B1 (en) Method and apparatus for automated testing for software program
CN109614227B (en) Task resource allocation method and device, electronic equipment and computer readable medium
JP2018026114A (en) Application profiling job management system, program, and method
CN112287603A (en) Prediction model construction method and device based on machine learning and electronic equipment
US20210152424A1 (en) Selecting and using a cloud-based hardware accelerator
US10963297B2 (en) Computational resource management device, computational resource management method, and computer-readable recording medium
CN115373835A (en) Task resource adjusting method and device for Flink cluster and electronic equipment
Song et al. Start late or finish early: A distributed graph processing system with redundancy reduction
CN108647137B (en) Operation performance prediction method, device, medium, equipment and system
US10977098B2 (en) Automatically deploying hardware accelerators based on requests from users
CN113378007B (en) Data backtracking method and device, computer readable storage medium and electronic device
Nagavaram et al. A cloud-based dynamic workflow for mass spectrometry data analysis
CN113886111B (en) Workflow-based data analysis model calculation engine system and operation method
CN113297057A (en) Memory analysis method, device and system
CN112286990A (en) Method and device for predicting platform operation execution time and electronic equipment
JP2014123249A (en) Information processor, program, and information processing method
CN110689137B (en) Parameter determination method, system, medium, and electronic device
CN113141407B (en) Page resource loading method and device and electronic equipment
Hao et al. Torchbench: Benchmarking pytorch with high api surface coverage
CN113296907B (en) Task scheduling processing method, system and computer equipment based on clusters
Youssef et al. SparkLeBLAST: Scalable parallelization of blast sequence alignment using spark
Vysocký et al. Application instrumentation for performance analysis and tuning with focus on energy efficiency
Choudhury et al. Accelerating comparative genomics workflows in a distributed environment with optimized data partitioning and workflow fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210129

RJ01 Rejection of invention patent application after publication