CN106533730B

CN106533730B - Hadoop cluster component index acquisition method and device

Info

Publication number: CN106533730B
Application number: CN201510585652.0A
Authority: CN
Inventors: 陈建伟
Original assignee: Nanjing ZTE New Software Co Ltd
Current assignee: Nanjing ZTE New Software Co Ltd
Priority date: 2015-09-15
Filing date: 2015-09-15
Publication date: 2020-07-31
Anticipated expiration: 2035-09-15
Also published as: CN106533730A; WO2016165471A1

Abstract

The invention provides a method and a device for acquiring indexes of a Hadoop cluster component, wherein the method comprises the following steps: obtaining index values of a first node in the Hadoop cluster component at a plurality of moments before a specified moment; obtaining a first index predicted value of the first node at a designated time through index values at a plurality of times according to a first preset rule; acquiring a plurality of load parameters of a first node and a deviation value between a first index predicted value and an actual acquisition index value at a specified time; carrying out weighted calculation on the multiple load parameters and the deviation values according to the component type of the first node, and obtaining acquisition time for acquiring Hadoop cluster component indexes according to a second preset rule through the result of the weighted calculation; and when the acquisition time is up, acquiring Hadoop cluster component indexes and reporting the Hadoop cluster component indexes to the management system.

Description

Hadoop cluster component index acquisition method and device

Technical Field

The invention relates to the field of communication, in particular to a method and a device for acquiring indexes of a Hadoop cluster component.

Background

A big data management system (hereinafter referred to as big data management system) based on a Hadoop ecosystem is one of key systems of a big data system. The management of resources, such as cluster creation, HBase distributed database, HDFS file system, MR program resources, node resources, installation resources, user states and the like, can be realized through the big data management system; the relationship management can be realized through a big data management system, such as the relationship between a user and a distributed file directory, the relationship between MR program resources and node resources, the relationship between installation resources and node resources and the like; the big data management system can realize the management of behaviors and the monitoring of component resources and performance indexes, such as the remote installation of nodes of a DataNode of HDFS service, the starting and stopping of service, the Yarn running, the monitoring of operation states and the like.

The monitoring of the states of operation, operation and the like of a Hadoop cluster in the big data management system is an important guarantee for the normal operation of the big data management system. The monitoring of the components in the cluster mainly relates to a performance index acquisition technology, and at present, the performance indexes of a network management system mainly comprise two common acquisition methods of passive acquisition and active polling acquisition.

The active polling acquisition algorithm in the related art mainly comprises the following steps: although the equal-time-interval polling periodic acquisition algorithm is simple to operate and convenient to implement, the polling interval is difficult to determine, too long, the instantaneity cannot be guaranteed, and too short, so that the load of a node host in a cluster and the pressure for warehousing and summarizing data are increased, and even the acquired data are lost; the algorithm based on historical polling round-trip delay mainly determines the next polling interval according to the historical polling request response time, the value is related to the network bandwidth state, although the polling time can be simply and dynamically adjusted, the network state information changing along with time has great influence on the network performance and fault management, and the algorithm cannot dynamically show the details of data change. With the advent of technologies such as artificial intelligence and neural networks, a polling algorithm is also improved, a polling strategy based on a unitary linear regression and other single prediction algorithms is developed, data details can be described and simple intelligent adjustment can be realized by comparing a predicted value of collected historical data with an actual value deviation and a threshold and dynamically adjusting polling time, but the single prediction algorithm has the problems of application scene limitation, accuracy and the like, for example: the unary linear regression algorithm is not ideal in performance of data with large fluctuation, and the calculation type component index value like Yarn has large fluctuation and large calculation error. In addition, in the index collection strategy in the related art, the load condition of the nodes in the cluster and the Hadoop component characteristics are ignored, and even the condition that the component index collection task cannot be completed can occur under the condition of node overload, so that the normal operation of the whole big data management system is influenced.

Aiming at the problem of poor performance caused by neglecting node load and Hadoop assembly characteristics in a cluster in an index collection polling strategy in the related technology, an effective solution does not exist at present.

Disclosure of Invention

The invention provides a method and a device for acquiring indexes of a Hadoop cluster component, which are used for at least solving the problem of poor performance caused by neglecting node loads in a cluster and Hadoop component characteristics in an index acquisition polling strategy in the related technology.

According to one aspect of the invention, a method for acquiring indexes of a Hadoop cluster component is provided, which comprises the following steps: obtaining index values of a first node in the Hadoop cluster component at a plurality of moments before a specified moment; obtaining a first index predicted value of the first node at the appointed time through the index values of the plurality of times according to a first preset rule; acquiring a plurality of load parameters of the first node and a deviation value between the first index predicted value and an actual acquisition index value at the specified time; carrying out weighted calculation on the plurality of load parameters and the deviation value according to the component type of the first node, and obtaining acquisition time for acquiring indexes of the Hadoop cluster component according to a second preset rule through a result of weighted calculation; and when the acquisition time is up, acquiring the Hadoop cluster component index and reporting the Hadoop cluster component index to a management system.

Further, obtaining a first index prediction value of the first node at the specified time through the index values at the plurality of times according to a first predetermined rule comprises: obtaining an index preliminary predicted value of the specified time through the index values of the multiple times according to a gray model; obtaining an index preliminary prediction deviation value at the appointed time through the index preliminary prediction value and the actual acquisition value at the appointed time according to a Markov chain model; and calculating the first index predicted value according to the index preliminary prediction deviation value and the index preliminary predicted value.

Further, obtaining the acquisition time of the index of the Hadoop cluster component according to a second predetermined rule comprises: comparing the result obtained by the weighted calculation with a preset threshold value; and obtaining the polling delay time of the appointed time according to the comparison result, and taking the sum of the polling delay time and the preset polling cycle time as the acquisition time for acquiring the index value of the Hadoop cluster component.

Further, the load parameter of the first node comprises: CPU utilization rate, memory utilization rate, bandwidth utilization rate and transmission delay.

Further, the component types of the first node include: computational and memory types.

According to another aspect of the present invention, there is provided an apparatus for acquiring indexes of a Hadoop cluster component, including: the first acquisition module is used for acquiring index values of a first node in the Hadoop cluster component at a plurality of moments before a specified moment; the first processing module is used for obtaining a first index predicted value of the first node at the appointed time through the index values of the plurality of times according to a first preset rule; the second obtaining module is used for obtaining a plurality of load parameters of the first node and a deviation value between the first index predicted value and the actual acquisition index value at the specified time; the second processing module is used for performing weighted calculation on the plurality of load parameters and the deviation value according to the component type of the first node, and obtaining acquisition time for acquiring indexes of the Hadoop cluster component according to a second preset rule through a result of the weighted calculation; and the acquisition module is used for acquiring the Hadoop cluster component index and reporting the Hadoop cluster component index to a management system when the acquisition time is up.

Further, the first processing module comprises: the first processing unit is used for obtaining an index preliminary prediction value of the specified time through the index values of the multiple times according to a gray scale model; the second processing unit is used for obtaining an index preliminary prediction deviation value at the appointed time through the index preliminary prediction value and the actual acquisition value at the appointed time according to a Markov chain model; and the calculating unit is used for calculating the first index predicted value according to the index preliminary prediction deviation value and the index preliminary predicted value.

Further, the second processing module comprises: the comparison unit is used for comparing the result obtained by the weighting calculation with a preset threshold value; and the third processing unit is used for obtaining the polling delay time of the specified time according to the comparison result, and taking the sum of the polling delay time and the preset polling cycle time as the acquisition time for acquiring the index value of the Hadoop cluster component.

According to the method, a first index predicted value of a first node at a specified time is obtained by using index values of the first node at a plurality of times before the specified time in an acquired Hadoop cluster component through a first preset rule, a deviation value between the first index predicted value and an actual acquisition index value at the specified time and a plurality of load parameters of the first node are weighted and calculated according to the component type of the first node, the acquisition time for acquiring the indexes of the Hadoop cluster component is obtained by using the result of the weighted calculation through a second preset rule, and when the acquisition time is up, the indexes of the Hadoop cluster component are acquired and reported to a management system; therefore, the collection time of the indexes of the assemblies in the cluster is dynamically adjusted by combining the loads of the host computers of the nodes and the types of the assemblies to be collected, so that the problem of poor performance caused by neglecting the characteristics of the loads of the nodes and the Hadoop assemblies in the cluster in the index collection polling strategy in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method for collecting Hadoop cluster component indicators according to an embodiment of the invention;

FIG. 2 is a block diagram of a Hadoop cluster component index acquisition device according to an embodiment of the present invention;

FIG. 3 is a first block diagram of an alternative structure of a Hadoop cluster component index acquisition device according to an embodiment of the present invention;

FIG. 4 is a block diagram of an alternative structure of a Hadoop cluster component index acquisition device according to an embodiment of the present invention;

FIG. 5 is a block diagram of a Hadoop component index polling acquisition module according to an alternative embodiment of the invention;

FIG. 6 is a flow diagram of Hadoop component index polling delay time calculation according to an alternative embodiment of the invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

In this embodiment, a method for acquiring an index of a Hadoop cluster component is provided, and fig. 1 is a flowchart of the method for acquiring an index of a Hadoop cluster component according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:

step S102: obtaining index values of a first node in the Hadoop cluster component at a plurality of moments before a specified moment;

step S104: obtaining a first index predicted value of the first node at a designated time through index values at a plurality of times according to a first preset rule;

step S106: acquiring a plurality of load parameters of a first node and a deviation value between a first index predicted value and an actual acquisition index value at a specified time;

step S108: carrying out weighted calculation on the multiple load parameters and the deviation values according to the component type of the first node, and obtaining acquisition time for acquiring Hadoop cluster component indexes according to a second preset rule;

step S110: and when the acquisition time is up, acquiring Hadoop cluster component indexes and reporting the Hadoop cluster component indexes to the management system.

Through the steps S102 to S110, obtaining a first index predicted value of a first node at a specified time by using index values of the first node at a plurality of times before the specified time in the acquired Hadoop cluster component through a first preset rule, performing weighted calculation on a deviation value between the first index predicted value and an actual acquisition index value at the specified time and a plurality of load parameters of the first node according to the component type of the first node, obtaining acquisition time for acquiring the Hadoop cluster component index by using a result of the weighted calculation through a second preset rule, and acquiring the Hadoop cluster component index and reporting the Hadoop cluster component index to a management system when the acquisition time is reached; therefore, in the embodiment, the collection time of the index of the assembly in the cluster is dynamically adjusted by combining the load of the host of the node and the type of the assembly to be collected, so that the problem of poor performance caused by neglecting the characteristics of the node load and the Hadoop assembly in the cluster in the index collection polling strategy in the related technology is solved.

Regarding the manner in which the step S104 in this embodiment obtains the first index predicted value of the first node at the specified time through the index values at multiple times according to the first predetermined rule, in an optional implementation manner of this embodiment, the following is implemented:

step S11: obtaining an index preliminary predicted value of a specified time through index values of a plurality of times according to the gray model;

step S12: obtaining an index preliminary prediction deviation value at a specified moment through an index preliminary prediction value and an actual acquisition value at the specified moment according to the Markov chain model;

step S13: and calculating a first index predicted value according to the index preliminary prediction deviation value and the index preliminary predicted value.

As can be seen from the foregoing steps S11 to S13, the first index prediction value at the specified time is obtained by using the grayscale model and the markov chain model, and the grayscale model and the markov chain model can be applicable to more scenes, so that the obtained index prediction value is more accurate, that is, by using the scheme of this embodiment, the problem of accuracy of different scenes caused by using a single prediction algorithm in the index collection polling policy in the related art can be solved.

For the weighted calculation of the multiple load parameters and the bias value θ according to the component type of the first node in step S108 of this embodiment, in an optional implementation manner of this embodiment, the load parameter of the first node may be: CPU utilization rate, memory utilization rate, bandwidth utilization rate and transmission delay; it should be noted that the load parameters mentioned in the present embodiment are not limited to the above parameters, and the above load parameters are preferred load parameters of the present embodiment. Further, the component types of the first node include: computational and memory types.

For the above manner involved in step S108 in this embodiment, an application scenario of this embodiment may be that the load condition of the first node host in the large data cluster, such as CPU utilization α, memory utilization β, bandwidth utilization χ, and transmission delay t, is obtained at a specified time, weights of the memory utilization and the transmission delay need to be reduced according to the characteristics of the storage type component in the component type, and different weighting coefficients are given to calculate f ═ a · θ + b · α + c · β + d · χ + k · t in a weighted manner.

As for the manner of obtaining the acquisition time of the Hadoop cluster component index according to the second predetermined rule in step S108 in this embodiment, in an optional implementation manner of this embodiment, the following manner may be implemented:

step S21: comparing the result obtained by the weighted calculation with a preset threshold value;

step S22: and obtaining the polling delay time of the appointed time according to the comparison result, and taking the sum of the polling delay time and the preset polling cycle time as the acquisition time for acquiring the index value of the Hadoop cluster component.

The polling delay time may be empirically set to 2s to 5 s.

In the above steps S21 and S22, the threshold value and the polling delay time are preset to make the acquisition time algorithm more comprehensive, so as to reasonably avoid many problems caused by the fixed period, that is, to solve the real-time and efficiency problems of the fixed periodic acquisition of performance data in the index acquisition of the big data management system component in the related art.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The embodiment further provides a device for acquiring indexes of a Hadoop cluster component, and the device is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 2 is a block diagram of a structure of an apparatus for acquiring indexes of a Hadoop cluster component according to an embodiment of the present invention, and as shown in fig. 2, the apparatus includes: a first obtaining module 202, configured to obtain index values of a first node in the Hadoop cluster component at multiple times before a specified time; the first processing module 204 is coupled to the first obtaining module 202, and configured to obtain a first index predicted value of the first node at a specified time according to a first predetermined rule and through index values at multiple times; a second obtaining module 206, coupled to the first processing module 204, configured to obtain a plurality of load parameters of the first node and an offset value between the first index prediction value and an actual acquisition index value at a specified time; the second processing module 208 is coupled to the second obtaining module 206, and configured to perform weighted calculation on the multiple load parameters and the deviation values according to the component type of the first node, and obtain, according to a second predetermined rule, acquisition time for acquiring indexes of the Hadoop cluster component through a result of the weighted calculation; and the acquisition module 210 is coupled to the second processing module 208, and configured to acquire the index of the Hadoop cluster component and report the index of the Hadoop cluster component to the management system when the acquisition time is reached.

Fig. 3 is a block diagram of an alternative structure of a device for acquiring indexes of a Hadoop cluster component according to an embodiment of the present invention, as shown in fig. 3, the first processing module 204 includes: a first processing unit 302, configured to obtain an index preliminary prediction value at a specified time by using index values at multiple times according to the grayscale model; the second processing unit 304 is coupled with the first processing unit 302 and is used for obtaining an index preliminary prediction deviation value at a specified time through an index preliminary prediction value and an actual acquisition value at the specified time according to the Markov chain model; the calculating unit 306 is coupled to the second processing unit 304, and configured to calculate a first indicator predicted value according to the indicator preliminary prediction deviation value and the indicator preliminary predicted value.

Fig. 4 is a block diagram of an alternative structure of the apparatus for acquiring indexes of a Hadoop cluster component according to an embodiment of the present invention, and as shown in fig. 4, the second processing module 208 includes: a comparing unit 402, configured to compare a result obtained by the weighting calculation with a preset threshold; and the third processing unit 404 is coupled to the comparing unit 402, and configured to obtain a polling delay time at a specified time according to a comparison result, and use the sum of the polling delay time and a preset polling cycle time as a collection time for collecting an index value of the Hadoop cluster component.

Optionally, the load parameter of the first node includes: CPU utilization rate, memory utilization rate, bandwidth utilization rate and transmission delay. The component types of the first node include: computational and memory types.

The invention is illustrated below by means of alternative embodiments of the invention;

the optional embodiment provides a Hadoop cluster component index polling acquisition module, fig. 5 is a block diagram of a structure of the Hadoop cluster component index polling acquisition module according to the optional embodiment of the present invention, and as shown in fig. 5, the apparatus includes:

the master node summarizing module is used for receiving index data acquired by polling of each node host in the cluster in a cluster master node Manager, summarizing and persisting the index data;

and the index acquisition module is used for executing the tasks of acquiring and sending indexes on each node of the cluster.

And the index prediction module is used for predicting a new index value according to the historical actual index of the component by combining a gray model suitable for trend prediction and a Markov chain model suitable for large fluctuation prediction on each node of the cluster.

And the load acquisition module is used for acquiring the current load condition of the node host on each node of the cluster.

And the polling calculation module is used for calculating the delay time required by the execution of the index acquisition task on each node of the cluster.

When the node executes the component index collection task, firstly, the index prediction module calculates the index prediction value, meanwhile, the load collection module collects the current node load, the polling calculation module calculates the delay time required by the current execution task in a weighting mode according to the latest historical index prediction deviation, the node load and the component type, and the delay time is added with the set minimum polling collection period to obtain the time for executing the index collection and sending the task. And after the time is reached, the index acquisition module acquires the indexes of the components and sends the indexes to the main node summarizing module to finish one-time index acquisition and sequentially perform the subsequent index acquisition tasks.

In addition, based on the Hadoop cluster component index polling acquisition module, this optional embodiment further provides a Hadoop cluster component index polling acquisition method, which includes the steps of:

step S1, initialization;

and setting a minimum polling period for component index acquisition according to a fixed period polling acquisition strategy, and starting an index acquisition task.

Step S2, according to the module history index value prediction;

firstly, according to historical index values of the components, an index preliminary predicted value obtained by using a gray GM (1, 1) model is used for calculating the deviation percentage of the predicted value and an acquired actual value. And then, according to the initial prediction deviation value of the historical index, calculating to obtain an initial prediction deviation value by using a Markov chain model, and correcting the latest initial index prediction result in the step 1 according to the initial prediction deviation value to obtain a final index prediction value.

Step S3, combining predicted value deviation, component type characteristics and load weighting calculation;

calculating the deviation percentage of the final predicted value and the actual acquired value, acquiring the current load condition (including CPU utilization rate, memory utilization rate, bandwidth utilization rate and transmission delay) of the node host in the big data cluster, giving different weighting coefficients according to the calculation type and storage type component types, and performing weighting calculation.

Step S4, comparing the threshold values to obtain the current polling time;

and comparing the weighted calculation result with a set threshold value, obtaining the polling delay time corresponding to the current index acquisition according to the comparison result, and adding a set minimum fixed polling period to obtain the current index acquisition task execution time.

Step S5, collecting the performance index value of the component;

and after the time for executing the acquisition index task is reached, acquiring the performance index value of the component, and sending and summarizing the performance index value to the big data management system.

According to the Hadoop cluster component index polling acquisition method and device in the optional embodiment, the node host load in the cluster in the big data management system and the type of the Hadoop component to be acquired are combined, the component index acquisition time in the Hadoop cluster is dynamically adjusted, the calculation algorithm is more comprehensive, and various problems caused by a fixed period are reasonably avoided.

An alternative embodiment of the present invention is described below with reference to fig. 6 and a sample collection of an index of the file system utilization capacity of the HDFS and a specific embodiment, where fig. 6 is a flowchart of calculating index polling delay time of a Hadoop component according to the alternative embodiment of the present invention, and as shown in fig. 6, the method includes the steps of:

step S601: acquiring historical index data of the component;

step S602: acquiring a gray model prediction index;

step S603: obtaining a Markov chain model to correct the predicted value; then, step S606 is executed;

step S604: acquiring a component type; then, step S606 is executed;

step S605: acquiring the node load condition; then, step S606 is executed;

step S606: weighted addition and threshold comparison;

step S607: polling index acquisition delay time.

Steps S601 to S607 in fig. 6 will be described in detail below in conjunction with the following steps.

The method comprises the following steps:

step S41: the HDFS file system capacity index (hereinafter, referred to as HDFS capacity index) collects a task, and sets a minimum fixed polling cycle time T1.

Step S42: according to HDFS use capacity history index value X_n-1＝{x₁,,x₂,...,x_n-1And (4) calculating to obtain a preliminary predicted value X of the use capacity at the moment n by using a classical gray model GM (1, 1)_n'；

Step S43: calculating the deviation Y between the primary predicted value and the actual acquired value of the HDFS through the gray model at each acquisition time of the history of the use capacity of the HDFS_n-1＝X_n-1-X_n-1'/X_n-1；

Step S44: according to the historical use capacity predicted value and the actual collection deviation value Y_n-1Calculating the deviation predicted value y at the time n by using a Markov chain model_n；

Step S45: according to the primary predicted value X of the HDFS use capacity_n' prediction deviation from time n predicted value y_nCalculating the final capacity predicted value x at n time_n；

Step S46: calculating the deviation z between the last time, namely n-1 time, the use capacity value predicted by the gray model and the Markov chain model and actually acquired_n-1＝x_n-1-x_n-1’/x_n-1；

Step S47, acquiring the load condition (CPU utilization rate α, memory utilization rate β, bandwidth utilization rate χ and transmission delay t) of the changed node host in the big data cluster at the moment n, weighting and calculating f ═ a · z ═ weight which needs to reduce the memory utilization rate and the transmission delay according to the characteristics of the HDFS storage type component_n-1+b·α+c·β+d·t+k；

Step S48: comparing the calculated f value with each preset threshold value to obtain polling delay time T at the moment n;

step S49: and after T1+ T time, acquiring the HDFS use capacity index value, summarizing the HDFS use capacity index value into a big data management system, and finishing big data index acquisition.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in a plurality of processors.

The embodiment of the invention also provides a storage medium. Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:

step S1: acquiring index values of a first node in a cluster component at a plurality of moments before a specified moment;

step S2: obtaining a first index predicted value of the first node at a designated time through index values at a plurality of times according to a first preset rule;

step S3: acquiring a plurality of load parameters of a first node and a deviation value between a first index predicted value and an actual acquisition index value at a specified time;

step S4: carrying out weighted calculation on the plurality of load parameters and the deviation values according to the component type of the first node, and obtaining acquisition time for acquiring the cluster component indexes according to a second preset rule;

step S5: and when the acquisition time is up, acquiring the cluster component indexes and reporting the cluster component indexes to the management system.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for acquiring indexes of a Hadoop cluster component is characterized by comprising the following steps:

obtaining index values of a first node in the Hadoop cluster component at a plurality of moments before a specified moment;

obtaining a first index predicted value of the first node at the appointed time through the index values of the plurality of times according to a first preset rule;

acquiring a plurality of load parameters of the first node and a deviation value between the first index predicted value and an actual acquisition index value at the specified time;

carrying out weighted calculation on the plurality of load parameters and the deviation value according to the component type of the first node, and obtaining acquisition time for acquiring indexes of the Hadoop cluster component according to a second preset rule through a result of weighted calculation;

and when the acquisition time is up, acquiring the Hadoop cluster component index and reporting the Hadoop cluster component index to a management system.

2. The method according to claim 1, wherein obtaining the first index prediction value of the first node at the specified time by the index values at the plurality of times according to a first predetermined rule comprises:

obtaining an index preliminary predicted value of the specified time through the index values of the multiple times according to a gray model;

obtaining an index preliminary prediction deviation value at the appointed time through the index preliminary prediction value and the actual acquisition value at the appointed time according to a Markov chain model;

and calculating the first index predicted value according to the index preliminary prediction deviation value and the index preliminary predicted value.

3. The method of claim 1, wherein obtaining the acquisition time of the Hadoop cluster component indicator according to a second predetermined rule comprises:

comparing the result obtained by the weighted calculation with a preset threshold value;

and obtaining the polling delay time of the appointed time according to the comparison result, and taking the sum of the polling delay time and the preset polling cycle time as the acquisition time for acquiring the index value of the Hadoop cluster component.

4. The method of claim 1, wherein the load parameter of the first node comprises: CPU utilization rate, memory utilization rate, bandwidth utilization rate and transmission delay.

5. The method of claim 1, wherein the component type of the first node comprises: computational and memory types.

6. The utility model provides a collection system of Hadoop cluster subassembly index which characterized in that includes:

the first acquisition module is used for acquiring index values of a first node in the Hadoop cluster component at a plurality of moments before a specified moment;

the first processing module is used for obtaining a first index predicted value of the first node at the appointed time through the index values of the plurality of times according to a first preset rule;

the second obtaining module is used for obtaining a plurality of load parameters of the first node and a deviation value between the first index predicted value and the actual acquisition index value at the specified time;

the second processing module is used for performing weighted calculation on the plurality of load parameters and the deviation value according to the component type of the first node, and obtaining acquisition time for acquiring indexes of the Hadoop cluster component according to a second preset rule through a result of the weighted calculation;

and the acquisition module is used for acquiring the Hadoop cluster component index and reporting the Hadoop cluster component index to a management system when the acquisition time is up.

7. The apparatus of claim 6, wherein the first processing module comprises:

the first processing unit is used for obtaining an index preliminary prediction value of the specified time through the index values of the multiple times according to a gray scale model;

the second processing unit is used for obtaining an index preliminary prediction deviation value at the appointed time through the index preliminary prediction value and the actual acquisition value at the appointed time according to a Markov chain model;

and the calculating unit is used for calculating the first index predicted value according to the index preliminary prediction deviation value and the index preliminary predicted value.

8. The apparatus of claim 6, wherein the second processing module comprises:

the comparison unit is used for comparing the result obtained by the weighting calculation with a preset threshold value;

and the third processing unit is used for obtaining the polling delay time of the specified time according to the comparison result, and taking the sum of the polling delay time and the preset polling cycle time as the acquisition time for acquiring the index value of the Hadoop cluster component.

9. The apparatus of claim 6, wherein the load parameter of the first node comprises: CPU utilization rate, memory utilization rate, bandwidth utilization rate and transmission delay.

10. The apparatus of claim 6, wherein the component type of the first node comprises: computational and memory types.