CN110795301A

CN110795301A - Job monitoring method, device, terminal and computer storage medium

Info

Publication number: CN110795301A
Application number: CN201810861581.6A
Authority: CN
Inventors: 翁泽梁; 伍应标; 王能
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2020-02-14

Abstract

The application discloses an operation monitoring method, an operation monitoring device, a terminal and a computer storage medium, wherein the operation monitoring method comprises the following steps: judging whether the operation occupying the memory resources exceeding a first threshold exists at present; wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources; and if the operation occupying the memory resources exceeding the first threshold value exists, sending the operation information of the operation to a monitoring object. By the method, the operation which is possibly abnormal can be in an early warning state in advance, and the probability of occurrence of major accidents is effectively reduced. In addition, by the mode, the operation is monitored in real time on the premise of low labor cost, and the stability and sustainability of operation monitoring are guaranteed.

Description

Job monitoring method, device, terminal and computer storage medium

Technical Field

The present application relates to the field of big data monitoring, and in particular, to a method, an apparatus, a terminal, and a computer storage medium for job monitoring.

Background

With the continuous development of big data and related technologies, the traditional data analysis is new and new, so that the analysis of large-scale data becomes possible. With the proliferation of data volume, the integration of data and the hardware resources required for computation face unprecedented challenges.

At present, two main ways for solving the problem of insufficient hardware resources exist, the first way is to add hardware resources, but the cost is high by adding hardware, the approval process of a general enterprise is long, and the problem of insufficient hardware resources cannot be solved in time. The second way is to manually detect the abnormality of the operation being executed, which not only has high labor cost, but also can be detected after the abnormality occurs, thus causing serious accidents and causing delay of other operations.

Disclosure of Invention

The technical problem mainly solved by the application is to provide an operation monitoring method, an operation monitoring device, a terminal and a computer storage medium, which can realize early warning of operation abnormity on the premise of not increasing hardware resources.

In order to solve the above technical problem, the first technical solution adopted by the present application is: provided is an operation monitoring method including: judging whether the operation occupying the memory resources exceeding a first threshold exists at present; wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources;

and if the operation occupying the memory resources exceeding the first threshold exists, sending the operation information of the operation to the monitoring object.

Wherein, if there is a job occupying the memory resource and exceeding the first threshold, the step of sending the job information of the job to the monitoring object includes:

if the operation occupying the memory resources exceeds the first threshold value, judging whether the duration time of the operation exceeding the first threshold value exceeds the threshold value time;

and if the duration time exceeds the threshold time, sending the job information of the job to the monitored object.

Wherein, the step of sending the job information of the job to the monitoring object includes:

judging whether the format of the operation information is the same as a preset format or not;

if the format of the operation information is different from the preset format, converting the format of the operation information into the preset format;

and sending the job information after the format conversion to a monitoring object.

The preset format comprises a table or a text format.

The operation information includes the name of the operation, the amount of occupied memory resources, and the current execution state.

Wherein the first threshold is an average value of memory resources occupied by the history operation.

The step of judging whether the current operation of occupying the memory resources exceeds the first threshold specifically comprises:

judging whether the number of the occupied maps is larger than a first threshold value or not;

if the operation occupying the memory resources exceeding the first threshold exists, the step of sending the operation information of the operation to the monitoring object comprises the following steps:

and if the number of occupied maps is larger than the first threshold value, sending the operation information of the operation to the monitored object.

In order to solve the above technical problem, the second technical solution adopted by the present application is: provides an operation monitoring device, which comprises a judging module and a sending module,

the judging module is used for judging whether the operation occupying the memory resources exceeds a first threshold value currently exists; wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources;

the sending module is used for sending the operation information of the operation to the monitoring object when the operation of which the memory resource exceeds the first threshold exists.

The judging module is specifically used for judging whether the duration time of the operation exceeding a first threshold exceeds threshold time when the operation occupying the memory resources exceeds the first threshold exists; the sending module is specifically configured to send job information of the job to the monitored object when the duration exceeds the threshold time.

The judging module is also used for judging whether the format of the operation information is the same as the preset format; if the format of the operation information is different from the preset format, converting the format of the operation information into the preset format; the sending module is further used for sending the job information after the format conversion to the monitoring object.

The preset format comprises a table or a text format.

The judging module is specifically used for judging whether the operation with the number of the maps larger than a first threshold exists at present; the sending module is used for sending the job information of the job to the monitored object when the number of the occupied maps is larger than a first threshold value.

In order to solve the above technical problem, the third technical solution adopted by the present application is: provided is an operation monitoring terminal including: the processor and the communication circuit are coupled with each other, and the processor is matched with the communication circuit to realize any one of the operation monitoring methods when working.

In order to solve the above technical problem, a fourth technical solution adopted by the present application is: there is provided a computer storage medium having stored thereon program data which, when executed by a processor, implements the job monitoring method of any one of the above.

Compared with the prior art, the beneficial effects of this application are: in the method, whether the operation occupying the memory resources exceeds a first threshold value is judged at present, the first threshold value is smaller than a maximum threshold value of the total amount of the memory resources, and when the operation is detected to exist, the operation information of the operation is sent to a monitoring object. The method can enter the early warning state in advance for the operation which is possibly abnormal, avoids finding the operation after the operation fails to be executed or other operations are influenced to be executed to cause major accidents, and can effectively reduce the probability of the major accidents. In addition, by the mode, the operation is monitored in real time on the premise of low labor cost, and the stability and sustainability of operation monitoring are guaranteed.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a job monitoring method of the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a job monitoring method of the present application;

FIG. 3 is a detailed flowchart of an embodiment of the present application of sending job information of a job to a monitored object;

FIG. 4 is a schematic structural diagram of an embodiment of the operation monitoring device of the present application;

FIG. 5 is a schematic structural diagram of an embodiment of a job monitoring terminal according to the present application;

FIG. 6 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, fig. 1 is a schematic flow chart of an embodiment of the job monitoring method of the present application. The operation monitoring method of the implementation method comprises the following steps:

step 101: judging whether the operation occupying the memory resources exceeding a first threshold exists at present; wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources.

When a terminal or a server executes a job, in order to ensure that a plurality of jobs can be normally executed, memory resources occupied by each job are monitored.

The terminal comprises a computer, an intelligent terminal, a PC and other equipment.

In one embodiment, the terminal or server monitors the resources of the jobs executing in the big data platform.

Specifically, the terminal or the server acquires the job information of the executing job, such as acquiring the job information of the currently executing job in real time from the yarn in the big data platform Cloudera Manager by calling the job information acquisition program. The operation information includes the name of the operation, the amount of memory resources occupied and the current execution state, and for example, under a hadoop software platform, the amount of memory resources occupied can be represented by a map number.

In order to avoid an accident in operation, an early warning is made in advance, and in the embodiment, after the operation information of the currently executed operation is acquired, whether the operation occupying the memory resource exceeds a first threshold value is judged.

Wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources. The maximum threshold is the maximum amount that a single job set by the terminal or the server occupies the memory resource and cannot exceed, and exceeding the maximum threshold may cause that other jobs cannot run normally or delay. If the total amount of the memory resources is 1000M, the maximum threshold value of the total amount of the memory resources is 7000M or 8000M.

The first threshold can be set according to practical experience, and in a preferred embodiment, the first threshold is an average value of memory resources occupied by historical operations, such as 200M. When the number of the currently executed operations is large, the first threshold may also be an average value of the memory resources occupied by each current operation, which is not limited herein.

Under the hadoop software platform, when the occupied memory resource amount is represented by the map number, the terminal or the server judges whether the map number occupied by the current operation has the map number exceeding a first threshold, such as 20 maps or 30 maps.

Step 102: and if the operation occupying the memory resources exceeding the first threshold value exists, sending the operation information of the operation to a monitoring object.

After the screening, if the operation with the memory resource exceeding the first threshold exists in the currently executed operation, the operation information of the operation with the memory resource exceeding the first threshold is sent to the monitoring object so as to remind the monitoring object to monitor the operation. And realizing early warning and reminding before real abnormity or accidents do not occur. And further judging whether the operation is an abnormal operation or not through the monitoring object, and if the operation is the abnormal operation, restarting or closing the abnormal operation and the like so as to avoid influencing the execution of other normal operations.

The terminal or the server can send the operation information to the monitoring object through mails, short messages or other social contact platforms. The monitoring object can be related staff.

By the mode, when the operation of which the occupied memory resource exceeds the first threshold value which is smaller than the maximum threshold value is monitored, the operation information of the operation is sent to the monitored object, so that the operation which is possibly abnormal can be in an early warning state in advance, the operation can be prevented from being discovered after the operation fails to be executed or other operations are influenced to cause major accidents, and the probability of the major accidents can be effectively reduced. By the mode, on the premise of low labor cost, real-time monitoring of operation is realized, and stability and sustainability of operation monitoring are guaranteed.

During the execution of the operation, a transient abnormality sometimes occurs, for example, the occupied memory resource suddenly increases, but the normal state is recovered later, in this case, the transient abnormality does not substantially affect the operation or other operations, but the terminal or the server still screens the abnormality. To avoid the waste of processor resources caused by processing such accidental abnormal jobs and save labor cost, as shown in fig. 2, fig. 2 is a flowchart of another embodiment of the job monitoring method of the present application, and in step 201: after determining whether there is a job occupying the memory resource exceeding the first threshold, step 202 is executed.

Step 202: and if the operation of occupying the memory resources exceeding a first threshold exists, judging whether the duration of the memory resources occupied by the operation exceeding the first threshold exceeds threshold time.

Wherein the threshold time may be set empirically, such as 20 seconds. The time may be set according to an average value of abnormal times of abnormal jobs that occasionally occur in the history, and is not limited herein.

Step 203: and if the time exceeds the threshold value, sending the job information of the job to a monitoring object.

And when the memory resources occupied by the operation are determined to be larger than a first threshold value and the duration time of the abnormal condition exceeds the threshold value time, the operation information of the operation is sent to a monitoring object for abnormal detection.

By the mode, accidental operation abnormity screening caused by other emergency situations such as a network can be effectively avoided, the workload of the terminal or the server is saved, and the workload of the monitoring object is also reduced.

In the actual operation process, sometimes for convenience of operation or different platforms for executing the job, the formats of the job information acquired by the terminal or the server may be various, such as json format or list format, and these formats are not very intuitive for a general monitoring object to view. In order to enable the monitoring object to more clearly and intuitively know the operation which may be abnormal, fig. 3 is a detailed flowchart of an embodiment of the step of sending the operation information to the monitoring object.

As shown in fig. 3, the method comprises the following steps:

step 301: and judging whether the format of the operation information is the same as a preset format.

After the terminal or the server acquires the operation information which needs to be sent to the detection object, whether the format of the operation information is the same as the preset format is further judged. The preset format may be a table format, such as an excel format, or a text format, and is not limited herein.

Step 302: and if the format of the operation information is different from the preset format, converting the format of the operation information into the preset format.

Step 303: and sending the job information after the format conversion to the monitoring object.

As shown in table 1, table 1 is an explanatory table of an embodiment of the job information.

The job information of this embodiment includes the job number jobID of the job in which the abnormality may occur, the submission time of the abnormality, the start time, duration, job name jobname, and the number of occupied maps, and in other embodiments, may include the end time, status, and the like.

TABLE 1

By the mode, the operation information of different platforms and different formats can be converted into the information of the universal preset format, so that monitoring personnel can more intuitively and clearly know the operation information of the operation which is possibly abnormal.

Different from the prior art, in the embodiment, it is first determined whether there is a job occupying the memory resource and exceeding a first threshold, where the first threshold is smaller than a maximum threshold of the total amount of the memory resource, and when the presence is detected, job information of the job is sent to the monitoring object. The early warning method can enter an early warning state in advance for the operation which is possibly abnormal, avoid finding the operation after the operation fails to be executed or other operations are influenced to be executed to cause major accidents, and can effectively reduce the probability of the major accidents. By the mode, on the premise of low labor cost, real-time monitoring of operation is realized, and stability and sustainability of operation monitoring are guaranteed.

In addition, after the operation that the occupied memory resource exceeds the first threshold value is acquired, the duration time is further judged, and if the duration time exceeds the threshold time, the operation information of the operation is sent to the monitoring object. By the mode, accidental operation abnormity screening caused by other emergency situations such as a network can be effectively avoided, the workload of the terminal or the server is saved, and the workload of the monitoring object is also reduced.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of the operation monitoring device of the present application.

The job monitoring apparatus of the present embodiment includes a determination module 401 and a transmission module 402.

The judging module 401 is configured to judge whether there is a job in which the occupied memory resource exceeds a first threshold at present; wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources.

In this embodiment, the job monitoring device monitors the big data platform resource of the terminal.

Specifically, the determining module 401 obtains job information of the executing job, such as collecting job information of the currently executing job in real time from the yarn in the big data platform Cloudera Manager by calling the job information collecting program. The operation information includes the name of the operation, the amount of memory resources occupied and the current execution state, and for example, under a hadoop software platform, the amount of memory resources occupied can be represented by a map number.

Further, the determining module 401 performs an early warning in advance to avoid an accident in the operation, and determines whether there is an operation occupying a memory resource exceeding a first threshold after acquiring the operation information of the currently executed operation.

Wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources. The maximum threshold is the maximum amount that a single job set by the terminal or the server occupies the memory resource and cannot exceed, and exceeding the maximum threshold may cause that other jobs cannot run normally or delay.

The first threshold may be set according to actual experience, in a preferred embodiment, the first threshold is an average value of memory resources occupied by historical operations, and when the number of currently executed operations is large, the first threshold may also be an average value of memory resources occupied by each currently executed operation, which is not limited herein.

The sending module 402 is configured to send job information of a job to a monitoring object when there is a job whose occupied memory resource exceeds the first threshold.

After the screening by the determining module 401, if there is a job whose memory resource exceeds the first threshold in the currently executed jobs, the sending module sends the job information of the job whose memory resource exceeds the first threshold to the monitoring object, so as to remind the monitoring object to monitor the job.

The sending module 402 may send the job information to the monitoring object through a mail, a short message, or other social platform. The monitoring object can be related staff.

In this way, when the determining module 401 monitors the operation that the occupied memory resource exceeds the first threshold smaller than the maximum threshold, the sending module 402 sends the operation information of the operation to the monitored object, so that the operation which may be abnormal can be brought into an early warning state in advance, the operation is prevented from being discovered after the operation fails to be executed or other operations are influenced to cause major accidents, and the probability of the major accidents can be effectively reduced. By the mode, on the premise of low labor cost, real-time monitoring of operation is realized, and stability and sustainability of operation monitoring are guaranteed.

In another embodiment, in order to avoid the waste of processor resources and save labor cost caused by the handling of such accidental exception, the determining module 401 is further configured to determine whether the duration of the memory resource occupied by the job exceeds the first threshold exceeds the threshold time when detecting that there is a job whose memory resource occupied by the job exceeds the first threshold. And if the threshold time is exceeded, sending the job information of the job to the monitored object.

Further, in order to enable the monitoring object to more clearly and intuitively know the job which may be abnormal, the determining module 401 further determines whether the format of the job information is the same as the preset format after acquiring the job information which needs to be sent to the detection object. The preset format may be a table format, such as an excel format, or a text format, and is not limited herein. And if the format of the operation information is different from the preset format, converting the format of the operation information into the preset format. The sending module 402 sends the job information after the format conversion to the monitoring object.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of the operation monitoring terminal according to the present application. The acquisition terminal 50 of the present embodiment includes a processor 501 and a communication circuit 502 coupled to each other. The communication circuit 502 is used for completing communication with other devices or monitoring objects.

The terminal 50 includes a PC, a tablet computer, and an intelligent device such as a smart phone.

The processor 501 is configured to determine whether there is a job occupying the memory resource exceeding a first threshold at present; wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources.

The processor 501 monitors resources of jobs executed in a big data platform.

Specifically, the processor 501 obtains job information of the executing job, such as collecting job information of the currently executing job in real time from the yarn in the big data platform Cloudera Manager by calling the job information collection program. The operation information includes the name of the operation, the amount of memory resources occupied and the current execution state, and for example, under a hadoop software platform, the amount of memory resources occupied can be represented by a map number.

In order to avoid an accident in the operation and make an early warning in advance, in this embodiment, after acquiring the operation information of the currently executed operation, the processor 501 determines whether there is an operation whose memory resource is greater than a first threshold.

Wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources. The maximum threshold is the maximum amount that a single job set by the processor 501 occupies memory resources and cannot exceed, and exceeding the maximum threshold may cause other jobs to not run normally or to delay.

The communication circuit 502 is configured to send job information of a job to a monitoring object when there is a job whose occupied memory resource exceeds the first threshold.

The communication circuit 502 may send the job information to the monitoring object through a mail, a short message, or other social platform. The monitoring object can be related staff.

By the above manner, when the processor 501 monitors the operation that the occupied memory resource exceeds the first threshold value smaller than the maximum threshold value, the operation information of the operation is sent to the monitored object, so that the operation which may be abnormal can be brought into an early warning state in advance, the operation can be prevented from being found after the operation fails to be executed or other operations are influenced to cause major accidents, and the probability of the major accidents can be effectively reduced. By the mode, on the premise of low labor cost, real-time monitoring of operation is realized, and stability and sustainability of operation monitoring are guaranteed.

In order to avoid the waste of processor resources caused by the accidental exception handling and save the labor cost, after the processor 501 monitors that the memory resources occupied by the operations exceed the first threshold, it is further determined whether the duration of the memory resources occupied by the operations exceeding the first threshold exceeds the threshold time. The communication circuit 502 transmits job information of the job to the monitoring object when the duration exceeds the threshold time.

In another embodiment, in order to enable the monitoring object to more clearly and intuitively know the job which may be abnormal, after acquiring the job information which needs to be sent to the detection object, the processor 501 further determines whether the format of the job information is the same as the preset format. The preset format may be a table format, such as an excel format, or a text format, and is not limited herein. If the format of the job information is different from a preset format, the processor 501 converts the format of the job information into the preset format. The communication circuit 502 transmits the job information after the format conversion to the monitoring object.

Referring to fig. 6, the present application further provides a structural diagram of an embodiment of a computer storage medium. In this embodiment, the computer storage medium 60 stores processor-executable program data 61, the program data 61 being for performing the method in the above-described embodiment.

The computer storage medium 60 may be a medium that can store the program data 61, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, or may be a server that stores the program data 61, and the server may transmit the stored program data 61 to another device for operation, or may self-operate the stored program data 61.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. An operation monitoring method, characterized by comprising:

judging whether the operation occupying the memory resources exceeding a first threshold exists at present; wherein the first threshold is smaller than the maximum threshold of the total amount of the memory resources;

and if the operation occupying the memory resources and exceeding the first threshold exists, sending the operation information of the operation to a monitoring object.

2. The method according to claim 1, wherein the step of sending job information of the job to a monitoring object if there is a job occupying the memory resource exceeding the first threshold includes:

if the operation occupying the memory resources and exceeding the first threshold value exists, judging whether the duration time of the operation exceeding the first threshold value exceeds threshold time or not;

and if the duration time exceeds the threshold time, sending the operation information of the operation to the monitoring object.

3. The job monitoring method according to claim 1 or 2, wherein the step of transmitting the job information of the job to a monitoring target includes:

and sending the job information after the format conversion to the monitoring object.

4. The job monitoring method according to claim 3, wherein the preset format comprises a table or a text format.

5. The job monitoring method according to claim 1 or 2, wherein the job information includes a name of the job, an amount of occupied memory resources, and a current execution state.

6. The job monitoring method according to claim 1 or 2, wherein the first threshold is an average value of the memory resources occupied by the historical jobs.

7. The operation monitoring method according to claim 1 or 2,

the step of judging whether the current operation of occupying the memory resources exceeds the first threshold specifically includes:

judging whether the number of the occupied maps is larger than the first threshold value or not;

if the operation occupying the memory resources and exceeding the first threshold exists, the step of sending the operation information of the operation to the monitoring object comprises the following steps:

and if the number of the occupied maps is larger than the first threshold value, sending the operation information of the operation to the monitoring object.

8. An operation monitoring device is characterized by comprising a judging module and a sending module,

the sending module is used for sending the operation information of the operation to a monitoring object when the operation occupying the memory resource exceeds the first threshold value.

9. An operation monitoring terminal, characterized in that the operation monitoring terminal includes:

a processor and a communication circuit coupled to each other, the processor being operable to implement the operation monitoring method of any one of claims 1-7 in cooperation with the communication circuit.

10. A computer storage medium having stored thereon program data which, when executed by a processor, implements a method of job monitoring as claimed in any one of claims 1 to 7.