Nothing Special   »   [go: up one dir, main page]

CN111026621A - Monitoring alarm method, device, equipment and medium for Elasticissearch cluster - Google Patents

Monitoring alarm method, device, equipment and medium for Elasticissearch cluster Download PDF

Info

Publication number
CN111026621A
CN111026621A CN201911342583.5A CN201911342583A CN111026621A CN 111026621 A CN111026621 A CN 111026621A CN 201911342583 A CN201911342583 A CN 201911342583A CN 111026621 A CN111026621 A CN 111026621A
Authority
CN
China
Prior art keywords
cluster
operation data
target
node
elasticissearch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911342583.5A
Other languages
Chinese (zh)
Other versions
CN111026621B (en
Inventor
蒋方禹
范渊
史光庭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN201911342583.5A priority Critical patent/CN111026621B/en
Publication of CN111026621A publication Critical patent/CN111026621A/en
Application granted granted Critical
Publication of CN111026621B publication Critical patent/CN111026621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/875Monitoring of systems including the internet
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a monitoring alarm method, device, equipment and medium for an elastic search cluster, wherein the method comprises the following steps: acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode. Therefore, the running condition of the Elasticissearch cluster can be monitored, corresponding alarm is generated on abnormal running conditions, the false alarm rate is low, the reliability is high, and the cost is low.

Description

Monitoring alarm method, device, equipment and medium for Elasticissearch cluster
Technical Field
The application relates to the technical field of an elastic search cluster, in particular to a monitoring alarm method, device, equipment and medium for the elastic search cluster.
Background
With the advent of the big data age, ElasitcSearch is gaining more and more favor as a distributed full-text search engine. How to effectively and inexpensively monitor and manage the Elasticsearch cluster has been a big problem. At present, the monitoring method of the Elasticsearch cluster mainly collects cluster operation data on a main node in the Elasticsearch cluster, judges whether the cluster operation data is greater than or equal to a preset alarm threshold value, and gives a corresponding alarm when the cluster operation data is greater than or equal to the preset alarm threshold value, so that a large number of false alarms exist, the false alarm rate is high, and after the alarm, operation and maintenance personnel are required to investigate the reason of the alarm, the cost is high, and the reliability is low.
Disclosure of Invention
In view of this, an object of the present application is to provide a monitoring alarm method, apparatus, device, and medium for an Elasticsearch cluster, which can monitor an operating condition of the Elasticsearch cluster and generate a corresponding alarm for an abnormal operating condition, and has a low false alarm rate, high reliability, and low cost. The specific scheme is as follows:
in a first aspect, the application discloses an Elasticsearch cluster-oriented monitoring alarm method, which includes:
acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster;
acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node;
performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not;
and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.
Optionally, the obtaining the target cluster operation data through a Restful API on the master node in the Elasticsearch cluster includes:
acquiring target cluster running data comprising any one or a combination of several of a child node IP, a child node name, an index number, index health, an index state, a merge thread number, a task name, task running time, a segment number, a segment size, query delay, JVM (java virtual machine) usage, GC time, GC times, storage space occupied by the index and a fragment volume through a Restful API (application programming interface) on a main node in an elastic segment cluster.
Optionally, the obtaining, by the data collection agent on the working node in the Elasticsearch cluster, the target node operation data includes:
the method comprises the steps of obtaining target node operation data comprising any one or combination of CPU utilization rate, hard disk utilization rate, memory utilization rate, hard disk reading rate, hard disk writing rate and hard disk io blocking rate through a data collection agent on a working node in an Elasticissearch cluster.
Optionally, before performing the association analysis on the target cluster operation data and the target node operation data, the method further includes:
and cleaning the target cluster operation data and the target node operation data according to a preset association rule, and storing the target cluster operation data and the target node operation data into a corresponding database according to dates.
Optionally, after the target cluster operation data and the target node operation data are cleaned according to a preset association rule, the method further includes:
and performing visual display on the cleaned target cluster operation data and the cleaned target node operation data through ECharts drawing.
Optionally, after the corresponding alarm is performed in the preset alarm manner, the method further includes:
and analyzing the correlation analysis result to obtain an abnormal generation reason, and carrying out visual display on the abnormal generation reason.
Optionally, the performing, by performing correlation analysis on the target cluster operation data and the target node operation data to determine whether the operation state of the Elasticsearch cluster is abnormal includes:
judging whether the index running state of the Elasticissearch cluster is abnormal or not by analyzing the index name, the index number, the index health degree, the index state, the merge thread number, the segment size, the query delay and the fragment volume;
judging whether the operating efficiency of the Elasticissearch cluster is abnormal or not by analyzing the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate;
judging whether the operation of the related tasks of the Elasticissearch cluster is abnormal or not by analyzing the task operation time, the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate;
and judging whether the system operation load of the Elasticise cluster is abnormal or not by analyzing the JVM usage amount, the GC time, the GC times, the storage space occupied by the index, the CPU usage rate and the hard disk usage rate.
In a second aspect, the present application discloses an Elasticsearch cluster-oriented monitoring alarm device, including:
the system comprises a first data acquisition module, a second data acquisition module and a first data processing module, wherein the first data acquisition module is used for acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster;
a second data acquisition module, configured to acquire target node operation data through a data collection agent on a working node in an Elasticsearch cluster, where the data collection agent is an executable file deployed in the working node and is configured to collect the target node operation data of the working node;
the data analysis module is used for performing correlation analysis on the target cluster operation data and the target node operation data so as to judge whether the operation state of the Elasticise cluster is abnormal or not;
and the alarm module is used for giving corresponding alarm in a preset alarm mode when the running state of the Elasticissearch cluster is abnormal.
In a third aspect, the present application discloses an Elasticsearch cluster-oriented monitoring alarm device, including:
a memory and a processor;
wherein the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the aforementioned disclosed monitoring alarm method for the Elasticsearch cluster.
In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned disclosed monitoring alarm method for an Elasticsearch cluster.
Therefore, the method comprises the steps that firstly, the Restful API on the main node in the Elasticissearch cluster is used for obtaining the running data of the target cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; then, performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode. Therefore, after the target cluster operation data on the main node and the target node operation data on the working node in the Elasticissearch cluster are obtained, the target cluster operation data and the target node operation data are subjected to correlation analysis to judge whether the operation state of the Elasticissearch cluster is abnormal or not, if yes, corresponding alarm is given, so that the operation condition of the Elasticissearch cluster can be monitored, corresponding alarm is given to the abnormal operation condition, and the Elasticissearch cluster alarm system is low in false alarm rate, high in reliability and low in cost.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of an Elasticissearch cluster-oriented monitoring alarm method disclosed in the present application;
FIG. 2 is a flowchart of a specific monitoring alarm method for an Elasticsearch cluster disclosed in the present application;
FIG. 3 is a schematic structural diagram of an elastic search cluster-oriented monitoring alarm device disclosed in the present application;
FIG. 4 is a structural diagram of a monitoring alarm device facing to an elastic search cluster disclosed in the present application;
fig. 5 is a diagram of a server structure disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
At present, the monitoring method of the Elasticsearch cluster mainly collects cluster operation data on a main node in the Elasticsearch cluster, judges whether the cluster operation data is greater than or equal to a preset alarm threshold value, and gives a corresponding alarm when the cluster operation data is greater than or equal to the preset alarm threshold value, so that a large number of false alarms exist, the false alarm rate is high, and after the alarm, operation and maintenance personnel are required to investigate the reason of the alarm, the cost is high, and the reliability is low. In view of the above, the application provides an Elasticsearch cluster-oriented monitoring alarm method, which can monitor the operation condition of an Elasticsearch cluster and generate a corresponding alarm for an abnormal operation condition, and has the advantages of low false alarm rate, high reliability and low cost.
Referring to fig. 1, an embodiment of the present application discloses an Elasticsearch cluster-oriented monitoring alarm method, including:
step S11: and acquiring the target cluster operation data through a Restful API on the main node in the Elasticissearch cluster.
In this embodiment, the target cluster operating data on the main node in the Elasticsearch cluster needs to be acquired first, and specifically, the target cluster operating data may be acquired through an Application Programming Interface (Restful API) on the main node. The target cluster operation data may also include other cluster operation data, wherein the target cluster operation data includes any one or a combination of several of an IP child node, a child node name, an index number, an index health, an index state, a merge thread number, a task name, a task operation time, a segment number, a segment size, an inquiry delay, a usage amount of a JVM (Java Virtual Machine), a GC (Garbage Collection) time, a GC number, an index occupied storage space, and a segment volume.
Step S12: and acquiring target node operation data through a data collection agent on a working node in the Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node.
In this embodiment, it is further required to obtain target node operation data on a working node in the Elasticsearch cluster, specifically, the target node operation data is obtained through a data collection agent on the working node, where the data collection agent is an executable file deployed in the working node and is used for the target node operation data of the working node, the target node operation data includes any one or a combination of a CPU usage rate, a hard disk usage rate, a memory usage rate, a hard disk read rate, a hard disk write rate, and a hard disk io blocking rate, and the target node operation data may further include other node operation data.
Step S13: and performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not.
After the target cluster operation data and the target node operation data are obtained, correlation analysis needs to be performed on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticise cluster is abnormal or not. The target cluster operation data on the main node in the Elasticissearch cluster and the target node operation data on the working node have relevance, and whether the operation data of the working node and the operation data of the main node are corresponding and consistent or not can be determined by analyzing the operation data with relevance so as to judge whether the operation state of the Elasticissearch cluster is abnormal or not.
Step S14: and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.
In a specific implementation process, after the target cluster operation data and the target node operation data are subjected to correlation analysis, if the operation state of the Elasticsearch cluster is abnormal, corresponding alarm is given in a preset alarm mode. Specifically, the alarm mode includes, but is not limited to, a visual information prompting mode and a voice prompting mode. For example by way of mail.
Therefore, the method comprises the steps that firstly, the Restful API on the main node in the Elasticissearch cluster is used for obtaining the running data of the target cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; then, performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode. Therefore, after the target cluster operation data on the main node and the target node operation data on the working node in the Elasticissearch cluster are obtained, the target cluster operation data and the target node operation data are subjected to correlation analysis to judge whether the operation state of the Elasticissearch cluster is abnormal or not, if yes, corresponding alarm is given, so that the operation condition of the Elasticissearch cluster can be monitored, corresponding alarm is given to the abnormal operation condition, and the Elasticissearch cluster alarm system is low in false alarm rate, high in reliability and low in cost.
Referring to fig. 2, an embodiment of the present application discloses a specific monitoring alarm method for an Elasticsearch cluster, where the method includes:
step S21: and acquiring the target cluster operation data through a Restful API on the main node in the Elasticissearch cluster.
Step S22: the method comprises the steps of obtaining target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and used for collecting the target node operation data of the working node.
Step S23: and cleaning the target cluster operation data and the target node operation data according to a preset association rule, and storing the target cluster operation data and the target node operation data into a corresponding database according to dates.
In a specific implementation process, after the target cluster operation data and the target node operation data are obtained, the target cluster operation data and the target node operation data need to be cleaned according to a preset association rule, and the cleaned target cluster operation data and the cleaned target node operation data are stored in corresponding databases. Specifically, the target cluster operation data and the target node operation data are cleaned according to a preset rule by taking a time line as a basis, so that the target cluster operation data and the target node operation data are associated.
Step S24: and performing visual display on the cleaned target cluster operation data and the cleaned target node operation data through ECharts drawing.
It can be understood that after the cleaned target cluster operation data and target node operation data are stored in the corresponding databases, the target cluster operation data and the target node operation data in the databases need to be visually displayed through an ECharts drawing, so that operation and maintenance personnel can know the current operation condition of the whole cluster.
Step S25: and performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not.
It can be understood that the target cluster operation data and the target node operation data need to be subjected to correlation analysis to determine whether the operation state of the Elasticsearch cluster is abnormal. Specifically, whether the operating efficiency of the Elasticsearch cluster is abnormal is judged by analyzing the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; judging whether the operation of the related tasks of the Elasticissearch cluster is abnormal or not by analyzing the task operation time, the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; and judging whether the system operation load of the Elasticise cluster is abnormal or not by analyzing the JVM usage amount, the GC time, the GC times, the storage space occupied by the index, the CPU usage rate and the hard disk usage rate. And when the index running state, the running effect, the related task running or the system running load of the Elasticissearch cluster are abnormal, judging that the running state of the Elasticissearch cluster is abnormal.
Step S26: and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.
Step S27: and analyzing the correlation analysis result to obtain an abnormal generation reason, and carrying out visual display on the abnormal generation reason.
Referring to fig. 3, an embodiment of the present application discloses an Elasticsearch cluster-oriented monitoring alarm device, including:
a first data obtaining module 11, configured to obtain target cluster operation data through a Restful API on a host node in an Elasticsearch cluster;
a second data obtaining module 12, configured to obtain target node operation data through a data collection agent on a working node in an Elasticsearch cluster, where the data collection agent is an executable file deployed in the working node and is configured to collect the target node operation data of the working node;
the data analysis module 13 is configured to perform correlation analysis on the target cluster operation data and the target node operation data to determine whether an operation state of the Elasticsearch cluster is abnormal;
and the alarm module 14 is configured to perform corresponding alarm in a preset alarm manner when the operating state of the Elasticsearch cluster is abnormal.
Therefore, the method comprises the steps that firstly, the Restful API on the main node in the Elasticissearch cluster is used for obtaining the running data of the target cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; then, performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode. Therefore, after the target cluster operation data on the main node and the target node operation data on the working node in the Elasticissearch cluster are obtained, the target cluster operation data and the target node operation data are subjected to correlation analysis to judge whether the operation state of the Elasticissearch cluster is abnormal or not, if yes, corresponding alarm is given, so that the operation condition of the Elasticissearch cluster can be monitored, corresponding alarm is given to the abnormal operation condition, and the Elasticissearch cluster alarm system is low in false alarm rate, high in reliability and low in cost.
Further, referring to fig. 4, an embodiment of the present application further discloses an Elasticsearch cluster-oriented monitoring alarm device, including: a processor 21 and a memory 22.
Wherein the memory 22 is used for storing a computer program; the processor 21 is configured to execute the computer program to implement the monitoring alarm method facing the Elasticsearch cluster disclosed in the foregoing embodiment.
For a specific process of the monitoring alarm method for the Elasticsearch cluster, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated herein.
Further, as shown in fig. 5, a schematic diagram of a server structure provided in the embodiment of the present application is shown. The server 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, sensors 25, and a communication bus 26. The memory 42 is configured to store a computer program, and the computer program is loaded and executed by the processor 21 to implement relevant steps in the monitoring alarm method for an Elasticsearch cluster disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware on the internet of things device; the communication interface 24 creates a data transmission channel between the server 20 and an external device, and the communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited again; sensor 25 for acquiring sensor data, specific sensor types including, but not limited to, speed sensor, temperature sensor, infrared sensor, light sensor, sound sensor, image sensor, and the like.
In addition, the storage 22 is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223, etc., and the storage may be a transient storage or a permanent storage.
The operating system 221 is used to manage and control hardware and computer programs 222 on the internet of things device 20, so as to implement operations and processing on the mass databases 223 in the processor 21 and the memory 22, and may be Windows, Unix, Linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the Elasticsearch cluster-oriented monitoring alarm method disclosed in any of the foregoing embodiments. The data 223 may include data received by the server and transmitted from an external device, or may include data collected by the sensor 25 itself.
Further, an embodiment of the present application also discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the following steps:
acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.
Therefore, the method comprises the steps that firstly, the Restful API on the main node in the Elasticissearch cluster is used for obtaining the running data of the target cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; then, performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode. Therefore, after the target cluster operation data on the main node and the target node operation data on the working node in the Elasticissearch cluster are obtained, the target cluster operation data and the target node operation data are subjected to correlation analysis to judge whether the operation state of the Elasticissearch cluster is abnormal or not, if yes, corresponding alarm is given, so that the operation condition of the Elasticissearch cluster can be monitored, corresponding alarm is given to the abnormal operation condition, and the Elasticissearch cluster alarm system is low in false alarm rate, high in reliability and low in cost.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: acquiring target cluster running data comprising any one or a combination of several of a child node IP, a child node name, an index number, index health, an index state, a merge thread number, a task name, task running time, a segment number, a segment size, query delay, JVM (java virtual machine) usage, GC time, GC times, storage space occupied by the index and a fragment volume through a Restful API (application programming interface) on a main node in an elastic segment cluster.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: the method comprises the steps of obtaining target node operation data comprising any one or combination of CPU utilization rate, hard disk utilization rate, memory utilization rate, hard disk reading rate, hard disk writing rate and hard disk io blocking rate through a data collection agent on a working node in an Elasticissearch cluster.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and cleaning the target cluster operation data and the target node operation data according to a preset association rule, and storing the target cluster operation data and the target node operation data into a corresponding database according to dates.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and performing visual display on the cleaned target cluster operation data and the cleaned target node operation data through ECharts drawing.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and analyzing the correlation analysis result to obtain an abnormal generation reason, and carrying out visual display on the abnormal generation reason.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: judging whether the index running state of the Elasticissearch cluster is abnormal or not by analyzing the index name, the index number, the index health degree, the index state, the merge thread number, the segment size, the query delay and the fragment volume; judging whether the operating efficiency of the Elasticissearch cluster is abnormal or not by analyzing the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; judging whether the operation of the related tasks of the Elasticissearch cluster is abnormal or not by analyzing the task operation time, the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; and judging whether the system operation load of the Elasticise cluster is abnormal or not by analyzing the JVM usage amount, the GC time, the GC times, the storage space occupied by the index, the CPU usage rate and the hard disk usage rate.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of other elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The detailed description is given above to the monitoring alarm method, device, equipment and medium for the Elasticsearch cluster provided by the present application, a specific example is applied in the present application to explain the principle and the implementation manner of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An Elasticissearch cluster-oriented monitoring alarm method is characterized by comprising the following steps:
acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster;
acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node;
performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not;
and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.
2. The method for monitoring and alarming for the Elasticissearch cluster as claimed in claim 1, wherein the obtaining of the target cluster operation data through the Restful API on the master node in the Elasticissearch cluster comprises:
acquiring target cluster running data comprising any one or a combination of several of a child node IP, a child node name, an index number, index health, an index state, a merge thread number, a task name, task running time, a segment number, a segment size, query delay, JVM (java virtual machine) usage, GC time, GC times, storage space occupied by the index and a fragment volume through a Restful API (application programming interface) on a main node in an elastic segment cluster.
3. The monitoring alarm method facing to the Elasticsearch cluster as claimed in claim 2, wherein the obtaining of the target node operation data by the data collection agent on the working node in the Elasticsearch cluster comprises:
the method comprises the steps of obtaining target node operation data comprising any one or combination of CPU utilization rate, hard disk utilization rate, memory utilization rate, hard disk reading rate, hard disk writing rate and hard disk io blocking rate through a data collection agent on a working node in an Elasticissearch cluster.
4. The transit search cluster-oriented monitoring and alarming method as claimed in claim 3, wherein before the performing the correlation analysis on the target cluster operation data and the target node operation data, the method further comprises:
and cleaning the target cluster operation data and the target node operation data according to a preset association rule, and storing the target cluster operation data and the target node operation data into a corresponding database according to dates.
5. The transit search cluster-oriented monitoring and alarming method as claimed in claim 4, wherein after the target cluster operation data and the target node operation data are cleaned according to a preset association rule, the method further comprises:
and performing visual display on the cleaned target cluster operation data and the cleaned target node operation data through ECharts drawing.
6. The monitoring alarm method facing to the Elasticsearch cluster as claimed in claim 5, wherein after the corresponding alarm is performed by the preset alarm method, the method further comprises:
and analyzing the correlation analysis result to obtain an abnormal generation reason, and carrying out visual display on the abnormal generation reason.
7. The monitoring alarm method facing to the Elasticissearch cluster as claimed in any one of claims 3 to 6, wherein the performing correlation analysis on the target cluster operation data and the target node operation data to determine whether the operation state of the Elasticissearch cluster is abnormal comprises:
judging whether the index running state of the Elasticissearch cluster is abnormal or not by analyzing the index name, the index number, the index health degree, the index state, the merge thread number, the segment size, the query delay and the fragment volume;
judging whether the operating efficiency of the Elasticissearch cluster is abnormal or not by analyzing the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate;
judging whether the operation of the related tasks of the Elasticissearch cluster is abnormal or not by analyzing the task operation time, the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate;
and judging whether the system operation load of the Elasticise cluster is abnormal or not by analyzing the JVM usage amount, the GC time, the GC times, the storage space occupied by the index, the CPU usage rate and the hard disk usage rate.
8. An Elasticsearch cluster-oriented monitoring alarm device is characterized by comprising:
the system comprises a first data acquisition module, a second data acquisition module and a first data processing module, wherein the first data acquisition module is used for acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster;
a second data acquisition module, configured to acquire target node operation data through a data collection agent on a working node in an Elasticsearch cluster, where the data collection agent is an executable file deployed in the working node and is configured to collect the target node operation data of the working node;
the data analysis module is used for performing correlation analysis on the target cluster operation data and the target node operation data so as to judge whether the operation state of the Elasticise cluster is abnormal or not;
and the alarm module is used for giving corresponding alarm in a preset alarm mode when the running state of the Elasticissearch cluster is abnormal.
9. An Elasticsearch cluster-oriented monitoring alarm device, comprising:
a memory and a processor;
wherein the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the monitoring alarm method for the Elasticsearch cluster of any of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the method for monitoring and alarming for Elasticsearch cluster as claimed in any of claims 1 to 7.
CN201911342583.5A 2019-12-23 2019-12-23 Monitoring alarm method, device, equipment and medium for Elasticissearch cluster Active CN111026621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911342583.5A CN111026621B (en) 2019-12-23 2019-12-23 Monitoring alarm method, device, equipment and medium for Elasticissearch cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911342583.5A CN111026621B (en) 2019-12-23 2019-12-23 Monitoring alarm method, device, equipment and medium for Elasticissearch cluster

Publications (2)

Publication Number Publication Date
CN111026621A true CN111026621A (en) 2020-04-17
CN111026621B CN111026621B (en) 2023-04-07

Family

ID=70211821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911342583.5A Active CN111026621B (en) 2019-12-23 2019-12-23 Monitoring alarm method, device, equipment and medium for Elasticissearch cluster

Country Status (1)

Country Link
CN (1) CN111026621B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100160A (en) * 2020-11-05 2020-12-18 四川新网银行股份有限公司 Elastic Search based double-activity real-time data warehouse construction method
CN112380107A (en) * 2020-12-08 2021-02-19 无锡无边网络技术有限公司 Operation and maintenance system data acquisition system and method based on management information system
CN112532435A (en) * 2020-11-20 2021-03-19 深信服科技股份有限公司 Operation and maintenance method, operation and maintenance management platform, equipment and medium
CN113192228A (en) * 2021-04-30 2021-07-30 中国工商银行股份有限公司 Cluster automation inspection method and device
CN113608964A (en) * 2021-08-09 2021-11-05 宁畅信息产业(北京)有限公司 Cluster automation monitoring method and device, electronic equipment and storage medium
CN115686381A (en) * 2022-12-29 2023-02-03 苏州浪潮智能科技有限公司 Prediction method and device for storage cluster running state

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090193436A1 (en) * 2008-01-30 2009-07-30 Inventec Corporation Alarm display system of cluster storage system and method thereof
CN105933175A (en) * 2016-07-14 2016-09-07 微额速达(上海)金融信息服务有限公司 Real-time monitoring early-warning system
CN106383776A (en) * 2016-08-30 2017-02-08 北京北信源软件股份有限公司 Monitoring and self-healing method and apparatus for distributed search cluster system
WO2017071563A1 (en) * 2015-10-31 2017-05-04 华为技术有限公司 Data storage method and cluster management node
CN107734035A (en) * 2017-10-17 2018-02-23 华南理工大学 A kind of Virtual Cluster automatic telescopic method under cloud computing environment
CN109101397A (en) * 2018-08-01 2018-12-28 武汉索雅信息技术有限公司 High-Performance Computing Cluster monitoring method, unit and storage medium
CN110309030A (en) * 2019-07-05 2019-10-08 亿玛创新网络(天津)有限公司 Log analysis monitoring system and method based on ELK and Zabbix

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090193436A1 (en) * 2008-01-30 2009-07-30 Inventec Corporation Alarm display system of cluster storage system and method thereof
WO2017071563A1 (en) * 2015-10-31 2017-05-04 华为技术有限公司 Data storage method and cluster management node
CN105933175A (en) * 2016-07-14 2016-09-07 微额速达(上海)金融信息服务有限公司 Real-time monitoring early-warning system
CN106383776A (en) * 2016-08-30 2017-02-08 北京北信源软件股份有限公司 Monitoring and self-healing method and apparatus for distributed search cluster system
CN107734035A (en) * 2017-10-17 2018-02-23 华南理工大学 A kind of Virtual Cluster automatic telescopic method under cloud computing environment
CN109101397A (en) * 2018-08-01 2018-12-28 武汉索雅信息技术有限公司 High-Performance Computing Cluster monitoring method, unit and storage medium
CN110309030A (en) * 2019-07-05 2019-10-08 亿玛创新网络(天津)有限公司 Log analysis monitoring system and method based on ELK and Zabbix

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姚攀;马玉鹏;徐春香;: "基于ELK的日志分析系统研究及应用", 计算机工程与设计 *
王景德;李建宇;付喜春;刘洪海;李家俊;: "基于多架构集群一体化动态监控软件的实现", 信息技术 *
胡庆宝;姜晓巍;石京燕;程耀东;梁翠萍;: "基于Elasticsearch的实时集群日志采集和分析系统实现", 科研信息化技术与应用 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100160A (en) * 2020-11-05 2020-12-18 四川新网银行股份有限公司 Elastic Search based double-activity real-time data warehouse construction method
CN112532435A (en) * 2020-11-20 2021-03-19 深信服科技股份有限公司 Operation and maintenance method, operation and maintenance management platform, equipment and medium
CN112532435B (en) * 2020-11-20 2023-09-08 深信服科技股份有限公司 Operation and maintenance method, operation and maintenance management platform, equipment and medium
CN112380107A (en) * 2020-12-08 2021-02-19 无锡无边网络技术有限公司 Operation and maintenance system data acquisition system and method based on management information system
CN113192228A (en) * 2021-04-30 2021-07-30 中国工商银行股份有限公司 Cluster automation inspection method and device
CN113608964A (en) * 2021-08-09 2021-11-05 宁畅信息产业(北京)有限公司 Cluster automation monitoring method and device, electronic equipment and storage medium
CN115686381A (en) * 2022-12-29 2023-02-03 苏州浪潮智能科技有限公司 Prediction method and device for storage cluster running state

Also Published As

Publication number Publication date
CN111026621B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111026621B (en) Monitoring alarm method, device, equipment and medium for Elasticissearch cluster
US20180365085A1 (en) Method and apparatus for monitoring client applications
US7310590B1 (en) Time series anomaly detection using multiple statistical models
US20170063762A1 (en) Event log analyzer
CN110888783A (en) Monitoring method and device of micro-service system and electronic equipment
CN109861878B (en) Method for monitoring topic data of kafka cluster and related equipment
JP4990018B2 (en) Apparatus performance management method, apparatus performance management system, and management program
JP6823265B2 (en) Analytical instruments, analytical systems, analytical methods and analytical programs
CN111046011A (en) Log collection method, system, node, electronic device and readable storage medium
EP3316175B1 (en) Methods and apparatus of an immutable threat intelligence system
US10657099B1 (en) Systems and methods for transformation and analysis of logfile data
US9043652B2 (en) User-coordinated resource recovery
CN109062769B (en) Method, device and equipment for predicting IT system performance risk trend
US10114731B2 (en) Including kernel object information in a user dump
US10187264B1 (en) Gateway path variable detection for metric collection
CN114006727B (en) Alarm association analysis method, device, equipment and storage medium
US11537576B2 (en) Assisted problem identification in a computing system
CA2759365A1 (en) Identification of thread progress information
CN112800061B (en) Data storage method, device, server and storage medium
CN115883407A (en) Data acquisition method, system, equipment and storage medium
US20140067912A1 (en) System for Remote Server Diagnosis and Recovery
JP2004348640A (en) Method and system for managing network
CN113901441A (en) User abnormal request detection method, device, equipment and storage medium
CN112565228A (en) Client network analysis method and device
CN113342608A (en) Method and device for monitoring streaming computing engine task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant