Nothing Special   »   [go: up one dir, main page]

CN114637656B - Redis-based monitoring method and device, storage medium and equipment - Google Patents

Redis-based monitoring method and device, storage medium and equipment Download PDF

Info

Publication number
CN114637656B
CN114637656B CN202210519248.3A CN202210519248A CN114637656B CN 114637656 B CN114637656 B CN 114637656B CN 202210519248 A CN202210519248 A CN 202210519248A CN 114637656 B CN114637656 B CN 114637656B
Authority
CN
China
Prior art keywords
index
instance
redis
fault
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210519248.3A
Other languages
Chinese (zh)
Other versions
CN114637656A (en
Inventor
陈实
张益军
王金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feihu Information Technology Tianjin Co Ltd
Original Assignee
Feihu Information Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feihu Information Technology Tianjin Co Ltd filed Critical Feihu Information Technology Tianjin Co Ltd
Priority to CN202210519248.3A priority Critical patent/CN114637656B/en
Publication of CN114637656A publication Critical patent/CN114637656A/en
Application granted granted Critical
Publication of CN114637656B publication Critical patent/CN114637656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a monitoring method, a device, a storage medium and equipment based on Redis, wherein the method comprises the following steps: carrying out big data analysis on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analyses corresponding to each index in the index set; after Redis is started to operate, obtaining the value of each index in the index set of each instance according to the monitoring period corresponding to each index; for each instance, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user. Compared with the prior art, the method does not need to manually perform problem troubleshooting on the examples one by one, so that the problem troubleshooting efficiency of Redis is effectively improved.

Description

Redis-based monitoring method and device, storage medium and equipment
Technical Field
The present application relates to the field of databases, and in particular, to a method, an apparatus, a storage medium, and a device for monitoring based on Redis.
Background
Remote Dictionary service (Redis) is a key-based storage system, is a cross-platform non-relational database, and can be used for caching, databases and message middleware. As the Redis is widely used in enterprises, the enterprises can encounter various problems during the use of the Redis, and when the service is found to be abnormal, the problems of the Redis need to be eliminated.
Currently, when a service is abnormal, a problem is usually troubleshoot by a human for an instance of Redis. However, in the case that the Redis includes a large number of instances, it is time-consuming to manually perform the problem troubleshooting process, and it is difficult to complete the problem troubleshooting work of each instance within a limited time.
Therefore, how to improve the problem troubleshooting efficiency of Redis becomes a problem which needs to be solved urgently in the field.
Disclosure of Invention
The application provides a monitoring method, a monitoring device, a storage medium and a monitoring device based on Redis, and aims to improve the problem troubleshooting efficiency of Redis.
In order to achieve the above object, the present application provides the following technical solutions:
a Redis-based monitoring method, comprising:
carrying out big data analysis on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analyses corresponding to each index in the index set;
configuring a monitoring period, a value range and an importance degree corresponding to each index;
after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index, and acquiring a value of each index in an index set of each instance;
for each of the instances, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;
and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.
Optionally, the performing big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and performing fault type and fault analysis corresponding to each index in the index set, includes:
grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the set of global instances comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing capacity of the global instance is not greater than a preset threshold value; the service processing capacity of the special instance is larger than the preset threshold value;
carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set;
and carrying out big data analysis on the sample operation data of each special example to obtain a second index set, and fault types and fault analysis corresponding to each index in the second index set.
Optionally, the configuring the monitoring period, the value range, and the importance degree corresponding to each index includes:
configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set;
and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.
Optionally, after the Redis is started and operated, according to a monitoring period corresponding to each index, calling a preset Info command to obtain a value of each index in the index set of each instance, where the method includes:
after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring a value of each index in the first index set of each global instance;
and calling the Info command according to the monitoring period corresponding to each index in the second index set to acquire the value of each index in the second index set of each special example.
Optionally, for each of the instances, in a case that the index set of the instance contains a problem index, identifying the instance as a target instance includes:
for each of the global instances, identifying the global instance as a target instance if a first set of metrics of the global instance contains a problem metric;
for each of the special instances, identifying the special instance as the target instance if the second set of indicators for the special instance contains the problem indicator.
A Redis-based monitoring device, comprising:
the analysis unit is used for carrying out big data analysis on the sample operation data of each instance in Redis to obtain an index set, and fault types and fault analysis corresponding to each index in the index set;
the configuration unit is used for configuring the monitoring period, the value range and the importance degree corresponding to each index;
the obtaining unit is used for calling a preset Info command according to a monitoring period corresponding to each index after the Redis is started to operate, and obtaining a value of each index in an index set of each instance;
an identification unit, configured to identify, for each of the instances, the instance as a target instance if the set of metrics of the instance contains a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;
and the warning unit is used for generating a warning prompt based on the preset id of each target example, the fault type corresponding to the problem index contained in each target example and fault analysis, and sending the warning prompt to a user.
Optionally, the analysis unit is specifically configured to:
grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the set of global instances comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing capacity of the global instance is not greater than a preset threshold value; the service processing capacity of the special instance is larger than the preset threshold value;
carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set;
and carrying out big data analysis on the sample operation data of each special example to obtain a second index set, and fault types and fault analysis corresponding to each index in the second index set.
Optionally, the configuration unit is specifically configured to:
configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set;
and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.
A computer-readable storage medium comprising a stored program, wherein the program performs the Redis-based monitoring method.
A Redis-based monitoring device, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;
the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the Redis-based monitoring method during the running process.
According to the technical scheme, big data analysis is carried out on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analysis corresponding to each index in the index set; after the Redis is started to operate, acquiring a value of each index in an index set of each instance according to a monitoring period corresponding to each index; for each instance, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user. According to the method and the device, the examples containing the problem indexes are identified as the target examples, the preset id of the target examples, the fault types corresponding to the problem indexes contained in the target examples and the fault analysis are sent to the user, the user is helped to rapidly troubleshoot the problems within a limited time, compared with the prior art, the problem troubleshooting is not needed to be carried out on the examples one by one manually, and therefore the problem troubleshooting efficiency of Redis is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a monitoring method based on Redis provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of another Redis-based monitoring method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a monitoring device based on Redis according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As shown in fig. 1, a schematic flow diagram of a monitoring method based on Redis provided in an embodiment of the present application may be applied to a server, and includes the following steps:
s101: and grouping each instance of Redis in advance to obtain a global instance set and a special instance set.
The global instance set comprises a plurality of global instances, the special instance set comprises a plurality of special instances, the service processing capacity of the global instances is not larger than a preset threshold, and the service processing capacity of the special instances is larger than the preset threshold. The so-called instance, i.e. the process of providing the Redis service, is common general knowledge familiar to the skilled person and will not be described further here.
S102: and carrying out big data analysis on the sample operation data of each global instance to obtain a first index set and fault types and fault analysis corresponding to each index in the first index set.
The sample operation data of the global instance comprises operation data when the global instance processes the business with normal efficiency and operation data when the global instance processes the business with lower efficiency. The first set of metrics includes a plurality of metrics that affect the efficiency of traffic processing for the global instance.
Specifically, each index included in the first index set at least includes: the method comprises the steps of outputting the maximum queue length of a buffer area, the maximum buffer size of an input buffer area, the current size of AOF, the blocking number of minutes of AOF, the last bgsave state, the last time used for fork, the execution times of minute full copy, the times of minute partial copy failure, the times of minute partial copy success, the times of minute connection rejection, real-time ops, the memory fragmentation rate, minute network output flow, minute network input flow, the deviation quantity difference of a master node and a slave node, user CPU consumption, kernel CPU consumption, cluster state and the number of cluster successful allocation grooves.
Through big data analysis, the fault type corresponding to the maximum queue length of the output buffer area is as follows: the server generates a block. The failure analysis corresponding to the maximum queue length of the output buffer is: the client frequently calls bigkey, the client has a large amount of commands of bulk operation hgetall, smembers and the like to acquire all elements, and the client executes monitor to block other client connections.
The fault type corresponding to the maximum buffer size of the input buffer is as follows: the server generates a block. The fault analysis corresponding to the maximum buffer size of the input buffer area is as follows: the bigkey operation executed by the server side causes slow query and blockage of client side command execution, and frequent writing operation of batch keys.
The fault types corresponding to the current size of the AOF are: the additional performance overhead is increased in rewriting the instance. The failure analysis corresponding to the current size of the AOF is: the overhead of the server memory and the disk is increased when the AOF file is too large in the process of rewriting and merging.
The fault types corresponding to the number of minutes AOF blocked are: influence client calls, have an influence on Redis performance. The failure analysis corresponding to the number of minutes AOF blocked is: the machine disk performance is not good enough, the read-write speed cannot keep up with, the host has other processes to write the disk, AOF rewrite or RDB operation consumes the disk io.
The fault type corresponding to the last bgsave state is: affecting master-slave instance handoff failures. The failure analysis corresponding to the last bgsave state is: disk space is insufficient.
The failure type corresponding to the time used for last fork is: affecting server performance. The failure analysis corresponding to the time used for last fork is: the instances consume too much memory.
The types of failures corresponding to the number of times of the minute full-scale copy execution are: full replication occurs. The failure analysis corresponding to the number of times of the minute full-scale copy execution was: the primary node replication buffer exceeds a threshold.
The failure types corresponding to the number of minute partial copy failures are: the partial copy execution fails. The failure analysis corresponding to the number of minute partial copy failures is: network failure and insufficient disk space.
The failure types corresponding to the number of times of successful minute partial replication are: the copy fails. The failure analysis corresponding to the number of successful minute segment replications was: a network failure.
The failure types corresponding to the number of connections rejected in minutes are: the client refuses the connection. The failure analysis corresponding to the number of connections rejected in minutes is: the client creates a large number of connections and there is connection leakage in the client code.
The fault types corresponding to the real-time ops are: affecting client performance. The fault analysis corresponding to real-time ops is: the method comprises the steps of suddenly increasing flow of a client, increasing machine call of the client and improving thread concurrence of a client list.
The fault types corresponding to the memory fragmentation rate are: this is a serious waste of machine memory, which may cause oom downtime in the container memory. The failure analysis corresponding to the memory fragmentation rate is as follows: there is too much memory fragmentation.
The fault types corresponding to the minute network output traffic are: causing a bottleneck to the machine network traffic. The failure analysis corresponding to the minute network output traffic is: how large the client reads the traffic.
The fault types corresponding to minute network input traffic are: causing a bottleneck to machine network traffic. The failure analysis corresponding to minute network input traffic is: how large the client writes traffic.
The fault types corresponding to the master-slave node offset difference are: the data of the master and slave instances are inconsistent. The fault analysis corresponding to the master-slave node offset difference is as follows: and due to network failure, the performance difference of machines where the master instance and the slave instance are located is large.
The fault types corresponding to the user CPU consumption are: process CPU consumes too much. The fault analysis corresponding to the user CPU consumption is: the Fork sub-process performs the task consumption.
The failure types corresponding to kernel CPU consumption are: the host process CPU consumes too much. The failure analysis corresponding to kernel CPU consumption is: the instance master process is busy.
The fault types corresponding to the cluster states are: the Redis cluster is not available. The failure analysis corresponding to the cluster state is: one or more groups of master and slave examples are down at the same time, and the machine room is in failure.
The fault types corresponding to the number of successfully allocated slots of the cluster are as follows: the client reports an error to the slot service hitting the key. The failure analysis corresponding to the number of successfully allocated slots of the cluster is as follows: the same group or a plurality of groups of master and slave examples are down, and the slot call is abnormal.
It should be noted that the above specific implementation process is only for illustration.
S103: and carrying out big data analysis on the sample operation data of each special case to obtain a second index set and fault types and fault analysis corresponding to each index in the second index set.
The sample operation data of the special instance comprises operation data when the special instance processes the business with normal efficiency and operation data when the special instance processes the business with lower efficiency. The second set of metrics includes a plurality of metrics that affect the efficiency of the business process for the particular instance.
Specifically, the second index set includes all the indexes in the first index set, and also includes a key value elimination number and a key expiration duration.
The fault types corresponding to the key value elimination numbers are as follows: the example hit rate is low. The fault analysis corresponding to the key value elimination number is as follows: the same or multiple groups of master and slave instances are down.
The fault types corresponding to the key expiration durations are: the example hit rate is low. The failure analysis corresponding to the key expiration time is as follows: master-slave instance data is lost.
It should be noted that the above specific implementation process is only for illustration.
S104: and configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set.
S105: and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.
S106: after the Redis is started and operated, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring the value of each index in the first index set of each global instance.
The preset Info command includes, but is not limited to: info clients, Info persistence, Info stats, Info replication, Info CPU, Info cluster, and the like.
Optionally, each index in the first index set and the acquisition time of each index may be stored in a preset data table, so that a user can view each index in the first index set and the acquisition time of each index at any time.
Specifically, the maximum queue length of the output buffer and the maximum buffer size of the input buffer can be obtained by calling the Info clients.
By calling Info persistence, the current size of the AOF, the number of blocked AOFs per minute, and the last bgsave state can be obtained.
By calling the Info stats, the time used by the previous fork, the number of times of executing the full-scale minute copy, the number of times of failure in the partial-minute copy, the number of times of success in the partial-minute copy, the number of connections rejected in the minute, the real-time ops, the memory fragmentation rate, the minute network output traffic, and the minute network input traffic can be obtained.
By calling Info replay, the master-slave node offset difference can be obtained.
By calling the Info CPU, the CPU consumption of the user and the CPU consumption of the kernel can be obtained.
By calling the Info cluster, the cluster state and the number of the cluster successfully distributed slots can be obtained.
It should be noted that the above specific implementation process is only for illustration.
S107: and calling an Info command according to the monitoring period corresponding to each index in the second index set, and acquiring the value of each index in the second index set of each special example.
For this reason, the manner of acquiring some indexes in the second index set can be referred to the explanation of the step S106. The remaining part of the indexes, specifically, the elimination number of the key values and the key expiration time can be obtained by calling the Info memory.
Optionally, each index in the second index set and the acquisition time of each index may be stored in a preset data table, so that a user can view each index in the second index set and the acquisition time of each index at any time.
S108: for each global instance, the global instance is identified as a target instance if the first set of metrics for the global instance contains a problem metric.
Wherein the problem indicators include: and the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value.
S109: for each particular instance, where the second set of metrics for the particular instance includes a problem metric, the particular instance is identified as the target instance.
S110: and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.
In summary, the case containing the problem index is identified as the target case, the preset id of the target case, and the fault type and fault analysis corresponding to the problem index contained in the target case are sent to the user, so that the user is helped to quickly troubleshoot the problem within a limited time.
It should be noted that, in the foregoing embodiment, reference is made to S101, which is an optional implementation manner of the Redis-based monitoring method described in this application. In addition, S107 mentioned in the above embodiment is also an optional implementation manner of the Redis-based monitoring method described in this application. For this reason, the flow mentioned in the above embodiment can be summarized as the method shown in fig. 2.
As shown in fig. 2, a schematic flow chart of another Redis-based monitoring method provided in the embodiment of the present application includes the following steps:
s201: and carrying out big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and fault types and fault analysis corresponding to each index in the index set.
S202: and configuring a monitoring period, a value range and an importance degree corresponding to each index.
S203: after the Redis is started and operated, calling a preset Info command according to a monitoring period corresponding to each index, and acquiring a value of each index in an index set of each instance.
S204: for each instance, where the set of metrics for the instance contains a problem metric, the instance is identified as the target instance.
Wherein the problem indicators include: and the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value.
S205: and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.
In summary, the method helps the user to quickly troubleshoot the problem within a limited time by identifying the example containing the problem index as the target example and sending the preset id of the target example, the fault type corresponding to the problem index contained in the target example and fault analysis to the user, and compared with the prior art, the problem is not required to be troubleshot the examples one by one manually, so that the problem troubleshooting efficiency of Redis is effectively improved.
Corresponding to the monitoring method based on Redis provided by the embodiment of the application, the embodiment of the application also provides a monitoring device based on Redis.
As shown in fig. 3, an architecture diagram of a monitoring device based on Redis provided in an embodiment of the present application includes:
the analysis unit 100 is configured to perform big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and a fault type and a fault analysis corresponding to each index in the index set.
Optionally, the analysis unit 100 is specifically configured to: grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the global instance set comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing amount of the global instance is not greater than a preset threshold value; the service processing capacity of the special case is larger than a preset threshold value; carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set; and carrying out big data analysis on the sample operation data of each special case to obtain a second index set and fault types and fault analysis corresponding to each index in the second index set.
And the configuration unit 200 is configured to configure the monitoring period, the value range, and the importance corresponding to each index.
Optionally, the configuration unit 200 is specifically configured to: configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set; and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.
The obtaining unit 300 is configured to, after the Redis is started to operate, call a preset Info command according to a monitoring period corresponding to each index, and obtain a value of each index in the index set of each instance.
Optionally, the obtaining unit 300 is specifically configured to: after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring the value of each index in the first index set of each global instance; and calling an Info command according to the monitoring period corresponding to each index in the second index set, and acquiring the value of each index in the second index set of each special example.
An identifying unit 400, configured to identify, for each instance, the instance as a target instance in a case that the index set of the instance contains a problem index; the problem indicators include: and the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value.
Optionally, the identification unit 400 is specifically configured to: for each global instance, identifying the global instance as a target instance if a first set of metrics of the global instance contains a problem metric; for each particular instance, where the second set of metrics for the particular instance includes a problem metric, the particular instance is identified as the target instance.
And the warning unit 500 is configured to generate a warning prompt based on the preset id of each target instance, the fault type corresponding to the problem index included in each target instance, and fault analysis, and send the warning prompt to the user.
In summary, the case containing the problem index is identified as the target case, the preset id of the target case, and the fault type and fault analysis corresponding to the problem index contained in the target case are sent to the user, so that the user is helped to quickly troubleshoot the problem within a limited time.
The present application further provides a computer-readable storage medium including a stored program, wherein the program executes the foregoing Redis-based monitoring method provided by the present application.
The application also provides a monitoring device based on Redis, including: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein when the programs run, the Redis-based monitoring method provided by the application is executed, and the method comprises the following steps:
carrying out big data analysis on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analyses corresponding to each index in the index set;
configuring a monitoring period, a value range and an importance degree corresponding to each index;
after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index, and acquiring a value of each index in an index set of each instance;
for each of the instances, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;
and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.
Specifically, on the basis of the above embodiment, the performing big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and a fault type and a fault analysis corresponding to each index in the index set includes:
grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the set of global instances comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing capacity of the global instance is not greater than a preset threshold value; the service processing capacity of the special instance is larger than the preset threshold value;
carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set;
and carrying out big data analysis on the sample operation data of each special example to obtain a second index set, and fault types and fault analysis corresponding to each index in the second index set.
Specifically, on the basis of the above embodiment, the configuring the monitoring period, the value range, and the importance degree corresponding to each index includes:
configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set;
and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.
Specifically, on the basis of the above embodiment, after the Redis is started to operate, according to a monitoring period corresponding to each index, calling a preset Info command to obtain a value of each index in the index set of each instance, including:
after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring a value of each index in the first index set of each global instance;
and calling the Info command according to the monitoring period corresponding to each index in the second index set to acquire the value of each index in the second index set of each special example.
Specifically, on the basis of the foregoing embodiment, identifying, for each of the instances, the instance as a target instance when the index set of the instance includes a problem index includes:
for each of the global instances, identifying the global instance as a target instance if a first set of metrics of the global instance contains a problem metric;
for each of the special instances, identifying the special instance as the target instance if the second set of indicators for the special instance contains the problem indicator.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A monitoring method based on Redis is characterized by comprising the following steps:
carrying out big data analysis on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analyses corresponding to each index in the index set;
configuring a monitoring period, a value range and an importance degree corresponding to each index;
after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index, and acquiring a value of each index in an index set of each instance;
for each of the instances, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;
and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.
2. The method according to claim 1, wherein the performing big data analysis on the sample operation data of each instance in Redis to obtain an index set, and a fault type and a fault analysis corresponding to each index in the index set comprises:
grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the set of global instances comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing capacity of the global instance is not greater than a preset threshold value; the service processing capacity of the special instance is larger than the preset threshold value;
carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set;
and carrying out big data analysis on the sample operation data of each special example to obtain a second index set, and fault types and fault analysis corresponding to each index in the second index set.
3. The method of claim 2, wherein the configuring the monitoring period, the value range, and the importance degree corresponding to each index comprises:
configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set;
and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.
4. The method as claimed in claim 3, wherein after the Redis is started and operated, according to a monitoring period corresponding to each index, calling a preset Info command to obtain a value of each index in the index set of each instance, including:
after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring a value of each index in the first index set of each global instance;
and calling the Info command according to the monitoring period corresponding to each index in the second index set to acquire the value of each index in the second index set of each special example.
5. The method of claim 4, wherein for each of the instances, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric comprises:
for each of the global instances, identifying the global instance as a target instance if a first set of metrics of the global instance contains a problem metric;
for each of the special instances, identifying the special instance as the target instance if the second set of indicators for the special instance contains the problem indicator.
6. A Redis-based monitoring device, comprising:
the analysis unit is used for carrying out big data analysis on the sample operation data of each instance in Redis to obtain an index set and fault types and fault analysis corresponding to each index in the index set;
the configuration unit is used for configuring the monitoring period, the value range and the importance degree corresponding to each index;
the obtaining unit is used for calling a preset Info command according to a monitoring period corresponding to each index after the Redis is started to operate, and obtaining a value of each index in an index set of each instance;
an identification unit, configured to identify, for each of the instances, the instance as a target instance if the set of metrics of the instance contains a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;
and the warning unit is used for generating a warning prompt based on the preset id of each target example, the fault type corresponding to the problem index contained in each target example and fault analysis, and sending the warning prompt to a user.
7. The apparatus according to claim 6, wherein the analysis unit is specifically configured to:
grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the set of global instances comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing capacity of the global instance is not greater than a preset threshold value; the service processing capacity of the special instance is larger than the preset threshold value;
carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set;
and carrying out big data analysis on the sample operation data of each special example to obtain a second index set, and fault types and fault analysis corresponding to each index in the second index set.
8. The apparatus according to claim 7, wherein the configuration unit is specifically configured to:
configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set;
and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the Redis-based monitoring method of any of claims 1-5.
10. A Redis-based monitoring device, comprising: a processor, memory, and a bus; the processor and the memory are connected through the bus;
the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the Redis-based monitoring method according to any one of claims 1 to 5.
CN202210519248.3A 2022-05-13 2022-05-13 Redis-based monitoring method and device, storage medium and equipment Active CN114637656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210519248.3A CN114637656B (en) 2022-05-13 2022-05-13 Redis-based monitoring method and device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210519248.3A CN114637656B (en) 2022-05-13 2022-05-13 Redis-based monitoring method and device, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN114637656A CN114637656A (en) 2022-06-17
CN114637656B true CN114637656B (en) 2022-09-20

Family

ID=81952778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210519248.3A Active CN114637656B (en) 2022-05-13 2022-05-13 Redis-based monitoring method and device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114637656B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880158A (en) * 2022-07-11 2022-08-09 飞狐信息技术(天津)有限公司 Redis instance diagnosis method and device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102740247A (en) * 2011-04-15 2012-10-17 中国移动通信集团山东有限公司 Method and device for generating warning message
CN106649040A (en) * 2016-12-26 2017-05-10 上海新炬网络信息技术有限公司 Automatic monitoring method and device for performance of Weblogic middleware
CN107992398A (en) * 2017-12-22 2018-05-04 宜人恒业科技发展(北京)有限公司 The monitoring method and monitoring system of a kind of operation system
CN110019503A (en) * 2017-09-01 2019-07-16 北京京东尚科信息技术有限公司 The dilatation of Redis cluster and/or the method and device of capacity reducing
CN111338901A (en) * 2020-02-26 2020-06-26 平安壹钱包电子商务有限公司 Redis monitoring method, Redis monitoring device and terminal
CN111459761A (en) * 2020-04-01 2020-07-28 广州虎牙科技有限公司 Redis configuration method, device, storage medium and equipment
CN112131090A (en) * 2020-09-30 2020-12-25 北京北信源软件股份有限公司 Business system performance monitoring method and device, equipment and medium
CN112148733A (en) * 2020-09-15 2020-12-29 珠海格力电器股份有限公司 Method, device, electronic device and computer readable medium for determining fault type
CN112948451A (en) * 2021-03-02 2021-06-11 中国建设银行股份有限公司 Fault detection method of intelligent operation and maintenance system, related device and storage medium
CN113542068A (en) * 2021-07-15 2021-10-22 中国银行股份有限公司 Redis multi-instance monitoring system and method
CN113656287A (en) * 2021-07-28 2021-11-16 北京宝兰德软件股份有限公司 Method and device for predicting software instance fault, electronic equipment and storage medium
CN114090644A (en) * 2022-01-20 2022-02-25 飞狐信息技术(天津)有限公司 Data processing method and device
CN114116391A (en) * 2021-11-19 2022-03-01 天翼数字生活科技有限公司 Redis instance health detection method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210216351A1 (en) * 2020-01-15 2021-07-15 Purdue Research Foundation System and methods for heterogeneous configuration optimization for distributed servers in the cloud
US11880271B2 (en) * 2020-03-27 2024-01-23 VMware LLC Automated methods and systems that facilitate root cause analysis of distributed-application operational problems and failures

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102740247A (en) * 2011-04-15 2012-10-17 中国移动通信集团山东有限公司 Method and device for generating warning message
CN106649040A (en) * 2016-12-26 2017-05-10 上海新炬网络信息技术有限公司 Automatic monitoring method and device for performance of Weblogic middleware
CN110019503A (en) * 2017-09-01 2019-07-16 北京京东尚科信息技术有限公司 The dilatation of Redis cluster and/or the method and device of capacity reducing
CN107992398A (en) * 2017-12-22 2018-05-04 宜人恒业科技发展(北京)有限公司 The monitoring method and monitoring system of a kind of operation system
CN111338901A (en) * 2020-02-26 2020-06-26 平安壹钱包电子商务有限公司 Redis monitoring method, Redis monitoring device and terminal
CN111459761A (en) * 2020-04-01 2020-07-28 广州虎牙科技有限公司 Redis configuration method, device, storage medium and equipment
CN112148733A (en) * 2020-09-15 2020-12-29 珠海格力电器股份有限公司 Method, device, electronic device and computer readable medium for determining fault type
CN112131090A (en) * 2020-09-30 2020-12-25 北京北信源软件股份有限公司 Business system performance monitoring method and device, equipment and medium
CN112948451A (en) * 2021-03-02 2021-06-11 中国建设银行股份有限公司 Fault detection method of intelligent operation and maintenance system, related device and storage medium
CN113542068A (en) * 2021-07-15 2021-10-22 中国银行股份有限公司 Redis multi-instance monitoring system and method
CN113656287A (en) * 2021-07-28 2021-11-16 北京宝兰德软件股份有限公司 Method and device for predicting software instance fault, electronic equipment and storage medium
CN114116391A (en) * 2021-11-19 2022-03-01 天翼数字生活科技有限公司 Redis instance health detection method, device, equipment and storage medium
CN114090644A (en) * 2022-01-20 2022-02-25 飞狐信息技术(天津)有限公司 Data processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"App-Centric and Environment-Aware Monitoring and Diagnosis in the Cloud";Tiago Carvalho et al.;《IEEE ICC 2017 SAC Symposium Cloud Communications and Networking Track》;20171231;1-7页 *
"如何完善Redis监控告警?";码农小胖哥;《CSDN博客(https://blog.csdn.net/qq_35067322/article/details/114109371)》;20210225;全文 *

Also Published As

Publication number Publication date
CN114637656A (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
US20080313502A1 (en) Systems, methods and computer products for trace capability per work unit
CN112035404B (en) Medical data monitoring and early warning method, device, equipment and storage medium
CN114328102A (en) Equipment state monitoring method, device, equipment and computer readable storage medium
CN107783829B (en) Task processing method and device, storage medium and computer equipment
CN110362455B (en) Data processing method and data processing device
CN108228322B (en) Distributed link tracking and analyzing method, server and global scheduler
CN114637656B (en) Redis-based monitoring method and device, storage medium and equipment
CN109586989B (en) State checking method, device and cluster system
CN108809729A (en) The fault handling method and device that CTDB is serviced in a kind of distributed system
CN111339466A (en) Interface management method and device, electronic equipment and readable storage medium
CN115422010A (en) Node management method and device in data cluster and storage medium
CN111309548A (en) Timeout monitoring method and device and computer readable storage medium
CN110011845B (en) Log collection method and system
CN110717130A (en) Dotting method, dotting device, dotting terminal and storage medium
CN113515481A (en) Data transmission method and device based on serial port
CN118503025A (en) Interrupt loss detection method, device, host and detection system
KR102456150B1 (en) A method and apparatus for performing an overall performance evaluation for large scaled system in real environment
CN113472881B (en) Statistical method and device for online terminal equipment
CN111143433A (en) Method and device for counting data of data bins
CN117149578A (en) Task progress monitoring method and related device
CN114860432A (en) Method and device for determining information of memory fault
CN115525392A (en) Container monitoring method and device, electronic equipment and storage medium
CN115858499A (en) Database partition processing method and device, computer equipment and storage medium
CN111427747B (en) Redis client performance monitoring method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant