Detailed Description
The existing service calling monitoring method needs to check and analyze the log records of the service system one by one, so that the data volume to be processed is huge, a large amount of time and resources are consumed, and the abnormal alarm is not easy to be performed in time. In the embodiment of the specification, a plurality of service call data under a preset service scene are obtained, each service call data comprises parameter data of a plurality of dimensions, the plurality of service call data are clustered through a pre-trained clustering algorithm to obtain a target clustering result, and further, for each class in the target clustering result, a characteristic dimension of the class is determined from the plurality of dimensions based on the parameter data of each service call data in the class under each dimension. The characteristic dimension determined by each class cluster can be used as a service calling instance in the service scene and used for monitoring abnormal calling in the service scene.
In the embodiment of the present specification, the service invocation data collected by the service system is learned through a clustering algorithm, and a feature dimension, i.e., a service invocation instance, representing a unique invocation form of similar service invocation data is extracted, so that service processing of the service system is further monitored according to the feature dimension, and a position where abnormal invocation occurs can be positioned according to the feature dimension.
For example, in one application scenario, the feature dimensions extracted for a certain class of clusters resulting from the clustering algorithm include dimension W1, dimension W2, and dimension W3. At this time, the implementation process of monitoring the abnormal call according to the above feature dimensions, i.e., the dimension W1, the dimension W2, and the dimension W3, may include:
the method comprises the steps of obtaining multiple groups of normal service calling data in a service scene in advance, wherein each group of normal service calling data comprises multiple service calling data, and analyzing parameter distribution data corresponding to a dimension W1, parameter distribution data corresponding to a dimension W2 and parameter distribution data corresponding to a dimension W3 in each group of normal service calling data respectively. For example, the dimension W3 is a dimension corresponding to a payment method, the parameter data corresponding to the dimension W3 includes a payment method 1, a payment method 2, and a payment method 3, in a certain set of normal service invocation data, there are a service invocation data corresponding to the payment method 1, b service invocation data corresponding to the payment method 2, and c service invocation data corresponding to the payment method 3, the proportion of the number of service invocation data of each payment method in the set of normal service invocation data is respectively calculated, and the proportion of each payment method is taken as the parameter distribution data corresponding to the dimension W2.
And counting the distribution data of each parameter data corresponding to the characteristic dimensionality in each group of normal service call data to obtain the normal proportion threshold range corresponding to the characteristic dimensionality. If the parameter distribution data corresponding to the feature dimension in the service call data to be analyzed in the service scene does not satisfy the normal proportion threshold range, it can be determined that the call of the service call data in the service scene in the feature dimension is abnormal.
Compared with analyzing the call logs one by one, the abnormal call of the service system is monitored through the characteristic dimension obtained by the service data processing method provided by the embodiment of the specification, the abnormal analysis is carried out on the service call data from the aspect of the characteristic dimension, and a large amount of time and system resources can be saved. And moreover, the abnormal position can be further positioned on the characteristic dimension, the range of problem troubleshooting is favorably narrowed, and the resource consumption of a computer is reduced.
Fig. 1 is a schematic diagram illustrating an operating environment suitable for a service data processing method provided in an embodiment of the present specification. As shown in fig. 1, the service data processing method provided by the embodiment of the present specification may be applied to a system architecture including a plurality of servers. Some of the servers may be used as a service system to perform specific service processing; the other part of the servers can be used as monitoring servers for executing the service data processing method provided by the embodiment of the description, and abstracting the similar service calling data of the service system into the service calling instance, so as to further monitor the service processing exception of the service system through the service calling instance.
The server can be an electronic device with data operation, storage function and network interaction function; software may also be provided that runs in the electronic device to support data processing, storage, and network interaction. The number of servers is not particularly limited in this embodiment. The server may be one server, several servers, or a server cluster formed by several servers.
In order to better understand the service data processing method provided by the embodiments of the present specification, the technical solutions of the embodiments of the present specification are described in detail below with reference to the accompanying drawings and specific embodiments, and it should be understood that the specific features in the embodiments and the embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and in a case of no conflict, the technical features in the embodiments and the embodiments of the present specification may be combined with each other. In the embodiments of the present invention, the term "a plurality" means "two or more", and the term "two or more" includes two or more.
In a first aspect, an embodiment of the present specification provides a service data processing method, which may be executed by the monitoring server. As shown in fig. 2, the method may include at least the following steps S200 to S204.
Step S200, acquiring a plurality of service call data under a preset service scene, wherein each service call data comprises parameter data of a plurality of dimensions.
In this embodiment, the service system may provide a plurality of interfaces, and by calling these interfaces, the corresponding service may be executed. The interface of the service system receives the processing request initiated by the calling party at every moment to execute the corresponding service, but the service calling data may be different each time a certain interface is called. The service invocation data may be specifically understood as parameter data used for characterizing an invocation process in service data processing, including request parameters, return parameters, and data describing information of a specific invocation flow in the invocation process. The specific calling process involved in the service data processing can be completely and clearly restored through the service calling data.
Each service invocation data includes parameter data for multiple dimensions, for example, the service invocation data may include: the system comprises an interface, an interface request parameter, an interface return parameter, a request magnitude, a directed acyclic structure of a system internal node, a called upstream system, a called downstream system, a called deployment unit and the like. The directed acyclic structure of the system internal node may be specifically understood as a call sequence of a call involved in a call flow and a called functional module in the call flow. The upstream system may be specifically understood as a callee of any specific call in the calling process. The downstream system may be specifically understood as a caller of any specific call in the call process. The deployment unit may specifically refer to a structural unit or a functional module where the called and called functional modules involved in the call flow are deployed.
It should be noted that the above-mentioned dimensions are only provided for better illustration of the embodiments of the present disclosure. In specific implementation, other types of service invocation parameters may be introduced as dimensions of the service invocation data according to specific situations, which is not limited in this specification.
In an actual application scene, the corresponding service calling data for each calling in a corresponding service scene can be obtained by collecting and analyzing the interface calling log of the service system. Specifically, a time period may be configured in advance, and a plurality of service invocation data in the preset time period may be acquired as sample data for extracting the feature dimension. The preset time period may be set according to an actual application scenario and a processing requirement, for example, the preset time period may be set to a latest time period, such as a previous hour, a previous 10 minutes, or a previous 1 minute, and the like, which is not limited in this embodiment of the specification.
In this embodiment, the preset service scenario may be one service scenario, or may also include a plurality of different service scenarios. The service scenario may be specifically understood as service processing corresponding to a calling process. In one application scenario, the business system may distinguish between different business scenarios by application name (Appname) and interface. Different service scenarios are obtained when the application names and/or interface numbers are different, the calling processes of different service scenarios correspond to different service processes, and the corresponding service calling data have different dimensions.
For example, a service system of a certain network platform includes two service scenarios, where an application name corresponding to one service scenario is A1 and an interface is B1, and an application name corresponding to the other service scenario is A2 and an interface is also B1. Table 1 shows M service invocation data of the application A1 at the interface B1, where each service invocation data includes q dimensions. In each service invocation data in the service scenario, X ij For representing parametric data in the respective dimension. Wherein i is an integer from 1 to q, and j is an integer from 1 to M. For example, a dimension is "city", and the corresponding parameter data is a city name or a feature code for indicating the city name. Similarly, table 2 shows N service invocation data of the application A2 at the interface B1, where each service invocation data includes p dimensions, and in each service invocation data in the service scenario, Y is gh For representing parametric data in the respective dimension. Wherein g is an integer between 1 and p, and h is an integer between 1 and N. It will be appreciated that between two different service scenarios, the service invocation data comprises multiple dimensionsIn the degree, at least one dimension is different, and the number of dimensions, i.e., q and p, included in the service invocation data may be the same or different.
TABLE 1
TABLE 2
And S202, clustering the plurality of service calling data based on a pre-trained clustering algorithm to obtain a target clustering result.
It is understood that a clustering algorithm is an algorithm that classifies samples based on similarity between the samples. In an embodiment of the present specification, the pre-trained clustering algorithm may be an algorithm for clustering based on a distance between samples. At this time, the similarity between the service invocation data to be learned can be characterized by the distance between the service invocation data, and the smaller the distance, the higher the similarity.
Since each dimension included in the service invocation data is a parameter related to the service invocation process, such as a request parameter, a return parameter, or the like, or a call structure, such as an upstream system, a downstream system, a deployment unit, or the like, it is necessary to calculate the distance between two service invocation data by respectively comparing whether the parameter data of the two service invocation data in the same dimension are the same. As an optional implementation manner, in the process of clustering the service invocation data obtained in step S200 by using the trained clustering algorithm, the distance between any two service invocation data may be obtained by the following steps: detecting whether parameter data of two service calling data under the same dimension are the same or not to obtain a detection result of each dimension; and obtaining the distance between the two service call data based on the detection result of each dimension.
It can be understood that the service call data in the same service scenario contains the same dimension. When the distance between two service call data is calculated, it is necessary to determine, for each dimension, whether parameter data of the two service call data are the same, for example, in a certain service scenario, one dimension of the service call data is a city where a user initiates a call, and if parameter data of the two service call data in the dimension are both "unity", it indicates that parameter data of the two service call data in the dimension are the same.
Specifically, the implementation process of obtaining the distance between the two service invocation data based on the detection result of each dimension may include: obtaining the total number of dimensions contained in the service calling data in the service scene, then obtaining the number of dimensions with different parameter data between the two service calling data according to the detection result of each dimension, and taking the ratio of the number of the dimensions with different parameter data in the total number of the dimensions as the distance between the two service calling data. Wherein the distance is a value greater than or equal to 0 and less than or equal to 1. Of course, in other embodiments of the present specification, the distance between any two service invocation data may be calculated in other manners, which is not limited herein.
As another embodiment, the similarity between the service invocation data may also be characterized by a similarity, and the greater the similarity, the higher the similarity between the service invocation data. At this time, the dimension number with the same parameter data between the two service invocation data can be obtained according to the detection result, and the ratio of the dimension number with the same parameter data in the total dimension number is taken as the similarity between the two service invocation data. The similarity is also a value greater than or equal to 0 and less than or equal to 1.
For example, assuming that the service invocation data in the service scenario includes 10-dimensional parameter data, when calculating the similarity or distance between two service invocation data, the two service invocation data have the same parameter data in 3 dimensions and different parameter data in 7 dimensions, and then the similarity between the two service invocation data may be: 3/10=0.3, the distance may be 0.7.
In the above process, in order to record which dimensions between two service invocation data have the same parameter data and which dimensions have different parameter data, the detection result may be characterized by a preset characteristic value. For example, in an application scenario, if parameter data of two service invocation data in a certain dimension are the same, for example, if the parameter data of the two service invocation data are the same in an upstream system, a detection result in the dimension may be recorded as 1, and otherwise, the detection result may be recorded as 0. At this time, the number of dimensions with a detection result of 1 is the number of dimensions with the same parameter data between the two service invocation data, and the number of dimensions with a detection result of 0 is the number of dimensions with different parameter data between the two service invocation data.
Specifically, in the above step S202, a Density-Based Clustering algorithm, such as a DBSCAN (Density-Based Clustering of Applications with Noise) algorithm, which can divide a region having a sufficient Density into clusters and find clusters of an arbitrary shape in a Spatial database having Noise, may be employed. Of course, in other embodiments of the present disclosure, other clustering algorithms such as kmeans, hierarchical clustering, manifold clustering, etc. may be used.
In addition, before the step S202 is executed, a trained clustering algorithm needs to be obtained in advance. In this embodiment, the process of training the clustering algorithm may specifically include a parameter training process. Specifically, the parameter training process may include the following steps S300 and S302.
Step S300, a training sample set is obtained, wherein the training sample set comprises a plurality of service calling data samples;
it should be noted that, when algorithm training is performed on a specific service scenario, all service invocation data samples in the training sample set are collected from the specific service scenario, and the dimensions included in all the service invocation data samples in the training sample set are the same. In addition, when algorithm training is performed on a plurality of specific service scenes simultaneously, the service call data samples in the training sample set are respectively acquired from the plurality of service scenes, and at this time, the service call data samples acquired from the same service scene in the training sample set can be divided into subsets, so that the service call data samples in the same subset have the same dimension, and thus training can be performed on each subset respectively to obtain a trained clustering algorithm corresponding to each service scene.
Step S302, clustering is carried out on the training sample set based on a preset clustering algorithm, when a clustering result does not meet a preset aggregation condition, configuration parameters in the preset clustering algorithm are adjusted according to a preset rule until the clustering result meets the preset aggregation condition, and the trained clustering algorithm is obtained.
Taking the DBSCAN algorithm as an example, when the distance is used to measure the similarity between samples, the configuration parameters of the DBSCAN algorithm include: radius and minimum neighborhood point number. It should be noted that, in the process of training the preset clustering algorithm, the initial radius and the minimum neighborhood point number may be set according to experience. At this time, the process of clustering the training sample set based on the preset clustering algorithm to obtain a clustering result may include: acquiring the distance between any two service call data samples in the training sample set; further, determining all core objects in a training sample set based on preset configuration parameters, namely the initial radius, the minimum neighborhood point number and the distance between any two service call data samples; and acquiring a direct density reachable sample of each core object in the training sample set, and acquiring a clustering result based on the direct density reachable sample of each core object.
For example, in one particular application scenario, assuming the radius is denoted as E, the minimum neighborhood point number that becomes the core object within the E neighborhood is MinPts. It should be noted that, in this embodiment, the area within the given object radius E is referred to as an E neighborhood of the object.
In the process of clustering by using the DBSCAN algorithm, the process of detecting whether each service invocation data sample in the sample set is a core object may specifically include: traversing the service call data samples in the sample set, taking any one of the service call data samples as a target sample, detecting the number of samples, of which the distance from the target sample is smaller than or equal to the radius E, in other samples except the target sample in the sample set, namely the number of samples in the E neighborhood of the target sample, when the number of samples in the E neighborhood of the target sample is larger than or equal to MinPts, judging that the target sample is a core object, otherwise, judging that the target sample is not the core object. And then taking the next service call data sample as a target sample until the traversal is finished.
After all the core objects in the sample set are determined, all the direct density reachable samples in the E neighborhood of the core object need to be further determined, and then a density connected sample set is found for all the direct density reachable samples in the E neighborhood of all the core objects. Of course, some consolidation of density reachable samples is involved in this process. Note that given a sample set D, and x i And x j All belong to D, if x i At x j E in the neighborhood of (a), and x j Is a core object, we say sample x i From sample x j Direct density is achievable from the start. Density reachability is a transitive closure that is directly density reachable, and this relationship is asymmetric, only so much as the mutual density between core objects is reachable. While density connection is a symmetric relation, the purpose of the DBSCAN algorithm is to find the largest set of density-connected objects.
For example, there are 12 core objects in the sample set, which are respectively expressed as: p1 to P12. Wherein, the density of P2 can be reached by P1, the density of P3 can be reached by P2, the density of P4 can be reached by P3, the density of P5 can be reached by P4, the density of P6 can be reached by P5, and the density of P7 can be reached by P6; p9 can be reached by the direct density of P8, P10 can be reached by the direct density of P9, P11 can be reached by the direct density of P10, and P12 can be reached by the direct density of P11. At this time, two density-connected sample sets can be obtained, specifically, the direct density reachable objects of each core object in P1 to P7 and P1 to P7 are combined into one density-connected sample set, and the direct density reachable objects of each core object in P1 to P7 and P1 to P7 are combined into another density-connected sample set. Each density connected sample set is a cluster in the clustering result.
In the training process, after a clustering result is obtained by each iteration, whether the clustering result meets a preset aggregation condition needs to be judged, when the clustering result does not meet the preset aggregation condition, configuration parameters in the preset clustering algorithm are adjusted according to a preset rule, and the next algorithm iteration process is carried out based on the adjusted configuration parameters until the clustering result meets the preset aggregation condition.
In this embodiment, the preset aggregation condition may be specifically set according to the actual application requirement. For example, the occupation ratio of the number of the service invocation data samples to be clustered in the total sample number may be set, and correspondingly, the occupation ratio of the noise points in the clustering result may also be set, where the occupation ratio of the noise points is the occupation ratio of the number of the service invocation data samples not included in any cluster in the total sample number of the training sample set. For another example, in addition to the above requirements, the preset aggregation condition may further include a requirement on the number of the clusters generated by clustering, such as determining whether the number of the clusters included in the clustering result is within a preset number range.
In one embodiment, the parameter adjustment process may include: acquiring the ratio of the number of the service call data samples in the clustering result to the total number of the samples in the training sample set; and when the ratio is smaller than a first preset threshold value, adjusting the configuration parameters according to a preset rule, clustering the training sample set based on the clustering algorithm after the configuration parameters are adjusted until the ratio is larger than or equal to the first preset threshold value, and obtaining the trained clustering algorithm.
In the implementation process, the number of the service call data samples in the clustering result is the sum of the number of the samples contained in each cluster in the clustering result, and the obtained ratio can reflect the aggregation degree of the clustering algorithm under the corresponding configuration parameters. The first preset threshold may be set according to an actual application scenario and a processing requirement, and for example, may be set to 0.9 or 0.95. For example, assuming that the training sample set includes 1000 samples, and the sum of the number of samples included in each cluster in the clustering result is 910, the ratio is: 910/1000=0.91, if the first preset threshold is 0.9, it indicates that the clustering result satisfies the preset aggregation condition.
In addition, in the training process, the process of adjusting the configuration parameters according to the preset rule may specifically include: and adjusting the radius and/or the minimum neighborhood point number according to a preset rule until the target radius and the target minimum neighborhood point number are obtained, so that the clustering result of the iteration meets a preset aggregation condition. Specifically, the preset rule may be set according to a preset aggregation condition and multiple tests corresponding to an actual application scenario. It can be understood that, if the radius is larger, the number of service call data samples included in each generated class cluster is larger, the corresponding number of aggregated clusters is smaller, and vice versa. Whereas MinPts becomes smaller, more clusters can be formed and vice versa. For example, a first step size and a second step size may be set, respectively, where the first step size is an adjustment step size of the radius, and the second step size is an adjustment step size of the minimum neighborhood point number. In the implementation process, when the ratio of the number of the service call data samples included in each cluster in the clustering result to the total number of the samples in the training sample set is smaller than a first preset threshold, the radius may be adjusted according to a first step length and/or the minimum neighborhood number may be adjusted according to a second step length on the basis of the current radius and the minimum neighborhood number, for example, the radius may be reduced and/or the minimum neighborhood number may be reduced, and the specific adjustment rule is set according to a preset aggregation condition and multiple tests.
It should be noted that, in other embodiments of the present disclosure, the preset clustering algorithm may also use an algorithm that measures the similarity between two samples through the similarity. For example, the configuration parameters of the DBSCAN algorithm described above may be set to the similarity threshold and the minimum number of samples. Correspondingly, after the similarity threshold and the minimum sample number are set according to the actual application scene, when the sample number of which the similarity with the target sample is greater than or equal to the similarity threshold is greater than or equal to the minimum sample number, determining that the target sample is a core object, and the direct density reachable sample of the core object is a sample of which the similarity with the core object is greater than or equal to the similarity threshold.
Optionally, in order to ensure the stability of the obtained clustering algorithm, after the parameter training process is completed, the process of training the clustering algorithm may further include an algorithm testing process.
As an embodiment, the algorithm testing process may include: obtaining a test sample set in a preset test period, wherein the test sample set also comprises a plurality of service calling data samples; inputting the test sample set into the clustering algorithm obtained by training in the step S302 to obtain a test clustering result; and judging whether the test clustering result meets a preset test condition, and when the test clustering result meets the preset test condition, judging that the clustering algorithm trained in the step S302 is a trained clustering algorithm.
In the algorithm testing process, the testing period can be set according to the actual application scene and the processing requirement. For example, the test period may be set to one, two, or three days after the parameter training is completed by the above step S302. In one embodiment, a plurality of specific time periods may be set within the test period, for example, when the test period is three days, 10 am of each day of the three days may be set: 00 to 11: 00. in the afternoon 17:00 to 18:00 and 21 night: 00 to 22:00 is set as a specific time period, at this time, the obtaining of the test sample set in the preset test period is specifically to obtain service call data in the specific time period in the preset test period to form a test sample set; correspondingly, the obtained samples in each specific time period of each day in the test period can be input into the clustering algorithm trained in the step S302, so as to obtain corresponding test clustering results, and further determine whether the test clustering results obtained in the test period meet the preset test conditions.
Specifically, the preset test conditions may be set according to actual application scenarios and processing requirements. For example, it may be determined whether the number of clusters included in the test clustering result is consistent with the number of clusters included in the clustering result satisfying the preset aggregation condition in step S302, and if so, it is determined that the test clustering result satisfies the preset test condition, and if not, it is determined that the test clustering result does not satisfy the preset test condition. Or, when a preset test cycle is provided with a plurality of specific time periods, the consistency degree of the test clustering results corresponding to the test samples in all the specific time periods in the test cycle may be calculated, and when the consistency degree reaches a preset consistency condition, it indicates that the clustering algorithm trained in step S302 meets the stability requirement, and it may be determined that the clustering algorithm trained in step S302 is a trained clustering algorithm.
The consistency degree can be determined according to the number distribution of the clusters contained in all the test clustering results obtained in the test period. For example, the consistency degree of the test clustering results can be represented by the ratio of the maximum number of the test clustering results with the same number of clusters to the total number of the test clustering results, and when the ratio exceeds a preset consistency threshold, the consistency degree is judged to reach a preset consistency condition. For example, 30 test clustering results are obtained in the test cycle, wherein the number of clusters included in 28 test clustering results is 10, and the number of clusters included in 2 test clustering results is 9, and at this time, the consistency degree of the test clustering results is: 28/30=0.933, assuming that the preset consistency threshold is 0.9, it indicates that the clustering algorithm trained in step S302 meets the stability requirement.
It should be noted that, when the test clustering result does not satisfy the preset test condition, the service invocation data included in the test sample set may be added to the training sample set as a new training sample set, and the configuration parameters of the clustering algorithm are adjusted, and the above-mentioned parameter training process and test process are repeatedly executed until the adjusted configuration parameters enable the clustering result of the clustering algorithm to satisfy the preset aggregation condition, and the test clustering result satisfies the preset test condition.
Further, the plurality of service call data under the corresponding service scenes can be clustered through the trained clustering algorithm, and a target clustering result is obtained. And then, the following step S204 is executed for the target clustering result, and the feature dimension corresponding to each cluster is extracted as a service call instance in the service scene.
Step S204, for each class cluster in the target clustering result, determining a characteristic dimension of the class cluster from the multiple dimensions based on parameter data of each service call data in the class cluster in each dimension, wherein the characteristic dimension of each class cluster is used for monitoring abnormal call in the preset service scene.
The target clustering result obtained by the trained clustering algorithm comprises more than two clusters, and each cluster is a service calling data set with certain similarity. Furthermore, by comparing the parameter data of each dimension of each service invocation data in the set, a characteristic dimension capable of reflecting the similarity of the service invocation data set is obtained. The characteristic dimension is an aggregation dimension of the service call data set corresponding to the corresponding class cluster.
In this embodiment, each class cluster corresponds to a group of feature dimensions, for example, assuming that the target clustering result includes 5 class clusters, 5 groups of feature dimensions can be obtained correspondingly. Wherein a set of feature dimensions may include more than two dimensions. For example, in one application scenario, a certain set of feature dimensions includes four dimensions, in turn "city & product code & order scene & payment method", where "&" represents a combination of these four dimensions. It should be noted that in other embodiments of the present specification, a feature dimension may also be one dimension.
In step S204, a characteristic dimension of the class cluster is determined from the multiple dimensions, that is, it is determined what aggregation dimension can be aggregated to obtain the service invocation data set of the class cluster. As an optional implementation, the determining a feature dimension of a class cluster from a plurality of dimensions may include: determining a dimension combination with the coverage rate exceeding a second preset threshold value from the multiple dimensions by comparing parameter data of each service call data in the class cluster under each dimension, wherein the dimension combination comprises more than one dimension in the multiple dimensions; and obtaining the characteristic dimension based on the dimension combination of which the coverage rate exceeds a second preset threshold. The coverage rate of the dimension combination is used for representing the proportion of the service invocation data with the same parameter data in the cluster under the dimension combination, that is, the coverage rate can be obtained by calculating the ratio between the number of the service invocation data with the same parameter data and the total number of the service invocation data contained in the cluster under the dimension combination in the cluster.
Specifically, the second preset threshold may be set according to actual needs, for example, may be set to 0.8 or 0.9. Taking the second preset threshold value of 0.9 as an example, it is necessary to determine, for each class cluster, a dimension combination in which the coverage rate exceeds 0.9, that is, parameter data of more than 90% of service invocation data in the class cluster in each dimension in the dimension combination are correspondingly the same. Taking a dimension combination of city & product code & order scene & payment method as an example, if the coverage rate of the dimension combination in a certain cluster exceeds 0.9, it indicates that more than 90% of service invocation data in the cluster contains the same city, the same product code, the same order scene and the same payment method.
In an implementation manner of this embodiment, the implementation process of obtaining the feature dimension based on the dimension combination in which the coverage rate exceeds the second preset threshold may include: and comparing the number of dimensions contained in each dimension combination in the dimension combinations with the coverage rate exceeding a second preset threshold, and taking the dimension contained in the dimension combination with the largest number of dimensions as the characteristic dimension. For example, if there are 5 dimension combinations whose coverage rate exceeds the second preset threshold, and the 5 dimension combinations include the number of dimensions 1, 2, and 4, respectively, then the dimension combination including the data of 4 dimensions is taken as the feature dimension. It should be noted that, when there are two dimensional combinations with the same number of dimensions and the largest number of dimensions in the dimensional combinations with the coverage rate exceeding the second preset threshold, the dimension included in the dimensional combination with the largest coverage rate is taken as the characteristic dimension, or the dimension included in any one of the two dimensional combinations is taken as the characteristic dimension.
In addition, in other embodiments of the present specification, a key dimension set may also be preset, where the key dimension set includes a plurality of specified service invocation parameters, such as "city in which" and "payment method" may be included. And selecting the dimension contained in the dimension combination with the largest number of dimensions as the characteristic dimension from the dimension combinations of which the coverage rate exceeds a second preset threshold and which correspond to at least one service calling parameter in the key attribute set.
After a group of feature dimensions corresponding to each class cluster is obtained in step S204, abnormal calls in the service processing process can be monitored according to each group of feature dimensions.
In an optional embodiment of this specification, after determining the feature dimension of the class cluster from the multiple dimensions of the service invocation data, the service data processing method may further include an exception invocation monitoring step for monitoring exception invocation in the service processing process. Specifically, the abnormal call monitoring step may include: and identifying abnormal calling data in the service calling data to be analyzed in the preset service scene based on the characteristic dimension of each class cluster. That is to say, after clustering is performed on the service invocation data in a certain service scene and the characteristic dimension is extracted, the obtained characteristic dimension can be used to identify whether the service invocation data to be analyzed in the service scene is abnormal or not. And the abnormity monitoring is carried out on the service processing process of the service system from the aspect of the characteristic dimension, namely the service calling instance, so that the monitoring efficiency is improved, and a large amount of time and system resources can be saved. Moreover, the abnormal position can be further positioned on the feature dimension, the range of problem troubleshooting is favorably narrowed, and the resource consumption of a computer is reduced.
In this embodiment, there may be various implementation manners for identifying whether the service invocation data to be analyzed in the preset service scenario is abnormal by using the obtained feature dimension. Two of them are listed below for introduction, and of course, in the specific implementation, the following two cases are not limited.
First, the following detection process may be performed for the feature dimension of each class cluster: determining abnormal parameter data corresponding to the characteristic dimension; acquiring the occurrence frequency of the abnormal parameter data in the service calling data to be analyzed, wherein the occurrence frequency is used for representing the number of the service calling data containing the abnormal parameter data in the service calling data to be analyzed; and when the occurrence frequency exceeds a third preset threshold, identifying the service calling data containing the abnormal parameter data as abnormal calling data. For example, the abnormal condition can be handled in time due to abnormal calling caused by attack of black product group. And moreover, the abnormal position can be further positioned to the specific dimension contained in the characteristic dimension, so that the range of troubleshooting problems is favorably reduced.
The service invocation data to be analyzed may be service invocation data in a specified time period, and the specified time period may be set according to actual needs, for example, may be one hour, one day, 7 days, or the like. For example, taking an example that the feature dimension includes "city & product code & order scene & payment method", assuming that there are multiple parameter data in the service invocation data under the "payment method" dimension, where one of the parameter data is error information, that is, abnormal parameter data, there are S service invocation data to be analyzed, and if there are W service invocation data in the S data that include the abnormal parameter data, it indicates that the abnormal parameter data corresponding to the feature dimension has occurred W times in the S data. In this case, the frequency of occurrence of the abnormal parameter data may be W or may be W/S.
The third preset threshold may be obtained according to a specific service scenario and multiple tests, and is used as a frequency threshold of the abnormal parameter data under a normal condition. It is understood that the third preset threshold corresponding to different sets of feature dimensions, i.e. feature dimensions extracted based on different clusters, may be set to be different. For example, if the frequency of occurrence of abnormal parameter data in a certain feature dimension is up to 100 times per day under normal conditions, the third preset threshold corresponding to the feature dimension may be set to be 10 times, that is, 1000 times, of the highest frequency of occurrence. And if the frequency of occurrence of the abnormal parameter data under the characteristic dimension exceeds 1000 times on a certain day, identifying the service calling data containing the abnormal parameter data as abnormal.
Thirdly, after the feature dimensions under the service scene are obtained through the steps S200 to S204, a normal instance library may be constructed for each group of feature dimensions in a marking manner, that is, the following process is performed for each group of feature dimensions: a normal instance library is constructed in advance, and the parameter data marked as normal in the parameter data corresponding to the feature dimension is added to the normal instance library, for example, if the feature dimension includes "city & product code & order scene & payment method", the specific city, product code, order scene and payment method marked as normal are correspondingly added to the normal instance library. After the normal instance base corresponding to each group of feature dimensions in the service scene is constructed, when whether the service calling data to be analyzed is abnormal or not is judged through a certain group of feature dimensions in the service scene, whether the parameter data of the service calling data to be analyzed in the feature dimensions are in the normal instance base corresponding to the feature dimensions or not can be judged, if the parameter data are not in the normal instance base corresponding to the feature dimensions, it is judged that the service calling data are abnormal, the abnormal position is located at the feature dimensions, and if the parameter data are not in the normal instance base corresponding to the feature dimensions, the abnormal position is not located, the abnormal position does not exist.
The service data processing method provided by the embodiment of the specification learns the service calling data under the preset service scene through the pre-trained clustering algorithm to obtain the characteristic dimension representing the unique calling form of the similar service calling data, and compared with extracting the characteristic dimension according to experience, the method is beneficial to improving the accuracy and efficiency of characteristic dimension extraction, does not need to consume a large amount of time to repeatedly adjust and verify the characteristic dimension, and reduces the resource consumption of the system in the characteristic dimension extraction. The method is suitable for feature dimension extraction of newly added service scenes, and has strong expandability.
Furthermore, the determined characteristic dimension can be used for monitoring abnormal calling in the preset service scene, and abnormal calling of a service system is monitored from the aspect of the characteristic dimension, so that compared with analyzing calling logs one by one, a large amount of time and system resources can be saved. And moreover, the abnormal position can be further positioned on the characteristic dimension, the range of problem troubleshooting is favorably reduced, and the resource consumption of the system on the positioning of the abnormal position is reduced.
In a second aspect, based on the same inventive concept as the service data processing method provided in the foregoing first aspect, an embodiment of the present specification further provides a service data processing apparatus. Referring to fig. 3, the service data processing apparatus 30 includes:
the data acquiring module 31 is configured to acquire multiple service invocation data in a preset service scene, where each service invocation data includes parameter data of multiple dimensions;
the clustering module 32 is configured to cluster the plurality of service call data based on a pre-trained clustering algorithm to obtain a target clustering result;
a dimension determining module 33, configured to determine, for each class cluster in the target clustering result, a feature dimension of each class cluster from the multiple dimensions based on parameter data of each service invocation data in the class cluster in each dimension, where the feature dimension of each class cluster is used to monitor abnormal invocation in the preset service scenario.
As an optional embodiment, the clustering algorithm is an algorithm for clustering based on a distance between service invocation data, and the distance between any two service invocation data is obtained through the following steps:
detecting whether parameter data of two service calling data under the same dimension are the same or not to obtain a detection result of each dimension;
and obtaining the distance between the two service calling data based on the detection result of each dimension.
As an optional embodiment, the service data processing apparatus 30 further includes a training module. The training module comprises:
the system comprises a sample acquisition submodule and a training sample set, wherein the sample acquisition submodule is used for acquiring a training sample set, and the training sample set comprises a plurality of service calling data samples;
the clustering submodule is used for clustering the training sample set based on a preset clustering algorithm;
and the parameter adjusting submodule is used for adjusting the configuration parameters in the preset clustering algorithm according to a preset rule when the clustering result does not meet the preset aggregation condition until the clustering result obtained by the clustering submodule meets the preset aggregation condition, so that the trained clustering algorithm is obtained.
As an alternative embodiment, the clustering submodule is configured to:
acquiring the distance between any two service call data samples in the training sample set;
determining all core objects in the training sample set based on preset configuration parameters and the distance between any two service call data samples;
and acquiring a direct density reachable sample of each core object in the training sample set, and acquiring a clustering result based on the direct density reachable sample of each core object.
As an alternative embodiment, the parameter adjusting submodule is configured to:
acquiring the ratio of the number of the service call data samples in the clustering result to the total number of the samples in the training sample set;
and when the ratio is smaller than a first preset threshold value, adjusting the configuration parameters according to the preset rules, and clustering the training sample set based on the clustering algorithm after the configuration parameters are adjusted until the ratio is larger than or equal to the first preset threshold value, so as to obtain the trained clustering algorithm.
As an alternative embodiment, the dimension determining module 33 includes:
a first determining sub-module 331, configured to determine, from the multiple dimensions, a dimension combination with a coverage rate exceeding a second preset threshold by comparing parameter data of each service invocation data in the class cluster under each dimension, where the coverage rate of the dimension combination is used to characterize the proportion of service invocation data having the same parameter data in the class cluster under the dimension combination, and the dimension combination includes more than one of the multiple dimensions;
a second determining submodule 332, configured to obtain the feature dimension based on the dimension combination in which the coverage exceeds a second preset threshold.
As an alternative embodiment, the second determining submodule 332 is configured to: and taking the dimension combination with the largest dimension as the characteristic dimension in the dimension combinations with the coverage rate exceeding a second preset threshold.
As an optional embodiment, the service data processing apparatus 30 further includes an exception identifying module, configured to: and identifying abnormal calling data in the service calling data to be analyzed in the preset service scene based on the characteristic dimension of each class cluster.
It should be noted that, in the service data processing apparatus 30 provided in the embodiment of the present specification, specific ways in which the respective modules perform operations have been described in detail in the embodiment of the method provided in the first aspect, and a detailed description thereof will not be provided here.
In a third aspect, based on the same inventive concept as the service data processing method provided in the foregoing embodiment, an embodiment of this specification further provides an electronic device, as shown in fig. 4, including a memory 404, one or more processors 402, and a computer program stored in the memory 404 and executable on the processor 402, where the processor 402 implements the steps of the service data processing method provided in the foregoing first aspect when executing the program.
Where in fig. 4 a bus architecture (represented by bus 400) is shown, bus 400 may include any number of interconnected buses and bridges, and bus 400 links together various circuits including one or more processors, represented by processor 402, and memory, represented by memory 404. The bus 400 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 405 provides an interface between the bus 400 and the receiver 401 and transmitter 403. The receiver 401 and the transmitter 403 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 402 is responsible for managing the bus 400 and general processing, while the memory 404 may be used for storing data used by the processor 402 in performing operations.
It is to be understood that the structure shown in fig. 4 is merely an illustration, and that the electronic device provided by the embodiments of the present description may further include more or less components than those shown in fig. 4, or have a different configuration than that shown in fig. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof.
In a fourth aspect, based on the same inventive concept as the service data processing method provided in the foregoing embodiments, the present specification embodiment further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the service data processing method provided in the foregoing first aspect.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications can be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, then such modifications and variations are also intended to be included in the present specification.