CN113516556A

CN113516556A - Method and system for predicting or training model based on multi-dimensional time series data

Info

Publication number: CN113516556A
Application number: CN202110523902.3A
Authority: CN
Inventors: 倪翔; 孟昌华; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-10-19

Abstract

Disclosed is a method for prediction based on multi-dimensional time series data, comprising: monitoring the multi-dimensional time series data stream to obtain a current observation point, wherein the current observation point comprises multi-dimensional characteristic data; based on the current viewpoint, applying a trained neural process model for prediction, the neural process model being trained using a plurality of previous viewpoints, each of the previous viewpoints comprising multi-dimensional feature data and corresponding label data, wherein the neural process model comprises an encoder and a decoder, the encoder comprising a cross-attention module, the cross-attention module assigning weights to the plurality of previous viewpoints based on associations between the multi-dimensional feature data of the current viewpoint and the multi-dimensional feature data of one or more of the previous viewpoints for ultimately generating a target prediction for the current viewpoint. The present application also relates to other methods, systems, apparatuses, and computer-readable storage devices. The method of the present application is capable of performing predictions more accurately based on multi-dimensional time series data.

Description

Method and system for predicting or training model based on multi-dimensional time series data

Technical Field

One or more embodiments of the present specification relate to machine learning, and more particularly, to methods, systems, apparatuses, and computer-readable storage media for predicting or training a model for multi-dimensional time-series data based.

Background

Currently, machine learning has been applied to time-series data in order to model the time-series data and perform prediction or processing (e.g., abnormality recognition or risk recognition, etc.) using the established model.

For example, in the context of online transactions and the like, modeling may typically be performed, for example, using time series data associated with the user (e.g., login history, transaction history, etc.), such that abnormal transactions (e.g., illegal transactions) or abnormal users (e.g., users performing illegal transactions) may be identified.

However, there may be problems in modeling using time series data or the like in a scenario such as an online transaction. For example, on the one hand, most machine learning models can only process single-dimensional time series data, while there are difficulties in processing multi-dimensional time series data. On the other hand, for multidimensional time series data, existing approaches typically employ a rule-based model (sometimes referred to as a baseline model). The scoring indexes of the rule-based model in different intervals are very strong in correlation, so that the problem of low accuracy exists.

Furthermore, existing machine learning models, when processing multi-dimensional time series data, typically achieve relatively high accuracy only when the data sample size is large, whereas when the data sample size is low (e.g., for a long-tailed user or a new user with a small data size), existing machine learning models are typically difficult to produce satisfactory results.

Unfortunately, for online activities of a user, especially activities such as anomaly identification in online transaction scenarios, the data that needs to be processed is often highly relevant and often times a situation with a small amount of data needs to be processed. For example, malicious entities often register as new users to perform malicious operations, and therefore identifying anomalous transactions for such new users is typically available in a small sample size.

Therefore, there is a need for a modeling scheme for multi-dimensional time series data that achieves higher accuracy with a smaller amount of data.

Disclosure of Invention

To overcome the shortcomings of the prior art, one or more embodiments of the present specification provide a more accurate modeling solution by building a private model for each user, and provide a more sophisticated protection of the privacy of the user.

One or more embodiments of the present specification achieve the above objects by the following technical solutions.

In one aspect, a method for prediction based on multi-dimensional time series data is disclosed, comprising: monitoring a multi-dimensional time series data stream to obtain a current observation point, wherein the current observation point comprises multi-dimensional feature data xt; based on the current viewpoint, applying a trained neural process model for prediction, the neural process model being trained using a plurality of previous viewpoints (xi, yi), each previous viewpoint comprising multi-dimensional feature data xi and corresponding label data yi, wherein the neural process model comprises an encoder and a decoder, the encoder comprising a cross-attention module, the cross-attention module assigning weights to the plurality of previous viewpoints based on associations between the multi-dimensional feature data xt of the current viewpoint and the multi-dimensional feature data xi of one or more previous viewpoints for finally generating a target prediction y of the current viewpoint.

Preferably, the encoder comprises a deterministic path and a hidden path, wherein the deterministic path comprises a deterministic encoder for generating a plurality of encoded representations ri based on a plurality of previous watchpoints (xi, yi), the mutual interest module generating a single aggregated representation r specific to the current watchpoint based on the multi-dimensional feature data xt of the current watchpoint, the multi-dimensional feature data xi of the one or more previous watchpoints, and the plurality of encoded representations.

Preferably, the deterministic encoder uses a self-attention model.

Preferably, the hidden path generates a hidden variable z based on the plurality of previous viewpoints (xi, yi), the decoder generating a target prediction y of the current viewpoint based on the multi-dimensional feature data xt of the current viewpoint, the representation r specific to the current viewpoint and the hidden variable z.

Preferably, the multidimensional temporal data stream is an online activity data stream of a user, and wherein the method comprises:

detecting an anomaly in online activity of the user based on the target prediction y.

Preferably, the method further comprises:

obtaining offline data, wherein detecting an anomaly in online activity of the user is further based on the offline data.

Preferably, the method further comprises:

providing a single-dimensional time series anomaly detection module, and

detecting an anomaly in online activity of the user using the single-dimensional time series anomaly detection module in conjunction with the neural process model.

Preferably, the method further comprises:

after an anomaly is detected, a cause of the anomaly is automatically determined using an attribution module.

Preferably, the method further comprises:

and after the abnormality is detected, outputting alarm information by using an alarm module.

In another aspect, a method of training a model based on multi-dimensional time series data is also disclosed, comprising: acquiring multi-dimensional time series data; generating a plurality of previous viewpoints (xi, yi) based on the multi-dimensional time series data, each viewpoint corresponding to an indication of time and comprising multi-dimensional feature data xi and corresponding label data yi; training a neural process model using a subset of the plurality of previous observation points (xi, yi) as training data, wherein the neural process model comprises an encoder and a decoder, the encoder comprising a mutual interest module configured to: assigning, for each observation point, weights to the plurality of previous observation points based on associations between the multi-dimensional feature data xi for that observation point and the multi-dimensional feature data xi for other observation points in the subset for ultimately generating a target prediction yip for that observation point, wherein the target prediction yip is used to generate a loss value with the tag data yi for that observation point, iteratively adjusting the neural process model based on the loss value to generate a trained model.

Preferably, the encoder comprises a deterministic path and a implicit path, wherein the deterministic path comprises a deterministic encoder for generating a plurality of encoded representations ri based on the subset, the mutual interest module generating a single aggregated representation r specific to the observation point based on the multi-dimensional feature data xi of the observation point, the multi-dimensional feature data xi of previous observation points in the subset, and the plurality of encoded representations.

Preferably, the deterministic encoder uses a self-attention model.

Preferably, the hidden path generates a hidden variable z based on the subset, and the decoder generates a target prediction yip for the viewpoint based on the multi-dimensional feature data xi specific to the viewpoint, the representation r specific to the viewpoint and the hidden variable z.

Preferably, the method further comprises:

calculating a loss value based on the target predictions yip and the label data yi for the one or more viewpoints; and

iteratively updating the neural process model based on the loss values to generate a trained neural process model.

In yet another aspect, a system for detecting anomalies in online activities is also disclosed, comprising: a data acquisition module configured to: monitoring an online activity data stream of a user to obtain a current observation point, wherein the current observation point comprises multi-dimensional feature data xt; a multi-dimensional time series anomaly detection module configured to: applying a trained neural process model for prediction based on the current viewpoint, the neural process model being trained using a plurality of previous viewpoints (xi, yi), each previous viewpoint comprising multi-dimensional feature data xi and corresponding label data yi, wherein the neural process model comprises an encoder and a decoder, the encoder comprising a cross-attention module that assigns weights to the plurality of previous viewpoints based on associations between the multi-dimensional feature data xt of the current viewpoint and the multi-dimensional feature data xi of one or more previous viewpoints for finally generating a target prediction y of the current viewpoint; and detecting an anomaly in online activity of the user based on the target prediction y.

Preferably, the data acquisition module is further configured to acquire offline data, wherein detecting the abnormality in the online activity of the user is further based on the offline data.

Preferably, the system further comprises a single-dimensional time-series anomaly detection module, wherein the single-dimensional time-series anomaly detection module is configured to detect anomalies in online activity of the user in conjunction with the multi-dimensional time-series anomaly detection module.

Preferably, the system further comprises an attribution module, wherein the attribution module is configured to automatically determine a cause of the abnormality using the attribution module.

Preferably, the system further comprises an alarm module, wherein the alarm module is configured to output alarm information after detecting an anomaly.

In yet another aspect, an apparatus for generating a model for a user is disclosed, comprising: a memory; and a processor configured to perform the method of any of the above.

In yet another aspect, a computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the above-described method is disclosed.

Compared with the prior art, one or more embodiments of the present specification can achieve one or more of the following technical effects:

the prediction can be performed more accurately based on the multi-dimensional time-series data;

capable of processing multi-dimensional time series data; and

prediction can be performed with a small amount of data.

Drawings

The foregoing summary, as well as the following detailed description of the embodiments, is better understood when read in conjunction with the appended drawings. It is to be noted that the appended drawings are intended as examples of the claimed invention. In the drawings, like reference characters designate the same or similar elements.

FIG. 1 shows a general flow diagram of a method for modeling multi-dimensional time series data according to an embodiment of the present description.

Fig. 2 shows a schematic diagram of a neural process model in accordance with an embodiment of the present description.

FIG. 3 illustrates a flow diagram of an example method of generating a prediction of a neural process model in accordance with an embodiment of the present description.

FIG. 4 illustrates a flow diagram of an example method for prediction based on multi-dimensional time series data in accordance with an embodiment of the present description.

FIG. 5 illustrates a flow diagram of an example method for detecting anomalies in online activity in accordance with an embodiment of the present description.

FIG. 6 illustrates a block diagram of an example system for detecting anomalies in online activity in accordance with an embodiment of the present specification.

Fig. 7 shows a schematic block diagram of an apparatus that may be used to perform the methods described above according to embodiments of the present description.

Detailed Description

The following detailed description is sufficient to enable any person skilled in the art to understand the technical content of one or more embodiments of the present specification and to implement the same, and the objects and advantages related to one or more embodiments of the present specification can be easily understood by those skilled in the art from the description, claims and drawings disclosed in the present specification.

As described above, there may be problems in modeling using time series data or the like in a scenario such as an online transaction. For example, on the one hand, most machine learning models can only process single-dimensional time series data, while there are difficulties in processing multi-dimensional time series data. On the other hand, for multidimensional time series data, existing approaches typically employ a rule-based model (sometimes referred to as a baseline model). The scoring indexes of the rule-based model in different intervals are very strong in correlation, so that the problem of low accuracy exists.

Attention Neural Processes (ANP) are a recently emerging algorithm for improving Neural Processes (NP). At present, ANP is commonly used for natural language processing and time modeling, but has not been used to process time series data, particularly multidimensional time series data. The present application notes that the ANP model has good applicability to multi-dimensional time series data, especially in online transactions (e.g., for anomaly identification), and accordingly designs a scheme for modeling the multi-dimensional time series data using the ANP process. For more details on ANP, reference may be made to the paper "ATTENTIVE NEURAL PROCESSES (attention nerve process)" published by Hyunjik Kim et al in 2019 in ICLR 2019, the contents of which are incorporated herein by reference in their entirety.

Referring to FIG. 1, an overall flow diagram of a method 100 for modeling multi-dimensional time series data is shown, in accordance with an embodiment of the present description.

The method 100 may include: at operation 102, multi-dimensional time series data may be obtained. Time series data is a type of data that may generally include a set of data point sequences arranged in chronological order. In one example, the time interval of the time series set is a constant value (e.g., 1 second, 5 minutes, 1 hour, 12 hours, 7 days, 1 month, etc.). In another example, the time intervals of the set of time series may not be constant values, but may include real-time events. Preferably, the multi-dimensional time series data includes a time stamp for each data point.

Multidimensional time series data is time series data that relates to data of multiple different dimensions.

For example, taking an online trading scenario as an example, over time, events such as user login, user transfer, user transaction, change in city where the user is located, etc. may occur, each event forming a data point. For each data point, there may be data for various dimensions. For example, for each transaction, there may be a transaction time, a transaction amount, a transaction target, a merchant name, a device identifier of a user client performing the transaction, a transaction occurrence location, and so forth. There are also some time-related statistics such as the number of logins in a 3 day period, the number of cities in a week period, the amount of trades in a month period, historical false trade percentage, etc., which may also be associated with a particular time (e.g., the time at which the statistics are performed).

The data associated with the user may also include data that does not change (or substantially does not change) over time or data that is not related to the event. One example is user base attribute data such as age, gender, city, etc. of the user. Another example is a user's equity characteristics such as the user's equity amount, kind of equity, etc. Yet another example is a payment capability characteristic of the user, such as the user's account balance, debit balance, and the like. These data may also be included as part of the multidimensional time series data, if desired, and may be time stamped (e.g., the most recently sampled time stamp). In other examples, the data may not be included as part of the multi-dimensional time series data. For example, the data may be provided separately to the model as offline data.

These data with the collected data as prediction input may be processed in a subsequent step to generate multi-dimensional feature data.

Data associated with variables to be predicted, such as whether the current transaction is an illegal transaction, whether the current user is a malicious user, and the like, may also be included in the multi-dimensional time series data. These data may be processed in subsequent steps to generate tag data.

It should be appreciated that the above description of multi-dimensional time series data is merely exemplary. Any other suitable time series data may be selected by the designer as desired.

The time series data for multiple dimensions of the user may be obtained, for example, from a data store for storing user data. The data store may be a local store of a server or cluster of servers executing the method 100, or may be a remote store, such as a cloud store, that is accessible by the server or cluster of servers.

The data store may, for example, collect time series data from multiple sources for multiple dimensions with the user. For example, the data store may obtain account data for the user from a user account server, may obtain transaction data for the user from a transaction server, may obtain location information for the user from a client device of the user, and so forth.

The method 100 may further include: at operation 104, a plurality of previous viewpoints may be generated based on the multi-dimensional time series data. For example, each previous observation point may be represented as (xi, yi). Where xi may be the multi-dimensional feature data xi for the viewpoint and yi may be the corresponding label data yi for the viewpoint. Multidimensional feature data is a variable used to predict other variables, and label data refers to the predicted variable.

Preferably, each observation point may correspond to a time indication. In one example, the time indication may be represented by a point in time. For example, the time indication may be represented as a timestamp. For example, the time indication may represent data of an event occurring at a current point in time, or may represent data of an event occurring during a corresponding point in time of the current point in time and a previous observation point. For data types associated with a point in time, the data value may be data related to an event occurring at the point in time (e.g., a transaction amount for a current transaction, etc.). For data types associated with a time period, the data value may be data counted during the corresponding time point of the current time point and the previous observation point (e.g., a total transaction amount over a certain time period).

In another example, the time indication may be represented by a time period. For example, the time indication may be represented as a time interval represented by two time stamps. For example, the time indication may represent data of an event occurring within the time interval. For example, if a transaction occurred during the time interval, the data value may be the amount of the transaction. The data value may be a total transaction amount for a plurality of transactions if the plurality of transactions occur during the transaction interval.

Assume that the sample data obtained at each time indication is x^tAnd the feature data for each viewpoint is denoted xi. In one example, the feature data for each observation point may be sample data obtained at the time indication t, i.e. xi ═ x^t. For example, x1 ═ x¹，x2＝x²,x3＝x³And so on.

In another preferred example, the feature data of each observation point may be a tensor of the acquired historical data prior to the time indication t, i.e. xi ═ x¹,x²,…,x^t]. For example, x1 ═ x¹]，x2＝[x¹,x²]，x3＝[x¹,x²,x³]And so on.

Preferably, after obtaining time series data of multiple dimensions, the time series data of multiple dimensions may be processed to generate a previous viewpoint, which may include preprocessing. For example, data cleansing, data integration, data augmentation, data transformation, and the like may be performed on time series data of multiple dimensions. Preferably, data dimensionality reduction can be performed on the time series data of the multiple dimensions.

The preprocessed data may then be subjected to feature extraction, feature selection, etc., as desired, to generate the plurality of previous viewpoints. A reshaping (reshape) operation may also be performed to fit the dimensions of the corresponding variables to the needs. Other suitable operations known to those skilled in the art may also be performed.

The method 100 may further include: at operation 106, a neural process model may be trained using a subset of the plurality of previous observation points (xi, yi) as training data. For example, after obtaining a plurality of previous viewpoints, a portion of the plurality of previous viewpoints may be selected as training data. For example, other data may be used as validation data or test data for validation or testing during or after the model training process.

Referring to fig. 2, a schematic diagram of a neural process model 200 is shown, according to an embodiment of the present description. As shown in FIG. 2, the neural process model 200 may include an encoder 202 and a decoder 204.

The encoder 202 may include two paths: deterministic path (deterministic path) and latent path (latent path). In fig. 2, deterministic paths are represented by solid lines and hidden paths are represented by dashed lines.

In general, the deterministic path includes a deterministic encoder 206 for generating a representation of each input-output pair (xi, yi). In embodiments of the present specification, there is also a mutual attention module in the deterministic path for generating a single aggregated representation.

The hidden path may include a hidden encoder 208 and an average aggregation module (denoted by m and-in fig. 2) for generating a hidden variable z.

The decoder 204 may include a decoding module. In fig. 2, the decoding module is shown as an MLP (multi-layer perceptron) for generating the prediction based on the output from the encoder.

The specific operation of the neural process model 200 of fig. 2 is described in detail below with reference to fig. 3. Referring to FIG. 3, a flow diagram of a method 300 of generating a prediction of a neural process model is shown, in accordance with an embodiment of the present description. Specifically, the method 300 uses a plurality of previous observation points (xi, yi) to generate a prediction (shown as y in fig. 3) of the feature data (shown as x in fig. 2) for one observation point. In this context, the observation point (xi, yi) in fig. 2 may also be referred to as a context point, which may be a subset of the training data (e.g., a batch of training data). x may also be referred to as a target query, and when applying the model to generate predictions, x may represent characteristic data of the current viewpoint (i.e., the viewpoint for which the prediction is to be made), which may be from, for example, a real-time data stream, or the like; in training the model, x may be taken from previous observation points in the training data. y may be referred to as a target prediction, which is a prediction generated using the neural process model; when training the model, the target prediction may be compared to the actual label to generate a loss value.

Specifically, the method 300 may include: at operation 302, a plurality of previous observation points (xi, yi) in the training data are passed into deterministic encoder 206 in encoder 202 to generate a plurality of representations ri. The previous observation points (xi, yi) are, for example, (x1, y1), (x2, y2) and (x3, y3) shown in fig. 2. Of course, in training, the afferent observation points may be a batch of previous observation point sets, and the neural process model may be iterated over multiple batches to continually update the neural process model to generate a trained neural process model.

In particular, the previous observation points are respectively passed into a deterministic encoder 206 and a hidden encoder 208 in the encoder 202.

In some examples, the deterministic encoder 206 and the latent encoder 208 may employ an MLP (multi-layer perceptron) model.

In a preferred example, the deterministic encoder 206 and the implicit encoder 208 may employ a self-attention model. The self-attention model may enable the rendering of interactions between the various previous viewpoints (xi, yi). By using a self-attention model, embodiments of the present specification may embody associations between different prior viewpoints, enabling predictions to be generated with explicit or implicit correlations between various viewpoints.

The deterministic encoder 206 may generate a respective representation ri for each previous observation point (xi, yi). For example, in fig. 2, deterministic encoder 206 may generate r1, r2, and r3 for the previous observation points (x1, y1), (x2, y2), and (x3, y3), respectively.

The method 300 may include: at operation 304, the multiple representations (r 1, r2, and r3 in fig. 2), the multi-dimensional feature data xi (x1, x2, and x3 in fig. 2) of the multiple previous viewpoints, and the multi-dimensional feature data x (i.e., the target query) of the current viewpoint may be passed to the cross-attention module 210 to generate a single aggregate representation r specific to the current viewpoint.

The cross-concern module 210 may employ any of a laplacian model, a dot product model, a multi-headed model, depending on the particular situation. The specific details of these models are not described herein. It will be appreciated that these models may enable the single aggregate representation r generated to focus on the associations between the target query (i.e., the characteristic data x of the current viewpoint) and the characteristic data of other viewpoints (e.g., r1, r2, r 3). By paying attention to the correlation, the model can embody the change of data in a time dimension, so that the model has high accuracy in processing time series data.

The method 300 may include: at operation 306, a plurality of previous observation points (xi, yi) in the training data may be passed into the hidden path to generate the hidden variable z. Specifically, first, the plurality of previous observation points (xi, yi) are passed into the hidden encoder 208 in the encoder 202 to generate a plurality of hidden encoder outputs. The plurality of hidden encoder outputs are then passed to an average aggregation module for average-aggregation (mean-aggregation) to generate the hidden variable z.

The method 300 may further include: at operation 308, the multi-dimensional feature data x of the current viewpoint, the representation r specific to the current viewpoint, and the hidden variable z may be passed to the decoder 204 to generate a target prediction y of the current viewpoint.

For example, the decoder may employ an MLP model. MLP models are commonly used models in neural process models. The specific details of the MLP model are not described further herein.

After understanding how the neural process model 200 is used to generate target predictions for particular observation points, it is known how to train the neural process model 200.

For example, in training a neural process model, the feature data xi in each previous observation point (xi, yi) in the training data may be used in turn as a target query x to generate a target prediction y for the previous observation point using the neural process model, and the target prediction y is compared to the actual label data yi for the previous observation point to generate a loss value, and the neural process model is iteratively updated with the loss value. After certain conditions are met (e.g., a set number of iterations is reached, or a loss value reaches a target threshold), training may be stopped and a trained neural process model may be generated.

Of course, the trained neural process model may be validated and tested to improve or retrain the neural process model as needed to improve performance of the neural process model.

Referring to FIG. 4, a flow diagram of an example method 400 for prediction based on multi-dimensional time series data is shown, in accordance with an embodiment of the present description.

As shown in fig. 4, the method 400 may include: at operation 402, a multidimensional data stream may be monitored to obtain a current viewpoint that includes multidimensional feature data xt. For example, the multidimensional data stream may preferably be received in real-time and the data in the multidimensional data stream converted to a viewpoint. The current viewpoint may include only multi-dimensional feature data xt, which may be used to generate a target prediction for the current viewpoint.

The method 400 may include: at operation 404, a trained neural process model may be applied for prediction based on the current observation point. The neural process model may be trained using a plurality of previous observation points (xi, yi), each of which includes multi-dimensional feature data xi and corresponding label data yi. For example, training of the neural process model may be performed using the methods described above with reference to fig. 1-3 to generate a trained neural process model (e.g., the neural process model 200 of fig. 2). In the prediction process, part or all of the plurality of previous observation points (xi, yi) may also be used.

As described above with reference to fig. 2, the neural process model 200 may include an encoder 202 and a decoder 204. The encoder 204 may include a deterministic path and a hidden path. The deterministic path may include a deterministic encoder 206 and the hidden path may include a hidden encoder 208. In one example, the deterministic encoder 206 and the hidden encoder 208 may employ an MLP model. While in a preferred example, the deterministic encoder 206 and the implicit encoder 208 may employ a self-attention model.

The deterministic encoder 206 may be used to generate a plurality of encoded representations ri based on a plurality of previous observation points (xi, yi). The encoder 202 may also include a mutual attention module 210. The mutual attention module 210 may assign weights to one or more previous viewpoints based on the association between the multi-dimensional feature data xt of the current viewpoint and the multi-dimensional feature data xi of the previous viewpoints for finally generating the target prediction y of the current viewpoint. In particular, the mutual attention module 210 may generate a single aggregated representation r specific to the current viewpoint based on the multi-dimensional feature data xt for the current viewpoint, the multi-dimensional feature data xi for the one or more previous viewpoints, and the plurality of encoded representations.

The hidden path may generate a hidden variable z based on the plurality of previous observation points (xi, yi). The decoder 204 may generate a target prediction y of the current viewpoint based on the multi-dimensional feature data xt of the current viewpoint, the representation r specific to the current viewpoint and the hidden variable z.

As described above, the multidimensional temporal data stream may be a user's online activity data stream, and a method may be used to detect anomalies in the user's online activity based on the target predictions y, as described below with reference to fig. 5.

Referring to FIG. 5, a flow diagram of an example method 500 for detecting anomalies in online activity is shown, in accordance with an embodiment of the present specification. The method 500 is described below in conjunction with the block diagram of the example system 600 for detecting anomalies in online activity of FIG. 6.

The method 500 may include: at operation 502, a user's online activity data stream may be monitored to obtain a current point of view. This operation may be performed, for example, by data acquisition module 602 of fig. 6. For example, the data acquisition module 602 may receive the online activity data stream in real-time. As described above, the online activity data stream may come from various sources, such as a transaction system, a payment system, a user client, and so forth. Various time series data as introduced above may be included in the online activity data stream.

In a preferred example, the data acquisition module 602 may also receive an offline data stream. The offline data stream may include, for example, non-real-time data. For example, the offline data stream may include user base attribute data, payment capability data, and the like, as described above. In other examples, the offline data stream may also include non-real-time-series data. As mentioned above, the offline data flow may also be used by the neural process model.

After acquiring the data, a viewpoint may be generated based on the acquired data, as described above in step 104 or 402. The operation of generating the observation point may be performed by the data acquisition module 602 or the anomaly detection module 604 of fig. 6. The data (or observation points) acquired by the data acquisition module 602 may be communicated to the anomaly detection module 604.

The method 500 may include: at operation 504, a trained neural process model may be applied for prediction based on the current observation point to obtain a target prediction.

The method 500 may further include: at operation 506, an anomaly in the user's online activity may be detected based on the target prediction. For example, the target prediction may directly indicate that there is an anomaly in the user's online activity (e.g., the target prediction indicates an anomalous transaction). Alternatively, the target prediction may be compared to a threshold to determine whether there is an anomaly in the user's online activity.

The above operations of obtaining a target prediction and detecting an anomaly may be performed, for example, by a neural process model in the anomaly detection module 604 of FIG. 6. For example, the anomaly detection module 604 may include a multi-dimensional time series anomaly detection module and may also include a single-dimensional time series anomaly detection module. The multi-dimensional time series anomaly module may employ, for example, a neural process model as described above. In this case, the target prediction may be generated based on the current viewpoint using the operations described above (as in step 404 of method 400). The single-dimensional time series anomaly detection module may include, for example, a robust time-frequency model, an ESD model, a CNN model, a twin network model, and the like. Any other suitable model may also be employed to perform anomaly detection.

The method 500 may further include: preferably, at operation 508, after the abnormality is detected, the cause of the abnormality generation may be automatically determined. This operation may be performed, for example, by attribution analysis module 606 of FIG. 6. The attribution analysis module 606 may be performed, for example, using a contribution droppers model (e.g., a multi-layer contribution droppers model), a Drillup model, or the like. Any other suitable attribute analysis module may also be employed to automatically determine the cause of the anomaly.

The method 500 may further include: preferably, at operation 510, an alert message may be output. For example, a rules engine may be used to output alarm information based on alarm rules. For example, the rules engine may determine whether to perform an alarm, in what manner to perform an alarm, and so on based on predetermined rules.

The method skillfully utilizes the consistency and the correlation of the user behaviors in different time frames by applying the mutual attention model, and also skillfully utilizes the correlation between data in different dimensions in the same time frame by applying the self-attention model, thereby utilizing the internal correlation of time sequence data in two dimensions of time and a user and improving the accuracy of model identification.

In addition, the neural process model described herein is particularly suitable for anomaly detection in the case of a small amount of historical data or no historical data, since it can make full use of time series data of multiple dimensions and the correlation between the respective observation points.

Referring to fig. 7, a schematic block diagram of an apparatus 700 that may be used to perform the methods described above is shown, according to an embodiment of the present description.

The apparatus may include a processor 710 configured to perform any of the methods described above, and a memory 715. The memory may store, for example, acquired data (e.g., an online data stream or an offline data stream, etc., as described above). The processor may perform any of the methods as described above.

The apparatus may include a network connection device 725, which may include, for example, a network connection device connected to other devices (e.g., a server may be connected to a user's client and/or other servers to which the client may be connected) by a wired connection or a wireless connection. The wireless connection may be, for example, a WiFi connection, a Bluetooth connection, a 3G/4G/5G network connection, or the like.

The device may also include other peripheral components 720 such as a keyboard and mouse.

Each of these modules may communicate with each other directly or indirectly, e.g., via one or more buses such as bus 705.

Also, the present application discloses a computer-readable storage medium comprising computer-executable instructions stored thereon, which, when executed by a processor, cause the processor to perform the method of the embodiments described herein.

Additionally, an apparatus is disclosed that includes a processor and a memory having stored thereon computer-executable instructions that, when executed by the processor, cause the processor to perform the method of the embodiments described herein.

Additionally, a system comprising means for implementing the methods of the embodiments described herein is also disclosed.

It is to be understood that methods according to one or more embodiments of the present description can be implemented in software, firmware, or a combination thereof.

It should be understood that the embodiments in the present specification are described in a progressive manner, and the same or similar parts in the embodiments are referred to each other, and each embodiment is described with emphasis on the differences from the other embodiments. In particular, as to the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple and reference may be made to some descriptions of the method embodiments for related points.

It should be understood that the above description describes particular embodiments of the present specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

It should be understood that an element described herein in the singular or shown in the figures only represents that the element is limited in number to one. Furthermore, modules or elements described or illustrated herein as separate may be combined into a single module or element, and modules or elements described or illustrated herein as single may be split into multiple modules or elements.

It is also to be understood that the terms and expressions employed herein are used as terms of description and not of limitation, and that the embodiment or embodiments of the specification are not limited to those terms and expressions. The use of such terms and expressions is not intended to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications may be made within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims should be looked to in order to cover all such equivalents.

Also, it should be noted that while the present invention has been described with reference to specific exemplary embodiments, it should be understood by those skilled in the art that the above embodiments are merely illustrative of one or more embodiments of the present invention, and various changes and substitutions of equivalents may be made without departing from the spirit of the present invention, and therefore, it is intended that all changes and modifications to the above embodiments be included within the scope of the appended claims.

Claims

1. A method of prediction based on multi-dimensional time series data, comprising:

monitoring a multi-dimensional time series data stream to obtain a current observation point, wherein the current observation point comprises multi-dimensional feature data xt;

applying a trained neural process model for prediction based on the current viewpoint, the neural process model being trained using a plurality of previous viewpoints (xi, yi), each previous viewpoint comprising multi-dimensional feature data xi and corresponding label data yi,

wherein the neural process model comprises an encoder and a decoder, the encoder comprising a cross-attention module that assigns weights to one or more previous viewpoints based on an association between the multi-dimensional feature data xt of the current viewpoint and the multi-dimensional feature data xi of the previous viewpoints for finally generating a target prediction y of the current viewpoint.

2. The method of claim 1, wherein the encoder comprises a deterministic path and a hidden path, wherein the deterministic path comprises a deterministic encoder for generating a plurality of encoded representations ri based on a plurality of previous watchpoints (xi, yi), the mutual interest module generating a single aggregate representation r specific to the current watchpoint based on the multi-dimensional feature data xt of the current watchpoint, the multi-dimensional feature data xi of the one or more previous watchpoints, and the plurality of encoded representations.

3. The method of claim 2, wherein the deterministic encoder uses a self-attention model.

4. The method of claim 2, wherein the hidden path generates a hidden variable z based on the plurality of previous watchpoints (xi, yi), the decoder generating a target prediction y of the current watchpoint based on the multi-dimensional feature data xt of the current watchpoint, a representation r specific to the current watchpoint and the hidden variable z.

5. The method of claim 1, wherein the multi-dimensional temporal data stream is an online activity data stream of a user, and wherein the method comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method of claim 5, wherein the method further comprises:

providing a single-dimensional time series anomaly detection module, and

8. The method of claim 5, wherein the method further comprises:

9. The method of claim 5, wherein the method further comprises:

10. A method of training a model based on multi-dimensional time series data, comprising:

acquiring multi-dimensional time series data;

generating a plurality of previous viewpoints (xi, yi) based on the multi-dimensional time series data, each viewpoint corresponding to a time indication and comprising multi-dimensional feature data xi and corresponding label data yi;

training a neural process model using a subset of the plurality of previous observation points (xi, yi) as training data, wherein the neural process model comprises an encoder and a decoder, the encoder comprising a mutual interest module configured to: assigning, for each observation point, weights to the plurality of previous observation points based on associations between the multi-dimensional feature data xi for that observation point and the multi-dimensional feature data xi for other observation points in the subset for ultimately generating a target prediction yip for that observation point, wherein the target prediction yip is used to generate a loss value with the tag data yi for that observation point, iteratively adjusting the neural process model based on the loss value to generate a trained model.

11. The method of claim 10, wherein the encoder comprises a deterministic path and a implicit path, wherein the deterministic path comprises a deterministic encoder for generating a plurality of encoded representations ri based on the subset, the mutual attention module generating a single aggregated representation r specific to the observation point based on the multi-dimensional feature data xi of the observation point, the multi-dimensional feature data xi of previous observation points in the subset, and the plurality of encoded representations.

12. The method of claim 11, wherein the deterministic encoder uses a self-attention model.

13. The method of claim 11, wherein the hidden path generates a hidden variable z based on the subset, the decoder generating a target prediction yip for the viewpoint based on the multi-dimensional feature data xi specific to the viewpoint, the representation r specific to the viewpoint, and the hidden variable z.

14. The method of claim 11, wherein the method further comprises:

15. A system for detecting anomalies in online activities, comprising:

a data acquisition module configured to: monitoring an online activity data stream of a user to obtain a current observation point, wherein the current observation point comprises multi-dimensional feature data xt;

a multi-dimensional time series anomaly detection module configured to:

applying a trained neural process model for prediction based on the current viewpoint, the neural process model being trained using a plurality of previous viewpoints (xi, yi), each previous viewpoint comprising multi-dimensional feature data xi and corresponding label data yi, wherein the neural process model comprises an encoder and a decoder, the encoder comprising a cross-attention module that assigns weights to the plurality of previous viewpoints based on associations between the multi-dimensional feature data xt of the current viewpoint and the multi-dimensional feature data xi of one or more previous viewpoints for finally generating a target prediction y of the current viewpoint; and

16. The system of claim 15, wherein the data acquisition module is further configured to acquire offline data, wherein detecting an anomaly in online activity of the user is further based on the offline data.

17. The system of claim 15, wherein the system further comprises a single-dimensional time-series anomaly detection module, wherein the single-dimensional time-series anomaly detection module is configured to detect anomalies in online activity of the user in conjunction with the multi-dimensional time-series anomaly detection module.

18. The system of claim 15, the system further comprising an attribution module, wherein the attribution module is configured to automatically determine a cause of the anomaly using the attribution module.

19. The system of claim 15, wherein the system further comprises an alert module, wherein the alert module is configured to output alert information after an anomaly is detected.

20. An apparatus for generating a model for a user, comprising:

a memory; and

a processor configured to perform the method of any one of claims 1-9.

21. An apparatus for generating a model for a user, comprising:

a memory; and

a processor configured to perform the method of any one of claims 10-14.

22. A computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the method of any of claims 1-9.

23. A computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the method of any of claims 10-14.