Nothing Special   »   [go: up one dir, main page]

CN105556552A - Fraud detection and analysis - Google Patents

Fraud detection and analysis Download PDF

Info

Publication number
CN105556552A
CN105556552A CN201480026670.9A CN201480026670A CN105556552A CN 105556552 A CN105556552 A CN 105556552A CN 201480026670 A CN201480026670 A CN 201480026670A CN 105556552 A CN105556552 A CN 105556552A
Authority
CN
China
Prior art keywords
event
user
fraud
model
account
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201480026670.9A
Other languages
Chinese (zh)
Inventor
克雷格·普里斯
史蒂文·施拉姆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guardian Analytics Inc
Original Assignee
Guardian Analytics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guardian Analytics Inc filed Critical Guardian Analytics Inc
Publication of CN105556552A publication Critical patent/CN105556552A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

Systems and methods comprise a platform including a processor coupled to a database. Risk engines are coupled to the platform and receive event data and risk data from data sources. The event data comprises data of actions taken in a target account during electronic access of the account, and the risk data comprises data of actions taken in a accounts different from the target account. The risk engines, using the event data and the risk data, dynamically generate an account model that corresponds to the target account, and use the account model to generate a risk score. The risk score represents a relative likelihood an action taken in the target account is fraud. A risk application coupled to the platform include.es an analytical user interface that displays for the actions in the target account at least one of the risk score and event data of any event in the account.

Description

Fraud detection and analysis
RELATED APPLICATIONS
This application claims the benefit of U.S. patent application No. 61/779,472 filed on 3/13/2013.
This application is a continuation of part of U.S. patent application nos. 12/483,887 and 12/483,963, both filed on 12/6/2009.
This application is a continuation of the section of U.S. patent application No. 13/632,834 filed on day 1/10/2012.
Technical Field
The disclosure herein relates generally to fraud detection and analysis. In particular, the present disclosure relates to fraud detection using behavior-based modeling.
Background
Tracking fraud in an online environment is a problematic issue. Fraudsters' strategies are evolving rapidly and the current sophisticated criminal approaches mean that online account fraud often looks completely unlike fraud. In fact, the fraudster may look and behave exactly like a customer. Because today's fraudsters use a multi-channel fraud approach that combines both online and offline steps, which appear to be entirely acceptable, but when considered in a combined amount, equates to a fraud attack, accurate detection is made more difficult. Identifying truly suspicious events worth action by limited fraudulent resources is like a large sea fishing needle.
Thus, customer financial and information assets are still at risk, and the integrity of the online channel is at risk. Companies are completely unaware of the resources of and reacting to every possible online fraud threat. Present day attacks expose the shortfalls of past online fraud prevention technologies that do not keep up with organized fraud networks and their surprising speed of innovation.
Reactive strategies are no longer effective for fraudsters. Often, financial institutions are only aware that fraud has occurred when customers complain about loss. Attempts to prevent fraudsters from being no longer practical by defining new probing rules after the fact, because each new fraud pattern cannot be anticipated and responded to at all. Remaining in the reactive mode makes it more difficult to track the performance of online anti-risk measures over time. Adequate monitoring of trends, policy control, and compliance requirements still escapes many organizations.
Conventional techniques that wish to address the problem of online fraud, while generally useful and even necessary for security layers, fail to address the core problem. These solutions often borrow technology from other market areas (e.g., credit card fraud, web page analysis) and then attempt to extend the functionality for online fraud detection using the hybrid results. Often, these solutions negatively impact the online user experience.
Traditional alternatives to attempting to address the online fraud problem include authentication solutions based on multiple factors and risks and transaction monitoring solutions based on fraud rules, fraud indicators, and fraud patterns. Multiple factor and risk based authentication solutions are inefficient because they typically result in high error detection (false positives) and return uncontrollable information. Authentication failures and requirements for challenge issues are not accurate indicators of fraud and the challenge rate is too high to take action with limited fraud investigation resources. Their fraud detection capabilities (e.g., device identification, cookies, etc.) do not deliver the required performance and lack the rich behavioral models and account history required to investigate suspicious activity. Recently, fraudsters have demonstrated the ability to circumvent this technology altogether.
Transaction monitoring solutions based on fraud rules, fraud indicators, and fraud patterns generally always lag behind the latest fraud techniques. These solutions only react to known threats and do not identify new threats as they occur. These solutions require complex rule development and maintenance of the known fraudulent "truth sets" for algorithm training and ongoing "care and attentional" maintenance to stay as current as possible. As a result, these solutions are unable to find out new fraud types and patterns. When a violation occurs, most return minimal detail about any given instance of fraud, a small amount of background, limited features of individual user behavior, invisible analysis, a small granularity of risk score, and minimal forensics (forensic).
Incorporation by reference
The entire contents of each patent, patent application, and/or publication mentioned in this specification is incorporated herein by reference to the same extent as if each individual patent, patent application, and/or publication was specifically and individually indicated to be incorporated by reference.
Drawings
Fig. 1 is a block diagram of a Fraud Prevention System (FPS) according to an embodiment.
Fig. 2A and 2B illustrate block diagrams of an FPS integrated with an online banking application, according to an embodiment.
Fig. 3 is a flow diagram of a method of predicting expected behavior using FPS according to an embodiment.
FIG. 4 is a flow diagram of a method of estimating an action of an account owner using an FPS, according to an embodiment.
Fig. 5 is a flow diagram of a method for determining a relative likelihood of a future event being performed by a user versus a future event being performed by a fraudster using an FPS according to an embodiment.
FIG. 6 is a flow diagram of generating an alert for possible fraudulent activity using an FPS, according to an embodiment.
Fig. 7 illustrates the use of conventional fraud techniques ("fraud knowledge") applied to activities of a user ("normal user") according to the prior art.
FIG. 8 illustrates the use of dynamic account modeling applied to a user's activities according to an embodiment.
Fig. 9 is an example screen of an FPS graphics interface (AUI) according to an embodiment.
Fig. 10 illustrates a variation of an example screen (fig. 9) of an FPS graphics interface (AUI) according to an embodiment.
Fig. 11 is an example AUI illustrating normal usage behavior for a user, according to an embodiment.
Fig. 12 is an example AUI showing a first red alert for a user, according to an embodiment.
FIG. 13 is an example AUI showing a second red alert for a user, according to an embodiment.
Fig. 14 is a diagram illustrating an additional example AUI for a user account, according to an embodiment.
Fig. 15 is an example AUI showing a fraud matching view according to an embodiment.
Fig. 16 is another example AUI showing results obtained in a fraud matching view plotted against time, in accordance with an embodiment.
Fig. 17 is a block diagram of a FraudMAP system according to an embodiment.
FIG. 18 is a block diagram of a FraudMAP online system according to an embodiment.
Fig. 19 is a block diagram of a FraudMAP mobile system according to an embodiment.
Fig. 20 is a block diagram of a frandmap supporting a mobile deployment scenario, according to an embodiment.
Fig. 21 is a block diagram of a FraudMAPACH system, according to an embodiment.
Fig. 22 is a block diagram of a FraudDESK system according to an embodiment.
Fig. 23 is a block diagram of a map (Reflex) according to an embodiment.
FIG. 24 is a block diagram of a fraud prevention component according to an implementation.
Fig. 25 is a flow diagram of fraud prevention using the FraudMAP system according to an embodiment.
FIG. 26 is a block diagram of a platform for the FraudMAP product, according to an embodiment.
Fig. 27 is a diagram of the riskenengine of the FraudMAP system according to an embodiment.
Fig. 28A and 28B (collectively fig. 28) show block diagrams of the frandmap data store and data flow, according to an embodiment.
Fig. 29 is a diagram of a data converter process according to an embodiment.
Fig. 30 is a flowchart of the RiskFeed (risk feed) process according to the embodiment.
Fig. 31 is a transaction diagram of the RiskFeed process according to an embodiment.
FIG. 32 is a block diagram of a JBoss application server and ModelMagic technology infrastructure, according to an embodiment.
FIG. 33 is a block diagram of model generation and metadata generation according to an embodiment.
Fig. 34 is a diagram illustrating a risk engine table, according to an embodiment.
Fig. 35 is a diagram illustrating an architecture mapping, according to an embodiment.
Detailed Description
Described below are fraud prevention systems and methods for providing real-time risk management solutions that protect online channels and offline channels for use in preventing account fraud and identity theft. The fraud prevention systems and methods described herein, collectively referred to herein as Fraud Prevention Systems (FPSs), use behavior-based modeling and rich analysis to support end-to-end online risk management processing. As described in detail below, the FPS provides an analysis-based software solution that handles the entire risk management lifecycle.
As part of the integrated risk management solution, the FPS in an embodiment links data analysis, online domain and fraud specialization evaluations by: provide predictive models of individual behavior, dynamically adjust to identify anomalies and suspicious activity, and then provide alarms and adequate investigation capabilities that can work. The FPS automatically detects new and evolving fraud threats without any fraud rule/pattern development or ongoing maintenance efforts.
In the following description, numerous specific details are introduced to provide a thorough understanding of, and enabling description for, embodiments of an FPS. One skilled in the relevant art will recognize, however, that the embodiments can be practiced without one or more of the specific details, or with other components, systems, etc. In other instances, well-known structures or operations are not shown, or are not described in detail, to avoid obscuring aspects of the disclosed embodiments.
In the description and examples provided herein, a user or customer is the owner of an account, a fraudster is anyone who is not the user or account owner, and an analyst or employee is a user of the FPS system.
Fig. 1 is a block diagram of an FPS100 according to an embodiment. FPS100 includes a risk engine 102 coupled to a risk application 104. The risk engine 102 includes or provides an application that uses predictive models of individual online customer behavior and analytics that together detect fraud and minimize false positives. Unlike traditional approaches, risk engine applications include real-time dynamic account modeling that automatically detects new fraud attacks without rule development or algorithm training. Risk application 104 features a visual analytics interface that facilitates investigation, resolution, and risk monitoring. The visual analytics interface included in the risk application 104 and/or coupled to the risk application 104 is also referred to herein as an Analytics User Interface (AUI). Not only simple alerts, the risk application 104 also delivers high fidelity risk scores to analysts, as well as extensive background information behind the risk scores to support comprehensive analysis and investigation.
The risk engine 102 of embodiments uses predictive models of individual online customer behavior to detect new and emerging fraud schemes, and as such, it distinguishes normal user behavior from suspicious activity. The risk engine 102 may use fraud models based on available known information about fraud threats, but without relying on knowledge of detailed fraud patterns or predefined fraud rules. To facilitate integration with the customer's online channel, the risk engine 102 features both a real-time API and a file-based batch controller for a wider integration and deployment option.
As described herein, the risk engine 102 includes dynamic account modeling. Dynamic account modeling, also referred to herein as "predictive modeling" or "modeling," uses predictive models of each individual online user behavior. Because the risk engine 102 does not depend on predefined fraud rules and automatically detects anomalous behavior, new threats are detected when they occur. In addition, the risk engine 102 readily handles real-world situations such as changing user and fraudster behavior, using proxy servers, corporate firewalls, dynamic IP addresses, and client hardware and software upgrades. The advanced statistical model of the risk engine is based on the following probabilities: the individual user behaviors are dynamically adjusted, recognizing that each user behavior is different and that behaviors that may be abnormal for one user may be normal for another user.
The risk application 104 provides a visual analysis interface to assist in investigation, interpretation, and risk monitoring. As described in detail herein, components of the risk application 104 display a detailed view of online account activity from a customer session using fine-grained risk scoring. The interactive configuration of risk application 104 enables any employee involved in fraud prevention, including fraud analysts, IT security personnel, risk management analysts, online channel analysts, or even customer-facing employees. The functions of the risk application 104 include, but are not limited to, alarm management, investigation and forensics, process management, and performance measurement, which are described in detail below.
The alarm management functions of the risk application 104 include a highly accurate risk score alarm that uses an adjustable threshold to highlight only the most suspicious activity, isolating compromised accounts. High fidelity scoring allows a fraud team to optimize its time and effort by ensuring correct survey priorities. This intuitive, actionable information focuses on anti-fraud efforts.
The investigation and forensics functionality of the risk application 104 provides visualization tools to scrutinize suspicious events using sophisticated investigation tools. The application returns session specific context and detailed customer history to assist in the survey. It detects collaborative attacks, related activities between accounts. Other business operations may additionally support detailed account history and customer activity to facilitate risk assessment for transactions that are offline.
The process management functions of the risk application 104 include case management tools that enable investigators to track any events, manage related workflows, and analyze fraud case history on an individual or aggregate basis.
The performance measurement function of the risk application 104 measures and reports on the effectiveness of fraud control over time, increasing the risk management organization's knowledge of the risk level. Metrics track risk trends, aggregate analysis across accounts, and use auditable results to help compliance indications.
The FPS of an embodiment is operable to prevent one or more of online fraud, offline fraud, and multi-channel fraud. As an example, fig. 2A and 2B illustrate block diagrams of an FPS integrated with an online banking application, according to an embodiment. In this example, the risk engine 202 is coupled to the online banking application 210 using a real-time Application Programming Interface (API)212 and/or one or more applications (e.g., authentication, risk assessment, fraud detection and alerting, surveys, compliance reports, performance measurements, etc.) applicable to the configuration of the risk engine 202 and/or the online banking application 210. The FPS may be integrated with the online application 210 by feeding event information in real time or by processing log files containing event information. As described above, risk application 204 (labeled as fraud application 204 in this example) functions to perform one or more of alarm management, investigation and forensics, process management, and performance measurement, to name a few.
In this example, a user or "customer" 220 logs into the online banking system 210 and uses the online banking system 210 to perform an event (e.g., check account balance, view a verification image, transfer funds, etc.) in his/her account. As described herein, the FPS includes a risk engine 202 coupled to a risk application 204. The risk engine 202 is a real-time event handler for receiving data of a user event or a set of events. The risk engine 202 also stores a user account model for a particular user. The risk engine 202 uses the event data and the user account model to calculate a risk score. The risk engine 202 updates the user account model using the risk scores and details of the observed events and stores the updated user account model for use in evaluating the user's subsequent next set of event data (of the session). The risk engine 202 also transmits the risk score to the online banking application 210. The risk application 204 also provides alerts and enables authenticated personnel to perform associations, reports, and surveys using the event data.
Regardless of the physical system configuration, the FPS functions to detect and prevent fraud using a behavior-based model corresponding to the behavior of a particular user. As one example, fig. 3 is a flow diagram of a method 300 for predicting expected behavior using FPS, in accordance with an embodiment. The operation begins by dynamically generating 302 a causal model corresponding to a user. The components of the causal model are estimated 304 using event parameters of a first set of events conducted by the user in the user's account. A causal model is used to predict 306 the expected behavior of the user during the second set of events.
The FPS is configured and operative to prevent online fraud, offline fraud, and multi-channel fraud. More specifically, online fraud and offline fraud include account takeover fraud, which is the case: someone steals the user's or account owner's account access credentials (username, password, PIN code, etc.) and then impersonates the user and accesses the account. Multi-channel fraud includes all channels through which a user interacts with his/her bank or accesses a bank account (e.g., ATM, customer service center, on-site branch access, etc.). An example of multi-channel fraud is the following: someone steals account access credentials, accesses the account online and changes profile information or obtains information about the account owner (e.g., account balance, account number, signature from a verification image, etc.), and then uses the information obtained via account access to frauds via other channels (verification fraud by forging a signature). This is an example where financial fraud occurs offline, but it begins with a fraudster using stolen access credentials online to access a user's account.
An event as used herein includes an online event, an offline event, and/or a multi-channel event. Thus, the first set of events includes at least one of online events, offline events, and multi-channel events. The second set of events includes at least one of online events, offline events, and multi-channel events. An online event is an event that can be made via electronic access to an account.
For online events, the online event includes one or more of a login event and an activity event. A set of events includes a session, and a session is a series of related events. The series of related online events includes a session login event and a termination event, and may include one or more activity events.
For offline events, the offline events include one or more of account access events and activity events. A set of events includes a session, and a session is a series of related events. The series of related online events includes an account access event and a termination event, and may include one or more activity events.
The multi-channel events include online events and offline events. Thus, a multi-channel event includes one or more of a login event, an account access event, and an activity event.
As another example of an FPS operation, fig. 4 is a flow diagram of a method 400 for predicting expected behavior of an account owner using an FPS, according to an embodiment. The operation begins by receiving 402 an observation corresponding to a first event. The first event of an embodiment includes an action taken in the account during electronic access to the account. A probabilistic relationship between the observations and the derived behavioral parameters of the account owner is generated 404. Operation continues with generating 406 an account model that includes the probabilistic relationship, and using the account model to estimate 408 an action of the owner during the second event.
As yet another example of an FPS operation, fig. 5 is a flow diagram of a method 500 for using an FPS to determine a relative likelihood that a future event is performed by a user versus a future event by a fraudster, according to an embodiment. The operation begins by automatically generating 502 a causal model corresponding to a user. Generating the causal model includes: the components of the causal model are estimated using event parameters of previous events by the user in the user's account. Operation continues to predicting the user's expected behavior during the next event in the account using a causal model 504. Predicting the expected behavior of the user includes generating an expected event parameter for a next event. Operation continues to generate fraud event parameters 506 using the predictive fraud model. Generating the fraud event parameter assumes that a fraudster, who is anyone other than the user, is doing the next event. Operation continues to generate a risk score for the next event 508 using the expected event parameters and the fraudulent event parameters. The risk score indicates a relative likelihood that the future event was performed by the user versus the future event by the fraudster.
Fig. 6 is a flow diagram of an alert 600 for generating a possible fraudulent activity using an FPS according to an embodiment. The operation begins by generating a predictive user model 602 corresponding to a user. Predictive user model 602 includes a number of probability distributions that represent event parameters observed during a first event in a user's account. The predicted event parameters 604 are generated using the predictive user model 602. The predicted event parameters 604 are expected to be observed during a second event 624 in the account, where the second event temporally follows the first event. Generating the predicted event parameters 604 includes: assuming that the user is conducting a second set of online events, a first set of predicted probability distributions representing the parameters of the predicted events is generated.
A second set of predicted probability distributions is generated using the predictive fraud model 612. The second set of predicted probability distributions represents the expected fraud event parameters 614 and assumes that the fraudster is conducting a second set of online events, where the fraudster is anyone other than the user. A comparison 634 is made between the actual event parameters of the second event 624 and the predicted event parameters 604 and 614 during the second event and an alert 606 is generated when the actual event parameters 624 appear to have been created by a person other than the user. Alert 606 includes generating a risk score using information of predicted event parameters 604, although embodiments are not so limited. The user model 602 is updated 644 using the information of the event parameters of the second event 624.
As described above, conventional fraud detection is based on predefined rules, recognized fraud patterns, or noting known fraud and using supervised learning techniques to deal with the known fraud. In online fraud, for example, conventional fraud detection is inefficient because online fraud is very dynamic and the technological developments used to conduct fraud are very dynamic and changing. Moreover, activities associated with online fraud often appear suspect (e.g., viewing account information, verifying images, etc.). This makes it very difficult to conceive of rules for detecting fraud, as fraud can be very elusive and changing.
Rather than attempting to accurately determine how fraud appears, or attempting to accurately model fraud and then compare the model to normal (average) users, embodiments of the FPS described herein instead analyze each individual user and the exact behavior of that user. This is more efficient because the behavior of each user is a very small subset of the behavior that is included in the modeling of the average behavior of many different users. Thus, specific online banking activities or behaviors commonly observed in a single user (e.g., logging in from Palo alto, Calif.; using a specific computer login, logging in using a specific Internet Service Provider (ISP), performing the same type of activity (e.g., checking account balances, viewing verification images, etc.)) can be used to build a user online behavior model that is very specific and unique to each specific user. This makes it easier to detect fraud, because the fraudster does not know how the user is behaving online, so it is very difficult for the fraudster to look like the account owner. In particular, it is clear that a behavior that may be normal for an "average" user may be extremely abnormal for a particular user. Equally important, it may even be considered that "abnormal" behavior for an "average" user may be very normal for a particular individual. Thus, these situations are very distinctive and useful in distinguishing legitimate activities from fraudulent activities.
The FPS uses the predictive model for each individual user to detect online fraud. This real-time or dynamic predictive modeling, also referred to herein as dynamic account modeling, is an application that runs on or under the direction of the risk engine of an embodiment. Using this approach, the exact behavior of the fraudster becomes less important because the analysis is more focused on the types of things the user typically does, rather than detecting specific known patterns of fraud. Unlike systems in which fraud data for previous fraudulent activities is used to train the system or generate rules, the FPS does not require rules or training. Thus, since the FPS is based on the online behavior of the user, the FPS can detect this new type of fraud, even though the new fraud may not have been seen before. This results in a high detection rate and a low false alarm rate.
Generally, FPSs use two types of models in preventing fraud. The FPS models the behavior of a particular user through a Predictive User Model (PUM) that computes the probability of an event being observed given the particular user. The FPS models the behavior of a fraudster through a Predictive Fraud Model (PFM) that calculates the probability of an event observed by a given fraudster. The probabilities are then used to calculate a risk score for the next occurrence of the event to which the probabilities correspond.
The following two assumptions for each event are used to support the model of FPS described herein: the first assumption assumes that the observed event was performed by a real user associated with a particular account, and the second assumption assumes that the observed event was performed by a fraudster. Events include, for example, account logins and/or any specific activity performed in an account at the time of login to the account. Each event includes a set of parameters including, but not limited to, the IP address and identification data of the computer used during the event, and the like.
Under a first assumption, the FPS generates and maintains a PUM (a causal model specific to each user), and then uses the PUM to predict the expected actions of the individual user to which the model corresponds. The FPS generates the PUM for the user by estimating the probability function for the user based on previous user activities and also the normal expectation of how the user behaves. When prior activity information for a user is not available, the FPS begins with a generic "normal" user activity model. When activity data of a user is collected from events or activities performed by the user, parameters of a user model are estimated over time based on collected observations of the user, so that at any point in time, an accurate PUM of the user can be obtained. Thus, PUM evolves recursively over time. When a user event occurs, the user event is scored and this provides a risk score for the event. The user model is then updated with the event parameters, and the updated user model is used to determine a risk score for the next subsequent user event.
The PUM is constructed based on observed user behavior and statistical analysis of general users. The structure of the PUM is formulated in advance so that the structure of the model does not need to be found, but unknown parameters of the model are estimated. PUM development uses a causal model, represented or formulated in embodiments as a bayesian network, that relates real-world derived parameters (e.g., location of user (country, state, city), type of computer being used for event, activity detected during online session) (probabilities) to observable parameters of the session (e.g., IP address, HTTP header information, page view, etc.). IP addresses provide location information (such as country, state, city), network segment (network block), and estimates of internet service providers. The HTTP header provides information of an Operating System (OS), a user agent string, a source (referrer) string, and a browser type of a computer for an event. Thus, the probability distribution of observable parameters of the user's events and sessions can be used to model the behavior of each user. The bayesian network is decomposed into individual parameters and relationships between the parameters. The distributions and conditional distributions are based on priors, observed data, "new mode" probabilistic models, etc.
The user is associated with actual observable parameters (including time, IP address, browser, OS, etc.) corresponding to the event. The FPS predicts future behavior using a causal model based on observed user behavior. Thus, a PUM is a structure formed by the real-world parameters used or selected, the observed event parameters, and the relationship between the real-world parameters and the observed event parameters.
The use of causal models for specific users allows the FPS to detect fraudulent activities and events without specific known rules, patterns, and/or indicators and without the need for training data for known fraud cases. Thus, the FPS can detect all fraud, both known and unknown, including fraudulent activity never before seen.
PFMs are generated under a second assumption of an embodiment. PFMs typically use all other session or event data for not all other online account holders of a user. The probability of the user is generated globally using this data. These probabilities can then be adjusted using known information of prolific fraudsters (e.g., fraud is ten times higher from nigeria than from other (low risk) countries), but this is not required. This is different from traditional fraud systems that rely on information about fraud by using new and/or other rules, indicators, or patterns. In contrast, the FPS generally uses online activity to develop PFM, represents a causal model of fraudsters (each not a specific account owner), and then adjusts the probability or expectation of PFM based on how the fraudsters act. Thus, the FPS is unique in how it incorporates information about fraudulent activities.
As described above, the model of an embodiment includes a PUM that is a joint probability distribution. PUM is a causal model. The net effect or outcome of a PUM is the probability of a particular user, observed parameter or event, to which a given PUM corresponds. Thus, a PUM is a predicted probability distribution of event parameters for a next event for a particular user to which the given PUM corresponds.
As described above, the FPS model also includes PFMs that are joint probability distributions. PFM is also a causal model. The net effect of PFM is the probability of a given fraudster, observed parameter or event. Thus, the PFM is a predicted probability distribution of event parameters for the next event given fraud.
The results of the PUM and PFM are used to calculate a risk score for the next event. The next event is an event or action that is performed in the user's account that appears to have been initiated or performed by the account owner. The risk score for the next event is determined or calculated by taking the probability of a given fraud observed event as determined using PFM and dividing it by the probability of the event observed given a particular user as determined using PUM. The risk score may be used to generate an alert or warning for the next event.
The FPS uses recursive model construction to generate the PUM. The PUM does not represent comprehensive details of each event that was seen in the user's account, but rather, it includes an individual probability distribution for each of a plurality of specific parameters of one or more observed events. Each probability distribution of an observed parameter is a statistical distribution for that parameter across the observed events corresponding to the account. The individual probability distributions of the parameters are combined to form a joint probability distribution as a PUM.
Generally, the PUM is generated by collecting event data in the form of observed parameters, and after each event, the PUM of the user to which the event corresponds is updated based on the observed parameters. The PUM then allows propagation (propagation) of the distribution of observed event parameters into the distribution of behavioral event parameters, wherein the propagation includes the distribution of observed parameters plus a prior model.
An example of model use begins with someone (a user or a fraudster) initiating an observed event. For example, the observed events include someone logging into the user's account and/or any activity taken during the online session (e.g., checking account balances, transferring funds between accounts, viewing account information, etc.). The observed event may or may not be an online event. Each event includes or corresponds to one or more event parameters. The event parameter is a directly observable parameter of the event or is raw data of the event that can be measured or observed. Examples of event parameters include, but are not limited to, network information including parameters of the network (e.g., IP address, etc.) through which the online event occurred (country, state, city are derived parameters derived from network information; this is implicit information as opposed to actual observed event data), user agent strings (the OS and browser of the device or computer for the event are derived parameters derived from user agent strings; this is implicit information as opposed to actual observed event data), and event or session times (timestamps), to name a few.
The models of embodiments (e.g., PUM and PFM) are used to model the behavior of a given user during a past event to predict actual observed event parameters for the next event. Derived parameters that cannot be directly observed are then derived or propagated from the PUM and observable parameters. Examples of derived parameters include, but are not limited to, the user's geographic location (e.g., country, state, city, etc.) at the time of the event, the device used for the event (e.g., device type/model, device OS, device browser, software application, etc.), the Internet Service Provider (ISP), and the user's local time of day of the event, etc. The causal model of an embodiment includes a probabilistic relationship between derived parameters and event (observable) parameters, as well as a probabilistic relationship between different derived parameters. An example of a relationship between parameters may be that the user's country (event parameters) may be associated with an ISP (derived parameters) and that the ISP may be associated with a particular set of IP addresses (event parameters).
The causal model in an embodiment is represented as a Bayesian Network (BN). The BN in an embodiment uses or comprises a conditional probability distribution modeling or representing the relationship between parameters (relationship between different derived parameters, relationship between event parameters and derived parameters, etc.). The BN as implemented in the PUM is or represents the distribution of derived parameters, the distribution of observed parameters and the relationship between the observed and derived parameters. The result output from the PUM is a predicted distribution of expected event parameters for the next event. A risk score is calculated using the distribution of expected event parameters. The PUM is generated as described below.
The PUM is used to predict event parameters for the next event. The predicted event parameters include a predicted probability distribution of the event parameters that may be observed during the next event. Thus, the PUM generates a predicted distribution of event parameters for the next event. Then, the next event is observed and information of the observed event parameters is collected or received. Given the observed event parameter values (e.g., actual IP addresses), and the predicted probability distribution of all possible IP addresses that may be used (probability of actual IP address of a given user from a PUM), the result is a probability of a particular observed event parameter (e.g., IP address) for a given PUM. This is performed for all parameters.
Thus, the causal model in an embodiment generates a likelihood that an observed parameter value is observed given a current PUM (i.e., a predicted distribution defined by the PUM), and generates a likelihood that an observed parameter value is observed given a current PFM (i.e., a predicted distribution defined by the PFM). These results are then used to calculate a risk score, as described above.
As described herein, a PUM is generated by collecting event data in the form of observed parameters, and after each event, the PUM of the user to which the event corresponds is updated based on the observed parameters. The PUM then allows propagation of the distribution of observed events into the distribution of behavioral events, wherein the propagation includes the distribution of observed parameters plus a prior model.
The update process updates the distribution of one or more observed parameters in the PUM to produce an updated PUM. Thus, the updated PUM includes an updated expectation of one or more observed parameters in the form of an updated probability distribution associated with a particular observed parameter. As an example, since a particular parameter used by the user during the event has been observed (e.g., IP address (observed) in the united states (location, derived parameters)), this information is propagated back into the PUM to update the corresponding distribution, so that during the next subsequent event, there is a higher expectation that the same or similar parameter (IP address of US) will be seen in the next event.
The model is periodically updated with actual observed event parameters since the last time the model was updated. The joint probability distribution of an embodiment is updated by updating the probability distribution for each observed parameter included in the model. The model update process of an embodiment is recursive and takes into account last observed events, previous user models (i.e., PUM), and prior user models, among others. The previous user model includes the PUM that was current for the last or most recently observed event. The prior user model includes a predicted probability distribution (i.e., PUM) before any event has been observed.
The model update process includes two alternatives. In a first embodiment of the update process, the previous user model is updated using data of the current observed event, and the prior user model is considered embedded in the previous user model, updated as part of a recursive process that updates the prior user model in response to each occurrence of the observed event.
In a second embodiment of the update process, the update process maintains an observation frequency distribution for each observed event parameter. As a result, instead of updating the previous user model, each event parameter probability distribution is updated with data of currently observed events. The updated observed frequency distribution for each event parameter is integrated with the prior user model to generate an updated PUM.
The probability distribution included in the prior model may be initially adjusted using general statistical information about the user and/or data collected from the user or from the user's account profile for a particular user, in general, prior to receiving any observed event data for the user. For example, a uniform probability distribution can be used to adjust the probability distribution. Probability data corresponding to residence information of users (e.g., residents of the united states, and 1% of the residents of the united states use a particular segment of the IP address) may also be used to adjust the probability distribution. Further, the probability distribution may be adjusted using the financial institution data of the user (e.g., the user is an XYZ bank customer, and 95% of the XYZ bank customers are in the united states).
The fraud model (i.e., PFM) in an embodiment is similar to a PUM in that it is based on the observed and derived distributions of parameters for events. This is in contrast to conventional systems that are rule-based and use specific indicators (rules) related to fraud. However, rules may be weighted, which is not a probability distribution, so these systems have nothing in common with the embodiments described herein.
Fig. 7 illustrates the difficulties and limitations of using traditional fraud techniques 702 (fraud knowledge 702) applied to the activities of a user 704 (normal user 704) according to the prior art. As described above, these conventional techniques may detect some known fraud events 710 and 712, but may cause the true fraud event 720 to go undetected, generating many false positives for events 730 and 732 that are not fraudulent activities. In contrast, FIG. 8 illustrates the use of dynamic account modeling 701 applied to user activities, according to an embodiment. Dynamic account modeling 701 applies a user-specific predictive model 701 to event activities of a user account and, in doing so, detects previously hidden fraud 720 and reduces false alarms for events 730 and 732 that are not fraudulent activities.
The FPS of an embodiment includes a graphical interface for a user account showing account activity and corresponding parameter data. The graphical interface is also referred to herein as an Analysis User Interface (AUI). The AUI displays at least one of a risk score and an event parameter for any event in the account, to name a few. The AUI includes a horizontal axis representing time and a vertical axis representing event parameters. As described above, the event parameters include one or more of Internet Protocol (IP) data and hypertext transfer protocol (HTTP) data. The IP data includes one or more of an IP address, an IP address country, an IP address city, an IP network segment, and an internet service provider supporting the event. The HTTP data includes one or more of an operating system, a user agent string, a source string, and data of an internet browser of a computer for an event.
The AUI includes a plurality of columns, and each column represents at least one event conducted in the account. The columns in the embodiments are arranged according to date. The AUI also includes a plurality of rows, and a set of rows represents event parameters for an event. The AUI includes a plurality of intersection regions in view of the rows and columns, and each intersection region is defined by the intersection of a row and a column. The intersection region corresponds to an event parameter of at least one event. Further, the intersection region includes a color coding that correlates the event parameter with a corresponding probability of the account model. The color coding represents the relative likelihood ratio of the event parameter corresponding to the user.
The AUI also includes a risk line indicating the risk of the event. Each intersection region defined by the intersection of a risk row and a column corresponds to a risk score for at least one event corresponding to the column. The intersection region includes a color coding that correlates the risk score with the at least one event. The color coding represents the relative likelihood ratio that the user performed the event.
Fig. 9 is an example screen 800 of an AUI according to an embodiment. One type of AUI screen includes one or more information portions 802-804 and a chart portion 806. Graph portion 806 of the AUI includes a horizontal axis 810 and a vertical axis 812. Horizontal axis 810 represents time (e.g., date). The horizontal or time axis 810 may be modeled as weekdays and weekends and subdivide each day by, for example, morning, afternoon, evening, although the present embodiment is not limited thereto. The vertical axis 812 of the AUI represents the category of the parameter (e.g., country, city, state, internet service provider, network, IP type, etc.) and all the different parameter values historically observed for the user's activities by category. Each column 820 of the AUI represents user login events or user sessions organized by date. The AUI includes a color-coded bar 870 within the display area, and the color-coded bar is the entire risk column for the user to which the display corresponds.
The AUI displays a color code (e.g., red 830, yellow 832, green 834, etc.) that represents a threshold value corresponding to the component risk score for each parameter of the event. As described above, the FPS models behavior based on the fact that: when more data is received that binds a particular user to a particular parameter value (e.g., 98% of JaneDoe's entries in the United states), it determines the probability that the particular parameter is different for the particular user (e.g., what the probability that JaneDoe is logged in from Mexico is). As more event data is collected from the user, the predicted probability distribution of the model parameters becomes tighter or narrower, and the color displayed on the AUI is correlated with each parameter of the event and the relative model probability (fraud versus user) corresponding to that parameter.
For example, for event 840, the parameters of country (us 841), city, state (vienna 842), provider (AOL843), and IP type (proxy 844) may be coded in green to show a high probability of an account owner initiating an event under dynamic account modeling. Conversely, for event 840, the country (germany 851) and city, state (frankfurt 852) parameters may be coded in red for events to show a low probability of the account owner initiating an event under dynamic account modeling, while the provider (AOL843) and IP type (proxy server 844) parameters may be coded in green for the same event to show a high probability of the account owner initiating an event under dynamic account modeling.
The information portion 802-804 of the AUI may be used to display various parameters or data suitable for the FPS and any integrated applications. For example, the AUI may display underlined parameter values 860 with an underlined color (e.g., red, yellow, green, etc.) that correlates to a risk amount associated with a particular parameter (e.g., virginia and vienna have red underlining to indicate a high probability of fraudster activity).
The adaptive properties of the FPS model are particularly useful in situations where, for example, a user may frequently travel and thus parameters change frequently. The FPS dynamically adapts to the behavior so that the behavior is not always marked as fraudulent, as would occur under conventional rule-based systems. Thus, the model adapts over time using data that indicates that a particular behavior has been observed from a user (e.g., a user logged in from denver), so the probability is that the same behavior will be observed from the same user in the future (e.g., a user logged in from denver in a subsequent event).
Fig. 10 illustrates a variation of an example screen (fig. 9) of the AUI according to an embodiment. Referring to this example screen, information for all related activity events from the same online session is shown on a timeline within the same column 1001 representing the session. Summary information about the type of activity occurring in each session is represented by a color-coded bar 1002. The color (red, yellow or green) represents the associated risk of this type of activity for this particular session. Detailed information about each activity within the selected session may also be shown within one or more information boxes or regions 1003 of the AUI on the same screen.
If the FPS shows suspicious fraudulent activity, the risk application allows the analysts to perform fraud matching. Fraud matching in embodiments allows analysts to search for other sessions (e.g., mexico-originated sessions, sessions with provider AOL, etc.) on all institution accounts with similar characteristics in an attempt to identify other fraud cases.
FPS fraud matching enables comparisons between data of one session and all other data of the institution to identify all sessions with one or more similar parameters. Thus, the organization may use a fraud matching function to identify other suspicious sessions with parameters (e.g., ISP, country, machine, etc.) similar or identical to the suspicious fraud attack.
Thus, the FPS can provide a risk assessment based on the overall activity of all users within the facility over a particular period of time (e.g., days, weeks, etc.) to help the facility determine whether to be attacked. This is the fundamental difference of the FPS when compared to conventional systems, because the FPS takes a risk management approach that attempts and blocks all fraud when compared to the approach of conventional systems.
All of the features of the FPS work together to allow the financial institution to, for example, learn about fraud rather than attempting to make a perfect binary decision (which is useless) as to whether to block transactions that are fraudulent. The FPS recognizes that it is important to know about fraud so that observable parameters (related to or translated into derived parameters) can be used to identify fraud early and to minimize losses compared to attempting to prevent any suspicious activity (which would only result in customer dissatisfaction and inconvenience if not completed perfectly when non-fraudulent transactions are marked as fraudulent according to the rule-based traditional approach). From a risk management perspective, the fraud matching application allows the institution to view all data collected over time according to one or a defined set of criteria to anticipate the overall percentage of fraudulent activity associated with the criteria. This allows for more informed decisions to be made, for example, because it is known that a high percentage of traffic using an ISP is not fraudulent, decisions that block all traffic from the ISP based on high occurrences of fraudulent activity in recent time periods may be prevented.
The FPS components described herein (e.g., risk engine, risk application, dynamic account model, etc.) can be components of a single system, multiple systems, and/or geographically separated systems. The FPS component can also be a subcomponent or subsystem of a single system, multiple systems, and/or geographically separate systems. The FPS component can be coupled to a host system or one or more other components of a system coupled to the host system (not shown).
The FPS in an embodiment includes and/or operates under and/or in association with a processing system. As is known in the art, a processing system includes any collection of processor-based or computing devices, or any collection of components of a processing system or device, operating together. For example, the processing system may include one or more of a portable computer, a portable communication device operating in a communication network, and/or a network server. The portable computer may be any one of a number and/or combination of devices selected from personal computers and other processor-based devices, but is not so limited. The processing system may include components within a larger computer system.
The processing system in an embodiment includes at least one processor and at least one storage device or subsystem. The processing system may also include or be coupled to at least one database. Generally, the term "processor" as used herein refers to any logical processing unit, such as one or more Central Processing Units (CPUs), Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), and the like. The processor and memory may be monolithically integrated onto a single chip, distributed among multiple chips or components of the FPS, and/or provided by some combination of algorithms. The FPS methods described herein can be implemented in any combination of one or more of software algorithms, programs, firmware, hardware, components, circuits.
The FPS components can be positioned together or in separate locations. The communication path couples the FPS components and includes any medium for transferring or transferring files between the components. The communication path includes a wireless connection, a wired connection, and a hybrid wireless/wired connection. The communication path also includes a coupling or connection to a network, including a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a proprietary network, a local or backend network, and the internet. In addition, the communications path includes removable fixed media such as floppy disks, hard drives, and CD-ROM disks, as well as flash RAM, Universal Serial BUS (BUS) connections, RS-232 connections, telephone lines, buses, and email messages.
An example of fraud analysis generated by the FPS using actual data of the account owner of the financial institution will be described next. This example is presented merely to aid in describing the operation of the FPS and is not intended to limit embodiments of the FPS to the scope of these examples only.
Fraud analysis example
Fig. 11 is an example AUI showing normal usage behavior of a user, under an embodiment. This is a frequent user and he/she logs in several times a week. The normal behavior of the user includes two normal modes: (1) access from san francisco bay area using SBC/PacBell with a single machine; and (2) occasional access from an agency called dss.
In this example, the FPS is configured to only process login attempts (i.e., the system has no information available whether the login was successful or failed, nor other activities occurring within a single online session). For readability, the AUI displays a separate user name (user _26201) generated for the account identifier string described above.
At 4/2/2007 (column adjacent flag or slider 1102), there are 2 red alerts for the user.
Fig. 12 is an example AUI showing a first red alert to account event 1202, according to an embodiment. A login attempt from network segment 70.9.83.0 using the provider "spscdns. Upon further investigation, it is believed that the network is operated by Sprint mobile broadband and that the IP address is a proxy server that can hide the user's true location (i.e., the user may not be in indiana). The attempt comes from a new os (vista) that has not been seen from the user. The login occurred at 11:57 pm at greenwich time 04/02/2007 or 06:57 pm at indiana time 04/02/2007.
FIG. 13 is an example AUI showing a second red alert for an account event 1302, according to an embodiment. The second red alarm occurs about 2 hours after the first red alarm and is an attempted login from network segment 70.9.83.0 using Comcast, a provider of Miami, Florida. In this case, the browser (Firefox) is different from any previous session from the user. The login occurred on Greenwich time Tuesday 04/403/2007 at 01:45 am or Meani time Monday 04/02/2007 at 08:45 pm.
Fig. 14 is an example AUI showing other information for account activity 1402, according to an embodiment. This activity occurs after eight hours and is a sequence of four login attempts from what appears to be the true account owner (likely login failure). It should also be noted that on day 3, 21, the user (likely a genuine user) logged in from Hilton Hotel in Phoenix; there may be no reason to link this with a fraud situation, but it is worth recording for future reference.
The FPS fraud match is used to search for other similar user sessions. Fig. 15 is an example AUI showing a fraud matching view according to an embodiment. A search is performed for other user sessions using Comcast network segment 67.191.79.0. Only the identified sessions are as follows: five sessions from the previous fraud case; a session from the fraud case; and other sessions corresponding to the first red alert.
Fig. 16 is another example AUI showing results obtained in a fraud matching view plotted against time, according to an embodiment. The ability to perform various analyses of related events provides unique insight. In this example, the schedule view allows the analyst to determine whether the relevant suspicious activity has changed over time (perhaps as a result of a widespread fraud attack)
Dynamic account modeling is described in detail below.
Risk-based hypothesis testing
A bayesian network is a representation of a known probabilistic model that represents a set of variables and their probability independence as graphs of nodes (parameters) and edges (dependencies). Bayesian hypothesis testing is one such known technique: which can determine the best decision criteria for distinguishing between two or more possible hypotheses for a given set of observation data and a known probabilistic model for each hypothesis.
The account owner (user) is the real-world person who owns the online account. In the case of ID theft, a fraudster is defined herein as anyone other than the account owner. Mathematically, two assumptions are:
·H0observed events (e.g., login events) are generated by an account holder (also referred to as a user)
·H1Observed events (e.g., login events) are generated by others (i.e., fraudsters)
If the true conditional probability and the conditional probability that the event was generated by a fraudster are known by assuming that the event was generated by a true user and observing the current event, then the best fraud/non-fraud decision statistic is the relative likelihood ratio L defined by:
using bayesian rules, equation (0.1) can be rewritten as:
( 0.2 ) - - - L ( E ) = P ( F | F ) P ( F ) P ( E | U ) P ( U )
and, optionally rewritten as:
the following applies to the above equation:
p (E | F) is a fraud model, which is the expectation of observing the parameters of event E assuming that the event was caused by a fraudster (someone other than the user)
P (E | U) is a user model, which is the expectation of observing the parameters of event E given that the event is caused by a real user
P (F) is a priori probability of fraud (also known as a priori fraud expectation), which is the a priori probability that the event will be caused by a fraudster (without knowing anything else about the event)
P (U) is the prior probability of a user (also known as prior user expectation), which is the prior probability that an event will be caused by a fraudster (without knowing anything else about the event)
The prior probability and thereafter p are constant if the events are independent of each other. When this is the case, the influence of ρ can be ignored, since an arbitrary judgment criterion with respect to l (E) can be performed (at an appropriate scale) on the judgment statistic λ (E) instead.
For example, λ (E) can be used as part of a binary decision process by introducing a threshold:
(0.4) determining fraud if λ (E) > τ
Judging if the user is lambda (E) ≦ tau
Alternatively, λ (E) may be used to rank a set of events from a high risk of fraud to a low risk of fraud.
In general, it is easier to work with log-likelihood ratios. The risk of an event is formally defined herein as:
( 0.5 ) - - - R ( E ) = ln ( λ ( E ) ) = ln ( P ( E | F ) P ( E | U ) )
then, r (E) is used as a decision statistic in the same manner as λ (E) or l (E).
Prediction model
Now, the question is how to calculate r (e). Also, more specifically, how to calculate two conditional probabilities P (E | F) and P (E | U). In this case, a series of events associated with the user's account are observed, where the k-th observed event is denoted as Ek. Further, the user's knowledge may be updated based on previous observations. Representing information about a user's previous observations as Uk-1So that P (E | U)k-1) Is shown observing a series of events E1…Ek-1The estimated user model. Thus, equations (0.3) and (0.5) can be rewritten as:
note that in this model, an event fraud model P (X)kIf) and fraud (and users) are constant, i.e. they are not based on observing the previous event E 1…Ek-1But is changed.
In practice, the conditional probabilities are expressed in terms of actual observed data of the event. In this case, the observation data is a set of parameters related to the event that the online application is able to collect at the time of the event (e.g., the client IP address of the user's browser and the user agent string). This is represented, for example, by vector Dk=[X,Y,…,Z]The represented observation parameters (i.e., observation data), where each element represents one of the observation parameters.
The definition of fraud models and user models can be expressed as:
each of these models is a predictive model for the observed parameters, one for the fraudster, and one for the user. When calculating lambda (E)k) And R (E)k) For what will be able to be in some real world situationThe ratios of these models used as advantages below are of interest.
For purposes of illustration, assume that there are two directly observable parameters:
x ═ IP address associated with HTTP session
Y-user agent string of a device for accessing an application
Then, for the observed event, D ═ x (IPAddr ═ y) is calculated as follows:
( 0.8 ) - - - λ ( E ) = P ( I P A d d r = x , U s e r A g e n t = y | F ) P ( I P A d d r = x , U s e r A g e n t = y | U )
the problem is that these probabilities are usually unknown and are generally difficult to know if not calculable in this form. Even assuming independence between the observation parameters, this presents a simple and intractable problem of computing the terms (or at least the ratios) of the resulting likelihood ratios:
( 0.9 ) - - - λ ( E ) = P ( I P A d d r = x | F ) P ( U s e r A g e n t = y | F ) P ( I P A d d r = x | U ) P ( U s e r A g e n t = y | U )
This problem is solved by decomposing the probability into a plurality of manageable components. One way to do this is to introduce the derived, real-world behavior parameters as described before as the condition parameters. For example, P (IPAddr ═ x | U) can be re-expressed as:
this method of decomposing a complex probabilistic model into a more computationally feasible network of cause-related parameters is critical to dynamic account modeling methods. Once the model has been re-formulated as a causal model, the bayesian network form allows information to be propagated through the network of related parameters. To simplify the following discussion, this will generally focus on the case of only one observation parameter X. This is extended to a full bayesian network representing the entire PUM as described herein by introducing conditional parameters and distributions.
User model
For ease of explanation, the underlying mathematics is described below for categories of parameters characterized by discrete (which may take only a defined set of values), finite base (there is a finite (perhaps unknown) set of values), and category (each value is independent of the others, i.e., there is no explicit or implicit ordering or distance between the values). Similar models may be developed for other parameter types (e.g., continuous parameters). Similarly, extension to the condition parameters is also straightforward under the teachings herein.
A number of variables are described below:
·Ukrepresenting updated user information (model) after k events have been observed
·Xk+1Is an observed parameter for event k +1, where X ∈ { X1,x2,…,xn}
AboutXk+1The predicted user model (distribution) of (c) is the following vector:
( 0.10 ) - - - P ( X k + 1 | U k ) = P ( X | U k ) = { p ( x 1 | U k ) , p ( x 2 | U k ) , ... , p ( x n | U k ) }
similarly, before any event of the user is observed, this will make the prior distribution about X:
( 0.11 ) - - - P ( X 1 | U 0 ) = P ( X | U 0 ) = { p ( x 1 | U 0 ) , p ( x 2 | U 0 ) , ... , p ( x n | U 0 ) }
synthetic priors and observations
One method for synthesizing the prior probability distribution and the observed events is to use a dirichlet distribution. Other distribution or synthesis techniques may also be used. The unknown polynomial probability distribution is estimated using a dirichlet distribution. More specifically, it extends the beta distribution to multiple dimensions and provides a smooth transition between the prior distribution and the observed distribution and allows control over how quickly the transition occurs.
Dirichlet distribution is a quadratic distribution (distribution over distribution). for example, X ∈ { X ] for each event1,x2,…,xnAnd PX={p(x1),p(x2),…,p(xm) An event parameter X, which exhibits one and only one value, may be related to PXThe dirichlet distribution of (a) is expressed as:
( 0.12 ) - - - p ( P X ) = D ( P X | P X 0 , α )
and
( 0.13 ) - - - D ( P X | P X 0 , α ) Π i ( p ( x i ) ) ( αp 0 ( x i ) - 1 )
here, ,
·p(PX) Is a scalar as the probability that the probability distribution PX is correct
·Is a priori (assumed) distribution (vector) about X, an
α is a scaling factor (in terms of the number of observations) that essentially represents the degree to which the prior distribution is believed.
That is, the scale factor controls the rate of convergence away from the prior distribution and toward the observed distribution.
After this derivation, the maximum likelihood estimate is given by
( 0.14 ) - - - P ^ X = E [ p ( x i ) | P X 0 , α , m i , k ] = αp 0 ( x i ) + m i α + k ,
Wherein m isiIs xiNumber of times observed, and k ∑jmjIs the total number of events observed.
Dirichlet can be used as an estimate of the predictive user model, so that each element p (x) of equation (0.10) can be usedi|Uk-1) Estimated as:
( 0.15 ) - - - p ^ ( x i | U k - 1 ) = α p ( x i | U 0 ) + m i α + k
the dirichlet model (equation (0.15)) can be rewritten as:
( 0.16 ) - - - p ^ ( x i | U k - 1 ) = β p ( x i | U 0 ) + ( 1 - β ) ( m i k ) ,
wherein,
β = α α + k
1 - β = k α + k
thus, for a given user, the estimated user model provides a smooth and intuitive transition between the prior distribution and the observed distribution with respect to X. The rate of convergence to the observed distribution is controlled by a parameter α in units of k (i.e., the observed event).
This is a good model for some parameter types, however, it fails to address other expectations regarding user behavior. It is worth noting that for some parameter types (e.g., location), only a few observations are desired for any given user. Also, for these parameters, the desire to see new parameter values may be based on the behavior that the user has previously observed. The desired model for combining this type is proposed in the next segment.
Modified event model (New model probability)
The modified event model allows for the expectations of a single user to be observed with only a limited set of parameter values. Furthermore, the event model recognizes that a user switching to a new (previously unobserved) parameter value is an event of interest to itself. For example, an individual user in one or perhaps a few different countries is expected, and seeing that user in a new country is an occurrence that draws attention.
The observed random variable X is considered to have all the definitions of the previous paragraph. While waiting for the (k + 1) th observation, this may characterize the possible outcome using a modification experiment based on a new random variable, where the observation was observed (for that user) if previouslyBy examining value Xk+1Thenk+1False, and if this is the first time the value is observed (for that user), thenk+1True. In other words, ·k+1This may define the new mode probability η as:
combining the new pattern event with the actual observed value can be rewritten as:
wherein, the following definitions are provided:
η is the probability of a new pattern for the user based on previously observed events. The new pattern probability η may be modeled in a number of different ways including a statistical model based on historical data
υ is the prior probability mass of the previous observation with respect to X, specifically:
andis the previous observed value xiE.g., equation (0.16).
The decision to use a new mode model (i.e., equation (0.19) or a variant thereof) or a more traditional model such as the Dirichlet model (i.e., equation (0.16)) is determined by the type of parameter being modeled if the parameter includes a strong expectation as to whether the new mode (value) should be observed, then equation (0.18) provides additional fidelity in this regard, however, if the parameter is simply modeled as an expectation of its value, then equation (0.16) provides a simpler and mode-direct way of modeling the behavior.
Trust model
The trust model accounts for the fact that events observed for the user may have been caused by a fraudster. If this is the case, the user model should not be updated with the observed information. Of course, this must be done in a probabilistic manner since the system cannot absolutely determine whether an event is caused by a user or a fraudster.
The trust model is particularly important for fraud cases that occur in multiple sessions. This helps prevent a fraudster from fooling the system (by deviating the model) through many good-looking sessions before trying more suspicious activity.
The basic idea is to consider two possible updated user models after observing an event.
1.U+Is a resulting user model that includes the effects of previous events E
2.U-Is a resulting user model that ignores the impact of previous events E
The likelihood of a subsequent event E' may then be written as:
wherein, P (U)+Is the correct | U) is essentially the probability that the event E is actually caused by the user. Define this item as the Trust T of the eventE
Combining this with equations (0.1) and (0.3) yields:
( 0.22 ) - - - ρλ ( E ) = L ( E ) = P ( F | E ) P ( U | E ) = 1 - P ( U | E ) P ( U | E ) = 1 - T E T E
rearrangement to find TE
( 0.23 ) - - - T E = 1 1 + ρλ ( E ) ρ = P ( F ) 1 - P ( F ) ≈ P ( F )
Intuitively, p (f) will always be 1, so that when the relative likelihood ratio λ (E) is 1/p (f), the trust of the event will be ≈ 1. Conversely, when λ (E) ≧ 1/P (F), the confidence of the event will be significantly reduced.
The trust of the previous event can be used in the estimation (updating) of the user model. For the dirichlet user model described in equation (0.16), cumulative confidence may be used instead of the counts (also called patterns) observed to derive each parameter value of the predictive user model. The method specifically comprises the following steps:
( 0.24 ) - - - p ^ ( x i | U k - 1 ) = β τ p ( x i | U 0 ) + ( 1 - β τ ) τ i Σ j τ j
where the a priori weight coefficients β are now calculated based on the accumulated confidence in all observations of the parametersτNamely:
( 0.25 ) - - - β τ = α α + Σ j τ j
here, the following definitions are used:
·p(xi|U0) Is that the value x is observediPrior (user) probability of
α is the Dirichlet scale factor (in observed quantities)
·τiIs where x is observed for the useriCumulative trust of events of (1):
·∑jτjis the total cumulative trust for all observations of the user about X
Refer back to T in (equation (0.23)))EIn the case where the event generally coincides with the user model (i.e., λ (E) ═ 1/p (f)), T is defined and interpretedE1, the effect of this equation is equivalent to the original dirichlet model (equation (0.15)). However, if the event has a very high risk (λ (E) ≧ 1/P (F)), the resulting TEMay be well below 1 and may have a correspondingly reduced impact on the resulting updated user model. Also, by using similar alternatives, the trust score can be used in the new mode model of equation (0.18).
Time attenuation model
The derivation of the user model so far does not take into account the elapsed time, and more specifically, the user may change the behavior over time, so that the behavior observed long ago does not reflect the current desired behavior. This problem is solved by introducing a time decay model for the user model.
The basic idea behind the time decay model is that the correlation of the observed events decreases over time. The exponential decay function forms a computationally attractive basis for the model. Using an exponential decay function, the relative weight of each event decays according to the following function:
( 0.26 ) - - - ω ( t , t E v e n t ) = e - t - t E v e n t λ
The following applies to this function:
t is the current time (or any time after the event is observed)
·tEventIs the time at which an event is observed
λ is the attenuation parameter of the model (same as t units)
The weighting function may be applied recursively from one point in time to another. In particular for at two future points in time t2>t1>tEvent
( 0.27 ) - - - ω ( t 2 , t E v e n t ) = e - ( t 2 - t E v e n t λ ) = e - ( ( t 2 - t 1 ) + ( t 1 - t E v e n t ) λ ) = e - ( t 2 - t 1 λ ) e - ( t 1 - t E v e n t λ ) = ω ( t 2 , t 1 ) ω ( t 1 , t E v e n t )
With this background, the time decay is now describedAnd (6) subtracting the model. For parameter value xi∈ X, mixing Mi(t) is defined as the cumulative observed mass. The cumulative observation quality may be based on the event count (i.e., base weight of each event is 1), the confidence of the event (base weight of time is T)E) Or some other metric that weights each observed event. However, as defined, the cumulative observed mass may also vary over time.
Using an exponential decay function, given a particular exponential time constant, a particular form of the cumulative observed mass for a given time t is defined as:
( 0.28 ) - - - M λ , i ( t ) = M λ , i L a s t e - ( t - t i L a s t ) λ
the following applies to the cumulative observed masses:
·is immediately afterObserve xiFor a value x after the last event ofiAccumulated quality of observation of
·Is that x is observediTime stamp of the most recent event. Will be provided withIs stored as part of the user model (each x) iWith its own)
T is the current time and is typically set by the time the next event is evaluated
λ is an exponential time constant and is a static parameter of the model
Recursively computingAndas part of the user model update process. In particular, whenever inclusion of the value x is observediAll using the following equation to update the user model
( 0.29 ) - - - M λ , i Last | k = m i E k + M λ , i Last | k - 1 e - ( t Event - t i Last | k - 1 ) λ t i Last | k = t Event
Wherein:
·is immediately adjacent to the current event k (where x is observed)i) For the value xiNew (updated) accumulated quality of observation
·Is for x before the most recent event is observediAccumulated quality of observation of
·Is based on the current (single) event k for xiIncreasing quality of observation of
. If the quality of observation is based on the observed counts, then
. If the quality of observation is based on event trust, then
·tEventIs the timestamp of the most recent event k (where x is observed)i)
·Is based on the event k for the value xiNew (updated) last observation time of
·Is for the value x before the most recent eventiLast time ofObservation time
If this is the first observation of xi(for that user), then the initial update is reduced to:
( 0.30 ) - - - M λ , i Last | k = M i E k t i Last | k = t Event
the evaluation of events is exactly the same as the processing of the time decay model, except that in calculating the risk score for an event, the cumulative observation mass M is usedλ,i(t) rather than observed counts or accumulated trust. In particular, the amount of the solvent to be used,
If the event count is used asUsing M in equation (0.16)λ,i(t)To replace mi. In addition, use is made of(which totals all previous observations xiThe cumulative observed mass) to compute k (which is now real-valued).
Use of M in equation (0.24)λ,i(t) instead of τiOr if event trust is used asThe basis of (1). Similarly, totals are now usedTo replaceAnd completing normalization.
More complex decay models, such as a weighted average of a plurality of exponential decays, may be used.
Fraud impersonation model
The above formula assumes that the fraudster acts independently of the user, i.e., the fraudster does not know anything about the overall user or about a particular user, and/or even if the fraudster does, the fraudster is unable or chooses to do anything different because of the knowledge. As fraudsters become more complex, this assumption is no longer valid and may affect the performance of the algorithm.
The counterfeit model solves this problem. Two related but different cases can be considered:
1. the fraudster already knows the overall user (perhaps for a particular target bank). In essence, a fraudster may be able to use this knowledge to guess what a typical user would do. For example, a fraudster attacking an american bank may safely assume that most users will access an online application from the united states, so the fraudster may use the us proxy server to hide the fraudster's location and perhaps more importantly look like a normal user. Of course, this is more relevant for some parameters (e.g., countries) but not for others, as fraudsters may not be able to adequately guess what the user will use (e.g., in the case of user agent strings) and/or have difficulty mimicking their behavior (e.g., from the exact same network segment).
2. Fraudsters have been able to learn something about a particular user (perhaps by collecting data from a phishing website or by installing malware on the user's machine). Also, based on this information, the fraudster may change the attack profile to look like the particular user. This creates more opportunities and more complex attack profiles. Still, this is more relevant for some parameters than others. For example, it is easier to look like a particular user agent string, but it is much harder to use just the same network segment (which would require complex malware on the user's machine).
Both cases are based on the same basic model, however, the model is applied at different times: 1) the guessing ability is handled by adjusting the parameter priors for the fraudster while 2) dynamically handling the ability to proactively impersonate a particular user.
For the case where a fraudster can guess the overall user behavior, the parameter priors in the fraud model can be adjusted to account for this probability. In particular, this defines the probability that a fraudster can guess the user's behavior for each parameter in the model:
(0.31)PGuessprobability of parameter value guessing by fraudster
Essentially, this illustrates that the fraudster has a probability PGuessThe prior probability (for a particular parameter) of the overall user (for a particular target bank and/or application) is known. This can be easily taken into account within the model by modifying the fraud parameter a priori for the parameter under consideration. This is done using the following formula:
( 032 ) - - - P ( X | F ^ 0 ) = P G u e s s P ( X | U 0 ) + ( 1 - P G u e s s ) P ( X | F 0 )
the modified fraud parameter prior is used instead of the original fraud parameter prior. In practice, this is done offline, and the risk engine simply uses the modified fraud parameter prior value.
A more interesting and challenging situation is when a fraudster is actually able to observe the user and then mimic the behavior (or at least the observed parameters). In this case, the counterfeit model must consider the following effects: the probability that a fraudster will attempt to mimic a particular observation parameter; the probability that a fraudster can observe (or learn about) a particular behavior (observed parameter) of a particular user (e.g., the fraudster can observe the actual IP address or user agent string that the user would have when accessing an online application); a fraudster can mimic the probability of a particular parameter value being observed for a user. For any particular parameter, this models the probability of a combination of these conditions by a single, statistically defined parameter:
(0.33)PlmpProbability of a fraudster successfully mimicking a parameter value
Then, at any point in time, the resulting fraud model is a probabilistic combination of the original fraud model (which is just a priori) and the fake user model.
(0.34)P(Xk|Fk-1)=PlmpP(Xk|Uk-1)+(1-Plmp)P(Xk|F0)
This model can be used directly in calculating the likelihood ratio and risk of an event (see equation (0.6)):
( 0.35 ) - - - λ I m p ( X k ) = P Im p P ( X k | U k - 1 ) + ( 1 - P Im p ) P ( X k | F 0 ) P ( X k | U k - 1 ) = P Im p + ( 1 - P Im p ) P ( X k | F 0 ) P ( X k | U k - 1 ) = P Im p + ( 1 - P Im p ) λ ( X k )
therefore, the temperature of the molten metal is controlled,
(0.36)R(Xk)=ln(Plmp+(1-Plmp)λ(Xk))
looking at the limit, if Plmp1, then at the original fraud likelihood ratio λ (X)k) With > 1 (i.e., original risk > 0), the resulting likelihood ratios and risks are generally unaffected. However, if λ (X)k) < 1 (i.e., the original risk is a relatively large negative number), then P is includedlmpEffectively setting a lower bound on risk:
(0.37)R(Xk)≥ln(Plmp)
intuitively, this makes sense because it is essential that if a fraudster might impersonate the user's observed parameters, this will limit the amount of trust that is placed in observing the parameter values that are normally expected to be seen from the user. In practice, when the user model includes many parameters and P is defined based on the properties of each parameterlmpThis becomes useful. For example, using a proxy server to allow a fraudster to impersonate a user's country is easier than impersonating the user's correct city.
Furthermore, while a full model expressed in equation (0.34) may be used, a simplified model that simply sets the minimum risk according to equation (0.37) may also be used and would provide a large number of identical values (i.e., the amount of confidence that the total risk score would have been generated by observing one of the desired parameters by the constraint). Thus, if the underlying parameter is also conditional, P will be lmpInterpreted as conditional probabilities.
Fraud co-occurrence model
Fraud co-occurrence models attempting to simulate a fraud attack against a single online account typically involve an observation of a burst of sessions. For example, an initial session (or sessions) may be used to steal credentials or confirm that the stolen credentials are correct, and once confirmed, another attack vector will be used to perform fraud; multiple sessions may be used, each session performing a segment of fraudulent activity in an effort to keep financial activity under the radar of transaction monitoring rules; if a fraud attack is successful against the account, the fraudster may come back and try again.
Note that in these cases, the sequence of fraudulent sessions may or may not have a similar profile. Furthermore, in most cases, fraudsters attempt to move as quickly as possible to perform fraud before their activities are discovered or their access to accounts is closed. Mathematically, this implies that it is also possible that the expectation of fraud is that the observation that a (potentially) fraudulent session should affect the latter event. Using updated user model Uk-1To rewrite the event EkEquation (0.3):
in this equation, p (f) is the prior probability that any observed event E is caused by a fraudster, not a user. In the previous section, it was assumed that each event is independent and that p (f) is constant, so that l (E) and λ (E) can be used as equivalent decision statistics. However, as previously discussed, this is not the case when observing a fraudulent event can change some expectation that the fraud (i.e., p (f)) of a subsequent event will be seen.
Note that in addition to modifying P (F), this may also include some form of dynamic event prediction model for fraud, i.e., P (E) for the user modelk|Fk-1). However, this is a difficult matter to define and adds a lot of complexity to the resulting algorithms and models.
Therefore, it is of interest to modify the estimate p (f) based on previous observations (of potentially fraudulent activity). Ideally, this would be done recursively, so that the resulting model does not have to remember every previous event.
One such model is exponential decay. The model implements the following assumptions: the latter fraudulent activity (for a single account) tends to occur within a limited time frame (e.g., within the same day or days). It also takes advantage of the advantageous half-life characteristics of the time-based exponential decay model.
In particular, assume that discovery is at time tFOf (E) fraud event EFAnd the a priori expectation of fraud (which decays over time) increases also in the case of the finding of the latter event E 'at time t'. One way to simulate this is to use an exponential decay model for the model based on knowing EFIs an increased a priori expectation of fraud:
wherein
·P(F0) Is the prior probability that any event is the original (before any event is observed) of fraud.
Is to define the immediate observation of event EFParameters of the latter new a priori fraud model.
μ is a parameter of the model defining the expected half-life decay of increased fraud.
Intuitively, when a fraudulent event E is foundFWhen a priori expectation of another fraudulent event is found is immediately followed from P (F)0) Jump to, and then decay back to P (F)0) Wherein the exponential half-life is equal to μ.
Of course, in practical cases, a certain previous event E is not determinediIs fraud. To account for this uncertainty, two cases may be considered, one of which is denoted as EiWhether it is caused by fraud or not, and another case is EiWhether it is not caused by fraud is a condition. The first case uses P (F) as defined abovek|Ei) As a subsequent fraud prior, while the second case uses the original fraud prior P (F)0):
(0.40)P(Fk|Ei)=P(Fk|EiIs fraud) P (F)i|Ei)+P(F0)(1-P(Fi|Ei))
Substitution of equation (0.21)And rewritten as:
( 0.41 ) - - - P ( F k | E i ) = P ( F 0 ) T E i + &lsqb; P ( F 0 ) + ( &epsiv; - P ( F 0 ) ) e - ( t k - t i ) / &mu; &rsqb; ( 1 - T E i ) = P ( F 0 ) + ( 1 - T E i ) ( &epsiv; - P ( F 0 ) ) e - ( t k - t i ) / &mu;
it should be noted that for any case of concern, are? P (F)0) This can be further simplified to:
( 0.42 ) - - - P ( F k | E i ) &ap; P ( F 0 ) + ( 1 - T E i ) &epsiv;e - ( t k - t i ) / &mu;
based on some previous potential fraudulent events EiA priori of new fraud. It should be noted that this could alternatively be defined as an increase in fraud a priori, and in this case equation (0.42) would be accurate. In practice, these two methods are equivalent.
There are potentially many previously observed events (for that user account), and the possible distributions of each should generally be considered. This is done by introducing a fraudulent co-occurrence update model.
Since the expected decay of increased fraud is exponential, the proportion of the decay from any single event depends only on the length of the decay interval andthis allows for the event { E) to be based on all previous observations1,…,Ek-1For the next observation event EkThe fraud prior of (a) defines a recursive model as:
( 0.43 ) - - - P ( F k ) = P ( F 0 ) + &gamma; k - 1 &epsiv; e ( - ( t k - t k - 1 ) &mu; ) &gamma; k = g ( &gamma; k - 1 , T E k , ( t k - t k - 1 ) ) &gamma; 0 = 0
in this formula, γk-1Substantially representing the passing of an observed event Ek-1The accumulation of (c) is untrusted. Updating the function gammakThe choice of g () defines how to combine the effects from multiple events. Can be as followsA simple recursive update model of the same behavior is defined as:
( 0.44 ) - - - &gamma; k = m a x ( ( 1 - T E k ) , &gamma; k - 1 e - ( t k - t k - 1 ) / &mu; )
can be obtained by ensuring gammakSome accumulation of previous events is used with ≦ 1 for other variations. For example, if there are too many highly suspicious events, an alternative model may allow γkIncreasing to a certain value. For example,
( 0.45 ) - - - &gamma; k = ( 1 - T E k ) + &gamma; k - 1 e - ( t k - t k - 1 ) / &mu;
using the fraud co-occurrence model to calculate likelihood ratios and associated risk scores may use equation (0.42) directly, however, it is useful to find (and possibly implement) the relevant impact of the group kIs defined as
( 0.46 ) - - - &Gamma; k L &OverBar; ( E k ) L ( E k ) = P ( E k | F ) P ( E k | U k - 1 ) ( P ( F k ) 1 - P ( F k ) ) P ( E k | F ) P ( E k | U k - 1 ) ( P ( F 0 ) 1 - P ( F 0 ) )
In this case, L is the original likelihood ratio, andis the likelihood ratio of combining the fraud co-occurrence models. It was observed that the first term in both cases was the same and F0This reduces to 1:
( 0.47 ) - - - &Gamma; k = P ( F k ) P ( F 0 ) ( 1 - P ( F k ) ) .
substituting equation (0.43) yields:
( 0.48 ) - - - &Gamma; k = P ( F 0 ) + &gamma; k - 1 &epsiv;e ( - t k - t k - 1 &mu; ) P ( F 0 ) ( 1 - P ( F 0 ) - &gamma; k - 1 &epsiv;e ( - ( t k - t k - 1 ) &mu; ) )
and finally, P (F) is observed for any case of interest0) 1-, this yields:
such that:
thus, the fraud co-occurrence model substantially increases the risk of a subsequent event by an amount determined by cumulative untrustworthiness recursively derived from previous events.
Conversation model
In addition to determining the risk of a single event, the FPS may determine the risk of a series of related events. For example, in the case of online activity, an online session includes one login event, followed by one or more activity events (e.g., viewing account balance, initiating funds transfer, viewing check-up images, etc.), followed by some form of termination event (explicit logoff of the user or some form of session timeout).
Consider a generic session model that includes 0, 1, or more observations of activity events. It should be appreciated that at any point in time, a session may be open (where other activities are observed) or closed (and where no other activities are observed).
Identify the user's kth session as:
(0.51)Sk=(A1,A2,...,AN)
wherein A isnIs the observed activity event. Each activity event AnHaving a type (or category) attribute CnAnd by vector VnA specified set of viewing parameters, CnTake a value of one of a set of predetermined types. Specifically:
( 0.52 ) - - - A n = ( C n , V n ) C n &Element; { c 1 , c 2 , . . . , c m } V n = ( v 1 , v 2 , . . . , v p )
it is possible to distinguish between open sessions (sessions that can accept future activity events) and closed sessions (sessions that cannot accept future activity events). When necessary, the open session is designated asAnd specifies closing the session as
In general, the likelihood ratio and associated risk for a session is:
( 0.53 ) - - - &lambda; ( S k ) = P ( S k | F k - 1 ) P ( S k | U k - 1 ) = P ( A 1 , A 2 , . . . , A N | F k - 1 ) P ( A 1 , A 2 , . . . , A N | U k - 1 ) R ( S k ) = log ( &lambda; ( S k ) )
an online login session is a special case of the generic session model. In particular, (ignoring cases of login failure), an online login session starts with a login event (which initiates an open session), then has 0, 1 or more active events, and finally ends with some form of termination event, which is also used to close the session. The termination event may be an explicit logoff by the user, or it may be a timeout of the online banking application or risk engine.
In essence, login events and termination events are specific types of events that also specify the beginning and end of a session. The corresponding open and closed sessions are defined as:
In these definitions, L represents a login event and T represents a termination event. By definition, there may be one and only one login event. Likewise, for a closed session, there is one and only one termination event when the open session has no termination event. In general, both L and T may have parameters and types associated with them.
In most cases, given a particular user or fraud model, it can be safely assumed that both login events and termination events are conditionally independent of each other and all other activity events. This allows the equation (0.53) for the online login session model to be rewritten as:
wherein:
· R L ( S k ) = log P ( L k | F k - 1 ) P ( L k | U k - 1 )
is the risk of a login event, which may be calculated as described above.
· R T ( S k ) = log P ( T k | F k - 1 ) P ( T k | U k - 1 )
Is a risk of a termination event. This may be combined with previous or expected behavior (e.g., the user may always explicitly log off). In most cases, both conditional probabilities are constant and are usually equal to each other, so the whole term can be safely ignored.
· R A &OverBar; ( S k ) = R ( A 1 , A 2 , . . . , A N ) = log P ( A 1 , A 2 , . . . , A N | F k - 1 ) P ( A 1 , A 2 , . . . , A N | U k - 1 )
Is the composite risk of all activity events within a session (also referred to as activity risk) and is described below.
Computing composite activity risk
Will converse S kThe estimate of activity likelihood ratio and associated activity risk of (a) is set as:
( 0.56 ) - - - &lambda; A &OverBar; ( S k ) &lambda; ( A 1 , A 2 , . . . , A N ) = P ( A 1 , A 2 , . . . , A N | F k - 1 ) P ( A 1 , A 2 , . . . , A N | U k - 1 ) R A &OverBar; ( S k ) R ( A 1 , A 2 , . . . , A N ) = log ( &lambda; ( S k ) )
it is impractical to compute this generic form. However, estimating these terms using a simpler model that is easier to process to work together achieves the most significant effect. There are many ways to solve this problem. For the purposes of this specification, the generic form has been broken down into three components:
( 0.57 ) - - - &lambda; A &OverBar; ( S k ) &ap; &lambda; A &OverBar; f r e q ( S k ) &times; &lambda; A &OverBar; o r d e r ( S k ) &times; &lambda; A &OverBar; p a r a m s ( S k )
wherein
·
Is the composite contribution of each activity in the session from the observed counts for each activity type
·
Is the composite contribution of each activity in the session from the particular order of activity types observed. This will lead toThe basic probability defined in any possible order is conditioned on the activity type count.
·
Is the composite contribution of the specific observation parameters for each activity in the session. This will lead toDefining as elementary probability likelihoods is conditioned on observing the type of activity, and in general they may depend on previously observed activities.
By using natural logarithms, the corresponding risk value is defined as
( 0.58 ) - - - R A &OverBar; ( S k ) = R A &OverBar; f r e q ( S k ) + R A &OverBar; o r d e r ( S k ) + R A &OverBar; p a r a m s ( S k )
Each term is considered.
For closing a session, one can turn offWritten as a product of likelihood ratios, where the terms correspond to the number of observations n found for each activity type ccThe expectation of (2):
( 0.59 ) - - - &lambda; A &OverBar; f r e q ( S ^ k ) = &Pi; c &Element; { c 1 , c 2 , ... , c M } P ( N c = n c | F k - 1 ) P ( N c = n c | U k - 1 )
similarly, the risk of opening a session may be calculated. However, for an open session, the minimum amount of activity that will be observed for that session may be known. This is represented by using ≧ instead of ═ within the probability:
Similarly, can be associatedThe values are calculated as:
note that all activity types are included in the calculation even if no specific activity of that type is observed in the session.
In most cases, the particular order of activity within a session, whether by a fraudster or a user, is not statistically different. Mathematically, this means that the following assumptions can be made:
&lambda; A &OverBar; o r d e r = 1
R A &OverBar; o r d e r = 0
in the most general case, the desired probability distribution of the observed parameter for each activity may depend on the previously observed activity. Furthermore, typically, the relevant prior activity may have occurred in this session or some other earlier session (or a combination of both). Information from previous sessions contained in updated user activity models Uk-1And updated fraud activity model Fk-1(if one is used). The information about the previous activity occurring in the current session can be directly used as the overall information about the activity maintained over the lifetime of the session.
Thus, in its most general form, the following may be usedWritten as the product of the likelihoods of each activity
And, similarly:
in most cases, the parameters of an activity are independent of previous activities (which may already be conditioned on the type of activity). If the parameters of the activity are not related to any previous activity, then
( 0.64 ) - - - &lambda; A j p a r a m s = P ( V j | C j , F k - 1 ) P ( V j | C j , U k - 1 )
Session cost model
Different types of activities can be associated with different costs from a commercial and risk perspective. For example, missing fraud on funds transfers is likely to cost more than missing fraud on checking account balances. To account for this, the concept of cost is introduced in computing the risk of a session.
Along with this decision theory approach, which assigns a possible cost to each decision result, and since the decision space is essentially used to declare whether a session is fraudulent or a user, there may be four possible decision results:
when the session actually came from the user, the FPS determines that the session is fraudulent. This is called the cost of false alarms and is expressed as:
○ £ when truly U, F £ is assertedFA
When the session is actually fraudulent, the FPS determines that the session is fraudulent. This can be referred to as the cost of correct fraud and is expressed as:
o £ £ (F is judged when it is truly F)
When the session is actually fraudulent, the FPS determines the session as a user. This can be referred to as the cost of missed fraud and is expressed as:
○ £ when indeed F, U is asserted £ h Missed
When the session actually came from the user, the FPS determines that the session is a user. This may be referred to as the correct user cost and expressed as:
o £ £ (determine U when really U)
Generally, when a determination can be made that a session is fraudulent, the expected cost is:
also, when making a decision that the session is from a user, the expected cost is:
therefore, to minimize the desired cost, the decision criteria are simplified by using the following equation:
also, optionally:
the individual costs may represent any cost to the business, including actual fraud losses, resources used to respond to alerts, and negative impact on the customer if the transaction is stopped. Assume that the cost of making a correct decision is 0, i.e., that £ F (when truly F) is £ 0 (when truly U) is U. It should be appreciated that the cost of making an incorrect determination may depend on the session itself (via the associated activity). With this case, the decision criterion of equation (0.68) is rewritten as:
using bayesian rules:
recognizing that the user prior and fraud prior correlate to P (U)0)=1-P(F0) And fraud prior P (F)0) Constant, these terms can be moved into the threshold so that:
enough statistics can be defined as:
That is, the risk of adjusting costs for sessions is a generalization of a simple risk score that can consolidate the costs for different types of sessions. Thus, the risk of justification cost for a session can be used as the primary decision statistic for the session.
The cost rate θ does not depend on the content of the session (i.e., the cost is the same for all sessions), so it can be moved into the threshold to make the original R (S)k) Is a sufficient statistic. This is typically valid when only a single event type is considered, as is the case with login events.
Movable model
In general, there are multiple types of activities, and the risk models that apply to the activity types should be based on the nature of the activity. In this section, a generic model is described that can be used for multiple types of activities. Other models may be derived and used based on similar logic.
The model described calculates the risk of activity based on whether any activity of that type has been observed in the session (regardless of how much). The cost contribution may include a base cost, an incremental cost per observation activity, and a cost of quantitative observation parameters (e.g., an amount of funds transfer) that may be tied to the activity.
For all activities according to a given type (i.e., ) The general form of the calculated risk component is as follows:
( 0.73 ) - - - R A &OverBar; c i ( S k ) = R A &OverBar; c i f r e q ( S k ) + &Sigma; A j &Element; A c i R A j p a r a m s ( S k )
for this activity model template, all activities of this type should be considered indistinguishable, i.e., P (V | C, F)k-1)=P(V|C,Uk-1) To make
( 0.74 ) - - - R A j p a r a m s ( S k ) = 0
Measurement ofBased on the type of activity observed in the session (i.e.,) Or no activity of this type is observed (i.e.,) In (1). The model is derived from the beta distribution to estimate the likelihood that this type of activity is observed for the user, i.e.:
where ρ isFFraud _ occurrent _ prior (fraud _ occurrence _ prior)
This is the prior probability of finding the type of activity within a session assuming fraud
·ρFUser _ occurrent _ prior (user _ occurrence _ prior)
This is the prior probability of finding the type of activity within a session assuming fraud
α ═ alpha _ octance (α _ occurrence)
This is the alpha (in number of sessions) associated with the dirichlet model for the user
·For Uk-1Observe ciSession generation of
This is the occurrence of a priori sessions (counting or preferably cumulative confidence) containing the type of activity observed by the user
·For Uk-1Total observed session occurrence of
This is the total number of sessions (counted or preferably accumulated confidence) of observed a priori sessions (whether or not the activity type is observed)
Using the definition in equation (0.75), calculateComprises the following steps:
1. if S iskIs on and no activity of this type is observed, then (see equation (0.61)):
( 0.76 ) - - - R A &OverBar; c i f r e q ( S k ) = log ( P ( N c i &GreaterEqual; 0 | F k - 1 ) P ( N c i &GreaterEqual; 0 | U k - 1 ) ) = log ( 1 1 ) = 0
2. if S iskIs off and no activity of this type is observed, then:
3. if at least one activity of this type has been observed (regardless of S)kWhether open or closed), then:
the missing fraud and false alarm cost models use a generic parameterized form that can be used to model a variety of situations. In particular (for fraud costs):
wherein,
·is type c observed in the sessioniIncluding the current activity
·Is a quantifier parameter associated with Activity A
Beta is a cost coefficient set as a movable model template parameter
(missing _ type _ cost)
(miss _ count _ cost)
(missing _ quantifier _ cost)
The false alarm cost model uses the same general parametric form, but with a separate set of cost coefficients.
Wherein,
beta is a cost coefficient set as a movable model template parameter
(FA _ type _ cost)
(FA _ COUNT _ COST)
(FA _ quantifier _ cost)
Embodiments described herein include a method comprising: automatically generating a causal model corresponding to a user; estimating a plurality of components of a causal model using event parameters of a first set of events conducted by a user in an account of the user; and predicting an expected behavior of the user during the second set of events using the causal model.
The automatically generating a causal model in an embodiment includes: statistical relationships between components of the plurality of components are generated.
The method in an embodiment includes representing the causal model as a bayesian network.
The automatically generating a causal model in an embodiment includes: a joint probability distribution is generated that includes a plurality of components.
The plurality of components in an embodiment includes a plurality of probability distribution functions representing the event parameters.
The event parameters in an embodiment are observable parameters collected during the first set of events.
The event parameters in an embodiment include one or more of Internet Protocol (IP) data and hypertext transfer protocol (HTTP) data.
The IP data in an embodiment includes one or more of an IP address, an IP address country, an IP address city, an IP network segment, and an internet service provider supporting the event.
The HTTP data in an embodiment includes one or more of an operating system, a user agent string, a source string, and data of an internet browser of a computer for an event.
The automatically generating a causal model in an embodiment includes: a statistical relationship between the event parameters and the derived parameters is generated.
The derived parameters in an embodiment include one or more of a geographic area where the device initiated the second set of events, a location of the device, an identification of the device, and an electronic service provider of the device.
Predicting the expected behavior of the user in an embodiment includes: expected event parameters for a second set of events are generated.
Generating the expected event parameters in an embodiment includes: a first set of predictive probability distributions representing expected event parameters is generated, wherein generating the first set of predictive probability distributions assumes that the user is conducting a second set of events.
The method in an embodiment includes receiving a predictive fraud model. The method in an embodiment includes generating a second set of predicted probability distributions that represent expected fraud event parameters, wherein generating the second set of predicted probability distributions assumes that a fraudster is conducting a second set of events, wherein the fraudster is anyone other than a user.
The method in an embodiment comprises: the predicted fraud model is automatically generated by estimating a plurality of fraud components of the predicted fraud model using fraud parameters of previous fraud events conducted in a plurality of accounts, wherein the previous fraud events are events suspected as having been conducted by a fraudster.
The automatically generating the predictive fraud model in an embodiment includes generating statistical relationships between fraud components of the plurality of fraud components.
Automatically generating the predictive fraud model in an embodiment includes generating a statistical relationship between the fraud event parameters and the derived fraud parameters.
The derived fraud parameters in embodiments include one or more of a location of the apparatus, an identity of the apparatus, and an electronic service provider of the apparatus.
The method in an embodiment includes generating a risk score for an event in the second set of events in real-time using the expected event parameters and the expected fraudulent event parameters and the observation parameters.
The method in an embodiment comprises: when the expected behavior indicates that a person other than the user is conducting the event, an alert corresponding to the event in the second set of events is generated.
The method in an embodiment comprises: the causal model is automatically updated using a second set of event parameters collected during a second set of events.
The second set of event parameters in an embodiment are observable parameters collected during the second set of events.
Automatically updating the causal model in an embodiment includes updating a joint probability distribution that includes a plurality of components.
Automatically updating the causal model in an embodiment includes updating at least one of the plurality of components.
The automatically updating the causal model in an embodiment includes: updating at least one of the plurality of probability distribution functions representing the event parameter, the updating modifying at least one of the plurality of probability distribution functions by considering data of the second set of event parameters.
The method in an embodiment comprises: a probability distribution function is generated for each event parameter of the first set of events. The method in an embodiment comprises: an updated probability distribution function is generated for each event parameter by applying data for a second set of event parameters for a second set of events to the probability distribution function.
The method in an embodiment includes receiving a baseline causal model corresponding to a user, the baseline causal model generated without using data of any event. The method in an embodiment includes generating a causal model by generating a joint probability distribution comprising a plurality of components, wherein the plurality of components comprises updated probability distribution functions for any event parameters represented in the causal model.
The first set of events and the second set of events in an embodiment include at least one of online events, offline events, and multi-channel events.
An online event in an embodiment is an event that is conducted via electronic access to an account.
The events in an embodiment include login events.
In an embodiment the event comprises an activity event.
The set of events in an embodiment includes a session, wherein the session is a series of related events.
The series of related events in an embodiment includes a session login event and a termination event.
The series of related events in an embodiment includes at least one activity event.
The method in an embodiment comprises: probabilistically determines that the second set of events was performed by the user. The method in an embodiment comprises: the causal model is automatically updated using a second set of event parameters collected during a second set of events.
The method in an embodiment includes updating the causal model to include a trust factor that represents a probability that the second set of events was actually performed by the user.
The method in an embodiment includes updating the causal model to a cumulative confidence factor that represents a cumulative probability that an event parameter in a plurality of sets of events was actually performed by a user across the plurality of sets of events.
Automatically generating the causal model in an embodiment includes generating the causal model including an attenuation parameter.
The decay parameters in an embodiment include an exponential decay function by which the relative weight of each event in a set of events in an account changes with the time elapsed from the event.
Embodiments described herein include a method comprising: receiving a plurality of observations corresponding to a first event, the first event comprising an action performed in an account during an electronic access to the account; generating a probabilistic relationship between the observation and the derived parameters of the owner of the account; automatically generating an account model comprising a probabilistic relationship; and estimating an action of the owner during a second event using the account model, wherein the second event temporally follows the first event.
Embodiments described herein include a method comprising: automatically generating a causal model corresponding to a user, the generating comprising estimating a plurality of components of the causal model using event parameters of previous events conducted by the user in an account of the user; predicting an expected behavior of the user during a next event in the account using the causal model, wherein predicting the expected behavior of the user comprises generating predicted event parameters for the next event; receiving an observed event parameter for a next event; and updating the causal model for use in the future event, the updating including regenerating the plurality of components based on a relationship between the expected event parameter and the observed event parameter.
Embodiments described herein include a system comprising a processor executing at least one application that receives event parameters for a first set of events by a user in an account of the user, the application automatically generating a causal model corresponding to the user by estimating a plurality of components of the causal model using the event parameters for the first set of events, the application using the causal model to output a prediction of an expected behavior of the user during a second set of events.
Automatically generating the causal model in an embodiment includes generating statistical relationships between components of the plurality of components.
Automatically generating the causal model in an embodiment includes generating a joint probability distribution including a plurality of components.
The plurality of components in an embodiment includes a plurality of probability distribution functions representing the event parameters.
The event parameters in an embodiment are observable parameters collected during the first set of events.
The event parameters in an embodiment include one or more of Internet Protocol (IP) data and hypertext transfer protocol (HTTP) data.
The IP data in an embodiment includes one or more of an IP address, an IP address country, an IP address city, an IP network segment, and an internet service provider supporting the event.
The HTTP data in an embodiment includes one or more of an operating system, a user agent string, a source string, and data of an internet browser of a computer for an event.
The automatically generating a causal model in an embodiment includes: a statistical relationship between the event parameters and the derived parameters is generated.
The derived parameters in an embodiment include one or more of a geographic area where the device initiated the second set of events, a location of the device, an identification of the device, and an electronic service provider of the device.
Predicting the expected behavior of the user in an embodiment includes: expected event parameters for a second set of events are generated.
Generating the expected event parameters in an embodiment includes generating a first set of predictive probability distributions representing the expected event parameters, where generating the first set of predictive probability distributions assumes that the user is conducting a second set of events.
The system in an embodiment includes receiving a predictive fraud model. The system in an embodiment includes generating a second set of predicted probability distributions that represent expected fraud event parameters, wherein generating the second set of predicted probability distributions assumes that a fraudster is performing the second set of events, wherein the fraudster is anyone other than a user.
The system in an embodiment includes: a risk score for an event in the second set of events is generated in real-time using the expected event parameters and the expected fraudulent event parameters and the observation parameters.
The system in an embodiment includes: when the expected behavior indicates that a person other than the user is performing the event, an alert corresponding to the event in the second set of events is generated.
The system in an embodiment includes: the causal model is automatically updated using a second set of event parameters collected during a second set of events.
The automatically updating the causal model in an embodiment includes: updating at least one of the plurality of probability distribution functions representing the event parameter, the updating modifying at least one of the plurality of probability distribution functions by taking into account data of the second set of event parameters.
The system in an embodiment includes: a probability distribution function is generated for each of the event parameters for the first set of events. The system in an embodiment includes: an updated probability distribution function is generated for each of the event parameters by applying data for a second set of event parameters for a second set of events to the probability distribution function.
The first set of events and the second set of events in an embodiment include at least one of online events, offline events, and multi-channel events.
An online event in an embodiment is an event that is conducted via electronic access to an account.
The events in an embodiment include login events.
The events in an embodiment include activity events.
The set of events in an embodiment includes a session, where a session is a series of related events.
The system in an embodiment includes: probabilistically determining that the second set of events was performed by the user; the causal model is automatically updated using a second set of event parameters collected during a second set of events.
The system in an embodiment includes updating the causal model to include a trust factor that represents a probability that the second set of events was actually performed by the user.
The system in an embodiment includes updating the causal model to include a cumulative confidence factor that represents a cumulative probability that an event parameter in a plurality of sets of events was actually performed by a user across the plurality of sets of events.
Automatically generating the causal model in an embodiment includes generating the causal model including an attenuation parameter.
The decay parameter in an embodiment comprises an exponential decay function in which the relative weight of each event in a set of events in the account changes with time elapsed from the event.
Embodiments described herein include a system comprising a processor executing at least one application, the application receiving event parameters for a first set of events by a user in an account of the user, the application automatically generating an account model corresponding to the user, the account model comprising a plurality of components, wherein generating the account model comprises generating the plurality of components using the event parameters for the first set of events, the application using the account model to predict expected behavior of the user during a second set of events, the application generating an updated version of the account model for use in a set of future events, the updating comprising regenerating the plurality of components using the second set of events.
Embodiments described herein include a method comprising: automatically generating a causal model corresponding to a user, the generating comprising estimating a plurality of components of the causal model using event parameters of previous events conducted by the user in an account of the user; predicting an expected behavior of the user during a next event in the account using a causal model, wherein predicting the expected behavior of the user comprises generating expected event parameters for the next event; generating a fraud event parameter using the predictive fraud model, wherein generating the fraud event parameter assumes that a fraudster is performing a next event, wherein the fraudster is anyone other than the user; and generating a risk score for the next event using the expected event parameter and the fraud event parameter, the risk score representing a relative likelihood of the future event being performed by the user and the future event being performed by the fraudster.
The method in an embodiment comprises: the predicted fraud model is automatically generated by estimating a plurality of fraud components of the predicted fraud model using fraud event parameters of previous fraud events conducted in a plurality of accounts, wherein the previous fraud events are events suspected as having been performed by a fraudster.
The automatically generating the predictive fraud model in an embodiment includes generating statistical relationships between fraud components of the plurality of fraud components.
Automatically generating a predictive fraud model in an embodiment includes generating a joint probability distribution that includes a plurality of fraud components.
The plurality of fraud components in an embodiment includes a plurality of fraud probability distribution functions representing fraud event parameters.
The fraud parameter in an embodiment is an observable fraud parameter collected during a previous fraud event.
Automatically generating the predictive fraud model in an embodiment includes generating a statistical relationship between the fraud event parameters and the derived fraud parameters.
The derived fraud parameters in embodiments include one or more of a location of the apparatus, an identity of the apparatus, and an electronic service provider of the apparatus.
The method in an embodiment includes generating a predictive fraud model.
Generating a predictive fraud model in an embodiment includes: an original fraud model is generated that includes a probability of observing the event assuming that the event was caused by a fraudster and that there is no other information about the event.
Generating the predictive fraud model in an embodiment includes generating a probabilistic combination of the original fraud model and the spoof model.
The method in an embodiment comprises: an original fraud model is generated that includes a probability of observing an event assuming that the event was caused by a fraudster and that there is no other information about the event.
Generating the predictive fraud model in an embodiment includes generating the predictive fraud model that includes a probability of impersonation, where the probability of impersonation is a probability of a fraudster successfully impersonating parameter values of event parameters of a set of events conducted by a user.
The impersonation model in an embodiment includes a probability that a fraudster mimics an event parameter of a set of events conducted by a user.
The impersonation model in an embodiment includes a probability that a fraudster observes event parameters of a set of events conducted by a user.
The method in an embodiment comprises: at least one previous fraudulent event is identified, the previous fraudulent event comprising a previous event in the account that may have been caused by the fraudster. The method in an embodiment includes generating an original fraud model by estimating a plurality of components of the fraud model using event parameters of at least one previous fraud event conducted in the account, the at least one previous fraud event being likely to be performed by a fraudster.
The method in an embodiment includes modifying the predictive fraud model based on at least one previous event that may be performed by a fraudster.
The method in an embodiment includes generating a predictive fraud model including fraud co-occurrence coefficients for at least one previous event that may be performed by a fraudster.
The fraud co-occurrence coefficient in an embodiment represents a cumulative distrust recursively derived from at least one previous event that may be performed by a fraudster.
The fraud co-occurrence coefficients in embodiments include coefficients representing the effects of a number of previous events that may be conducted by a fraudster.
Automatically generating the causal model in an embodiment includes generating statistical relationships between components of the plurality of components.
Automatically generating the causal model in an embodiment includes generating a joint probability distribution including a plurality of components.
The plurality of components in an embodiment includes a plurality of probability distribution functions representing event parameters of previous events.
The event parameters in an embodiment are observable parameters collected during a previous event.
The event parameters in an embodiment include one or more of Internet Protocol (IP) data and hypertext transfer protocol (HTTP) data.
The IP data in an embodiment includes one or more of an IP address, an IP address country, an IP address city, an IP network segment, and an internet service provider supporting the event.
The HTTP data in an embodiment includes one or more of an operating system, a user agent string, a source string, and data of an internet browser of a computer for an event.
Automatically generating the causal model in an embodiment includes generating a statistical relationship between the event parameters and the derived parameters.
The derived parameters in an embodiment include one or more of a geographic area in which the device is initiating a next event, a location of the device, an identification of the device, and an electronic service provider of the device.
Predicting the expected behavior of the user in an embodiment includes generating an expected event parameter for a next event.
Generating the expected event parameters in an embodiment includes generating a first set of expected probability distributions representing the expected event parameters, where generating the first set of expected probability distributions assumes that the user is performing the next event.
The method in an embodiment comprises: when the risk score indicates that a person other than the user is performing the next event, an alert corresponding to the next event is generated.
The method in an embodiment includes automatically updating the causal model using a second set of event parameters collected during a next event.
The second set of event parameters in an embodiment are observable parameters collected during the next event.
Automatically updating the causal model in an embodiment includes updating a joint probability distribution that includes a plurality of components.
Automatically updating the causal model in an embodiment includes updating at least one of the plurality of components.
Automatically updating the causal model in an embodiment includes updating at least one of a plurality of probability distribution functions representing the event parameters by modifying the at least one of the plurality of probability distribution functions taking into account data of the second set of event parameters.
The method in an embodiment comprises generating a probability distribution function for each event parameter of a previous event. The method in an embodiment comprises generating an updated probability distribution function for each event parameter by applying data of the second set of event parameters for the next event to the probability distribution function.
The method in an embodiment includes receiving a baseline causal model corresponding to a user, the baseline causal model generated without using data for any event. The method in an embodiment includes generating a causal model by generating a joint probability distribution including a plurality of components, wherein the plurality of components includes an updated probability distribution function for any event parameter represented in the causal model.
The previous and next events in an embodiment include at least one of an online event, an offline event, and a multi-channel event.
An online event in an embodiment is an event made via electronic access to an account.
The events in an embodiment include login events.
The events in an embodiment include activity events.
The method in an embodiment includes probabilistically determining that a next event was performed by a user. The method in an embodiment includes automatically updating the causal model using a second set of event parameters collected during a next event.
The method in an embodiment includes updating the causal model to include a trust factor that represents a probability that a next event was actually performed by the user.
The method in an embodiment includes updating the causal model to include a cumulative confidence factor that represents a cumulative probability that an event parameter in the plurality of events was in fact executed by the user across the plurality of events.
Automatically generating the causal model in an embodiment includes generating the causal model including an attenuation parameter.
The decay parameters in an embodiment include an exponential decay function by which the relative weight of each event in the account varies with the time elapsed since the event.
Embodiments described herein include a method comprising: automatically generating an account model corresponding to the user, the generation of the account model using event parameters of a previous event performed by the user in the user's account to generate a predicted distribution of event parameters for a next event in the account, wherein the account model includes the predicted distribution of event parameters; when a next event occurs, receiving an observed event parameter of the next event; generating a first probability using the account model, wherein the first probability is a probability that the user is assumed to be performing the next event observing the observed event parameter; generating a second probability using the fraud model, wherein the second probability is a probability of assuming that a fraudster is performing the next event observing the observed event parameter, wherein the fraudster is a person other than the user; and generating a risk score using the first probability and the second probability, the risk score representing a relative likelihood that the next event is performed by the user versus the next event by the fraudster.
Embodiments described herein include a method comprising: generating a probabilistic relationship between the observation of the first event and the derived parameters of the owner of the account; automatically generating an account model comprising a probabilistic relationship; dynamically updating the account model using the observation of the second event; and predicting, using the account model, whether the owner or the fraudster is continuing the third event during the third event, wherein the event comprises an action taken in the account during the electronic access to the account.
Embodiments described herein include a system comprising a processor for executing at least one application that automatically generates a predicted user model corresponding to a user, wherein the predicted user model comprises a plurality of probability distributions representing event parameters observed during a first event in an account of the user, the application uses the predicted user model to generate predicted event parameters, the predicted event parameters are expected to be observed during a second event in the account, the second event following the first event, the application compares actual event parameters of the second event with the predicted event parameters during the second event, and generates an alert corresponding to the second event when the actual event parameters appear to be someone other than the user.
Embodiments described herein include a system, comprising a processor for executing at least one application, the application automatically generates a causal model corresponding to the user by estimating a plurality of components of the causal model using event parameters of previous events conducted by the user in the user's account, the application uses a causal model to predict the expected behavior of the user during the next event in the account, wherein predicting the expected behavior of the user comprises generating expected event parameters for a next event, the application generating fraud event parameters using an expected fraud model, wherein generating the fraud event parameter assumes that the fraudster is performing the next event, wherein the fraudster is anyone other than the user, the application uses the expected event parameters and the fraudulent event parameters to generate a risk score for the next event, the risk score represents the relative likelihood that a future event is performed by the user versus a future event by a fraudster.
The system in an embodiment includes: the predicted fraud model is automatically generated by estimating a plurality of fraud components of the predicted fraud model using fraud parameters of previous fraud events conducted in a plurality of accounts, wherein the previous fraud events are events suspected as having been performed by a fraudster.
The automatically generating the predictive fraud model in an embodiment includes generating statistical relationships between fraud components of the plurality of fraud components.
Automatically generating a predictive fraud model in an embodiment includes generating a joint probability distribution that includes a plurality of fraud components.
The plurality of fraud components in an embodiment includes a plurality of fraud probability distribution functions representing fraud event parameters, wherein the fraud event parameters are observable fraud parameters collected during a previous fraud event.
Automatically generating the predictive fraud model in an embodiment includes generating a statistical relationship between the fraud event parameters and the derived fraud parameters.
The derived fraud parameters in embodiments include one or more of a location of the apparatus, an identity of the apparatus, and an electronic service provider of the apparatus.
The system in an embodiment includes generating a predictive fraud model.
Generating the predictive fraud model in an embodiment includes generating an original fraud model that includes a probability that the event was assumed to be caused by a fraudster and was observed without any other information about the event.
Generating the predictive fraud model in an embodiment includes generating a probabilistic combination of the original fraud model and the spoof model.
The system in an embodiment includes generating an original fraud model that includes a probability of an event being assumed to be caused by a fraudster and observed without any other information about the event.
Generating the predictive fraud model in an embodiment includes generating the predictive fraud model that includes a fraud probability, wherein the fraud probability is a probability that a fraudster successfully spoofs parameter values of event parameters of a set of events performed by a user.
The impersonation model in an embodiment includes a probability that a fraudster mimics an event parameter of a set of events conducted by a user.
The impersonation model in an embodiment includes a probability of a fraudster observing event parameters of a set of events performed by a user.
The system of an embodiment includes identifying at least one previous fraud event, the previous fraud event including a previous event in the account that may have been caused by a fraudster. The system in an embodiment includes generating an original fraud model by estimating a plurality of components of the fraud model using event parameters of at least one previous fraud event conducted in the account, the at least one previous fraud event possibly being performed by a fraudster.
The system in an embodiment includes modifying the predictive fraud model based on at least one previous event that may be performed by a fraudster.
The system in an embodiment includes generating a predictive fraud model including fraud co-occurrence coefficients for at least one previous event that may be performed by a fraudster.
The fraud co-occurrence coefficient in an embodiment represents a cumulative distrust recursively derived from at least one previous event that may be conducted by a fraudster.
The fraud co-occurrence coefficients in embodiments include coefficients representing the effects of a number of previous events that may be performed by a fraudster.
Automatically generating the causal model in an embodiment includes generating a joint probability distribution including a plurality of components.
The plurality of components in an embodiment includes a plurality of probability distribution functions representing event parameters of previous events.
The event parameters of the previous event in an embodiment are observable parameters collected during the previous event.
The event parameters of the previous event in an embodiment include one or more of Internet Protocol (IP) data and hypertext transfer protocol (HTTP) data.
The IP data in an embodiment includes one or more of an IP address, an IP address country, an IP address city, an IP network segment, and an internet service provider supporting the event.
The HTTP data in an embodiment includes one or more of an operating system, a user agent string, a source string, and data of an internet browser of a computer for an event.
Automatically generating the causal model in an embodiment includes generating a statistical relationship between the event parameters and the derived parameters.
The derived parameters in an embodiment include one or more of a geographical area in which the next event is being initiated, a location of the device, an identification of the device, and an electronic service provider of the device.
Predicting the expected behavior of the user in an embodiment includes generating expected event parameters for a next event, wherein generating the expected event parameters includes generating a first set of predicted probability distributions representing the expected event parameters, wherein generating the first set of predicted probability distributions assumes that the user is executing the next event.
The system in an embodiment includes: when the expected behavior indicates that a person other than the user is performing the next event, an alert corresponding to the next event is generated.
The system in an embodiment includes automatically updating the causal model using a second set of event parameters collected during a next event, wherein the second set of event parameters are observable parameters collected during the next event.
Automatically updating the causal model in an embodiment includes updating at least one of a plurality of probability distribution functions representing the event parameters by modifying the at least one of the plurality of probability distribution functions taking into account data of the second set of event parameters.
The previous and next events in an embodiment include at least one of an online event, an offline event, and a multi-channel event, wherein the online event is an event via electronic access to an account.
The event in an embodiment includes at least one of a login event and an activity event.
The system in an embodiment includes probabilistically determining a probability that a next event was performed by a user. The system in an embodiment includes automatically updating the causal model using a second set of event parameters collected during a next event.
The system in an embodiment includes updating the causal model to include a trust factor that represents a probability that a next event was actually performed by the user.
The system in an embodiment includes updating the causal model to include a cumulative confidence factor that represents a cumulative probability that an event parameter in the plurality of events was actually performed by the user across the plurality of events.
Automatically generating the causal model in an embodiment includes generating the causal model including a decay parameter, wherein the decay parameter includes an exponential decay function by which a relative weight of each event in the account varies with elapsed time since the event.
Embodiments herein include a system comprising: a risk engine running on the processor and coupled to the financial system including the account, the risk engine generating an account model corresponding to the user and events performed in the account, the generation of the account model using event parameters of previous events performed by the user in the account to generate a predicted distribution of event parameters for a next event in the account, the risk engine receiving event parameters for the next event when the next event occurs, the risk engine generating a first probability using the account model, wherein the first probability is a probability that the user is assuming that the next event is being performed observing the event parameters, the risk engine generating a second probability using a fraud model, wherein the second probability is a probability that a fraudster is assuming that the fraudster is performing the next event observing the event parameters, wherein the fraudster is other than a user, wherein the events performed in the account include the previous event and the next event, the risk engine using the first probability and the second probability to generate a risk score representing a relative likelihood that the next event was performed by the user versus the next event by the fraudster; and a risk application running on the processor, the risk application including an Analysis User Interface (AUI) that displays at least one of a risk score and an event parameter for any event in the account.
The AUI in an embodiment includes a horizontal axis representing a series of events ordered by time.
The AUI in an embodiment includes a vertical axis representing event parameters.
The event parameters in an embodiment include one or more of Internet Protocol (IP) data and hypertext transfer protocol (HTTP) data.
The IP data in an embodiment includes one or more of an IP address, an IP address country, an IP address city, an IP network segment, and an internet service provider supporting the event.
The HTTP data in an embodiment includes one or more of an operating system, a user agent string, a source string, and data of an internet browser of a computer for an event.
The AUI in an embodiment includes a plurality of columns, wherein each of the plurality of columns represents at least one of the events performed in the account, wherein the plurality of columns are arranged according to a date.
The AUI in an embodiment includes a plurality of rows, wherein a set of the rows represents an event parameter of an event.
The AUI in an embodiment includes a plurality of intersection regions, each intersection region defined by an intersection of a row and a column in a set of rows, wherein an intersection region corresponds to an event parameter of at least one event, wherein an intersection region includes a color coding that correlates the event parameter to a corresponding probability of the account model.
Color coding in an embodiment means that the event parameters correspond to the relative likelihood ratios of the users.
The AUI in an embodiment includes a risk row representing a risk of the event, wherein each intersection region defined by the intersection of the risk row and the column corresponds to a risk score of at least one event corresponding to the column.
The intersection region in an embodiment comprises a color coding relating the risk score to at least one event.
The color coding in an embodiment represents a relative likelihood ratio of the user performing the at least one event.
The at least one event in an embodiment comprises at least one of an online event, an offline event, and a multi-channel event.
An online event in an embodiment is an event made via electronic access to an account.
At least one event in an embodiment comprises a login event.
At least one event in an embodiment comprises an activity event.
At least one event in an embodiment comprises a session, wherein a session is a series of related events.
The series of related events in an embodiment includes a session login event and a termination event.
The series of related events in an embodiment includes at least one activity event followed by a login event.
Generating the account model in an embodiment includes generating statistical relationships between the prediction distributions.
Generating the account model in an embodiment includes generating a joint probability distribution that includes the prediction distribution.
The prediction distribution in an embodiment comprises a plurality of probability distribution functions representing the event parameters.
The event parameters in an embodiment are observable parameters collected during a previous event.
Generating the account model in an embodiment includes generating a statistical relationship between the event parameters and the derived parameters.
The derived parameters in an embodiment include one or more of a geographic area where the device initiates the next event, a location of the device, an identification of the device, and an electronic service provider of the device.
Generating a risk score in an embodiment includes generating an expected event parameter for a next event.
Generating the expected event parameters in an embodiment includes generating a first set of expected probability distributions representing the expected event parameters, where generating the first set of expected probability distributions assumes that the user is performing a second set of events.
The system in an embodiment includes receiving a predictive fraud model. The system in an embodiment includes generating a second set of expected probability distributions that represent expected fraud event parameters, wherein generating the second set of expected probability distributions assumes that a fraudster is performing a next event.
The system in an embodiment includes: the predicted fraud model is automatically generated by estimating a plurality of fraud components of the predicted fraud model using fraud event parameters of previous fraud events conducted in a plurality of accounts, wherein the previous fraud events are events suspected as having been performed by a fraudster.
The automatically generating the predictive fraud model in an embodiment includes generating statistical relationships between fraud components of the plurality of fraud components.
The automatic generation of the predictive fraud model in an embodiment comprises generating a statistical relationship between the fraud event parameters and deriving the fraud parameters.
The derived fraud parameters in embodiments include one or more of a location of the apparatus, an identity of the apparatus, and an electronic service provider of the apparatus.
The system in an embodiment includes generating a predictive fraud model.
Generating the predictive fraud model in an embodiment includes generating an original fraud model that includes a probability of observing the event assuming that the event was caused by a fraudster and that there is no other information about the event.
Generating the predictive fraud model in an embodiment includes generating a probabilistic combination of the original fraud model and the spoof model.
The system in an embodiment includes generating an original fraud model that includes a probability of observing an event assuming that the event was caused by a fraudster and that there is no other information about the event.
Generating the predictive fraud model in an embodiment includes generating the predictive fraud model that includes a probability of impersonation, where the probability of impersonation is a probability of a fraudster successfully impersonating parameter values of event parameters of a set of events conducted by a user.
The impersonation model in an embodiment includes a probability that a fraudster mimics an event parameter of a set of events conducted by a user.
The impersonation model in an embodiment includes a probability that a fraudster observes event parameters of a set of events conducted by a user.
The system of an embodiment includes identifying at least one previous fraud event, the previous fraud event including a previous event in the account that may have been caused by a fraudster. The system in an embodiment includes generating an original fraud model by estimating a plurality of components of the fraud model using event parameters of at least one previous fraud event conducted in the account, the at least one previous fraud event possibly being performed by a fraudster.
The system in an embodiment includes modifying the predictive fraud model based on at least one previous event that may be performed by a fraudster.
The system in an embodiment includes generating a predictive fraud model including fraud co-occurrence coefficients for at least one previous event that may be performed by a fraudster.
The fraud co-occurrence coefficient in an embodiment represents a cumulative distrust recursively derived from at least one previous event that may be performed by a fraudster.
The fraud co-occurrence coefficients in embodiments include coefficients representing the effects of a number of previous events that may be conducted by a fraudster.
The system in an embodiment includes: the account model is selectively updated using a second set of event parameters collected during a next event.
The second set of event parameters in an embodiment are observable parameters collected during the next event.
Automatically updating the account model in an embodiment includes updating a joint probability distribution that includes a plurality of components of the account model.
Automatically updating the account model in an embodiment includes updating at least one of the plurality of components of the account model.
Automatically updating the account model in an embodiment includes updating at least one of the plurality of probability distribution functions representing the event parameters by modifying the at least one of the plurality of probability distribution functions taking into account data of the second set of event parameters.
The system in an embodiment includes generating a probability distribution function for each event parameter of the prior events. The system in an embodiment includes generating an updated probability distribution function for each event parameter by applying data for a second set of event parameters for a next event to the probability distribution function.
The system in an embodiment includes receiving a baseline account model corresponding to a user, the baseline account model generated without using data of any event. The system in an embodiment includes generating an account model by generating a joint probability distribution that includes a plurality of components of the account model, wherein the plurality of components include updated probability distribution functions for any event parameters represented in the account model.
The previous and next events in an embodiment include at least one of an online event, an offline event, and a multi-channel event.
An online event in an embodiment is an event made via electronic access to an account.
The events in an embodiment include login events.
The events in an embodiment include activity events.
The events in an embodiment include sessions, where a session is a series of related events.
The series of related events in an embodiment includes a session login event and a termination event.
The series of related events includes at least one activity event.
The system in an embodiment includes probabilistically determining that the next event was performed by the user. The system in an embodiment includes automatically updating the account model using a second set of event parameters collected during a next event.
The system in an embodiment includes updating the causal model to include a trust factor that represents a probability that a next event was actually performed by the user.
The system in an embodiment includes updating the account model to include a cumulative trust factor that represents a cumulative probability that an event parameter in the plurality of events was actually performed by the user across the plurality of events.
Automatically generating the account model in an embodiment includes generating the account model including the decay parameter.
The decay parameters in an embodiment include an exponential decay function by which the relative weight of each of the events in the account changes with the time elapsed from that event.
Embodiments described herein include a system comprising: a risk engine running on the processor, the risk engine receiving observations from the financial system corresponding to the prior events, the observations comprising actions taken in an account of the financial system during electronic access of the account, the risk engine using the observations to estimate parameters of an account model and dynamically generating the account model including the parameters, the account model corresponding only to the user, the risk engine using output of the account model to generate a risk score, the risk score being a relative likelihood of an event in the account following the prior event being performed by the user and by a fraudster; and a risk application running on the processor, the risk application including an Analysis User Interface (AUI) that displays, for any event in the account, at least one of an event parameter and a risk score for any event in the account.
Embodiments described herein include a system comprising a platform including a processor coupled to at least one database. The system includes a plurality of risk engines coupled to a platform. The plurality of risk engines receive event data and risk data from a plurality of data sources including at least one financial application. The event data includes data of actions performed in the target account during the electronic access to the account. The risk data includes data of actions taken in a plurality of accounts different from the target account. The plurality of risk engines dynamically generate an account model corresponding to the target account using the event data and the risk data, and generate a risk score using the account model, the risk score being a relative likelihood that an action taken in the target account is fraudulent. The system includes a risk application coupled to the platform and including an analytics user interface that displays at least one of a risk score and event data for any event in the account for an action in the target account.
Embodiments described herein include a method comprising receiving event data and risk data at a plurality of risk engines from a plurality of data sources comprising at least one financial application. The event data includes data of actions performed in the target account during the electronic access to the account. The risk data includes data of actions taken in a plurality of accounts different from the target account. The method includes dynamically generating an account model corresponding to the target account, the generating using event data and risk data. The method includes generating a risk score using an account model. The risk score is the relative likelihood that the action taken in the target account is fraudulent. The method includes presenting an analytics user interface that displays at least one of a risk score and event data for any event in the account for an action in the target account.
The embodiments described herein include additional components as described in detail below.
Implementation of the FraudMAP System
Fig. 17 is a block diagram of a FraudMAP system according to an embodiment.
FIG. 18 is a block diagram of a FraudMAP online system according to an embodiment.
Fig. 19 is a block diagram of a FraudMAP mobile system according to an embodiment.
Fig. 20 is a block diagram of a frandmap supporting a mobile deployment scenario, according to an embodiment.
Fig. 21 is a block diagram of a FraudMAPACH system, according to an embodiment.
Fig. 22 is a block diagram of a FraudDESK system according to an embodiment.
Fig. 23 is a block diagram of a map (Reflex) according to an embodiment.
FIG. 24 is a block diagram of a fraud prevention component, according to an embodiment.
Fig. 25 is a flow diagram of fraud prevention using the FraudMAP system according to an embodiment.
FIG. 26 is a block diagram of a platform for the FraudMAP product, according to an embodiment.
Functions of the FraudMAP System
Fig. 27 is a diagram of the riskenengine of the FraudMAP system according to an embodiment. For the following discussion, please refer to the above figures. Products that include FPS include FraudMAP, RiskEngine (RE) (risk engine), and RiskFeed (risk feed). The design, components, and functionality of these products are detailed, including automation, database design, algorithms, analysis, activation methods, model generation/data generation, and specification of third party sources.
The FraudMAP is consistent with a number of product requirements and methods as described in detail below.
Bank application. The behavioral analysis techniques have application in the context of other fraud problems within financial institutions. The foundation of this platform includes: behavioral analysis techniques, dynamic account modeling TM. Behavioral analysis can be applied to a wide variety of business issues and the method is "universal" in the sense that it is not limited to assessing risk for internet and mobile banking activities. The method expands on current online banking fraud prevention offerings to cross-product, cross-channel offerings.
Third party risk data to be used for event risk scoring. Several third-party risk data sources (i.e., IP risk, mule (mule)) may include aspects of an automatic risk scoring method. The FPS platform is designed to receive this data and automatically include this data to provide enhanced risk scoring capabilities.
Specially-built risk engine for generating risk data. Various risk engines may process activity data. Instead of risk scoring individual user activities on a per-organization basis, these risk engines may process data across organizations to identify suspicious activity sequences, IP addresses, funds-transfer destination accounts (i.e., mules), and so on. The data generated by these risk engines can similarly be used for third party risk data identified in previous terms.
Cross-chassis data mining and FraudDESK. Data mining engine capabilities have been prepared for use with the frantdesk. For example, one engine slightly modifies (pivottoff) a fraud case confirmed at any given customer to identify similar activities at any other customer. Combined with the investigation and study of the frankdask analyst monitoring activities for individual customers and across all customers, the combined results may generate new risk data and facilitate proactive customer communication.
Tracing alarm. Based on new information from the analysis across institutions and the FraudDESK activity, the platform may automatically provide alerts to the institutions about historical recent activity that may be re-scored based on the new information.
Cross-organization collaboration. With almost all customers using a hosted-based, SaaS-based platform, often a compact and collaborative anti-fraud community, and the customer's expressed desire to have more opportunities to interact with each other, cross-organization collaboration features are envisioned using a secure and closed frandmap application environment. For example, the client may seek counseling about suspicious activity, alert each other about new threats, share results generated by the FraudMAP, associate each other about cross-agency attacks, share specific risk factors, and other topics.
Product, database and data flow for a FraudMAP System
Fig. 28A and 28B (collectively fig. 28) show block diagrams of the frandmap data store and data flow, according to an embodiment.
Several practical considerations for the FraudMAP have been considered, including the following: relational DBMS query execution engines use only one index per table; the RA search involves many parameters; there is no single search index that can "drive" the entire search; RA searches in the middle working set typically involve millions of rows or tens of millions of rows; row-level eligibility joints are too slow for interactive searches; even if row-level eligibility joins are well organized, the lookup of the secondary index is too slow; HLP tables and setup operations are intended to define bitmaps and sublist databases; data can be sorted by risk and real-time updating is supported; bitmap indexes are unordered, static, and may not be employed; and RDBMs that support bitmap or columnar storage are expensive.
In response to these factors, solutions are envisaged that address: this solution utilizes a small number of very large databases, rather than one database per customer or multiple small databases, which produces significant operational and cost advantages over large databases, and performs cross-tenant fraud analysis; the solution uses a user management index (HLP) table for all supported searches; the HLP table is designed to contain the following: "Anchor" search parameters, tenant (service) ID, risk, date, time, and ID of session; the HLP table is intended to have a multi-valued master key to enable the inodb PK-based indexing organization to strictly match searches; the FACT table is intended for look-up and display only and contains all session and activity data; the search aims at traversing one HLP table for each search parameter and the working set is to be saved to a Temporary (TMP) table; after executing all the search parameters, the system aims to employ a set operation (INTERSECT, UNION, MINUS) on the TMP table contents to evaluate the search; and extracting from the FACT table for display the most dangerous sessions that meet the search criteria.
The following is expected to be the result of this design approach: first, in contrast to the "standard RDBMS" approach, where searches in a single client environment with only a few million records often take many minutes or hours, RADB can contain nearly 50 billion records, while 99% of user searches are completed in 5 seconds. Second, the RiskFeed load can be kept ahead of each running RiskEngine with minimal data backlog.
This is effective for the following reasons: the system supplements the index organizational property of supporting the mysqlennodb table, which enables an optimal level of page reads; the HLP table is organized in the following way: the search in the HLP table involves a single B-tree traversal and is divided by days so that the HLP table can be loaded quickly, and the load time has an absolute upper bound; day-based partitioning enables searches to "partition refinement; because the TMP table appears in RAM, loading and defining the TMP table is very fast; the only rows loaded from the FACT table are those to be displayed, resulting in a minimum number of ID-joins and B-tree traversals being required.
Data converter of FraudMAP system
Fig. 29 is a diagram of a data converter process according to an embodiment.
Data conversion Brief description of the drawings. The data converter component is configured to be located between the collector/collector and the RiskEngine. The data converter component provides clean and expected data for processing by one or more riskengines. For example, the sequencer component may sequence data from the collector before feeding the data into the RiskEngine, which may solve the problem of ever increasing unordered data present in certain customer data. Multiple converter components are combined so that a series of operations can be performed on an input data file. To add a data transformer component (or components) for a customer, the transform.
The transformer layer is invoked after the collector component, the data of the incoming file is extracted, and the determination to invoke the transformer layer is based on whether there is a transform. The transform. ini file will determine exactly what action(s) will be performed in the transformer layer, where one or more actions may be performed. The converted file will have a "-transformed" word added to the name.
Operational requirements for converter assembly. Data converter functions use transform. ini to control their required operations. For each tenant, if the tenant needs a conversion action, the transform. The data converter component may conform to the recording and overwriting framework of existing architectures. The data converter component processes all data available at the time of invocation and exits with a status code of 0 when no more data exists. When an error is encountered that requires human intervention, the data converter component can report the error and terminate with a non-zero exit status code. The output of the data converter may tag "-transformed" into the filename of the file it converts. For example, if the input file being converted (which may be sorted or deduplicated) is "a.log," the output of the data converter becomes "a-transformed.log. If the output file already exists, the data converter reports an error and ends by defaultAnd (4) stopping. In this case, the data converter output options include OVERWRITE, SKIP, or ERROR. The data converter may be daisy-chained so that multiple operations may occur. This can be specified with the transform. The data converter component can clear all temporary files created upon termination. The data converter component may have a debug/detail mode, which may output additional information for debugging purposes. Transform. ini can remain backward compatible if possible; if not, it may be controlled by a version number.
Requirement of converter chaining. Each converter component may be combined with other components to perform a series of data conversions. If the component cannot be part of the chain due to technical limitations, the converter may display an error output message if an unsupported component is selected to join the chain. If the file partially passes through the transition chain of actions during the period of time when the system is unavailable, then execution will continue with the incomplete portion of the chain component when the system is restarted. The processing for the partially completed file may occur alphabetically with the new file. The system is able to process new files in cooperation with partially converted data. For each intermediate file after each stage of the chain component, the file name or extension may be distinguished to distinguish it from the original file and the next file.
Converter parameters and version requirements. Due to the frequency and complexity of the parameters at which the converter assembly operates, the converter assembly can adapt to new requirements. For example, changes are made to the transform. ini file to make it work with the new converter functionality. To provide seamless support, a converter component is defined to include the following requirements.
Any changes made to the converter components remain as backward compatible as possible; this means that the existing configuration file continues to function because it has not changed any behavior. However, possible WARN messages indicating that certain options are expired may be updated to use the new parameters. When backward compatibility is not possible, the translator component can output an error when an incompatibility between the code and the configuration file is detected. This will help in detecting problems and bring about a fast solution operation.
With the introduction of new incompatible translator core code related to existing profiles, manual notification of changes to parties is utilized to cause changes to all affected existing profiles. Any new parameters introduced to an existing component have default values, so all existing profiles can continue to operate with default behavior. If a customer requires different behavior, the profile for that particular customer will have to be modified.
Converter-operation. To facilitate converter operation, a command is initiated to invoke transformer. In the linked mode, after each stage, the converter generates an intermediate result file in the input directory that is processed by the next stage. For a component-like configuration, a sorter, deduplizer, the sorted file will be deduplicated at the final stage to produce a-transformed.
Converter-known and planned data converter assembly. A known and planned data converter component is employed and includes a sequencer that sorts in a file in chronological order. The de-duplicator removes a copy of the data from the next file accessed. The combiner combines a plurality of files into one file. The regulator removes data errors and impurities. The divider divides one data source into a plurality of data sources. The filter removes selected data from the view to make the output or display cleaner. The mapper can overlay one user ID over another user ID in the display.
Riskfeed component of FraudMAP system
Fig. 30 is a flowchart of the RiskFeed process according to the embodiment.
Fig. 31 is a transaction diagram of the RiskFeed process according to an embodiment.
RiskFeed functional design-overview. RiskFeed is a FraudMap component designed to handle large-scale data sets. In a typical setup, RiskEngine populates the REDB and StaginDB (temporary databases) (as the RepDB today). RiskFeed converts and loads the data from StaginDB into RADB, and RA queries RADB primarily. For its purpose, RiskFeed may convert the data from StaginDB into an optimized representation in RADB for better query performance. RiskFeed can also support restricted queries for RA on real-time information in StaginDB.
Riskfeed is able to do data refreshes in RADB (all close sessions need to be acquired for a short time in RADB). The job of RiskFeed on StaginDB will not slow down the data-stuffing and model computation of the RE. Data stuffing of RiskFeed into TADB does not affect RA queries.
Range of Riskfeed operations. The RiskFeed operation includes elements including REDB, StaginDB, and RADB and may include three different databases. These databases may typically reside on three different machines, a design that limits or avoids resource competition in a large-scale computing environment. RiskFeed may also be employed on the same processing unit as the StaginDB, RE or RA elements.
RiskFeed with configurable scheduling policy options is employed, which avoids slowing RA queries during RADB loading. The RADB loading process may also be paused and resumed manually.
The StagingDB function of the system is initialized using the model template information it obtains, and StagingDB can be run after the model template is loaded into both the REDB and the StagingDB. When this initialization occurs, then RiskFeed can operate without the RA or RE also operating.
RiskFeed-StaginDB design. RiskEngine populates the StaginDB of Riskfeed with 4 types of records. These 4 types include LoginStats history, session history and event history and their data, metadata type defined history, activitystatsshistory, and Modedef (schema statistics history) Definition). They will be moved to RADB for queries supporting RA.
SessionHistory (session history). In the SessionHistory record there is a one-to-one correspondence between loginstatsshistory and SessionHistory, which occurs by matching session identities. RiskFeed would move the matching LoginStatsHistory/Session History pair only when the corresponding session is closed. SessionHistory and its corresponding data include all activities from login to logout and at Riskfeed]Aggregation of all events that occur during the lifetime of a session. The storing occurs using a FIFO queue to store all session identifications that can be selected for movement. The collection thread performs a series of tasks including a queue of sessions that have been newly closed since the last check, and the move thread may be used to move and then dequeue the record pair. A separate cleanup thread may remove the records that are mistakenly overwritten (displayed) in the LoginStatsHistory and SessionHistory tables. The movement of the selected records is not done as a single task but in batch form, which has the following benefits: if RADB becomes busy, movement of the selected record stops without disrupting the completed batch process. The mobile thread is controlled by the protocol with RA such that the mobile thread only loads the batch when the RADB is idle. The gather thread and the cleaner cleaning thread are arranged to wake up and execute periodically, and additional some simple inter-process orchestration occurs between the gather, mover and cleaner. The collector "wakes up" the mover when it has completed a collection round, and the mover "wakes up" the cleaner after each move round.
ActivityStatsEvent history. Activitystatssevent history records may be created and managed (i.e., moved, stored) in a first-in-first-out manner consistent with the LoginStatsSession history function described in this specification.
Modedef. Modedef is managed (i.e., moved and stored to RADB) in different ways depending on the type of Modedef. For example, the shared modedef can be moved to a set based on all newly created sets since the last move. By reflectionThe unshared Modedef is managed by all modedefs created since the last move. These have different values based on the defined field and are assigned normalized modeef identifiers. These normalized different modedefs can be moved into RADB when identified.
As a special case, a volatile modedef (i.e., IPNetBlock) utilizes its record move logic shared with a non-shared modedef. The logic ignores the variable field of modedef because the search and retrieve function is not part of the variable modedef operation. Alternatively, the immutable modedef can be moved only after the act of updating the record that was previously moved. To manage a large base modedef, such as a cookie, the large base modedef is treated as a shared modedef and can be moved without normalization since there is no compression gain from the deduplication, which is also very expensive.
All modedef identifications identified for movement are placed into a queue and managed in a first-in-first-out. The collection thread enqueues the modedef that was newly created since the last check; the mobile thread normalizes modedef, adjusts loginstatsHistory references and applies them to the RADB watch-assist, and loads modedef into RADB; the cleanup thread cleans up modedef after it is loaded into RADB and builds a normalized mapping of modedef (see description below). A table of all the different unshared modedefs (i.e., normalized) throughout the history is maintained for modedef normalization. The mapping of Modedef flags to normalized Modedef flags is maintained for all unshared modedefs. This is used to change the LoginStatsHistory reference to the old Modedef to the newly normalized Modedef identity when building the helper table. The record move is done in the module as in LoginstatsHistory/SessionHistory. The collect thread and the clean up thread are periodically activated. In RADB, modedef may have only defined columns and only different values.
Synchronizing session, event and MetadataModedefs (metadata schema definition), ActivityStatsHistory and LoginStatsHistory records . "LoginstassHistory" as used herein refers to a LoginstassHiThe store and its corresponding SessionHistory record. For loginstatsHistory, the record moves by]Is executed in a round governed by the active cycle of the collection thread. In each round, a collection thread is started and looking for newly created records for additional actions since the last round. To ensure that referential integrity, starting with loginstatsshistory, passes and includes Modedef activity, a snapshot is first taken before each round so that the collection thread has a consistent "newly created record set" for both Modedef and loginstatsshistory session identifications. Since loginstatsHistory arrives in batch form, after modedef, the snapshot is intended to confirm that all loginstatsHistory records refer to modedef that already exists in the StaginDB. The snapshot defines the newly created record. These records become a working set for the current round of operational checks and actions. The move thread performs each round by moving the chunk to RADB. Each chunk is loaded to RADB in a transaction. These chunks form checkpoints in the StaginDB so that interrupt processing can be performed between chunks. In each round, the move thread performs the tasks in sequence, where first, one block of modedefs is moved, then one block of activitystatsHistory is moved, and then one block of LogiinStatsHistory is moved. Using this method, the reference file is moved after the determination (refer) file.
Riskfeed flow control. Ideally, the operational tasks of the RADB load are suspended in the event that RA is busy. The system can be configured to pause the operational task using settings designed to minimize system latency. The configuration includes the following elements:
RiskFeed checks the busy status of RADB whenever the batch is loaded. RiskFeed retries after a specified wait period if busy. The waiting period is governed by exponential backoff logic until it reaches a maximum value. When RADB is not busy or is no longer busy, the batch is loaded and RiskFeed resets its wait period to a minimum.
Riskfeed enters "catch-up" mode if it falls behind by failing to complete the working set (identified by the collection thread cycle). In catch-up mode, WorkFeed requests RA to block future queries and start loading as long as RADB is not busy. When RiskFeed catches up in the [ decimal ] of the collection thread, it cancels its RA blocking request and enters the usual mode.
RiskFeed can also resolve contention for StaginDB between REs and RiskFeed, which is not needed when loads from both StaginDB and RiskFeed are manageable. RADB competition strategies can be applied in this setting, where there are material loading factors.
Communication between RiskFeed and RA is accomplished through the RADB table. These tables include a "RA busy" flag and a RiskFeed request "flag.
Riskfeed-metadata. RiskFeed processes several metadata populated by RiskEngine or ModelLoader. These include model configuration, data element definitions, event type definitions and risk component definitions. The first type of metadata is model template information and is populated when the model loader populates the metadata into the REDB and StaginDB. The second type of metadata is risk engine configuration information, which is populated into the StagingDB when the RE starts and loads the configuration.
Riskfeed-Start and shutdown. RiskFeed can work on StaginDB regardless of whether RE or RA operations. When the RE is operating, information delay is prevented by the simultaneous operation of the RiskFeed. In this setting, the RiskFeed may be controlled by an agent through which the RE or human administrator may operate or turn off the RiskFeed. The address of the agent is a configuration parameter of the RE.
RiskFeed-configuration.RiskFeed has the following configuration: a collection thread loop defining a frequency at which the collection thread identifies work in an active round; batch size configured for each type of record loaded into the RADB; a minimum RADB wait period and a maximum RADB wait period configured for a flow control policy; a cleaning thread cycle that determines the frequency at which files are loaded into the RADB and cleaned; and mo to be normalized Type of def.
Multiple RiskEngine support. The Modedef identifier is globally unique and not just unique in the REDB. Thus, the LoginStatsHistory reference to modedef is unambiguous. RiskEngine is configured to prevent overlap or collision between Modedef identities. RiskFeed is designed under this assumption. There are two possible RiskFeed settings in the case of multiple riskengines: one Riskfeed for each RiskEngine; while one RiskFeed targets all riskengines. In the case where RiskEngine and Riskfeed are present in equal amounts, the information is merged in RADB so that Riskfeed can operate as if there is only one RiskEngine.
In the case where there is more RiskEngine than RiskFeed, the shared modedef is unique in the REDB but may have copies from different RiskEngines. In this case RiskFeed would ignore the copies (because the compression ratio is small) and load them into RADB as in a single RiskEngine setting. For unshared modedef, normalization occurs regardless of the RiskEngine's starting point. RiskFeed will also apply the same logic as in a single RiskEngine setting. However, multiple RiskEngines may cause the modedef identification to be out of time, making FIFO queue implementation inefficient.
The opening problem.To support modedef normalization, the StaginDB can be designed to hold a table of all different modedefs and a mapping table that correlates all modedef identifications with their normalization identifications. Both tables can grow indefinitely. This infinite growth makes normalization non-scalable. Another problem is determining the type of query that the RA can use with the StaginDB. With minimal index support in the context of the need to support rich queries, a scheduling strategy can be devised to move records on an urgent schedule to keep the StaginDB size small. Such an urgent policy may have an additional impact on RA queries.
Riskfeed-supplemental information
The collector is circulated.The cycle includes the followingThe method comprises the following steps: obtaining next round identification, selecting identification>All closed sessions of the last _ moved _ session _ id and enqueue them in round identification, update and save the last _ moved _ session _ id of round t in round _ session _ map table, select and insert all "new" modedefs into the rodb, select and insert all session data of the current round into the worksheet and commit the StagingDB. For the purposes of this risk collector loop, a "new" modedef is one that has not been collected in a previous round, and may be implemented by a collection round tag on each record (i.e., an unpainted record is tagged with a round id prior to collection).
And (5) circulating data movement.The RiskFeed data movement loop includes the following elements:
all worksheets are checked for the minimum round id-r.
Work on round r is entered into the worksheet in the following order: metadata, sessions, risk components, events, Misc helper data.
Which log is considered up to round r, based on the schedule working on each of these tables. Shared Modedef worksheet (for each type): load round r records (for that type) to RADB (no batch processing is required because the size is small); submitting RADB; cleaning the round r record in the worksheet (for that type); for modedef type m, recording progress in round r; the round r of markers records "moved"; submitting the StaginDB; unshared Modedef worksheets (for each type); selecting a round r to be recorded in a temporary table t; and adding a new different value from t to the normalized Modedef table (for that type); select all records in t, combine with the normalization table (for that type) and insert the identification; and normalized _ id into the mapping table (for that type).
Metadata: determining differences in RA metadata relative to RF metadata; and only moves "new" metadata to RADB.
Conversation data: the working data is selected by the following constraints: is searchable; de-duplication and structuring are required; for each set, the data is mapped and loaded into a temporary worksheet on the RADB
Risk component
Event data
Updating solid model statistics
Differentiating known RA entities and moving only new ones to RADB
Data element watch (searchable as defined in the model)
Other assistant watch
Update round processing statistics: updating the reference from ga _ ra _ logins _ history in round r with normalization id using the mapping table; load records normalized by round r (for that type) into RADB; selecting the record normalized by the round r to a temporary table t 1; load the next chunk of the nth record from t1 to RADB (next ═ 1+ the largest chunk id in the schedule); submitting RADB; recording progress aiming at the modedef type m, the round r and the big block c; submitting the StaginDB; repeating the last 2 steps until all are loaded; flag "moved" rounds rmodedef (i.e., those with corresponding ids in t); clearing the temporary tables t and t 1; clearing the round r (and all previous round) records (for that type) in the worksheet; and commit StaginDB
ActStats (behavioral statistics) worksheet: loading the next chunk of the nth round r record into the RADB; submitting RADB; recording progress for round r chunk c; submitting the StaginDB; repeating the last 2 steps until all are loaded; clearing the round r (and all previous round) records in the worksheet; and submits StagingDB.
LoginStats and SessionStats worksheets: select from logins _ history and session _ history round r records (with updated moderef references); loading the next chunk of N round r records to the RADB; load to RADB watch aid using these same records; submitting RADB; recording progress for round r chunk c; submitting the StaginDB; repeating the last 2 steps until all are loaded; clearing the round r (and all previous round) records in the worksheet; the round-session _ map with the reply r moved is updated and the StaginDB is committed.
The RiskFeed cleaner cycles.The sweeper cycle includes the following operational steps: first, the cleaner will wake up according to its configured schedule, or be woken up by a signal from the mover; second, the cleaner then calculates the number of rounds of cleaning, N, and starts at min (round) (min (round)); third, at the end of each cleaning, the cleaner will attempt to clean rounds from the table used to calculate which round to clean; fourth, if the table is being actively used by the mover, the cleaner skips this step; fifth, the ActStats, Loginstats _ History, and Session _ History tables (all moved but uncleaned rounds are selected from round _ Session _ map) table, and the cleaning is repeated for each round r in the following order: cleaning has Session id <An ActStats table of the maximum session identity for round r; the cleaning has id<Logins _ history of the maximum session identification of round r; cleaning with session id<Session _ history of the maximum session identification of round r; updating round _ session _ map of the round r to be cleaned; and commit StaginDB); and sixth, the modedef table (for each modedef type m, shared modedef indicates the cleaning of records marked "moved", and for each modedef type m, non-shared modedef indicates the cleaning of records marked "moved")
Riskfeed alternative embodiment
Threading. Each thread manages its own state and has removed three (3) additional threads that monitor and block the worker thread. The model of the alternative embodiment uses limited latency on the thread-safe atomic object. There are no longer any situations in which a thread may be in an infinite wait state. The run loop has a general catch _ all and enables the worker to handle everythingWhich is abnormal and continues or panic (shut down). The new exception handling does not use exceptions for branching or conditional processing. The new thread model may be used by: RFmover, RFcollector, RFPurger, OpenSessionsearch, and MySQLanalyzer.
Choreography. The way in which the choreography occurs is as follows: wake up RFMover immediately when the COLLECTOR round is completed; wake up RFPurger immediately when move round is completed, wake up (if enabled) MySQLAnalyzer after move round is completed; and all data about the activity has been migrated from the RFController and relocated to the worker class. In the foregoing, the OSS manages its own role.
A database.The underlying database handlers have been rewritten for RFMover, RFController, and RFcollector. For these classes, the new database logic enforces the use of the JDBC connection options required. PURGER does not use the new db logic, but rather uses (at least) the rewriteBatchedStatements. In addition to the query output, the approximate data throughput (in bytes) of the load is also included in the sql debug log file. Embodiments include an ANALYZE proxy configurable via config. The default behavior is run after each of the first 5 rounds is completed and then exits.
Failure recovery. The heartbeat (heartbeat) operation is as follows: the RFController uses the "ts" column in the "ga _ rf _ instance" table for heartbeat updates. The heartbeat interval is hard coded to 1 minute, but is not limited thereto. The timeout after the crash is hard coded to 2 minutes, but is not limited thereto. Riskfeed should be able to restart from a failure at any point in its execution. The collector does not leave db in an inconsistent state and can stop at any time. The mover should detect that the step is not completed and enter RECOVERY _ MODE. If the RF detection fails (started but not completed). For the remainder of the step, it will go to RECOVERY _ MODE. In RECOVERY _ MODE, no error will occur Is thrown out. All previous steps in the round are skipped. The RECOVERY _ MODE is turned off when this step is completed. All subsequent steps would return to "insert" and would throw the PK exception if one were encountered. After a crash, the following steps are taken: no manual intervention is required after a crash; alternatively, the start of another RF instance is implemented. If no other instance issues a heartbeat, the next RF instance should "re-declare" the crashed instance after 2 minutes. If the ga _ RF _ instance table is empty, Riskfeed should still probe another running RF process. With respect to transient database problems, in some cases, the RF may look nothing, where the RF is actually blocked waiting to lock. When started, the RF will always check for active processes in the same architecture. If an active process is found, it will exit with an error. To retry, the RF would back off, then wait and retry (up to 3 times) if: transaction lockout timeout; db processing stops and the connection status is invalid. In the context of panic, RF may attempt to turn off in sequence if: primary key violation and dataqualityexpection (data quality exception).
Close, pause. The underlying handler for shutdown and pause has been completely changed, but it should behave in the same way. Regarding "off" -RF should return as soon as its current step is completed. Regarding "pause" -an RF should pause as soon as its current "query" is completed. Regarding "resume" -RF should resume immediately. When suspended, shutdown should force an orderly shutdown. To facilitate exit, CATCHALL has been added to the close.
Control console. Two additional commands have been added to the console: "state" will return the current/last known state of all running services; and "analysis" enables the analysis agent to run once immediately.
Query variation. This occurs when more than 50% of the queries have been modified in one way or another, some for correctness and most for performance. In the event, by in each loopThe increment at the beginning of the contract loads the ENTITY, EVENT, DATAELT and RISKCOMPONENT definitions. In addition, the RF work temporary table no longer uses turns as constraints.
DataQualityChecks (data quality check).The RF now has a data quality check in the critical location that, if failed, will cause the RF to shut down.
OpenSessionSearch (open session search).In addition to collectors, movers, and cleaners, thread processing in RF captures data that open sessions are searchable. The failure recovery includes the following: OSS processing is self-healing and may take up to 5 minutes to restart after a crash, but is not so limited. No other manual intervention is required. The OpenSessionsearch runs independently (outside of the RF processing), if necessary.
Continuous operation. The riskfeed continues to run in the event of RE restart or if a new model is loaded.
Two letter codes.The RiskFeed does not generate these codes, but instead, these codes are loaded by a ModelLoader (model loader).
And (6) logging.The daily rotation now compresses (Gzip) the archive log. The log is divided into (RF and SQL) content.
Configuration of. Parameters have been added to riskfeed. This function is identified as OPEN _ SESS _ REFRESH _ CYCLE. The OpenSessionSearch process takes a new snapshot with a default setting of 60000 milliseconds.
Automated intervention for FraudMAP systems.
Summary of the invention. The flexible architecture provides automated intervention for evaluation of specific events. Such intervention may include (among others): deactivating an online banking user with/without account holder involvement; deactivating an online banking end user with account holder participation; releasing financial transactions and event-based transactions The piece handles the activity log.
The architecture requirements.The automated intervention solution is separate from any existing frandmap component and stands in its own perspective and can be forward compatible with respect to Doral. The interface writes to the new system in a fast manner with as little impact on the development as possible. This architecture enables the fraud analyst to see (from the FraudMAP) what intervention has been attempted and its current state, and is written in the following way: other actions may also be driven after initiation of the intervention or in response to the intervention (e.g., automatically emailing a fraud analyst). The configuration is flexible and simple and provides the system with recoverability after a system failure.
FraudMAPConnect (rogue map connection).The FraudMAPConnect service is responsible for the following: managing (and persisting) the status of messages sent to and received from the third-party system; exposing messages received by the third party to other components in the FraudMAP; and accepts interactions from the frandmap to initiate or react to a conversation with a third party system.
A persistent communication structure. The architecture implements a means to record conversations with third party systems. These dialogs are recorded as shown in the following table, structured as follows, and located in RFdb:
ga_fmc_conversation_log
ga_fmc_message_log
ga_fmc_message_parameters
a complete session may result in the following: a single line in ga _ fmc _ conversion _ log; a plurality of rows in ga _ fmc _ message _ log; multiple sets of rows in ga _ fmc _ message _ parameters; and a plurality of rows in ga _ fmc _ message _ status. The following interactions with the third party (the timestamp field has been omitted, but forms part of the data) may prove these:
received wire transfer notification
ID Type of conversation Third party Third party references
12000019 Electric wire REDACTED ABC123XYZ
ID Dialog ID Direction of rotation Message type Application referencing Status of state
15000019 12000019 Input device Wire transfer alarm RCVD
ID Message ID Keyword Value of
12000019 15000019 MID ABC123XYZ
12000019 15000019 Account ID JOHNDOE3
13000019 15000019 Introduction to wire transfer 13489729139
14000019 15000019 Status code HELD
Wire-release request by FraudMAP:
ID message ID Keyword Value of
15000019 16000019 MID ABC123XYZ
16000019 16000019 Status code Release (Release)
17000019 16000019 Reason code Low risk (Low risk)
And sending a Wire-release request:
ID dialog ID Direction of rotation Message type Application referencing Status of state
16000019 12000019 Output of Wire transfer response 212000019-547-000019 Sending
The interaction can also be refined as follows: based on the high-risk event, correspondence with the end user is enabled to determine whether the transaction is acceptable or should be blocked. In this instance, the communication, which may be mediated by the third party, will take an action based on the response. In this case, the following communication will be initiated:
FraudMAP detects risk events and chooses the corresponding end user:
ID type of conversation Third party Third party references
22000019 End user authentication CLAIRMAIL Air conditioner
ID Dialog ID Direction of rotation Message type Application referencing Status of state
25000019 22000019 Output of Authentication request 212000019-547-000019 Queuing up
[0812]
ID Message ID Keyword Value of
22000019 25000019 Account ID JANEYRE2
23000019 25000019 Text Protection … … for client
24000019 25000019 E-mail address janeayrebooboo.com
Corresponding transmission
ID Dialog ID Direction of rotation Message type Application referencing Status of state
25000019 22000019 Output of Authentication request 212000019-547-000019 Sending
Acknowledgement of receipt
ID Type of conversation Third party Third party references
22000019 End user authentication REDACTED XYZ123CBA
ID Dialog ID Direction of rotation Message type Application referencing Status of state
26000019 22000019 Input device Verification of RCVD RCVD
Confirmation of acceptance
ID Dialog ID Direction of rotation Message type Application referencing Status of state
26000019 22000019 Input device Verification of RCVD Receiving
End user response
ID Dialog ID Direction of rotation Message type Application referencing Status of state
27000019 22000019 Input device Verification success RCVD
ID Message ID Keyword Value of
25000019 27000019 Account ID JANEYRE2
26000019 27000019 E-mail address janeayrebooboo.com
Accepted end user responses
ID Dialog ID Direction of rotation Message type Application referencing Status of state
27000019 22000019 Input device Verification success Receiving
The interface is outbound.Interfaces have been developed to receive, initiate and respond to conversations with third party systems. These interfaces may have different transport mechanisms, but at a minimum, it is desirable to send messages over HTTP. The FraudMAPConnect would provide an interface to third party systems and would try as many fetches as possible. In reality, a custom class may be written and adhere to agreements agreed with third parties. These custom classes communicate using the common dialog logic described above. Each developed interface satisfies specified parameters to reuse connection classes and develop new services like the offered conventions with maximum capacity to provide professional service capabilities.
The interface is inbound.The method of interfacing with other frandmap components in order to probe for new inbound messages and initiate outbound messages is as follows. Since it has been determined that these dialogs are maintained in database tables, client APIs have been developed that interact with these tables. Coordination of multiple instances of components that require the same action has been addressed. An example of this problem is shown in the context of multiple emails being sent to the customer due to having two riskep instances (active and failover) running. Therefore, the dialog is tightly controlled so that the anomaly does not occur in the FraudMAPConnect system.
And (5) carrying out conversation.Most of the dialog will be conducted at least first by the riskepp. This is because RiskEngine currently does not have the ability to detect conditions with which it is within specification. In this case, the RE may be utilized using the monitored search. This is demonstrated in the wire transfer scenario, where: the monitored search is configured to capture all types of wire transfers, EventEvaluator detects the wire transfers and creates a triggered alert; alarms triggered by RA pick-up; then, the RA determines that the triggered alarm requires further analysis leading to possible automated intervention; and the session identified by the triggered alert is passed to some logic that determines whether automatic intervention is required. If intervention is specified, RA is compared with The FraudMAPConnect talks properly.
Procedure for FMConnect analysis.The new element was introduced into the RiskApp model as follows:
examples describe possible definitions for methods of probing secure wire transfers and commanding how to communicate results to third parties. The purpose is to capture the required parameters to: first, define the type of conversation and with whom ("< monitored searchwithfmconnect >"); second, what type of event is defined to trigger further analysis ("< searchcriterion >"); third, criteria are defined that will exactly match events for inclusion or exclusion ("< expletematchriteria >" and "< includemachcritia >"); and fourth, defining data to be recorded and communicated to the involved third parties ("< connected parameters >").
Communicating with FraudMAPConnect. Communication occurs through the API that effectively updates the FMC table as described above. The data imported into the FMC table (specifically the ga __ FMC _ message _ parameters table) is selected from<connectParameters>The element specifies and will use Velocity (or other similar open source software) to interpret the variable name and convert it to the actual value. While inserting the data, the appropriate row is added to the ga _ fmc _ message _ status table, with the state set to QUEUED (QUEUED). This triggers FMConnect processing to attempt to send properly formatted messages to third parties. If the message cannot be sent or FMConnect processing fails, the state will remain in the QUEUED state and an attempt is made when FMConnect is supported. Both active and fail-over RiskApp can run simultaneously using the current failover policy. Both execute the same background process and are not aware of each other. Furthermore, the way they detect and react to triggered alarms may be synchronized only in relatively open windows. Two are anticipated The person may attempt to reply to the conversation at the same time. To prevent the transmission of duplicate commands, Sequence _ ID may be inserted into the ga _ fmc _ message _ status table. This Sequence _ ID is formed using aspects of the triggered alert and if the FMConnect process detects more than one similar message with the same Sequence _ ID, the FMConnect process ignores all but the first.
Communicate with a third party's frandmapconnect.Third parties cannot be expected to adhere to the specified protocols. Adaptations to third party rules are contemplated and take into account dialogue and extracted primitives. Each implementation differs in the following way:
and (4) low-layer transmission.Most third parties support HTTP in addition to other modes of communication.
Application protocol (web API).Some protocols may be very basic and very proprietary. Other protocols may involve the use of technologies such as SOAP or architectures such as RESTful.
And (4) data format.Some third parties may wish to transfer data in XML or JSON format. Some third parties may want to use their own proprietary formats.
A dialog rule.The rules of the dialog with the third party may be very different. Some third parties may only need a response to the conversation they initiate; other third parties may desire to communicate with them as needed; some third parties may send an acknowledgement and wish to be returned; others will not respond.
And (4) configuring.For each connection, variable parameters such as host name, port, URL are specified and may vary from one third party to another.
In view of this significant variability, consider the implementation of a basic interface that can have a minimum of two methods (transmit and receive). Custom implementations of this interface are contemplated for each new third party, where inheritance and other standard practices are used to centralize common code and behaviors.
System integrity and recovery.In the case where the message is sent out by the frandmapconnect, the message would be queued first and then sent. If the system just aborts between queuing and sending, then on startup, a resend of all messages in the queued state is initiated. It will be the responsibility of the client (internal) API to keep all the information needed to send a message in the database, and when a message is queued and then started, any queued but unsent messages will be resent. In the abnormal case, duplicate messages may be sent out, with the benefit of preventing failure to transmit requests to third parties.
FraudMAPConnect processing.The FraudMAPConnect may perform the following functions: monitoring an input message from a third party; monitoring messages from the FraudMAP component; and sending the message to a third party.
The FraudMAPConnect will communicate via HTTP/HTTPs or via JMS and SMTP or other more specialized protocols. In any event, running as a web service under an application server, such as JBoss, enables most communication protocols to be satisfied. The dialog protocol at both the lower and upper layers of the communication stack is specified by type. The architecture is designed in such a way that: code that adheres to common protocols is reused and the system is configured to enable rapid specification of communication parameters.
There is a pattern as communication in the inner database. This has the following advantages: permanently built-in, easily included in the transaction and with little impact on configuring the interaction between the frandmap components. The main disadvantage is that the message will be absorbed by any component that is capable of doing so. Thus, the product system is tightly controlled to prevent rogue processes from being initiated that would erroneously drain a message out of the database queue and send it to an undesirable location.
FIG. 32 is a block diagram of a JBoss application server and ModelMagic technology infrastructure, according to an embodiment.
New application. Will write the followingNew JBoss application, FraudMAPConnect: first, the configuration interface of the current ModelMagic architecture (the class that reads from the ModelMagic file at startup and configures the appropriate data classes to drive the services in the frandmaconnect) will be written; second, a database services layer (to include APIs to initiate, read, and update dialogs) that is maintained to and read from the database will be written out; third, a session management layer will be written that interacts with the database service layer according to specified business logic that will be configured via ModelMagic; fourth, a tool will be developed that can be used by callers to facilitate fine-matching of events (initially for use by the RiskApp but written in a way that it can be called by other components; to be configured by ModelMagic); fourth, a generic interface for sending and receiving messages between the third party and the FraudMAPConnect is written; fifth, the above HTTP communication implementation would be developed (would also be configured via the appropriate elements in the model magicfmconnect model file); sixth, an implementation of a generic interface to communicate with the DI console will be developed; and seventh, a pseudo-implementation using the HTTP communication method, FraudMAPConnect, would be developed to serve as a test third party endpoint.
Enhancement of RiskApp.Serving the implementation of another monitoring type (in addition to general monitoring and RBA) of the dialogue managed by the frandmapconnect.
ModelMagic enhancements. Another model type, FMConnect, is included for defining the modelmac configuration and presenting the modelmac configuration to the frandmapconnect.
Image (Reflex)
User scenario (store) -major component.User scenarios are aspects of the software development tool implemented for the frandmapconnect. Successful implementation of a product involves several components, including: an engine that determines how the image will look for the trigger event and respond; log records that determine how the image will log all activities into a log file for subsequent viewing and retrieval; determining how the image will alert an analyst of the financial institution of notifications regarding their activities; determine where to display the mapping activity in the frandmagui,and how the user can search for and locate them as desired; and a configuration that determines how a person would build an image standard on behalf of the FI.
Because each of these components is a separate item, tracking is performed using a "user context" based approach. The user scenario is considered "successful" and "complete" by developing to meet a series of acceptance criteria. These criteria are defined at the end-user level, enabling groups of professionals (engineering, DBA, UI, etc.) to define the best possible technical approach to meet them.
Many current frand wifi users perform multiple manual activities on low-risk items daily. These activities include: automatically releasing delayed wire transfers and ACH transactions and canceling or suspending user accounts performing daily activities, etc. Low risk activities (e.g., releasing an automatically delayed wire transfer or ACH transaction) take many hours of work time and can be automated. High-risk activities should trigger a quick response (e.g., account suspension) without manual intervention. Thus, the system using third party messages on behalf of the customer automatically performs certain activities defined by the financial institution.
Acceptance criteria. The initial criteria is the criteria for creating an infrastructure or system to send and receive external messages to and from third party vendors (e.g., DI consoles) that can invoke changes to the external system. One example is to stop (pause) home bank customers' access to the online system without account holder or FI participation. Another example is the automatic release of ACH or wire transfer transactions that are automatically delayed or held based on low scores or low dollar amounts. The initial items defining the underlying activity are shown in the "plan" section above and describe such infrastructure as is suitable for sending messages to the DI console; however, a complete user scenario defines each of the different systems forming the communication network. Since it is envisaged that each provider will use different terms and will enable automatic notification in different ways if they do so Allowing this) it is assumed that each individual provider will require its own different user scenario.
Another standard is to create a notification and reporting system that notifies bank employees and/or account holders of each automated action and activity. One example is to send an email alert to a particular bank email list at any time the account holder's home bank account is suspended. Another example is to create a daily report of all automatically released ACH or wire transfer transactions that meet the "low risk, low dollar" criteria described in the previous paragraph. The purpose of this is to inform the bank that it carries with it an automated event that requires the bank to follow up to some extent. For example, an automatic account suspend may require the bank to contact the account holder at the last full address or phone number to alert them: they need to cancel a potentially compromised bank or credit card account and reactivate their account. Depending on the complexity of each action, a separate user scenario may be specified for each activity. The criteria also applies to changes to the current risk application UI to accommodate mapping activities that apply to the currently displayed information. For example: the image performs an "account suspend" action on behalf of the bank based on defined trigger criteria. This should be displayed as "active" in the activity panel of the session.
Another criterion is to create a new, separate location in the frandmap risk application that shows all image-related activities. While this data may also be used in current systems, such as displaying an automatic release of low-dollar, low-risk transactions when displaying the particular ACH or wire transfer in a risk application, the same data may be displayed in a separate location for FI users who wish to monitor or report image-based activities. A "map" label or similar partitioned area dedicated to mapping activities is expected to be the preferred display.
Another standard is to create a back-end "console" or control panel that enables personnel to build expressions on behalf of the bank. These expressions define criteria and resulting actions performed by the image as a result of such criteria. For example, a "pause user account" mapping action may involve a number of criteria, including "3 or more red alerts in 24 hours" before taking the action, "which may include the following activities: "edited user contact information", "created new wire transfer template" and "scheduled transfers over 5000 dollars in the same 24 hours". The customer may specify inclusion and exclusion criteria, and may cause a particular action to be performed that may result in whether the particular criteria are met. The console may not be displayed to the bank using the system. Alternatively, the console may be a console for internal use and in the form of a custom XML script (TBD). A simple easy-to-use menu-based standard screen is envisioned and made available to the inside personnel, and it can save individual image triggers and actions for subsequent assignment (to be customized) to all the frandmap clients. Alternatively, this may enable development of new and customized trigger/response pairs as needed.
Another criterion is to create a matrix of all the desired mapping actions based on the expected use case and cross referencing of these activities against each home bank provider's system. It is contemplated that each provider accomplishes various tasks in different ways (e.g., performing ACH or wire transfer release), and that some providers may not allow the actions involved from a remote source. This may be mapped and stored at a central location for later reference. It is expected that the more desirable image action will be prioritized (i.e., "what action should be taken.
Another standard is to collect and define technical challenges that can hinder or block image actions (engineering impacts) and documentation of these challenges and their solutions on the provider's wiki page in the newly created "image" section (product management impacts).
Another criterion is to test all image activity with a third party before going online. A document may be generated for each provider that describes how each mapping campaign may look to the provider. A license (clearance) from each provider may be obtained showing that the provider (or its infrastructure) is ready to accept notifications and perform tasks. If a bank needs log files or other specific notifications that are not captured in the FraudMAP risk application, the standard provides these details to the bank before the user accepts the test and the product comes online.
Another criterion is documentation of all changes to the system due to the implementation of the image, and training or retraining of the requesting customer (technpubs effect). This would include updates to internal wiki, training materials, and all associated support documentation (product management impacts), as well as preparation and distribution of newsletters (marketing and possibly product management impacts), product flags (marketing impacts), and sales and marketing notifications and support documentation (product management impacts). Depending on the complexity of each individual task, this may require a separate user scenario for each component.
FraudMAP system database design
A definition table is a relatively small table that contains data element values and identifications. The definition tables include an EMUN table, a MODEDEF table, and a USEMODULAL table.
A watch-assisted or HLP table is used for the initial search. They are typically combined with no more than one definition table to produce an identified working set and stored in a temporary table. Note that some searches are made directly on the HLP table and do not use the definition table. The helpwatch is "indexed organized" meaning that the table contents are stored entirely in an index structure organized around its "organized key".
The supplementary table is the following special table: these tables are searched in some queries and do not contain all SESSION IDs, but need to have the same SESSION related fields that all HLP tables would have, so the working set from them can be extracted into the TMP table. An example of the auxiliary table is a SESSION _ STATUS table.
FACT tables are used for FACT storage. The FACS tables are not searched. The search representation in this context is to extract rows from the table using any criteria other than identifying a lookup. The only access allowed by the FACT table is the ID lookup.
All queries have three steps:and (5) searching.This step searches for an identification about "watch-helpers" to collect into the TMP table that may qualify as part of the "final answer".And (5) combining.This step uses SQLINTERSECT, UNION and MINUS to compute the final set of identifications that qualify based on the logic of the search. This utilizes LIMIT<N>To generate the "highest risk".And (5) a fact step.This step uses a set of identifications from the combining step to extract the FACT for display. Note that for some types of queries, particularly REPORT (REPORT), a "factual step" may not exist.
The architecture can minimize the number of pages accessed in four different ways: by a very narrow search table, indexorgainized if possible; absence of intermediate joins by completion in a "true" (i.e., non-temporary) table; keeping the number of full index traversals at a fixed minimum by exploiting the hard search limit in the fact step; and using an "iterative" search strategy of the search that results in a large working set to stop the search once the number of rows to be displayed is reached.
And traversing the index.Minimizing full index traversal is a central goal because full index traversal is particularly expensive. This is done by avoiding intermediate binding for qualification based on the open end ID. The use of open end ID join for intermediate qualifications is why the initial approach of using a "thorough" join policy through a "dimension ID table" fails at about 100M sessions, and why 2.5 and earlier architectures cannot exceed about 25M sessions.
For this architecture, the number of index traversals is small and bounded:
Max_Index_Traversals=N_Definition_Items+Fact_Limit
wherein N _ Definition _ Item is SUM (< N-defined value satisfying each search parameter >)
If wildcards are not used, the number of defined values is only the number of search boxes clicked with a valid entry in the RA search screen. If wildcards are used in the search box, their contribution to the number is the number of search values that satisfy the wildcards. The Fact _ Limit is a "LIMITN" constant, which is a configurable global constant, typically 500.
Examples and comparisons with other methods-example queries.The following queries are given: "show me all sessions from paris, texas with the Comcast corporation as PROVIDER and Opera as a web browser". The following is the high level of what looks like in the above framework.
The helper table initiates a search round.In the helper-list round, the "candidate space (universe)" of possible sessions that match the search criteria can be collected by breaking the search into its components:
INSERTINTOTemp1SELECTHLP.SESSION_ID,HLP.RISK_SESSION
FROMCITY_MODEDEFM,CITY_HLPHLP
WHEREM.MODEDEF_ID=HLP.MODEDEF_ID
ANDUPPER(M.CITY_NAME)=′PARIS′;
INSERTINTOTemp2SELECTHLP.SESSION_ID,HLP.RISK_SESSION
FROMSTATE_MODEDEFM,STATE_HLPHLP
WHEREM.MODEDEF_ID=HLP.MODEDEF_ID
ANDUPPER(M.STATE_NAME)=′TEXAS′;
INSERTINTOTemp3SELECTHLP.SESSION_ID,HLP.RISK_SESSION
FROMPROVIDER_MODEDEFM,PROVIDER_HLPHLP
WHEREM.MODEDEF_ID=HLP.MODEDEF_ID
ANDUPPER(M.PROVIDER_NAME)=′COMCAST′;
INSERTINTOTemp4SELECTHLP.SESSION_ID,HLP.RISK_SESSION
FROMBROWSER_MODEDEFM,BROWSER_HLPHLP
WHEREM.MODEDEF_ID=HLP.MODEDEF_ID
ANDUPPER(M.BROWSER_NAME)=′OPERA′;
the search has traversed four B-tree indexes to gather a "space" of session candidates for the query.
The filter is run.In filtering, logic is used to export the rows to be processed. SQLSET operators INTERSECTIONs, UNIONALL, AND MINUS are employed to logically execute AND, OR, AND NOT filter predicates. FraudMatch3.0 is used to search primarily for AND search predicates, so all searches use INTERSECT, however the search can easily implement OR AND NOT searches. The LIMIT feature of the database engine is used to LIMIT the rows to "most risky" using ORDEREDBY (ordered by …) on the RISK _ SESSION field for the row to be displayed. For REPORT or ACCOUNT, a traversal of the solution set is made. In all types of searches, using the example above, the internal query is used as follows:
(SELECTSESSION_ID,RISK_SESSIONFROMTEMP1INTERSECT
SELECTSESSION_ID,RISK_SESSIONFROMTEMP2INTERSECT
SELECTSESSION_ID,RISK_SESSIONFROMTEMP3INTERSECT
SELECTSESSION_ID,RISK_SESSIONFROMTEMP4)
note that the REPORT query adds the DAY _ SESSION field. For the FraudMatch and alarm queries, the SESSION for checking is limited to "most risky" as determined by the RISK _ SESSION. This is done using ORDERBYRRISK _ SESSION with LINITN predicate. The REPORT and ACCOUNT searches traverse the search space completely.
FACT round.In the FACT round, FACTs to be displayed are identified and presented. The FACT table is not searched because it is very expensive to search the FACT table. The FACT query looks as follows:
SELECT<display_cols>
FROMGA_AR_SESSION_FACTSF,<Modcdcftablcs>,
(SELECTSESSION_ID,RISK_SESSIONFROMTEMPITNTERSECT
SELECTSESSION_ID,RISK_SESSIONFROMTEMP2INTERSECT
SELECTSESSION_ID,RISK_SESSIONFROMTEMP3INTERSECT
SELECTSESSION_ID,RISK_SESSIONFROMTEMP4
ORDERBYRISK_SESSION
LIMIT500)T
WHEREF.SESSION_ID=Υ.SESSION_ID
ANDF.<MODEDEFS>=<MODEDEF_TABS>.MODEDEF_ID;
note that the ACCOUNT search uses usermode and externalueinfo tables for its FACT table, but the concept is similar.
And (5) challenging.The worst case search is a search involving many qualifiers with multiple matches, i.e., "COUNTRYs (united states)" in most databases. This will invoke many databases. In practice, these types of searches are rare, and the search is not doneOften quickly. One of the best features of this method is that the search time is bounded.
An alternative approach is provided.Note that the above method includes multiple steps that appear "too numerous". This illustrates the limitation of using the interrupt logic rather than using the combination of directly answering the queries. Several methods have been used to try this method, and performance is often poor, especially for "cold" searches. The best "direct" cases are the following: first, the "FACT" table is searched directly and each searchable column is indexed; second, culling produces an initial search of the minimum number of rows and uses it as an "inbound" or "anchor" search; third, the DBMS initiates a query for the search by using the optimizer hints; and fourth, use the "outbound" combination from the FACT table to the MODEDEF table to answer other search criteria.
The search discussed above is defined as follows:
SELECT/*+USE_lNDEX(CITY_INDEX)*/
< display column >
FROMGA_RA_SESSION_FACTSF,
CITY_MODEDEFMC,
STATE_MODEDEFMS,
PROVIDER_MODEDEFMP,
BROWSER_MODEDEFMB,
< other tables with display information >
WHEREF.CITY_MODEDEF_ID=MC.MODEDEF_IDAND
UPPER(MC.CITY_NAME)=′PARIS′
ANDF.STATE_MODEDEF_ID=MS.MODEDEF_IDAND
UPPER(MC,STATE_NAME)=′TEXAS′
ANDF.PROVIDER_MODEDEF_ID=MP.MODEDEF_ID
ANDUPPER(MC.PROVIDER_NAME)=′COMCAST′
ANDF.BROWSER_MODEDEF_ID=MB.MODEDEF_ID
ANDUPPER(MC.BROWSER_NAME)=′OPERA′
AND < other qualification of display columns to be extracted >
ORDERBYF.RISK_SESSION
LIMIT500;
In this example, two HISTORY tables have columns in the FACT table that are joined together to extract the display information, so there is significant additional logic in the query.
Challenge(s): by directly involving the FACT table in the initial search, a large number of database pages containing non-qualified rows are accessed. Even the most rigorous initial search does not efficiently limit the search space. Furthermore, many searches involve two low selectivity searches that limit the effectiveness of the initial search limit. The "outbound" qualification in the query includes traversing a large number of B-trees using ID joins on the MODEDEF tables. Even though the modeeffb trees would all remain in the DBMS buffer pool, a search in a large database would include tens or hundreds of thousands of full B-tree traversals in a single query directed to only this step. Because most database engines do not support the use of multiple indices on the same table, it is not possible to "vector in" (vectorer) from multiple "sides" of a table in some cases using the above-described approximate "star transform" approach.
And (6) concluding. The "alternative method" is effective in small databases, but underperforms in databases with more than 1000 ten thousand sessions, and becomes unavailable above about 3000 ten thousand sessions. The approach discussed in the example works well, with most searches completed in 15 seconds, even for a database with 36000 ten thousand sessions. Because the method externalizes "star transforms" in a manner that works in a plurality of DBMS engines, including those that do not themselves support one, the method therefore does not support oneIt performed well. Note also that the new architecture is more flexible than "star conversion" in that it allows the use of OR AND NOT logic in the search, whereas most star conversions require AND logic.
Database specific problems.The new architecture depends on three non-standard features, but these are supported by most of the major databases likely to be encountered:
indexorgainized table (Oracle) tableMySQL: InoDB storage engine tables have been "index-organized";
DB 2: the "index only" table.
SQL server: "index of clusters" (note that these are different from the Oracle cluster index and more like InoDB storage).
POSTGRES: the Enterprise DB company "PostgressPlusAdvancedServer" requires support for index-organized tables. Open source versions do not appear to support them.
*ORDERBY<something>LIMIT<N>syntax(MySQL).
Oracle:SELECT(...ORDERBY<...>)WHEREROWNUM<=<N>;
DB2:SELECT…ORDERBY<...>FETCHFIRST<N>ROWSONLY;
SQLServer (2005 and following):
SELECT...,ROW_NUMBER()OVER(ORDERBY<...>)ASROWCT
WHERE(<whereclause>)ANDROWCT<=<N>;
POSTGRES:SELECT...LIMIT<N>;
SQLINTERSECT,UNIONALL,MINUS
INTERSECT, UNIONALL and MINUS are both "standard" SQL but are not supported by all databases, especially MySQL. UNIONALL is the only one that is not easily replaced with a binding; fortunately, it is supported by MySQL.
The interrupt may be replaced as follows:
SELECTA1INTERSECTSELECTA2INTERSECTSELECTA3...INTERSECT
SELECTAN
with
SELECT<cols>
FROMA1,A2,A3,...,AN
WHEREA1.cols=A2cclsandA2.cols=A3.colsand...andA<N-1>.cols=AN.cols
MINUS may be replaced as follows:
SELECTA1MINUSSELECTA2
with
SELECTA1whereA1.<cols>NOTIN(SELECTA2).
FraudMAP system algorithm
Summary of algorithms employed by trial accounts ACH: the following special variables are assumed to be available and used in various algorithms:
combinations of ORIGINATORs (immediteioridinids, componyids, compoynamates)
ORIGINATOR_QUALIFIED:(IMMEDIATEORIGINID.
Componyid, compoynnamie, compoynyerydecription) combinations
RECIPIENT_RN_ACCT:(RECIPIENTROUTINGNUMBER,
Recipientaccountumber) of a plurality of combinations
SUBMISSION _ DATE time in milliseconds at midnight of the filing DATE since the epoch, except when the epoch is in the time zone of the client
Submit _ TIME _ OF _ DAY: the time of batch submission by the customer (in milliseconds since midnight)
Submit _ DAY _ OF _ WEEK: the day of the week of the batch is submitted by the client in the client's time zone (1-7).
SUBMISSION _ WEEK _ OF _ MONTH: the weeks of the month in which the batch was submitted by the customer in the customer's time zone (1-5).
EFFECTIVEDIFFUBMISSIONDATE: date difference between submission date and expiration date
Total credit: total number of total CREDIT transactions in a batch
TOTALDEBITS: total number of all DEBIT (DEBIT) transactions in a batch
AVERAGE _ CREDIT _ AMOUNT: average volume across all CREDIT transactions in a batch
AVERAGE _ default _ AMOUNT: average volume across all DEBIT transactions in a batch
Risk assessment of transaction layer characteristics.For all transactions:
^,#likelihood of a new C for B if B has been previously used with a different C assuming the same A
^If C is a known mule (mule) at the time of processing, an alert is generated.
Call normalized aggregation of 1 to 2 or more transactions _ Risk _ Default.
For all E1 and E2 (not E3) CREDIT/DEBIT transactions:
1,3,*,#the likelihood of a time difference between the current transaction and the previous transaction for pair B assuming the same a.
Calling more than 3 normalized aggregations of Transaction _ Risk _1
Further, for all E1 (not E2 or E3) transactions:
1,2,*,#likelihood of the current transaction amount for the same (a and D) past E1 transactions assumed to be in the same direction.
This evaluates the likelihood of a making a false quantitative transaction.
1,2,#The likelihood of the current transaction amount to C for the same (C and a and D) past E1 transactions assumed to be in the same direction.
This evaluates the likelihood of a making a false amount of transactions against hypothesis C.
Invoke 4 to more than 5 normalized combinations of Transaction _ Risk _25
Evaluation of these batch layer characteristics: for all batches:
1,4,*,#the likelihood of the batch being submitted at the current F1 is assumed for the same (A and D) past batch submissions.
1,3,*,#Assume the past values, the assigned likelihood of G, for the same a.
1,3,*,#By the likelihood of the time difference between the same (H and D) current batch and the previous batch.
1,*,#The likelihood of a batch process D of past values is assumed for the same a.
Normalized aggregation of 6 to more than 9 Batch _ Risk _1 are invoked.
For all batches containing at least one E1 or E2 transaction:
1,*,#the likelihood of the past batch, batch i (j), is assumed for the same (a and F2 and F3).
For all batches containing at least one E1 transaction:
1,2,*,#for the same(A and D) the likelihood of batch K (L) assuming a past batch.
Call normalized aggregation of 10 to 11 more than Batch _ Risk _ 2.
The maximum value of the normalized combination of [ Transaction _ task _1, Transaction _ task _2, Transaction _ task _3] is used to show the activity Risk: a maximum of 1.5% of all such scored activities in a day was classified as RED; the next 1.5% of all such scored activities in a day were classified as YELLOW (YELLOW); and the next 3% of all such scored activities in a day were classified as LIGHTGREEN (light green).
The normalized combination of [ Transaction _ Risk _1, Transaction _ Risk _2, Transaction _ Risk _3, Batch _ Risk _ l, Batch _ Risk _2] is used to drive alarms: a maximum of 0.75% of all such scored eligible batches in a day were classified as RED; the next 0.75% of all such scored eligible batches in a day were classified as YELLOW (YELLOW); the next 1.5% of all such qualified batches scored on a day were classified as LIGHTGREEN (light green).
Only debits to the batch are excluded from the eligible batch.
In view of the following consideration of the above described risk assessment description:
1: until sufficient history for a particular customer is available, an a priori model configuration set with the properties of the fill layer is used for evaluation. As more history is accumulated, the evaluations from the filling priors and the user history are mixed together.
2: smaller amounts are considered less risky.
3: very small time differences are considered to be a greater risk. Time differences suitable for a daily/weekly/biweekly schedule are considered less risky.
4: commit time is considered to be a higher risk outside of working time.
5: until sufficient history for a particular (C and a and D) is available, an a-specific model is used for evaluation. Only the (C and a and D) specific models are used after enough history is available.
*The model is configured to generate a suitable risk cause for a maximum of 2.5% of eligible transactions/batches.
^: the model is configured to generate appropriate risk causes for all such qualifying transactions/batches.
#: these values are calculated but set to zero for the DEBIT transaction and DEBIT is only batch processed. The risk causes associated with these transactions/batches are still generated via the actual calculated values.
E1, E2, E3 indicate the transaction type.
F1, F2, F3 denote timing parameters.
Mule account matching for FraudMAP system
Mule account matching.The fields being considered include: first, the routing number (defined as RN), using third _ party _ current. ga _3pty _ access _ list. routing _ identifier as its third party database and concat (ACH6_ rdfi _ id, ACH6_ check _ digit) as its ACH data; second, the account number (defined as ACCT), uses third _ party _ current. ga _3pty _ acc _ list. account _ id as its third party database and ACH6_ df1_ account _ number as its ACH data, and third, the account holder NAME (defined as NAME), uses third _ party _ current. ga _3pty _ acc _ list. user _ NAME as its third party database and ACH6_ individual _ NAME as its ACH data.
Regarding the above-described fields, the recipient can be uniquely identified only by the combination (RN, ACCT). A field named (RN, ACCT) may be included to handle ACH transactions. Regarding the RN field: the value may be 9 digits long. In the third party DB, this field is generally always available and always a 9-digit number. In ACH data, this field looks like it is always available and is always a 9-bit number. Regarding the ACCT field: there is no explicit standard. In the third-party DB, this field appears to be available all the time. However, the field may not be clear if the value is sufficient to identify a different account at all. There is a significant variation in the number of digits visible in this field. The ACH standard allows for an "alphanumeric" value in this field in ACH data, and there are very occasional (i.e., 334 out of 280 unique recipients in 3 months of data) account numbers that do not include numbers all but instead include special characters such as spaces and hyphens. There is no guarantee that these will be reported in all such special characters in the mule set. There is no explicit criterion for the NAME field. In the third-party DB, this field is not always filled. When available, some common modes include:
'FirstNameLastName (first name)': Marzia Hasan
'FirstNameMiddleNameLastName (first name, middle name)': mohammadatrazafirdi
'FirstNameMiddlelnitialLastName (the first name abbreviated as the first name in the middle of the name)' Christina G.Balew
It is not always the name of a person: D.S.Young & Accocciates
There is no standard for capitalization or spelling or abbreviation (Steve instead of Steven)
In ACH data, the NAME field is not always filled. When available, some common modes include:
‘FirstNameLastName':WallyEberhardt
‘LastName,FirstName’:Eberbardt,Wallace
‘FirstInitialLastName’:W.EBERHARDT
‘FirstNameMiddleNameLastName’:WallaceMauriceEberhardt
‘FirstNameMiddleInitialLastName’:WallaceM.Eberhardt
risk assessment of transaction to mule: suppose that. If it can be confirmed that a transaction is being requested for a known mule, the transaction can be alerted. There is no need to know the behavior of transactions to mule-these are always risky. For confirming transfers to mule, an attempted match is made to any combination of (RN, ACCT, NAME), with the following description: NAME is an optional field and an exact matching algorithm cannot be defined for non-standard NAME values; and ACCT values do not comply with any standard-the same ACCT may be reported as a different string value.
Options to consider: in some contexts, the following fields are available:
from a third party: RN, ACCT, and NAME
From ACH data: RN, ACH, ACCT, ACH, NAME
Further, the following string method may be available:
TRIM (X): a copy of string X is returned, with all preceding and trailing spaces removed.
Upper per (x): returning a copy of string X, wherein all characters [ a-z ] are equivalently replaced with their capital letters
REPLACE (X, regexp, b): all matches of the regular expression "regexp" in X are replaced with the string "b".
Tokenize (x): a list of all "tokens" contained in the string X is returned, where a token is a non-space character sequence divided by a blank region.
Initialize (x): the first character of string X is returned.
ED — n (X, Y): if the strings X and Y are within an edit distance n of each other, TURE is returned.
xP _ BG (X, Y): if X percent of the bigram in the shorter of (X, Y) is contained in the other, then TURE is returned.
Given the above information, the following matching methods (the above first and third methods are recommended for implementation) may be considered:
first, RNAACCT method: exact string match to (RN) and approximate match to (ACCT):
X.RN=TRIM(TPD.RN)
X.ACCT=REPLACE(TRIM(UPPER(TPD.ACCT)),[^0-9A-Z],””)
Y.RN=TRIM(ACH.RN)
Y.ACCT=REPLACE(TRIM(UPPER(ACH.ACCT)),[^0-9A-Z],””)
Return‘RN_aACCT’iff(X.RN==Y.RN&&X.ACCT==Y.ACCT)
in the second place, the first place is,RNAMEED 1 method: exact string matching using edit distance pairs (RN) on component tokens and approximate matching (ACCT):
X.RN=TRIM(TPD.RN)
X.NAME_TOKENS=TOKENIZE(REPLACE(UPPER(ΥPD.NAMES),[^0-9A-Z],””))
Y.RN=TRIM(ACH.RN)
Y.NAME_TOKENS=TOKENIZE(REPLACE(UPPER(ACH.NAMES),[^0-9A-Z],””))
The larger value of any one of the symbols (x.name _ TOKENS, y.name _ TOKENS) is longer than 2 characters
LIST of tokens not short _ LIST
Return 'rnameediff':
X.RN==Y.RN
and & for each token pair (a, b) between short _ LIST and long _ LIST, where a & b is LONGER than 2 characters and ED _1(a, b) is true: (a, b) are considered to match
And & for each unmatched token pair (c, d) between the SHORTER _ LIST and the LONGER _ LIST, wherein the length of at least one of (c, d) is 2 characters
And (c, d) are considered to match each other
And no unmatched tokens remain in the SHORTER _ LIST
Third, RNANAME75PBG method: exact string matching of pairs (RN) and approximate matching of pairs (ACCT) using a bigram matching at least 75% of the constituent tokens:
X.RN=TRIM(TPD.RN)
X.NAME_TOKENS=TOKENIZE(REPLACE(UPPER(TPD.NAMES),[^0-9A-Z],””))
Y.RN=TRIM(ACH.RN)
Y.NAME_TOKENS=TOKENIZE(REPLACE(UPPER(ACH.NAMES),[^0-9A-Z],””))
the larger value of any one of the symbols (x.name _ TOKENS, y.name _ TOKENS) is longer than 2 characters
LIST of tokens not short _ LIST
Return 'RNaNAME 75 PBG' iff:
X.RN==Y.RN
and & for each token pair (a, b) between short _ LIST and long _ LIST, where a & b is LONGER than 2 characters and 75P _ BG (a, b) is true: (a, b) are considered to match
And & for each unmatched token pair (c, d) between the SHORTER _ LIST and the LONGER _ LIST, wherein the length of at least one of (c, d) is 2 characters
And (c, d) are considered to match each other
And no unmatched tokens remain in the SHORTER _ LIST
The fixed cost is associated with the value returned by the mule matching method:
if returns a match value RNaACCT: associating a COST HIGH _ LEVEL _ MULE _ COST with the transaction; else
If returns the match value RN _ aNAME _75P _ BG: associating a COST MID _ LEVEL _ MULE _ COST with the transaction; else
Associating a 0 cost with a transaction
More complex matching logic can then be used to extend the above.
Hybrid behavior pattern analysis for FraudMAP systems
And (5) analyzing a mixed behavior pattern.When modeling and analyzing a person's behavior patterns to see if the newly observed behavior is consistent with the behavior observed or understood in the past, the person's behavior can be more efficiently interpreted and predicted if analyzed using considerations of behavior patterns seen from others that have some commonality with the potential person. For example, if a company has offices in two different cities (city a and city B), it is highly likely that employees working in city a are present in city B. In the context of funds transfer, if, for example, Jack and Mike both work for the company (AAN) and it is known that Jack sends money to MattSmith (Acc #12345) on behalf of ANN, if Mike sends money to MattSmith, it should not be unexpected even though he may have not previously sent money to MattSmith. In other words, Mike's transfer pattern may be predicted and interpreted based on Jack's pattern.
In the above example, if the individual's geo-location behavioral patterns or funds-transfer history were analyzed based solely on the individual's history, the first time money was sent to MattSmith by Mike or in city B office would appear to deviate significantly from expected or understood behavior, while if these events were analyzed in conjunction with the behavior of others (colleagues), they would appear to be more predictable behavior, which would correspondingly reduce the number of false alarms.
In hybrid behavioral pattern analysis, the behavioral patterns of an individual are modeled and understood from two perspectives: a predictive model (IPM) of the individual, representing that the behavior pattern is a model based only on historical data of the individual; and a Group Prediction Model (GPM) representing a model in which the behavior pattern is based on historical data of the group (including aggregated data of past performances of individuals and other groups).
Any newly observed behavior patterns are analyzed by both the IPM model and the GPM model. Four possible scenarios can be encountered: first, new behavior patterns are validated by both the IPM model and the GPM model. In this case, the observed behavior is consistent with the user's past behavior pattern and there is no anomaly. Second, the new behavior pattern is not confirmed by IPM or GPM. In this case, the observed behavior cannot be interpreted by any of the models and is considered to be a significant deviation and unexpected behavior. Third, the new behavior pattern is validated by the GPM model and not by the IPM model, in which case the observed behavior is not consistent with the past history of the individual but is consistent with the group's (colleagues of the individual) history. Returning to the city example, this is the following case: when a person is always visible in city a and his colleagues are visible in both city a and city B, then the person is present in city B. Positively, the newly observed behavior is not as unexpected as in case B. Depending on the attributes, the risk associated with deviating behaviors should be reduced based on the fact that new behaviors can be clearly interpreted by the GMP model. For example, in the case of the funds transfer example, it is likely that MattSmith should be considered a secure recipient of Mike because Mike's colleague Jack has sent money to Matt multiple times. Fourth, the new behavior pattern is validated by the IPM model and not validated by the GPM model. This situation will not occur because all the data used to develop the IPM also exists as part of the GPM development.
This hybrid approach to analyzing behavioral patterns does not necessarily apply to all aspects of behavioral patterns. Some behavior pattern attributes (e.g., login failure or password change) are more relevant to an individual's lifestyle, habits and traits than their relevance to other groups. Considering the behavioral patterns of the account group for these attributes does not necessarily improve the predictability of the patterns of individuals.
FraudMAP system monitoring method
A method monitors changes in importance of nodes in an undirected graph, such as a social network, a database of user behaviors in which user behaviors can be divided into a finite set. Changes in the importance of the user are associated with behavioral changes or external stimuli. The score is based on the importance of the user and is updated at given time intervals. The user's score is based on the importance of the user to which it is connected and how similar the user is to other users. The method is particularly useful in improving fraud detection, for example in online banking, where user behaviour is diverse and changes over time, fraudster behaviour developing over time.
FraudMAP System activation
Overspeed II is activated.Overspeed II activation involves a series of actions:
And setting an environment.
Customer information is generated.
The downloading of data is started.
Running an initial data analysis script. And returning to the test account. Run the data analysis script and follow the instructions. Scripts can be cancelled and re-run at any time during this step.
cd~/warpspeedtop/warp_speed/mode]_createperlgetDataForValidation.pl
<Host><HostService><BankPlatform>
Files from "temporary" or "production" are selected for use. Typically, a script is used to indicate a directory in which files are present, which may be placed in production. If temporal is used, then either synchronous or asynchronous (if not determined, synchronous is selected). If it is not certain which directory is used (i.e., if the script lists 0 files in production or if the script lists significantly more files in scratch than in production), please contact the algorithm/PS before proceeding.
Which setting to load is selected. The script compares the headers to determine which files have the same file format. Files with the same format will be in the same setting. Unless otherwise indicated, the last listed set of data is selected. If the last listed set of data is listed for at least 10 days compared to its previous set, the algorithm/PS is contacted before proceeding. The script will load the database, which will take some time.
The user _ id field is always user _ name only for the trial account. Remember "user _ id ═ user _ name" and jump to step f. The script prompts the identification of the database user id column. Unless otherwise indicated, the script suggested values are used. Remember the user _ id.
Only for all trial accounts: ignoring all of the following text "update SVN … to look for subsequent messages in the output. Appropriate scrolling up; these messages are generated before updates occur to the version control system (subversion). The message would indicate whether the feedback stops or can continue for the algorithm as follows:
ACTION: FILE _ GAP _ STOP _ IN _ minses is written down. The record number is for reference.
ACTION: write the next MAX _ GAP _ IN _ MINUTES. The record number is for reference.
ACTION: and (4) problem calculation. It is not safe to continue the overspeed process. Alternatively, the algorithm/PS is contacted for further command. Provide < Host >, < HostService > and < Platform >.
ACTION: a warning is found that continues to be unsafe. It is not safe to continue the overspeed process. Contact algorithm/PS and for further instructions.
ACTION: no warning was found. Continue to be safe. It is safe to continue the overspeed process.
The script will generate an analysis file in the script directory and load all data into the database on stageddb 01c
raw _ < host > _ host service > _ db.db _ query _ results.log-important data analysis for [ REDACTED ] ([ EDIT ])
raw _ < host > _ host service > _ db.db _ warming _ etc. log-database error log for [ REDACTED ]
raw _ [ < host > ] _ [ < host service > - ] db _ analyze _ info
raw _ host > _ host service > _ db.db _ analyze _ war.log-analysis warning log for [ retrieved ]. This file lists the items that the algorithm can view before proceeding.
Determine that all final output files from (g) above have been logged in to svn under [ http:// svn. guardian. lan/svn/ga/bridges/stable/customer-data/_ customer specificnotes/< HOST > ] (version control system). This should occur automatically. In the case where it does not occur automatically, the following is done (for trial account customers only):
if the following directory does not exist, create it: mkdirwarpspeedtop/cd-bridge-stable/customer-data/_ customer specificnotes/< HOST >// copy all files from the export folder into the svn customer folder. cp/warppedtop/warp _ speed/model _ create/DATAQUALITY _ OUTPUT/< HOST > - < HOSTSTEREVICE >/warppedtop/cd-bridge-stage/customer-data/_ customer Specificnotes/< HOST > cd-/warppedtop/cd-bridge-stage/customer-data/_ customer Specificnotes/< HOST > svnadd/svncommmit-m "BugzID, YOUR _ DEPLOYNTE _ CASE _ NUMBER"
Reviewing data analysis results and attention critical items
Analyzing and checking the data of the test account; and analyzing and checking visual data.
Initializing a model for "svn
Care is taken in this step. If an incorrect value is entered, an algorithm/PS may be utilized to proceed. The following script is run using information from all the steps described above and the prompts are followed. Will be named as<Host>_<HostService>The files of tenant data are placed inwarpspeedIn the/model _ create directory.
cd~/warpspeedlop/warp_speed/model_createperladdDetailsCustomer.pl<Host>
<HostService><BugzID>
For trial account-retail
BotUserList (Bot user List): null unless otherwise noted
Timezone < TimeZoneForRisk Engine time zone > (for example, "America/LosAngeles (USA/los Angeles)") in a location/city format. See list.
FILE _ GAP _ STOP _ IN _ minses: written down during the initial data analysis step.
MAX _ GAP _ IN _ minues: written down during the initial data analysis step.
USER _ ID: written down during the initial data analysis step. This field is typically "member" if the field is a CU (credit union) and "customer" if the customer is a bank.
Specifying whether mobility should be enabled for the customer.
For trial accounts-business:
BotUserList (Bot user List): null unless otherwise noted
Timezone < TimeZoneForRisk Engine time zone > (for example, "America/LosAngeles (USA/los Angeles)") in a location/city format. See list.
FILE _ GAP _ STOP _ IN _ minses: written down during the initial data analysis step.
MAX _ GAP _ IN _ minues: written down during the initial data analysis step.
USER _ ID: written down during the initial data analysis step. This field is typically "member" for CUs (credit unions) and "customer" for banks.
ACTOR _ ID: saved in a written manner during the initial data analysis step. Importantly: if "user 1" is saved, "user" must be entered instead. "user 1" is the db column name and "user" is the original data name.
For trial accounts-business:
BotUserList (Bot user List): null unless otherwise noted
BotUserList is added in the case of Fogbugz, is explicitly indicated, preferably begins with "BOTUSERLIST" and then lists below it.
The client is synchronized with the pseudo model.
And returning to the test account.
Synchronize with the pseudo model so that the customer model is up-to-date with the up-to-date pseudo build (links are also available at the top of the wiki).
cd-/waxy-seed/model _ create/perlsyncCustomerWithDummy.pl < MYCLIENT >. tenent.data < DUMMYBUILD >/check the changes made to the model by the script to ensure that they are as expected: cd-/warpspeedtop/cd-branch-stable/customer-data/< Host >/< Host service >// commit change-ensure that "BugzID: XXXXX" svn commit-m "BugzID: XXXXX" is used to enter the error number into the annotation.
Running harness (harness) script. At a high level, the script loads data into a first-round database using the RE, runs some R-analysis and updates the model, and then reloads the data into a second-round database. If the analysis of the first-round database is successful, a second-round database is loaded.
The updated model is built and deployed.Here, the RE has run through the input data and adjusted the model. Next, RE and RA will start from the assigned env × puppet context.
The model for the tenant of the RA is updated. On dc1stagere01.dc1.fm-hosted. com:
in the/svn/puppet/env [ you _ NUMBER _ HERE ] directory, versions.pp:
The latest setup number obtained for the tenant (< BUILDNUMBER >), Ex: $ [ retrieved ] _ return ═ 4.2-snap-r 25733 ", is entered in the" tenant version (tentversion) "section.
The "component version" section is updated with the latest build.
The RA will use a second round database created by the harness script. This second round database should be used for all tenants in the deployment block. Thus, if a new block is started, the DB NAME entry in the "pobdefinitions (pob definitions)" section is also updated to match the previously created WS _ EXE _ DB _ NAME _2 parameter. If the deployment is not the first time in the current block, then the "pobdefinitions" section is left unchanged.
In the/svn/puppet/env [ you _ NUMBER _ HERE ] directory, the terms. View SVNs at other puppet/envXX/tendents. Note that env07 is the only env that enables the harvester (harverster) to operate.
Commit the change to the SVN and wait for Puppet to make the change on the fly. The status of Puppet changes is checked by entering dc1stagere01.dc1.fm-hosted. com and typing "ploil". Although this process is very fast, this will trail the puppet log file.
Adding RF and RA hosts and instance # to FogBugz
Turn on RF and RA:
sudo/opt/ga/froaddmap 4/env [ you _ NUMBER _ HERE ]/run/riskfed _ control.sh on// note: if this is the first tenant that is input to load the RiskApp correctly, it is proposed to run a refresh _ ga _ count.
sudo/opt/ga/fraudmap4/env [ you _ NUMBER _ fire ]/run/RiskApp/RiskApp _ control.sh open in the RiskApp, determining that the new threshold has been validated and entered correctly.
A construction of the stamp (stampedbuild) is made and the error is sent to QA.
Lan on "svn
sudosu-buildbot
build-modelbranches/stable-Pstamp-Pproduction-builderHost/HostService
Note down the build number for the next step
The following information is sent to QA: constructing a stamp; moving: yes/no; previous url and login/password; DB information for RA; a path to a wire harness directory; the RE time zone; the RA time zone; known fraud; for trial accounts hosting only tenants: the temporary harvester is not activated; for a trial account with only preset (onprep) tenants: data feed indication (to be completed before turning on RE); copying the contents of/mounts/customer-data/HOST-HOSTSTERVLCE/sftp _ archive/to/mounts/customer-data-prod/HOST-HOSTSTERVCE/home/chroma/incomming/; creating a gpg harvester for the HOST-HOSTSTEREVICE tenant and setting the gpg harvester using the cron job schedule listed below; harvetter _ cronjob _ hour > "03-19", harvetter _ cronjob _ minute > [ "10", "25", "40", "55" ]; and turning on the GPG harvester and completing it before turning on the RiskEngine.
And the schedule production construction of network operation is realized.
RiskEngine model generation and RiskApp metadata generation
A metadata item.The metadata items are the design of the RiskEngine model generation and the RiskApp metadata generation. Because the metadata of the RiskApp is based on the RiskEngine model, processing entails first defining a logon model and an activity model, which model definitions may then be used by the RiskApp generator to generate RiskApp metadata (depending on how the RiskApp metadata is defined; this may be a simple loader that may take RiskApp XML definitions and parameter files with a logon model/activity model definition/palette XML file and a parameter XML file, and process them all separately and independently and load them into RE and RA)
The components are as follows: first, a login model directory and palette comprising a set of XML files that include a login model definition that may be selected to construct a login model template; second, a login model template definition, which includes an XML file defining a login model template for a host/host service; third, an active model structure comprising an XML file describing the active model structure; fourth, a login parameter/activity parameter generator (optional), which may generate login parameters or activity parameters that are not initially in XML format; fifth, the model/palette stamper assigns versions and unique keys to key elements of the component; sixth, the model loader puts the RE model template into the RE and the RA metadata into the RA; and seventh, a RiskApp metadata generator (optional) that generates RiskApp XML metadata if the original source is not in XML.
FIG. 33 is a block diagram of model generation and metadata generation according to an embodiment.
Login model directory and palette. The login model directory includes components in which login models may be assembled together. At the lowest level, the login model directory has a group template palette in which unique group templates are defined and an evaluator template palette in which unique evaluators are defined. The set of gang templates then forms a gang template structure and the set of evaluators forms an evaluator structure. The group template structure and the evaluator structure together define a login model template structure. All palettes and directories may be defined in XML.
The group template palette may define group templates that are available for use. For example, multiple country group templates may exist, each using a different statistical and a priori classes. Group templates can be inserted but cannot be updated or deleted. The palette should have the following fields:
PALETTE _ REGISTRATION _ KEY, which is a KEY assigned by the PALETTE stamper to uniquely protect the PALETTE from changes in XML format.
GRP _ TMPLT _ private _ KEY, which is a global KEY that uniquely identifies the group.
GRP _ TMPLT _ TYPE, which identifies a group TYPE such as country and which identifies variants. This may be mapped to a name in the group template table.
GRP_CLASSNAME
GRPSTATSCLASSNAME
MODE_CLASSNAME
MODEDEF_CLASSNAME
MODESTATS_CLASSNAME
PRIORS_CLASSNAME
MODEFORGETTING_CLASSNAME
MODEDEF _ TABLENAME is used for RiskApp; the riskepp can determine which column(s) from such a table to use.
LMTS group structure.Together, the LTMS _ TYPE and LMTS _ variance will select the necessary structure from the two entities described below.
A first entity.The first entity is an LMTS group association palette: the set association defines all the sets to be used for LTMS _ TYPE. The fields for this entity are:
PALETTE _ REGISTRATION _ KEY, which is a KEY assigned by the PALETTE stamper to uniquely protect the PALETTE from changes in XML format.
LMTS_TYPE
LMTS_GRP_VARIATION
GRP_TEMPLATE_PALETTE_KEY
GRP _ COORD, which is the order in which groups should be processed
MODEDEF _ INDEX, which is the meddef number in the ga _ ra _ registers _ history.
All group variants within this LTMS _ TYPE have all the same group TYPEs. Each LTMS _ TYPE may have one LMTS _ GRP _ VARIATION and it is marked as a DEFAULT (DEFAULT) variant.
When LTMS _ TYPE is specified without variant, LTMS _ TYPE is interpreted as a default variant. The default version may be replaced by providing only the differences between the new variant and the default variant. For example, if a country variant is required, the LMTS _ GRP _ variance for this model may have a different country group specified instead of the default version. All other groups are still taken from the default version.
A second entity.The second entity is an LMTS group relationship palette-this defines the group relationship between the parent and the child. The fields are:
PALETTE _ REGISTRATION _ KEY, which is a KEY assigned by the PALETTE stamper to uniquely protect the PALETTE from changes in XML format.
An evaluator template palette.The evaluator template palette defines the evaluators that can be used. The palette may have the following fields:
PALETTE _ REGISTRATION _ KEY, which is a KEY assigned by the PALETTE stamper to uniquely protect the PALETTE from changes in XML format.
EVAL _ TMPLT _ PALETTE _ KEY, which is a global KEY that uniquely identifies the group.
LMTS evaluator architecture.The LMTS evaluator structure is an evaluator definition that is specified for the group contained in the risk calculation. The LMTS evaluator structure includes the following:
LMTS evaluator associated paletteWhich is an evaluator association that defines all evaluators to be used for LTMS _ TYPE. This field includes:
PALETTE _ REGISTRATION _ KEY, which is a KEY assigned by the PALETTE stamper to uniquely protect the PALETTE from changes in XML format.
LTMS_TYPE
LMTS_EVAI_VARIATION
EVAL_TMPLT_PALETTE_KEY
The NAME EVAL _ NAME-EVAL _ COORD ═ 0 should be DEFAULT
EVAL _ COORD-0, 1 or 2
LMTS evaluator relationship paletteWhich defines the group relationship between parents and children. This field includes:
PALETTE _ REGISTRATION _ KEY, which is a KEY assigned by the PALETTE stamper to uniquely protect the PALETTE from changes in XML format.
The current ga _ grouptmplt _ usage _ rel is used for 3 purposes: parent-child relationships between groups (PAR _ GRP) specified in the group relationship entity; an evaluator (EVAL _ SRC _ GRP); and an evaluator mode (EVAL _ LEAF _ GRP) not currently used.
Login model directoryWhich includes all known LTMS _ TYPE and its known group variants and evaluator variants and can be selected from and associated with the description.
<LMTS_CATALOG>
<LMTS_TYPE>
<NAME>DI_BEACON</NAME>
< describe > for DI clients with merged beacon data. It has access type as top level node and device beacon pseudo node and device beacon available node >
Global database. It may be XML or a database
Host service ID dictionary: a host service ID is assigned. If not, stop.
Login model template definition. The entire login model structure is specified as described below and populated with elements from the group structure definition and the evaluator definition along with all palettes previously defined. This can be expressed as follows:
Variable elements for future determination.The manner and location for how to initiate the analysis remains open to future determinations. Similarly, the initiation of the Deployment _ key and LMTS _ SOURCE may be generated by means of a development tool or may be manually entered.
Movable model: activity models have proven to be very different among different customers. The introduction of the palette concept may occur over time. However, each customer's data may be decomposed down to structures and parameters (and not dynamic versus static). The structure comprises two parts: SESSION and ACTIVITY (e.g., GA _ ACTIVITY _ TMPLT, GA _ ACTIVITY _ subtipy) and are typically defined at the beginning and may require small adjustments. The parameters (again SESSION and ACTIVITY) are data that is typically in the GA SESSION ACT PARAMSET table and which may require constant adjustment to correctly obtain the parameters (e.g., COST) as iterations through the data occur. Versions should be assigned by the model stamper:
if so, the activity will be written.
The GA _ ACTIVITY _ TMPLT table also has SHORT _ NAME, DISPLAY _ NAME, QUALIFIER _ DESCR, QUANTIFIER _ DESCR, and QUANTIFIER _ TYPE. However, this information is only for the riskepp (ga _ ra _ activity _ tmplt) and may be changed often. Thus, this information may be removed from the RiskEngine metadata. The subtype DISPLAY _ NAME, QUALIFIER _ DESCR, QUANTIFIER _ DESCR, and QUANTIFIER _ TYPE may be similarly removed. The activity parameter can be broken down into two parts: one for internal modification and the other for modification at the client site. The modification of the activity parameters at the client station includes:
template.addparamset ("terminate", "TIMEOUT _ IN _ MINUTE", "20", pVersion); // from Mike: the session timeout was set at 15 minutes and based on inactivity.
template.addParamSet(″TERMINATION″,″TIMEOUT_ACCURACY″,″2″,pVersion);
template.addParamSet(″TERMINATION″,″MAX_BIN_SIZE″,″3500″,pVersion);
template.addParamSet(″TERMINATION″,″WARN_BIN_SIZE″,″2100″,pVersion);
template.addparamset ("term", "receiver _ IN _ MINUTE", "30", pVersion); // suggest: so that the TIMEOUT _ IN _ MINUTE +3 times TIMEOUT _ ACCURACY
The following parameters may be found in sessionstroturellevel? Department assignment
Expressed in XML model parameters.To populate the following XML output from the EXCEL spreadsheet and analyze the resulting data, the following tools are envisioned:
login parameters:
activity parameters
Model/palette stamper.To keep track of the deployed model at the customer site, the customer is provided with control over the model and an assignment of a unique key to be included in the ETL file is provided. This enables the identification of the model in use to handle specific events. This includes the registration process: the data MODEL is stamped with MODEL _ KEY before being supplied to the client. The KEY may be used to prevent backfiring (tempering) of the XML file, meaning that the KEY may be generated based on the content of the parameter version and structure type.
Each PALETTE (e.g., LMTS _ GRP _ VARIATION) is protected with a PALETTE _ REGISTRATION _ KEY. The MODEL _ KEY is written to the RE database and to the ETL when deployed at the client site. This MODEL _ KEY is stored internally, e.g., in a database, and is associated with any KEYs in its configured ETL file. At the client site, the model may be stamped with the registration key to be deployed.
The main characteristics of the stamping device are as follows:
each palette in the login model template palette is stamped.
Read from the log-in model template palette, log-in model structure XML definition.
Logging in the model parameter XML file and verifying that the logging in XML parameter file is compatible with the logging in model structure XML definition.
An LMTP _ VERSION for the login XML parameter is generated. Reading from the active model template structure and parameter XML file.
And generating a movable model structure version.
And generating a parameter version of the movable model.
A model key is generated based on all the above information and a registration XML file is generated as follows. The same information may be stored in the model database, so each key may uniquely identify all components. This file can be shipped to the customer and read by the model loader to validate all components at the customer site when loading the model.
And a model loader.The model loader is a separate tool that is issued independently of the RiskEngine. The model loader derives data from the RiskEngine library. The files to be read by the model loader include the following: a login model structure definition XML file, a login model structure palette, a login model parameter file, an active model structure file, an active model parameter file, and a model registration XML file.
The model loader loads its model to a system with the following conditions:
saving changes made in a system by a client
Maintaining history of all versions
Version compatibility (i.e., protection against human error) and backfiring of data are checked by validating all keys generated by the STAMPER (STAMPER).
The loading RiskEngine determines that it is the correct version to be run.
The activity model may be loaded independently of the login model (if the login model is unchanged and the activity model is changed, only the activity model should be loaded).
The activity model does not exist.
A display metadata generator.The display model is generated with a similar concept as the login model: structure, variables and parameters. The structure, static parameters, and dynamic parameters are described below. The RiskEngine model generator will create a structural xml file that describes the login model and the activity model. The display metadata generator generates its metadata using an XML file.
Structure of the product: the display structure is closely tied to the login model with a slight twist due to the conversation model; this defines what the RiskApp needs to associate its architecture with the Login model/Activity model (i.e., the base layout). The usage group association definition (i.e., all groups being used in the model) is displayed to determine the login group. A display group palette similar to the login model may be selected and displayed to assemble the display model.
Variants: some variations include removing the usergent from the alarm page and adding logType to the alarm page. Some difficulties may be which rows, columns and their impact on other components already exist that the added or deleted component may be.
Parameter(s): the threshold is a parameter to be displayed.
Display model definition. Display model definition to be defined in XML as a sub-part within a model<displayMetadata (display metadata)>。
Fig. 34 is a diagram illustrating a risk engine table, according to an embodiment.
Fig. 35 is a diagram illustrating an architecture mapping, according to an embodiment.
Possible changes to existing RE databases.
Required changeThe following may be the changes required to the RE database: adding a previous class name to the group template; adding LMPT _ SOURCE to the ga _ settings table (which the ga _ groutppplt _ use _ param already has) to indicate who has made the change; removing duplicate GrpType _ ID from the ga _ grouptmplt _ use and ga _ grouptmplt _ use _ rel tables; removing rel _ order from the ga _ grouptmplt _ use _ rel table; adding MODEL _ KEY to the ga _ settings table; removing the subtype _ name from the ga _ group _ tmplt; GA _ ACTIVITY _ TMPLT removes TYPE _ CD; rename LMTS _ VARIATION to LMTS _ GRP _ VARIATION; and adding LMTS _ EVAL _ variance.
Currently, the GA _ ACTIVITY _ TMPLT table also has SHORT _ NAME, DISPLAY _ NAME, QUALIFIER _ DESCR, QUANTIFIER _ DESCR, and QUANTIFIER _ TYPE. This information only applies to the RiskApp (ga _ ra _ activity _ tmplt) and may be changed frequently. Thus, this information is removed from the RiskEngine metadata.
A recommended change.The following may be changes to the RE database recommendation: a modeef index to be independent of coordinates (coord) specified in the system is added within the evaluator (current ga _ group _ use _ rel). The coordinates specify the order of processing and the modedef index specifies which modedef index is in the ga _ ra _ registratits _ history table.
A desired variation.The following may be expected variations to the RE database: adding the ga _ group _ template _ pattern table to the RE engine; merging usage and grouping template tables; and to the groupAn evaluator table or the like is created to prevent overload of the group template table by using type _ cd.
An unplanned change.The following may be the required changes to the RE database: the ga _ group _ template and the ga _ mode _ template are combined.
An item for resolution.The following may still be solved: merging usage and grouping template tables; and using a Key (character string) or an ID (integer) for group _ template identification.
And (4) version.Because there are many parameter files that can affect the version of the model, a multi-version system is contemplated, as follows:
RiskEngine version: some models work with some RiskEngine versions. For example, in version 2.5, there is a parameter name change, and an old model with an old parameter name may not work in RiskEngine 2.5. [ RiskEngine can determine whether the installed model version is RiskEngine compatible]. This is defined in a file at the highest level of the model tree, so it applies to all models under the tree.
Group of. The group structure may be static. When the group structure is changed, the group structure becomes a different LMTS _ TYPE. The group definition is defined by a variant. The variants may be different processing classes, different evaluators. The variant name of the cascade (group-evaluator) is used. The parameters may be static or dynamic. Static applies to items that change infrequently and to all customers that use LMTS _ TYPE; when it changes, the static state will apply to all customers using the LMTS _ TYPE. There is a version number that defines the change. Dynamically applied to items that vary for each customer and have version numbers that define the changes.
Evaluation device. The evaluator design is based on the following: first, the structure defines groups to be included for evaluators; second, the definition is the class to be used for the evaluator; and third, the parameters are items that vary for each customer. Static applications are to fraud co-occurrence bins, login rate bins, timebin models, etc. Movable partThe states are applied to fraud co-occurrence coefficients, login rate coefficients, etc. Because there are many files involved, the evaluator will generate an error if the user updates any manually maintained version numbers. Thus, the model stamper may assign versions based on the checksum generation of the parameter XML file.
Active model version: this is the structure of ACTIVITY MODELs such as GA _ ACTIVITY _ MODEL, GA _ ACTIVITY _ SUBTYPE
Active parameter version: this is for inputting information of the GA _ SESSION _ ACT _ PARAMSET table.
A parameter file.Each group in the Excel spreadsheet may undergo identification by group template type or possibly by key. The folder structure is as follows:
a new folder "model" is added to the release. This can be identified as a global model directory.
AUTHENTICATION, DI _ BEACON, STANDARD, ACCESS _ AUTHENTICATION, and STANDARD _ COOKIE are the 5 latest LMTS defined. In customers.tar.gz, the structure appears as follows:
The model-related files for the customer are stored in a "model" subfolder under the name of the customer (sometimes not directly). Typically, in each folder, it has a minimum of up to 4 files (sessionModelParams is only needed if a session model exists).
There are typically 6 files describing the model and they are (to be read by the tool):
login model-evaluator:
loginModelParams: at the model level and typically customer-specific parameters. Typically this parameter is present in the model folder of each client. Examples may be a fraud co-occurrence coefficient (FRAUDCO-OCCURRENCEEEXCEL worksheet) and a LOGIN rate coefficient (LOGIN _ RATECO _ EFFICIENTEEXCEL worksheet). This is at the evaluator level.
loginModel: parameters at the model level. For example, fraud co-occurrence bin definitions, trust model definitions and login rate bin definitions. This parameter is typically at the "model/generic" catalog because it applies to all models and is not customer specific. This parameter is typically applied to all models whenever it changes. This is at the evaluator level.
Login model-group
groupModelParams: at the group level and is a client dependent parameter. Typically this parameter is present in the model folder of each client. This typically includes new model parameters (NewModeExcel worksheet), priors for the relevant set of user agents (PRIORexcel worksheet).
groupModelAn example is α (ALPHA) for the dirichlet parameter because this parameter does not change for each client, so this should be separated from groupmodel params.
Conversation model:
sessionModelParams: at the session level but depending on the client's parameters or active level, the definition is a fetchDepending on the parameters of the client. This parameter is typically present in each customer's model folder.
sessionModel: at the session model layer but not dependent on the parameters of the client. This parameter is typically at the "model/generic" catalog.
The model generator tool begins searching for a given file at the customer's model folder.
For the login model (4 files):
if not, moving upwards to the next layer of the LMTS folder;
if not, move up to the next level that is a common folder.
For the session model (2 files):
if not, move up to the next level that is a common folder.
Any parameters that already exist may be ignored. Txt (typically a generic file) is a candidate for modification from generic values, for example, there are two alternatives: firstly, the method comprises the following steps: txt to the LMTS model layer or the client model layer and modify the value therein; and second, if only one or two values, then the value is specified in lognmodel params.
Another design is to allow any replacement parameters to be entered into another file (the replacement file) to keep the structure consistent. The tool makes some checks to ensure that all groups get some parameters, which if not, will complain and may prevent the model from being written.
Process for defining a model. The process of defining the model includes the steps of: defining a new group template; defining LMTS _ TYPE and group variants (defining new group associations; defining new relationships); and defining a new evaluator. There are three aspects to selecting LMTS _ TYPE: in the first place, the first,determining whether a pre-existing model (and variants) exists for use; second, if there are no pre-existing models, then it is determined whether any model can be modified to provide a different variant (if so, a variant for that model is created; note that a new group template palette or evaluator may need to be defined); third, if no modification is possible, a group template for the group template palette is created and then a new LMTS is established.
Structural representation in XML (for reference only)
But this enables the riskepp to obtain the necessary information without using a palette.
Movable model
Main component
Login model template. The login model template includes the following: a catalog, which is a new component and may exist in XML or a centralized database; and definitions, which may need to be changed based on the login model dictionary.
Conversational (activity) model. While there may be some shared activity, for example, for DI customers, this is independent of the login model and therefore can be enhanced in the future.
Reading Excel parameters. Summary tables have been developed that isolate changes in the raw data spreadsheet.
Display metadata. The display was done using two components: first, the directory is in a shared display model metadata structure, and second, includes a definition of two parts. First, the hard-coded ID is removed, so data can be programmatically written to the database along with the data for the ID generated. Second, if it is desired to have sharable RiskApp metadata, that portion may be rewritten to favor sharable RiskApp metadata (similar to the RiskEngine concept for dictionaries and palettes).
Version(s). This applies to the login model, the session model/activity model, and the display model. This part is sensitive due to the fact that: portions of the parameters may be shared among the clients.
Model loader. If the XML representation of the model remains unchanged, only the version aspect may involve modification.
Multi-pull (Doral) algorithm for FraudMAP system
Multiple pull algorithm requirement-primary objective. The primary goal is to enable processing and scoring of multiple event streams from different channels and possibly different arrival timings (e.g., batch versus real-time).
Multiple pull algorithm requirement-overview. To put some of these requirements into context, consider the following regarding the current state of the product and the commercial environmentAnd (6) observing. Note that this point in this section is not intended as a requirement, but is expressed to explain the context surrounding the explicit requirement in the subsequent section.
Customers and potential customers want to read, model, score, and display events from multiple and different data sources. The timing (e.g., real-time versus batch) and availability (now versus 6 months since now) of these data sources are not always conveniently combined and it is very cumbersome to expect the customer to do so.
There are increasing opportunities to create products that analyze data that is not specific to online channels (e.g., wire transfers, accounts, offline ACH data).
Boarder enables the use of cross-facility data to both internally improve fraud detection and create information sources that may be "productized". This contemplates processing and modeling data along different dimensions (e.g., IP address, recipient account, device ID, and possibly even a sequence of activities).
Furthermore, it is desirable to have a repository containing information (across all users at all institutions) that is widely useful for risk scoring and other purposes. The repository will contain information from third party sources as well as internal cross-institution data and analytics. The simulation of the repository in the current product is IPDB, which contains information from the trial accounts.
It is desirable to make better use of some of the additional fields involved about activities, particularly around display and search. For example, it is currently not possible to search or match against a particular recipient account.
New one-wave fraudulent attacks caused by sophisticated malware have been identified. These attacks can be roughly divided into several broad categories:
stolen credentials/different IP addresses . Malware is used to steal credentials, but fraudulent sessions come from IP addresses not associated with the user.
Stolen certificateProxy server through user's machine. After stealing credentials, a fraudster uses the "loop back" feature of malware to proxy through the user's machine. Thus, the activity looks like a legitimate IP address from the user. In this case, the fraudster may also surmise to steal the user's cookie and masquerade their user agent string, although in practice the fraudster does not always do so. In this case, there is probably a person who performs the activity. Most of the recent fraud at the trial account site falls under this description.
Session interception/transaction modification. Malware waits for a user to log into an online bank and then initiates a transaction in the background or changes information (payee, amount) about the user-initiated transaction. In this case, the transaction is automatically initiated or modified without the need for a human fraudster to execute each instance. No direct observation is made of instances of such cheating, which, however, is frequently mentioned by customers and other players in the cheating space.
The top level approach enables detection of fraud by modeling different aspects of user behavior as reflected in the data. However, frequently changing the fraudster policy would require fast iterations of improvements to the algorithm to effectively detect and prevent new fraud attacks.
Multi-pull algorithm requirement-Primary user case. Many of the requirements in the following sections are suggested by the following use cases:
use case AThe (trial account based) is based on a real-time data feed that contains online banking activity. In addition, ACH files are obtained in batches as they are processed (several times a day). This creates several complex cases:
real-time scoring of real-time data is proposed to be provided to the customer, but reasonable scoring of batch events is also contemplated. This may result in a minimum degradation in the scoring of batch events as a compromise to score batch events "out of order".
The information may be presented in a single display.
A "link" is set between the account number in the ACH file and the online user id in the online banking data. This may require a look-up table (which may be considered a third data source).
Deployment may occur in stages; for example, the deployment system scores only real-time data; then added to ACH capabilities at a later date. Phased deployment may occur without introducing architectural changes or processing historical data again.
Use case B(cross-facility data feedback) is based on raw data from multiple customers. The cross-organizational data can explore mule accounts, score IP reputation, and otherwise understand and score data across dimensions other than online users. This use case requires the ability to model and score the event stream across other entities than the online user. The significant flexibility in the type of stored information and the type of formula used is a point. This also requires a mechanism whereby information from the cross-chassis model is passed to the FI-specific risk engine so that it can be factored into the risk score.
Multi-pull algorithm requirement-data processing. Data processing refers to acting on the starting data from possibly multiple sources of raw input data to create events and ordering the events for use by the computing portion of the risk engine. These events may contain all necessary information for risk calculation (and display). This step also includes determining which events should be skipped (because the events are irrelevant, corrupt, etc.). The raw data involved may be provided by the customer, a third party, or may be output from an internal risk engine.
Multi-pull algorithm requirement-definition. The following definitions are provided:
event(s): an event is a fundamental unit of data and canRepresented by a single line in a separate file or XML element in an XML file. An example of an event might be "user JSMITH logged in from IP address 123.43.43.43 at 16 o' clock 44 min 35 sec 5/14/2010. "in a data file, this is conceptualized as a set of fields.
Field(s): a field is a basic component of an event. The fields are separated by delimiters in the delimited data or are different XML elements in the event elements of the XML file. For example, the previously described events may be represented in XML as follows:
the event has four fields: username, IPaddr, acttype, datetime.
The event has seven fields: username, IPaddr, acttype, datetime, from _ account, running _ number, and to _ account.
Model entity: the model entity represents the unit around which the behavior is modeled. Originally, the model entity was always an online user. With the business banking model, this is generalized so that a company can be a model entity. The desired future capabilities need to take into account other applications with other designations of model entities such as IP reputation scoring (IP address), mule account probing (target account), and offline wire transfer scoring (source account).
Data event pair concept event: for purposes of this document, data events refer to events for separate dataA line of text, or an event XML object in XML data. A "conceptual event" refers to a real-world occurrence represented by data. For example, if the user changes the password, this may be captured as one data event in the weblog data source and as another data event in the audit log. Those are two different data events. However, they are the same conceptual events. As another example, consider use case a. The online event may indicate that ACH batch has been sent without details. Subsequently, a support file with additional information about the event is transmitted. In this way, the same conceptual event displays itself in two different data sources.
Multi-pull algorithm requirement-design. The software can handle any event stream with the following structure (see appendix: event stream for examples)
Each event is a line of separate data or a single XML element. Each event contains a set of fields. In delimited data, fields are separated by delimiters and named according to headers. In XML data, fields are named by the tags of elements inside the event element.
The fields included may be event specific. (fields such as "transfer amount" may not be included in events such as "account summary").
In an activity type, there may be variants in which fields are listed and fields are not listed.
Data elements are selected as model entities. (Note that a data element is typically a single field, but may be placed in different fields in some cases.
The model entity specifies the "dimension" along which the modeling occurs. For example, for a retail bank, the model entity is a user. For a commercial bank, the model entity is a company. Other applications with other designations of model entities are contemplated such as IP reputation scoring (IP address), mule account probing (target account), and offline wire transfer scoring (source account).
The selection of the model entity (roughly) means that previous events that only involve the same value of the model entity are relevant for scoring the current event.
A model entity may exist in each event. If the model entity does not exist, the event must be skipped or otherwise processed (see below).
The software may enable metadata-driven logic to determine which fields represent model entities on an event-by-event basis. For example, in the context of a trial account, the following (hard-coded) logic exists: the logic (roughly) indicates that field B is used as a model entity if field a is "business" and field C is otherwise used as a model entity.
In some cases, pre-processing may be required to complete a consistent model entity across different users. In use case a, for example, an online banking user (or company) may be the model entity. ACH data will probably not contain this field. However, the lookup table may link the account with the online user. Therefore, the following steps will be taken: by which the online user attaches to ACH data via a look-up table.
A single model entity for each instance of the RiskEngine is contemplated. However, the same data may be fed to different instances of RiskEngine. In this case, the instances of RiskEngine may use each other's outputs. There are timing issues involved in which one of the riskengines will process the event first unchanged and thus there are no other channels for which the RiskEngine is learning what from the same event. Alternatively, different models with different model entities may exist and interact within the same RiskEngine.
The software may enable the metadata driven configuration to handle multiple independent data streams with potentially different "arrival timings" (i.e., real-time versus batch processing). For example, consider use case a. Possible solutions include:
Multiple data streams with identical arrival timing are linked into a single, time-sequential stream of events. This is the simplest way to handle multiple data fields. However, it needs to run at the speed of the slowest data source. Thus, if only one data source is available in batch mode, the entire system will run in batch mode.
Multiple data streams with different arrivals are fed to the same risk engine model(the risk engine may receive data out of order). In this solution, the data is fed into the RiskEngine as it becomes available. This requires that the model performs reasonably well in scoring events that have occurred "in the past". When events arrive in chronological order, the following strategy is used: more details about recent events are kept in memory and data about past events is "compressed". In this case, there may be a limit to the richness of the background when scoring events from hours ago.
Multiple data streams with different arrivals are fed into separate risk engines, but one of the risk engines can utilize the results of the other risk engines . One possible solution is to have one RiskEngine operating in real time, scoring the online data and a second RiskEngine operating in batch mode, scoring the ACH data. However, when scoring ACH transactions that occur online, it is preferable to consider online data. This may be accomplished by having the output of the real-time risk engine fed into the batch risk engine as another data source.
The software may enable the results of the cross-facility data to be utilized in the risk scoring process (in real-time or near real-time). Consider use case B. The cross-institution data is fed into a RiskEngine where the target account is the model entity. The RiskEngine is able to score the target account and output data (in a variety of ways) when the recipient account is deemed at risk. At the same time, each financial institution has its own RiskEngine scoring the same event (for its own user). Ideally, information about risky accounts from the cross-institution model can be fed to the financial institution specific RiskEngine, so that the information can be factored in the risk scoring. The primary solution for the cross-institution RiskEngine envisaged for this is to write to the information repository when it sees a risky account. The financial institution RiskEngine has a risk component that makes transfers more risky if the target account is listed in the information repository.
RiskEngine does not process the same data event more than once. This is more difficult to achieve in the event of out-of-order data. Possible solutions include: first, it is ensured that the data stream fed into the risk engine does not contain duplicate events (at least in cases where out-of-order data is allowed). In other words, the customer will control this in their data source. A second possible solution is to allow duplicate events within a certain time span (short but configurable) and have the RiskEngine keep a list of checksums to ensure that the RiskEngine skips any duplicate events.
The same conceptual events may be present in different data streams. However, the conceptual event must be part of the modeling process to understand under what circumstances this may happen so that the conceptual event can be handled appropriately. For example, in the triage account-extension model, the internal account login and triage account login are (in some ways) the same conceptual event. It can therefore be expected that there are many ways (merging, interleaving, etc.) to address this conceptual event.
This configuration scheme may be equipped with logic to allow more complex handling of heterogeneous data anomalies, allowing "graceful degradation" in the face of lost data, incorrectly formatted data, corrupted data, etc. Effectively, this means that it can be as accurate as possible in the face of data quality issues, and is robust so that small data errors do not have a wide range of impact. Possible scenarios include:
The field "money amount" is contemplated as a number, but instead includes a text string. This may be configured to be treated as zero or null, but still score other aspects of the event.
Events from IP address 123.45.67.89 are from background processes that have no relationship to user activity. These events may be configured to be skipped.
Sometimes, some events in a data source have timestamps that are in a different time zone than the rest of the data. While the processing of the event may be poor processing (because the event is "corrupt"), ideally, such processing does not result in skipping a large number of other events.
As an example from a trial account, parsing data using regular expressions to pick out certain fields is used. For example, if the raw data contains fields for a URL as follows:
/common/Wire/WireMaintTemplate.asp?Action=Create&ID=20073&Status=I&FromPag
e=wireCreatePrereq.asp&GoToPage=wireManager.asp
the ability to parse out "actions", IDs and possibly some logic is needed to determine what events it should map to. The data may appear as name-value pairs that are themselves contained in a common field name. Software can read the name-value pairs and perform logic to determine the mapping.
The generic event represents handling these different situations, especially those outside of online banking. Currently, event representations are hard coded to contain IP, usergent, cookie, session ID, qualifier, quantifier, etc., and extracted.
Multi-pull algorithm requirement-session flow. A session flow refers to the process by which incoming events from an event stream are "grouped" (into entities called sessions) together. It has not been determined whether the concepts of the session are useful from a risk scoring perspective (they areIs a separate problem for display purposes). The concept of "in the same session" and "in a previous session" can be replaced by the following more flexible concept: how long the previous event occurred in the past.
Although a session may appear to be a natural entity, it may in fact be difficult to identify. Most clients do not provide a reliable session ID and when they do, sessions often do not behave in an ideal manner. For example, the event of a password being mistaken before a successful login attempt is typically not included in the same session. Without a session ID provided by the client, relatively coarse logic is employed to determine the session boundaries. This often results in many session flow errors, which affect the performance of the risk score. Furthermore, significant modeling and configuration effort is expended in an attempt to minimize this problem.
The session flow has some advantages. The session flow is a useful entity for display purposes. The conversation process serves as a functional unit one layer higher than the personal events and thus provides a concise summary of these events. The conversation process also provides a basis for probabilistic statements such as "how likely the user will wire transfer". However, all of these are likely to be replaced by more fluid concepts such as considering the set of recent events and how long ago they took place. The session ID can be utilized under the logic of risk calculation if desired without explicitly splitting the event stream into different sessions. For most modeling purposes, the amount of time between events is more informative than whether they are in the same session.
The burden of the session flow may outweigh any advantages. From a risk modeling perspective, a purely event-based approach may be preferred. If a session ID is sought and provided in the data, logic can be developed in the risk component for use, which provides more flexibility than having an explicit session. In many cases, such as concurrent sessions, the concept of a session prevents risk scoring, so events in one session may not be able to be used to affect risk scores in another session. The session may be utilized in some form for display purposes, if desired.
Multi-pull algorithm requirement-risk calculation. The risk calculation occurs by: taking an event, evaluating the relevant information, and providing a final risk score for output.
Multi-pull algorithm requirement-definition. The following terms have the following meanings and assume that the model entity is a user. However, all of the following concepts are generalized to other choices of model entities.
Risk component: the risk component is a name given to one of many small calculations that focus on a particular feature of the event and its surroundings. One risk component may focus on the location of the user, another risk component may assess the degree of risk of the presence of a wire transfer activity, and yet another risk component may assess the cumulative additional risk of a given wire transfer admission occurring 15 minutes after the initiation of the wire transfer. The values output by the various risk components are then passed through another layer of computations to produce a final risk score. The risk component may be thought of as a function of the current event, summary statistics, environmental variables, model parameters, and information from the information repository as inputs.
Summary statistics: summary statistics are stored information about previous events for the same user. This is the essence of the behavioral modeling approach. The user's history may be a factor in determining that this particular event is more risky. Because it is not feasible to revisit every event that the user has in the past to evaluate the current event, some way of storing and updating a compressed version of the user's history is sought. In particular, only data relevant to the required calculations (this is called sufficient statistics in statistics) is stored. For example, to score the risk of wire transfer occurring at this moment, it is important to know how often the user sends wire transfers based on the user's behavior. But for the calculation it may be sufficient to know that: how many wire transfers the user has sent, and how long the user has been a client. Exact date and time of previous wire transfer is not requiredAnd (3) removing the solvent. In general, the kind of information to be stored is very different depending on the kind of calculation to be expected. For example, to score risk based on the timing of wire transfer permissions, the required data may include templates of all wire transfers initiated in the past 24 hours, recipients, and the initiating user. Some situations may require more important details, but only in the recent past. The trade-off between the accuracy and complexity of negotiating the storage of large amounts of information and the risk calculations enabled by the information is the nature of the computational statistics.
Background variable. A background variable is information from one risk component that may be relevant to the calculation of another risk component. For example, whether the user is at a new location is relevant to assessing the risk of being present on a different computer (since the user is more likely to be on a new computer in the case of a trip). This is similar to the summary statistics except that it stores previously calculated information about the current event and the summary statistics store information about previous events.
Model parameters: the model parameters are numbers that are consistent across all users and relatively static in time for use by the risk component. The model parameters may be updated manually in response to changing conditions, or even automatically through some process.
Information storage library: an information repository is a source of information that is expected to be more dynamic in time that is applicable to all users. The information repository may store information provided by third parties or information output from analysis of cross-facility data. Geo-location information and anonymous proxy server data provided by Quova are examples of information repository data.
In evaluating risk components for a time sequence of wire transfer permissions, the following examples are provided. To generate the output VALUE1 for wire transfers permitted within the initiated NUM _ HOURS and otherwise VALUE 2. VALUE1, VALUE2, and NUM _ HOURS will be model parameters for this risk component. A related aspect of the history is the time of template creation, and thus the summary statistics of the time of creation of each wire transfer template that is sought to be captured. The event will provide the time, template name, recipient account, and event type (wire transfer). The function would be the following logic: the time since the template was created is calculated, compared to NUM _ HOURS, and VALUE1 or VALUE2 is output as appropriate. In this case, the background variable is not necessary, but may be necessary in order to use different logic depending on whether the transaction is from a MOBILE device, where MOBILE or NORMAL would be calculated by different risk components. Similarly, to check the recipient account for a list of suspicious mule accounts and accounts that are in a risk score, it may be valuable to use information from the information repository.
Algorithmic changes may take many forms, including modifying an existing risk component or adding a new risk component. These changes may or may not require tracking of new summary statistics. An existing risk component may be used to output a new background variable to be used by another risk component. It is possible that different software architectures will require different processes depending on the type of change desired. The goal is to allow as many different kinds of changes as possible with a minimum amount of overhead.
Multi-pull algorithm requirement-design. The architecture may allow flexibility in the types of aggregated statistics it may store. Some examples include:
for each wire transfer (e.g., last week), the stored data includes the reference number and time the wire transfer was sent, the template and amount for the wire transfer. This enables the wire transfer activity to be connected with the relevant wire transfer allowances.
Allowed for each wire transfer. The stored data includes an aggregated distribution of time between when the wire transfer was sent and allowed. (the distribution is "quantized" in the same way as the quantifier bin).
In processing the ACH file, the stored data includes summary statistics on ACH file names, ACH batch names, and recipient names. These aggregated statistics may include different amounts received, a list of aggregated (quantified) distributions of amounts, different accounts used, a list of statistics on date and time.
Similar to the previous paragraph, the stored data includes summary information about files and batches. This may include a list of checksums, a list of different amounts, a list of batch names included in the file, a list of statistics about date and time.
For each user created (in the commercial banking model), the stored data includes the name and time of the user creation. This enables the calculation of the time elapsed between the user being created and the user first logging in.
The architecture may enable risk components, summary statistics, background variables, and model parameters to be added or changed without architectural changes or reprocessing of the data. (obviously, if the re-treatment does not take place, the change is only valid as it is made forward). Examples of risk components include such situations when it is intended to score the time elapsed between initiation and approval of a wire transfer. When a wire has been observed to be admitted, the wire is checked against a list of recent wires that have been sent (and stored as summary statistics) for elapsed time. Then, a "legal probability" of the amount of elapsed time is calculated. This requires summary statistics on previous event intervals for that company/user. By taking the ratio of the probability of fraud for the time interval (saved as summary statistics), the risk associated with that particular time interval can be determined.
The architecture may enable risk components, summary statistics, background variables, and model parameters to be added or modified without re-establishing code (e.g., algorithms are in metadata to any possible extent). Alternatively, the architecture may allow members of the algorithm team to be able to create and implement new risk components and summary statistics without the expertise of the internal workings of the risk engine. Similarly, the architecture may allow for efficient processing for implementing algorithm-based code changes without requiring a complete release cycle.
The architecture may enable algorithmic changes without affecting the display. In particular, when an algorithm team determines the best way to implement a customer model, they should not worry about whether the best way will change the display in an undesirable or unexpected way. Alternatively, subsequent processing may be employed to configure the display to occur after the algorithm team has completed the modeling work. It is advantageous to have more separation between display and risk calculation in the architecture.
The risk component may be "real-time" in the sense that: the risk component may be recalculated as new information arrives. This enables the processing of cluttered data from different sources and may not require the receipt of all relevant information on a single event. This can be automated without a session.
The architecture may enable testing of risk components in configurations with minimal correlation to other components, databases, etc.
Requirement-organization process for multi-pull algorithm. Resources, tools, and documents may be designed and created for various steps in the customer model configuration. These steps include: obtaining/verifying customer data; a converter configured to convert one or more data sources into an event stream containing all required information; determining an appropriate structure and format of risk components for which risk calculations are to be made for the client; setting parameters for risk calculation (to be as automated and data driven as possible); testing and verifying the accuracy of risk calculations (including creating processes within a team of algorithms and providing the QA team with the appropriate testing tools to verify model changes in the QA environment); determining appropriate display elements and configurations for the client; properly linking the display element to the risk component; and verifying the adequacy of the display configuration.
Resources, tools, environments, and processes may be specified and created for the step of adding and modifying risk components. These steps include: research/discovery of new features; implementing new features for testing; verifying the feature in a test environment; implementing new features in a production environment; QA processing for new features in a production environment; progress/period of adding new features; and tracking and recording changes to risk components, parameters, and reasons behind the changes.
A process may be specified to determine how catastrophic recovery occurs in a system that has had model changes in the past. There remains a need to determine under what conditions and to what extent attempts are made to replicate systems, as they existed before replicating again processing historical data versus using a new model.
Resources and processes may be specified and created in response to client requests for enhanced and customized model features. These steps include: if applicable, obtain and verify new data sources or changes to existing data sources; modifying the converter to transfer the required information into the single data stream; if so, adding and modifying the risk component structure for enhanced risk calculation; setting parameters for the new risk component, and adjusting any other parameters that may be affected; testing and verifying the accuracy of the new components and the new model; if so, add/modify display elements for enhancement; linking any new display elements to the risk component if applicable; and determining how to resolve model changes in the history.
Appendix:merging pair interleaving
Description of the merged data
Merging requires specifying a "login" event that is attempted to be merged with the internal event. A specification of "time tolerance" is indicated to determine how far apart in time an event can be and still be merged together. If the internal process does not find a counterpart in the trial account log, it is assumed to be a "stray beacon" (which occurs for various reasons) and discarded.
Appendix: correlation of "real-time" risk components
The design may score events as they arrive and process any information available. In other words, the design may score as much information as is available. More detailed examples of this design are given in the appendix: non-conversational scoring (as this occurs automatically with the notion of a conversational process omitted). Note that scoring events in this manner also enables the use of previously interleaved data to be easily made. Because the information is scored as it is received, there is no longer a need to rely on a merge process to ensure that all relevant information is available in a single event. Furthermore, the design enables processing of multiple data feeds in real time.
Appendix: event stream example. Note that: the data is tab delimited for readability. In practice, a vertical line separation ("|") or XML will be used.
Consider the above event stream. When the model entity is a user, the probability (and risk) associated with JSMITH logging in from 3 different IP addresses in a short period of time can be modeled. When the model entity is an IP address, the risk associated with IP address 123.43.43.43 and seeing that IP address on three different users can be modeled. When the model entity is a recipient account, the risks associated with the account 523345555-.
Appendix: non-conversational scoring
Without sessioning, each risk component has a time decay profile in login (e.g., for the risk of the last six hours of activity, regardless of whether it is "same session" or "different session" or whether it gets "reset" according to certain conditions, such as a change in IP address, etc.).
In this way, the risk is more appropriately considered to be "risk of the user at that time" rather than "risk of session" or "risk of event".
Third party data Source for FraudMAP
A fraud intelligence data repository.
Summary of the invention. The summary provides a high-level functional specification for a third-party data store. When the high level design agrees, the next step will be to create a detailed design for the particular use case.
Target: in order to provide a source for fraud intelligence data from third parties, the fraud intelligence data repository may be used by a number of proprietary tools, services and applications. The specific targets are as follows: to provide a central repository and focus for all fraud intelligence data independent of the platform or product; to efficiently execute complex queries; to provide analytical tools, such as data mining, reporting, and direct querying; and to operate without intervention using a production application.
Use case
Creating a data structure for each data source. Each data source may focus on one or more aspects of fraud (IP address, account number, etc.) and the result may contain different data elements. Access to some data sources may be tenant specific. The repository should: providing a data structure specific to each data source to support all incoming data elements; storing all records in each data source; and providing access at the tenant level.
Automated method supporting data collection. The data source may provide an automated transmission method. The repository is intended to solve the following tasks: support for SFTP via push or pull; support for CSV and separator (tab, vertical line, etc.) formats; and tracking the source and submission date of each file.
Supporting manual entries by insiders. Some data sources are currently downloaded manually. Furthermore, the inside staff may be informed of the contributed intelligence, e.g. by telephone with the customer, and should be able to enter the record directly. The repository is intended to solve the following tasks: supporting upload of files for a data source; formats that support CSV and tab separation; support adding manual entries; support editing/deleting manual entries (in case of errors); and tracking internal staff performing file uploads and manual entries.
Fraudulent usage data from tenant reports. Frandmap online may enable tenants to mark cases as fraudulent. The data elements in these cases may be used by a repository where they may be used as identifiers for fraud across other tenants and/or shared out to third party sources if needed. The repository is intended to collect available data elements from tenant reported fraud, including: IP address, user agent string, destination account information, date of occurrence, modified profile information such as email address, phone number, and other elements of interest.
Using and with other interior partsData source synthesis. There are other internal data sources that contain information that can be used for fraud intelligence. The solution will be enhanced to collect several technical data elements that can be used for device fingerprinting and indicating the harm of the computer. Furthermore, IP blacklist (IPBL) proposals that are being developed internally contain suspicious IP addresses that, when seen in tenant data, will indicate a degree of increased risk. This is functionally similar to that of the IP address classification table. The repository should use the beacon data as a data source and integrate with the IPBL into a data source or part of the repository itself (IPBL can actually become a wider internal process in the repository).
Providing a query interface. The repository may be used as a research tool. Being able to query for specific attributes may enhance the link analysis activities of the internal personnel. Access to the data in the repository will be optimally facilitated by the query interface. The interface may enable an authorized person to query the records for one or more specific attributes. Some of these attributes are date or date range, IP address, email address, account number, RTN number, user agent string, internal data elements, report source, and malware information.
The query interface may also enable logical queries (AND, OR, NOT) AND wildcard searches (windows) to be performed across multiple search criteria. For example:
IP address 192.168.0.1OR192.168.0.2AND user agent NOT windows
Exposing informative data to FraudMAP applications. The real value of the fraud intelligence data warehouse is to leverage data in the FraudMAP applications (online, ACH, Mobile, wire transfer, and API). This would allow tenants and the FraudDESK personnel to properly review and respond to fraud intelligence data in near real-time. There are a number of ways in which this data can be used. Some of these ways include information notification-transfer to possible mule account, risk factors-login from IP address confirmed to be associated with fraud, direct tenant notification-compromised user credentials, and feed to IPDB-suspect IP address reported by third party sourceAnd (5) feeding.
Providing data mining/exception reporting functionality. With the integration of fraud intelligence data into the frandmap application, providing the ability to mine data at the backend would be valuable for identifying suspicious activities. This capability is similar to and can be branched from the monitoring capability proposed in the IPBL proposal or the suspicious account (mule) report. The high-level workflow may be as follows: on a periodic basis, specific fraud intelligence data (e.g., known fraud accounts) will be queried for tenant data, matches will be collected and output as exception reports, and personnel will review the results and notify the tenant of possible fraudulent activity.
Feedback on confirmed fraud intelligence records. When fraud intelligence causes fraud to be identified within the FraudMAP product, it may be possible to flag the data. Considering the context of use case G, this would imply a two-way communication between the repository and the frandmap application. In other words, the FraudMAP application will fetch the intelligence from the repository and push back a confirmed hit on the data. For example, if the IP address identified by the NCFTA data is confirmed as an account that is taken over in the tenant session, there is a need to be an interface so that fraud can be fed back to the repository as a confirmation. This is the logic to introduce the following use cases: with the emphasis on sharing the data back to the various sources from which it came.
Providing outbound intelligence sharing. Part of the benefit of third party intelligence data is the ability to reward sharing of data from confirmed fraud. The current process for sharing is a manual process by FraudDESK. Automating this process may make sharing more efficient and may require less resources of the FraudDESK. Moreover, exposing automated processes for sharing to tenants may enable the automated processes to contribute while maintaining control over what they want to share. By reviewing known fraud and correlating the known fraud with data from third party sources, the repository should enable authenticated personnel and tenants to select relevant data fields for co-operation The method includes automatically selecting a source from which to share data, selecting other data sources to share, packaging the data into an input format usable by the data sources, transferring the data to all selected sources, and tracking submissions.
Providing tracking and performance metrics. To know what sources produce results that can work, the repository should be able to track activities and report performance metrics. How this will be done exactly can be explored in more detail to ensure that the repository is tracking the relevant data for reporting. At a minimum, the repository should be able to track when records from the data source link to confirmed fraud in the tenant's data and to generate trend reports on relationships.
Direct data service. One option for utilizing the data in the repository is to provide it directly to the tenant without integration into a particular frandmap product or application. This may enable tenants to select the type of data they are interested in and enable them to review and use the data according to their own processes. The advantage of this approach is that the repository will provide the data to the tenant but will leave the application of the data to the tenant itself. In this regard, the repository will serve as a conduit for intelligence data. In this context, FraudXchange would be well suited for the delivery mechanism.
Data classification
Each feed may contain certain data elements that may be used for analysis or data mining activities. Some feeds contain a number of useful elements. Also, some elements are present in multiple feeds. Classifying these elements can help organize data from all feeds into a source-independent structure that can be used for analysis. These categories include, but are not limited to, account information (routing/account numbers, SWIFT/IBAN numbers, prepaid card numbers), IP addresses (suspicious, confirmed fraudulent activity or known compromised computers), compromised credentials, email addresses, phone numbers and physical addresses.
Sorting the data in this manner enables aggregation of data elements from multiple feeds into one normalized data source that can be used by a risk engine, data mining, direct querying, or other internal process without understanding the format of each data source. In theory, this would enable new data sources to come online in the future without modifying the processing that uses the data.
Source of third party data. There are many third parties that provide threat data feeds. The types of threat intelligence in these feeds are very diverse. Data relating to online or offline fraud is most valuable for purposes of fraud intelligence data warehousing. These data feeds include the following regions of interest:
Internal data-data from the FraudMAP product applied across tenants
Active botnet activity-malware focused on financial hints
Suspicious Account-known mules and/or used in confirmed fraud (origin or destination)
Compromised voucher-dedicated to online banking platform
Prepaid card accounts-more and more destinations for cash-out
In addition, the source of malware intelligence will be evaluated. These sources may provide intelligence such as automated activity sequences, platform/tenant specific goals, and indicators that may be used to identify compromised computers.
Third party data description. A summary of the data of interest that should be presented in a third party data feed containing information about previously listed data categories is as follows: suspicious account-source, reported data, account holder name, company name, address, email, phone number, bank RTN number, bank account number, bank, SWIFT/IBAN, debit/credit card number, amount of attempted, data occurred, transaction date/time, different destination accounts, different source accounts; suspicious IPAddress-source, date reported, IP address, user agent string, URL, domain name; compromised credential-source, date reported, user login ID, IP address, geo-location information, login domain name, compromised date, malware name; malware data-source, date of report, date of infection, malware name/family, malware severity, development URL, download URL, command and control URL, and let-go server URL. This is not an exhaustive list, but details the most critical data fields currently known to obtain the intelligence that the data warehouse can work with.
Aspects of the FPS described herein can be implemented as functionality programmed into any of a variety of circuits, including Programmable Logic Devices (PLDs), such as Field Programmable Gate Arrays (FPGAs), Programmable Array Logic (PAL) devices, electrically programmable logic and memory devices and standard cell based devices, and Application Specific Integrated Circuits (ASICs). Some other possibilities for implementing aspects of the FPS include: a microcontroller with memory, such as Electrically Erasable Programmable Read Only Memory (EEPROM), an embedded microprocessor, firmware, software, and the like. Furthermore, aspects of the FPS may be implemented in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and any mix of the above device types. Of course, basic device technologies may be provided in a variety of component types, such as Metal Oxide Semiconductor Field Effect Transistor (MOSFET) technologies like Complementary Metal Oxide Semiconductor (CMOS), bipolar technologies like Emitter Coupled Logic (ECL), polymer technologies (e.g., silicon conjugated polymer and metal conjugated polymer metal structures), hybrid analog and digital, and so forth.
It should be noted that any of the systems, methods, and/or other components disclosed herein may be described using computer-aided design tools and represented (or represented) in terms of their behavior, register transfer, logic components, transistors, layout geometries, and/or other characteristics, such as data and/or instructions embodied in various computer-readable media. Computer-readable media on which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media), and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signal transmission media or any combination thereof. Examples of transfers of such formatted data and/or instructions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mails, etc.) over the internet and/or other computer networks via one or more data transfer protocols (e.g., HTTP, FTP, SMTP, etc.). When such data and/or instruction-based representations of the above-described components are received within a computer system via one or more computer-readable media, they may be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with execution of one or more other computer programs.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is intended to mean "including, but not limited to". Words using the singular or plural number also include the plural or singular number, respectively. Further, as used herein, the words "herein," "below," "above," "below," and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word "or" is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any item in the list, all items in the list, and combinations of items in the list.
The above description of embodiments of the FPS is not intended to be exhaustive or to limit the systems and methods to the precise forms disclosed. While specific embodiments of, and examples for, the FPS are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the system and method, as those skilled in the relevant art will recognize. The teachings of the FPS provided herein can be applied to other systems and methods, not just the systems and methods described above.
The elements and acts of the various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the FPS in accordance with the above detailed description.
In general, in the following claims, the terms used should not be construed to limit the FPS to the specific embodiments disclosed in the specification and the claims, but should be construed to include all systems that function in accordance with the claims. Accordingly, the FPS is not limited by the disclosure, but rather the scope of the FPS is to be determined entirely by the claims.
While some aspects of an FPS are presented in some claim forms, the inventors contemplate the aspects of the FPS in any number of claim forms. Accordingly, the inventors reserve the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the FPS.
Embodiments described herein include additional components as described in detail below.

Claims (2)

1. A system, comprising:
a platform comprising a processor coupled to at least one database;
a plurality of risk engines coupled to the platform, the plurality of risk engines receiving event data and risk data from a plurality of data sources including at least one financial application, wherein the event data includes data of actions taken in a target account during electronic access to the target account, wherein the risk data includes data of actions taken in a plurality of accounts different from the target account, wherein the plurality of risk engines use the event data and the risk data to dynamically generate an account model corresponding to the target account and use the account model to generate a risk score that is a relative likelihood that an action taken in the target account is fraudulent; and
a risk application coupled to the platform and including an analytics user interface that displays at least one of the risk score and the event data for any event in the target account for the action in the target account.
2. A method, comprising:
receiving, at a plurality of risk engines, event data and risk data from a plurality of data sources including at least one financial application, wherein the event data includes data of actions taken in a target account during electronic access to the target account, wherein the risk data includes data of actions taken in a plurality of accounts different from the target account;
dynamically generating an account model corresponding to the target account, the generating using the event data and the risk data;
generating a risk score using the account model, wherein the risk score is a relative likelihood that an action taken in the target account is fraudulent; and
presenting an analytics user interface that displays at least one of the risk score and the event data for any event in the target account for the action in the target account.
CN201480026670.9A 2013-03-13 2014-03-13 Fraud detection and analysis Pending CN105556552A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201361779472P 2013-03-13 2013-03-13
US61/779,472 2013-03-13
PCT/US2014/026264 WO2014160296A1 (en) 2013-03-13 2014-03-13 Fraud detection and analysis

Publications (1)

Publication Number Publication Date
CN105556552A true CN105556552A (en) 2016-05-04

Family

ID=51625388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480026670.9A Pending CN105556552A (en) 2013-03-13 2014-03-13 Fraud detection and analysis

Country Status (4)

Country Link
EP (1) EP2973282A4 (en)
CN (1) CN105556552A (en)
CA (1) CA2905996C (en)
WO (1) WO2014160296A1 (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106802879A (en) * 2017-01-13 2017-06-06 大连理工大学 A kind of structure monitoring data exception recognition methods based on multivariate statistical analysis
WO2018014811A1 (en) * 2016-07-22 2018-01-25 阿里巴巴集团控股有限公司 Risk identification method, client device, and risk identification system
CN107645533A (en) * 2016-07-22 2018-01-30 阿里巴巴集团控股有限公司 Data processing method, data transmission method for uplink, Risk Identification Method and equipment
CN108038692A (en) * 2017-11-06 2018-05-15 阿里巴巴集团控股有限公司 Role recognition method, device and server
WO2018099276A1 (en) * 2016-11-30 2018-06-07 阿里巴巴集团控股有限公司 Identity authentication method and apparatus, and computing device
CN108512822A (en) * 2017-02-28 2018-09-07 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device of data processing event
CN108989359A (en) * 2018-10-12 2018-12-11 苏州创旅天下信息技术有限公司 Method for verifying login and system, the readable storage medium storing program for executing and terminal of server cluster
CN109120429A (en) * 2017-06-26 2019-01-01 苏宁云商集团股份有限公司 A kind of Risk Identification Method and system
CN109120428A (en) * 2017-06-26 2019-01-01 苏宁云商集团股份有限公司 A kind of method and system for air control analysis
CN109213736A (en) * 2017-06-29 2019-01-15 阿里巴巴集团控股有限公司 The compression method and device of log
WO2019062697A1 (en) * 2017-09-27 2019-04-04 阿里巴巴集团控股有限公司 Method and device for virtual resource allocation, model establishment and data prediction
CN109670930A (en) * 2018-09-13 2019-04-23 深圳壹账通智能科技有限公司 Rogue device recognition methods, device, equipment and computer readable storage medium
CN109690547A (en) * 2016-07-11 2019-04-26 比特梵德知识产权管理有限公司 For detecting the system and method cheated online
CN109934706A (en) * 2017-12-15 2019-06-25 阿里巴巴集团控股有限公司 A kind of transaction risk control method, apparatus and equipment based on graph structure model
WO2019119260A1 (en) * 2017-12-19 2019-06-27 Paypal Inc Universal model scoring engine
CN110084525A (en) * 2019-05-05 2019-08-02 重庆天蓬网络有限公司 A kind of work management engine method and device based on business demand
CN110210868A (en) * 2019-05-20 2019-09-06 腾讯科技(深圳)有限公司 The processing method and electronic equipment of numerical value transfer data
TWI673669B (en) * 2016-07-21 2019-10-01 香港商阿里巴巴集團服務有限公司 Modeling method and device for evaluating model
CN110585719A (en) * 2019-09-08 2019-12-20 北京智明星通科技股份有限公司 Method, device and server for identifying potential cheating players of mobile phone game
CN110827317A (en) * 2019-11-04 2020-02-21 西安邮电大学 FPGA-based four-eye moving target detection and identification device and method
CN111724250A (en) * 2020-06-29 2020-09-29 深圳壹账通智能科技有限公司 Risk propagation determination method and device, computer system and readable storage medium
CN112016851A (en) * 2020-09-14 2020-12-01 支付宝(杭州)信息技术有限公司 Management method and device for information disclosure
CN112567710A (en) * 2018-08-09 2021-03-26 微软技术许可有限责任公司 System and method for polluting phishing activity responses
CN112765003A (en) * 2020-12-31 2021-05-07 北方工业大学 Risk prediction method based on APP behavior log
CN114143807A (en) * 2021-10-27 2022-03-04 中盈优创资讯科技有限公司 Method and device for evaluating integrity rate of route registration
US11526936B2 (en) 2017-12-15 2022-12-13 Advanced New Technologies Co., Ltd. Graphical structure model-based credit risk control
CN115563284A (en) * 2022-10-24 2023-01-03 重庆理工大学 Deep multi-instance weak supervision text classification method based on semantics
CN118469298A (en) * 2024-07-09 2024-08-09 成都冰鉴信息科技有限公司 Business risk prediction method and system based on big data analysis

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009152465A1 (en) 2008-06-12 2009-12-17 Guardian Analytics, Inc. Modeling users for fraud detection and analysis
US10290053B2 (en) 2009-06-12 2019-05-14 Guardian Analytics, Inc. Fraud detection and analysis
CN105809502A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Transaction risk detection method and apparatus
CN104822156B (en) * 2015-04-01 2018-12-11 中国联合网络通信集团有限公司 A kind of method and device of user behavior analysis
US10580006B2 (en) * 2015-07-13 2020-03-03 Mastercard International Incorporated System and method of managing data injection into an executing data processing system
US11423414B2 (en) * 2016-03-18 2022-08-23 Fair Isaac Corporation Advanced learning system for detection and prevention of money laundering
US10341369B2 (en) 2016-03-29 2019-07-02 Ncr Corporation Security system monitoring techniques by mapping received security score with newly identified security score
US10938844B2 (en) * 2016-07-22 2021-03-02 At&T Intellectual Property I, L.P. Providing security through characterizing mobile traffic by domain names
CN106529919A (en) * 2016-10-24 2017-03-22 安徽百慕文化科技有限公司 Third-party online payment collaborative management system
CN108346048B (en) * 2017-01-23 2020-07-28 阿里巴巴集团控股有限公司 Method for adjusting risk parameters, risk identification method and risk identification device
AU2018291009B2 (en) * 2017-06-30 2023-02-23 Equifax Inc. Detecting synthetic online entities facilitated by primary entities
CN107909274B (en) * 2017-11-17 2023-02-28 平安科技(深圳)有限公司 Enterprise investment risk assessment method and device and storage medium
EP3791296A1 (en) * 2018-05-08 2021-03-17 ABC Software, SIA A system and a method for sequential anomaly revealing in a computer network
WO2020069624A1 (en) 2018-10-05 2020-04-09 Mastercard Technologies Canada ULC Account recommendation based on server-side, persistent device identification
TR201906682A2 (en) * 2019-05-06 2019-06-21 Turkcell Teknoloji Arastirma Ve Gelistirme Anonim Sirketi A FRAUD DETECTION SYSTEM
US11157776B2 (en) 2019-09-20 2021-10-26 International Business Machines Corporation Systems and methods for maintaining data privacy in a shared detection model system
US11188320B2 (en) 2019-09-20 2021-11-30 International Business Machines Corporation Systems and methods for updating detection models and maintaining data privacy
US11080352B2 (en) 2019-09-20 2021-08-03 International Business Machines Corporation Systems and methods for maintaining data privacy in a shared detection model system
US11216268B2 (en) 2019-09-20 2022-01-04 International Business Machines Corporation Systems and methods for updating detection models and maintaining data privacy
US11012861B1 (en) 2020-01-09 2021-05-18 Allstate Insurance Company Fraud-detection based on geolocation data
US11538040B2 (en) * 2020-02-12 2022-12-27 Jpmorgan Chase Bank, N.A. Systems and methods for account validation
WO2022073116A1 (en) * 2020-10-06 2022-04-14 Bank Of Montreal Systems and methods for predicting operational events
CA3133404A1 (en) * 2020-10-06 2022-04-06 Bank Of Montreal Systems and methods for predicting operational events
CN112529505B (en) * 2020-12-21 2024-02-27 北京顺达同行科技有限公司 Method and device for detecting illegal bill, and readable storage medium
US20220358504A1 (en) * 2021-04-28 2022-11-10 Actimize Ltd. Estimating quantile values for reduced memory and/or storage utilization and faster processing time in fraud detection systems
US20220398310A1 (en) * 2021-06-09 2022-12-15 Mastercard Technologies Canada ULC Sftp batch processing and credentials api for offline fraud assessment
CN113243918B (en) * 2021-06-11 2021-11-30 深圳般若计算机系统股份有限公司 Risk detection method and device based on multi-mode hidden information test
US20220405659A1 (en) * 2021-06-16 2022-12-22 International Business Machines Corporation Data-driven automated model impact analysis
CN114462018B (en) * 2022-01-10 2023-05-30 电子科技大学 Password guessing system and method based on transducer model and deep reinforcement learning
US12088463B1 (en) 2023-01-27 2024-09-10 Wells Fargo Bank, N.A. Automated configuration of software applications

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102203724A (en) * 2008-06-12 2011-09-28 加迪安分析有限公司 Modeling users for fraud detection and analysis
CN102812488A (en) * 2010-02-08 2012-12-05 维萨国际服务协会 Fraud reduction system for transactions

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119103A (en) * 1997-05-27 2000-09-12 Visa International Service Association Financial risk prediction systems and methods therefor
US8924279B2 (en) * 2009-05-07 2014-12-30 Visa U.S.A. Inc. Risk assessment rule set application for fraud prevention
US8626663B2 (en) * 2010-03-23 2014-01-07 Visa International Service Association Merchant fraud risk score
CA2821095C (en) * 2010-12-14 2018-10-02 Early Warning Services, Llc System and method for detecting fraudulent account access and transfers

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102203724A (en) * 2008-06-12 2011-09-28 加迪安分析有限公司 Modeling users for fraud detection and analysis
CN102812488A (en) * 2010-02-08 2012-12-05 维萨国际服务协会 Fraud reduction system for transactions

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109690547A (en) * 2016-07-11 2019-04-26 比特梵德知识产权管理有限公司 For detecting the system and method cheated online
CN109690547B (en) * 2016-07-11 2023-05-05 比特梵德知识产权管理有限公司 System and method for detecting online fraud
TWI673669B (en) * 2016-07-21 2019-10-01 香港商阿里巴巴集團服務有限公司 Modeling method and device for evaluating model
WO2018014811A1 (en) * 2016-07-22 2018-01-25 阿里巴巴集团控股有限公司 Risk identification method, client device, and risk identification system
CN107645533A (en) * 2016-07-22 2018-01-30 阿里巴巴集团控股有限公司 Data processing method, data transmission method for uplink, Risk Identification Method and equipment
JP2019533851A (en) * 2016-07-22 2019-11-21 アリババ グループ ホウルディング リミテッド Aggregation of service data for transmission and risk analysis
US11570194B2 (en) 2016-07-22 2023-01-31 Advanced New Technologies Co., Ltd. Identifying high risk computing operations
US11075938B2 (en) 2016-07-22 2021-07-27 Advanced New Technologies Co., Ltd. Identifying high risk computing operations
WO2018099276A1 (en) * 2016-11-30 2018-06-07 阿里巴巴集团控股有限公司 Identity authentication method and apparatus, and computing device
CN106802879A (en) * 2017-01-13 2017-06-06 大连理工大学 A kind of structure monitoring data exception recognition methods based on multivariate statistical analysis
CN108512822A (en) * 2017-02-28 2018-09-07 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device of data processing event
CN109120428B (en) * 2017-06-26 2022-04-19 南京星云数字技术有限公司 Method and system for wind control analysis
CN109120429B (en) * 2017-06-26 2022-04-15 南京星云数字技术有限公司 Risk identification method and system
CN109120428A (en) * 2017-06-26 2019-01-01 苏宁云商集团股份有限公司 A kind of method and system for air control analysis
CN109120429A (en) * 2017-06-26 2019-01-01 苏宁云商集团股份有限公司 A kind of Risk Identification Method and system
CN109213736A (en) * 2017-06-29 2019-01-15 阿里巴巴集团控股有限公司 The compression method and device of log
US10891161B2 (en) 2017-09-27 2021-01-12 Advanced New Technologies Co., Ltd. Method and device for virtual resource allocation, modeling, and data prediction
US10691494B2 (en) 2017-09-27 2020-06-23 Alibaba Group Holding Limited Method and device for virtual resource allocation, modeling, and data prediction
WO2019062697A1 (en) * 2017-09-27 2019-04-04 阿里巴巴集团控股有限公司 Method and device for virtual resource allocation, model establishment and data prediction
CN108038692B (en) * 2017-11-06 2021-06-01 创新先进技术有限公司 Role identification method and device and server
CN108038692A (en) * 2017-11-06 2018-05-15 阿里巴巴集团控股有限公司 Role recognition method, device and server
US11526766B2 (en) 2017-12-15 2022-12-13 Advanced New Technologies Co., Ltd. Graphical structure model-based transaction risk control
US11526936B2 (en) 2017-12-15 2022-12-13 Advanced New Technologies Co., Ltd. Graphical structure model-based credit risk control
CN109934706B (en) * 2017-12-15 2021-10-29 创新先进技术有限公司 Transaction risk control method, device and equipment based on graph structure model
CN109934706A (en) * 2017-12-15 2019-06-25 阿里巴巴集团控股有限公司 A kind of transaction risk control method, apparatus and equipment based on graph structure model
WO2019119260A1 (en) * 2017-12-19 2019-06-27 Paypal Inc Universal model scoring engine
US11080637B2 (en) 2017-12-19 2021-08-03 Paypal, Inc. Universal model scoring engine
US11615362B2 (en) 2017-12-19 2023-03-28 Paypal, Inc. Universal model scoring engine
CN112567710A (en) * 2018-08-09 2021-03-26 微软技术许可有限责任公司 System and method for polluting phishing activity responses
US12015639B2 (en) 2018-08-09 2024-06-18 Microsoft Technology Licensing, Llc Systems and methods for polluting phishing campaign responses
CN112567710B (en) * 2018-08-09 2023-08-18 微软技术许可有限责任公司 System and method for contaminating phishing campaign responses
CN109670930A (en) * 2018-09-13 2019-04-23 深圳壹账通智能科技有限公司 Rogue device recognition methods, device, equipment and computer readable storage medium
CN108989359A (en) * 2018-10-12 2018-12-11 苏州创旅天下信息技术有限公司 Method for verifying login and system, the readable storage medium storing program for executing and terminal of server cluster
CN110084525A (en) * 2019-05-05 2019-08-02 重庆天蓬网络有限公司 A kind of work management engine method and device based on business demand
CN110210868A (en) * 2019-05-20 2019-09-06 腾讯科技(深圳)有限公司 The processing method and electronic equipment of numerical value transfer data
CN110210868B (en) * 2019-05-20 2022-12-30 腾讯科技(深圳)有限公司 Numerical value transfer data processing method and electronic equipment
CN110585719A (en) * 2019-09-08 2019-12-20 北京智明星通科技股份有限公司 Method, device and server for identifying potential cheating players of mobile phone game
CN110827317A (en) * 2019-11-04 2020-02-21 西安邮电大学 FPGA-based four-eye moving target detection and identification device and method
CN110827317B (en) * 2019-11-04 2023-05-12 西安邮电大学 Four-eye moving object detection and identification equipment and method based on FPGA
CN111724250A (en) * 2020-06-29 2020-09-29 深圳壹账通智能科技有限公司 Risk propagation determination method and device, computer system and readable storage medium
CN112016851B (en) * 2020-09-14 2022-11-08 支付宝(杭州)信息技术有限公司 Management method and device for information disclosure
CN112016851A (en) * 2020-09-14 2020-12-01 支付宝(杭州)信息技术有限公司 Management method and device for information disclosure
CN112765003A (en) * 2020-12-31 2021-05-07 北方工业大学 Risk prediction method based on APP behavior log
CN112765003B (en) * 2020-12-31 2021-09-14 北方工业大学 Risk prediction method based on APP behavior log
CN114143807A (en) * 2021-10-27 2022-03-04 中盈优创资讯科技有限公司 Method and device for evaluating integrity rate of route registration
CN114143807B (en) * 2021-10-27 2023-08-08 中盈优创资讯科技有限公司 Route registration integrity rate evaluation method and device
CN115563284A (en) * 2022-10-24 2023-01-03 重庆理工大学 Deep multi-instance weak supervision text classification method based on semantics
CN118469298A (en) * 2024-07-09 2024-08-09 成都冰鉴信息科技有限公司 Business risk prediction method and system based on big data analysis

Also Published As

Publication number Publication date
EP2973282A4 (en) 2016-11-16
CA2905996A1 (en) 2014-10-02
EP2973282A1 (en) 2016-01-20
WO2014160296A1 (en) 2014-10-02
CA2905996C (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN105556552A (en) Fraud detection and analysis
US10290053B2 (en) Fraud detection and analysis
US11755628B2 (en) Data relationships storage platform
US11354301B2 (en) Multi-system operation audit log
US11755586B2 (en) Generating enriched events using enriched data and extracted features
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
US11625371B2 (en) Automated audit balance and control processes for data stores
US10657530B2 (en) Automated transactions clearing system and method
Southerton Datafication
CN110188103A (en) Data account checking method, device, equipment and storage medium
US20230409565A1 (en) Data aggregation with microservices
Alghushairy et al. Data storage
Imran et al. Data provenance
US20140122163A1 (en) External operational risk analysis
US11810012B2 (en) Identifying event distributions using interrelated events
WO2023175413A1 (en) Mutual exclusion data class analysis in data governance
Hogan Data center
US11593406B2 (en) Dynamic search parameter modification
EP2698966A2 (en) Tracking end-users in web databases
Wen Data sharing
Zhang Data Synthesis
US20230044695A1 (en) System and method for a scalable dynamic anomaly detector
Chen Digital Ecosystem
Agrawal Data Virtualization
Ferreira Demographic Data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160504