Nothing Special   »   [go: up one dir, main page]

US9141686B2 - Risk analysis using unstructured data - Google Patents

Risk analysis using unstructured data Download PDF

Info

Publication number
US9141686B2
US9141686B2 US13/672,012 US201213672012A US9141686B2 US 9141686 B2 US9141686 B2 US 9141686B2 US 201213672012 A US201213672012 A US 201213672012A US 9141686 B2 US9141686 B2 US 9141686B2
Authority
US
United States
Prior art keywords
words
individual
structured form
organization
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/672,012
Other versions
US20140129561A1 (en
Inventor
Daniel C. Kern
David A. Hogeboom
Anne Bromstead
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of America Corp
Original Assignee
Bank of America Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of America Corp filed Critical Bank of America Corp
Priority to US13/672,012 priority Critical patent/US9141686B2/en
Assigned to BANK OF AMERICA CORPORATION reassignment BANK OF AMERICA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BROMSTEAD, ANNE, HOGEBOOM, DAVID A., KERN, DANIEL C.
Publication of US20140129561A1 publication Critical patent/US20140129561A1/en
Application granted granted Critical
Publication of US9141686B2 publication Critical patent/US9141686B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • G06F17/30598

Definitions

  • This invention relates generally to risk analysis, and more particularly to risk analysis using unstructured data.
  • Information about risks an organization faces or could face is documented in a variety of systems and sources. This information usually contains unstructured information (i.e., bodies of text) and is not typically easy to quantify and automatically analyze. The information is typically written to allow readers to understand the intended message. While a reader may read small bodies of separate texts, it is impracticable for a reader to read and summarize many bodies of text in a reasonable amount of time.
  • disadvantages and problems associated with analyzing risk using unstructured data may be reduced or eliminated.
  • unstructured data is received from a plurality of sources to facilitate risk analysis.
  • the unstructured data comprises a plurality of bodies of text.
  • Each body of text from the unstructured data is deconstructed into individual terms.
  • the individual terms from each body of text are converted into a structured form.
  • the individual terms in the structured form are categorized according to a comparison of the structured form to another structured form.
  • the individual terms in the structured form are quantified according to at least the categorization of the individual terms.
  • Certain embodiments of the present disclosure may provide one or more technical advantages.
  • a technical advantage of one embodiment includes extracting, analyzing, and summarizing useful information from various, unstructured sources to manage operational risks.
  • Another technical advantage of an embodiment includes transforming unstructured data into a structured form to determine risk.
  • Yet another technical advantage of an embodiment includes identifying and aggregating risks across an organization to manage.
  • FIG. 1 illustrates a block diagram of a system for analyzing risk using unstructured data
  • FIG. 2 illustrates an example flowchart that analyzes risk using unstructured data.
  • FIGS. 1 through 2 of the drawings like numerals being used for like and corresponding parts of the various drawings.
  • Unstructured data represents data that has no easily identifiable structure or consistent and recurring patterns. It is difficult for a reader to understand and summarize a lot of information in a reasonable amount of time. Unstructured data may include substantial text, which results in irregularities and ambiguities that make it difficult for a computer to understand. Therefore, it is advantageous to provide a system and method that employs text mining techniques on bodies of unstructured text to determine patterns of risks, identify emerging risks, and compare external risk data to internal risk data.
  • FIG. 1 illustrates a block diagram of a system for analyzing risk using unstructured data.
  • System 10 includes computers 12 , data sources 18 , a competitor database 20 , a vendor database 22 , and a marketing database 24 that communicate over one or more networks 16 with risk analysis module 26 to facilitate the structuring of unstructured data.
  • the unstructured data is structured to determine patterns of risk, identify emerging risks, and compare external risk data to internal risk data.
  • organization 11 comprises computers 12 , competitor database 20 , vendor database 22 , marketing database 24 , and risk analysis module 26 .
  • Organization 11 represents an entity in any suitable industry that manages risk.
  • Organization 11 may include companies of any suitable size that evaluate operational risk to manage and identify risk of the organization.
  • Third parties may include any suitable entity that is external to organization 11 , such as vendors of organization 11 , competitors of organization 11 , or entities in industries different from organization 11 .
  • System 10 includes computers 12 a - 12 n , where n represents any suitable number, that communicate with risk analysis module 26 through network 16 .
  • computer 12 a communicates with risk analysis module 26 to identify the sources from which to retrieve unstructured data.
  • computer 12 receives quantified and structured data from risk analysis module 26 in a graphical format.
  • risk managers, associates, employees, or other suitable individuals in the organization use computer 12 .
  • Computer 12 may include a personal computer, a workstation, a laptop, a wireless or cellular telephone, an electronic notebook, a personal digital assistant, a smartphone, a netbook, a tablet, a slate personal computer, or any other device (wireless, wireline, or otherwise) capable of receiving, processing, storing, and/or communicating information with other components of system 10 .
  • Computer 12 may also comprise a user interface, such as a display, keyboard, mouse, or other appropriate terminal equipment.
  • GUI 14 graphical user interface
  • GUI 14 may display analyzed external data in a particular format to a user of computer 12 .
  • GUI 14 is generally operable to tailor and filter data entered by and presented to the user.
  • GUI 14 may provide the user with an efficient and user-friendly presentation of information using a plurality of displays having interactive fields, pull-down lists, and buttons operated by the user.
  • GUI 14 may include multiple levels of abstraction including groupings and boundaries. It should be understood that the term GUI 14 may be used in the singular or in the plural to describe one or more GUIs 14 in each of the displays of a particular GUI 14 .
  • Network 16 represents any suitable network operable to facilitate communication between the components of system 10 , such as computers 12 , data sources 18 , competitor database 20 , vendor database 22 , marketing database 24 , and risk analysis module 26 .
  • Network 16 may include any interconnecting system capable of transmitting audio, video, signals, data, messages, or any combination of the preceding.
  • Network 16 may include all or a portion of a public switched telephone network (PSTN), a public or private data network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a local, regional, or global communication or computer network, such as the Internet, a wireline or wireless network, an enterprise intranet, or any other suitable communication link, including combinations thereof, operable to facilitate communication between the components.
  • PSTN public switched telephone network
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • Internet a local, regional, or global communication or computer network
  • wireline or wireless network such as the Internet
  • enterprise intranet or any other suitable
  • Data sources 18 represent components that are external to organization 11 that provide unstructured data associated with organization 11 and/or third parties to risk analysis module 26 .
  • Data sources 18 may provide unbiased, independent information for analysis.
  • data source 18 may include regulatory filings associated with third parties or organization 11 , such as filings made with the Security Exchange Commission (e.g., 10Ks and 10Qs).
  • Data sources 18 may also include press releases, news, events, subscription-based information, or any other digital media that may be related to organization 11 or a third party.
  • data sources 18 may include independent professional research materials. Data sources 18 may be scanned for targeted, repeatable information.
  • Data sources 18 may include a network server, any suitable remote server, a mainframe, a host computer, a workstation, a web server, a personal computer, a file server, or any other suitable device operable to communicate with other components in system 10 and process data.
  • data source 18 may execute any suitable operating system such as IBM's zSeries/Operating System (z/OS), MS-DOS, PC-DOS, MAC-OS, WINDOWS, a .NET environment, UNIX, OpenVMS, or any other appropriate operating system, including future operating systems.
  • the functions of data source 18 may be performed by any suitable combination of one or more servers or other components at one or more locations.
  • the server may be a private server, and the server may be a virtual or physical server.
  • data source 18 may include any suitable component that functions as a server.
  • Competitor database 20 stores, either permanently or temporarily, information associated with competitors of organization 11 .
  • Competitor database 20 is within organization 11 and represents information that organization 11 compiles associated with its competitors.
  • the information stored in competitor database 20 may include, but is not limited to, press releases, regulatory filing information, professional research materials, or other suitable competitor analysis information.
  • Risk analysis module 26 may communicate with competitor database 20 to receive information associated with competitors of organization 11 .
  • Competitor database 20 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information.
  • competitor database 20 may include Random Access Memory (RAM), Read Only Memory (ROM), magnetic storage devices, optical storage devices, or any other suitable information storage device or combination of these devices.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • magnetic storage devices magnetic storage devices
  • optical storage devices or any other suitable information storage device or combination of these devices.
  • Vendor database 22 stores, either permanently or temporarily, information associated with vendors of organization 11 . Vendor database 22 is within organization 11 and represents information that organization 11 compiles associated with its vendors. The information stored in vendor database 22 may include, but is not limited to, press releases, regulatory filing information, professional research materials, performance information, relationship information, financial data, or other suitable vendor analysis information. Risk analysis module 26 may communicate with vendor database 22 to receive information associated with vendors of organization 11 . Vendor database 22 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information. For example, vendor database 22 may include RAM, ROM, magnetic storage devices, optical storage devices, or any other suitable information storage device or combination of these devices.
  • Marketing database 24 stores, either permanently or temporarily, information associated with organization 11 or other third parties.
  • Marketing database 24 is within organization 11 and represents information that organization 11 compiles regarding itself and third parties.
  • marketing database 24 stores information on third parties that are not vendors or competitors.
  • the information stored in marketing database 24 may include, but is not limited to, press releases, regulatory filing information, professional research materials, or other suitable marketing information.
  • Risk analysis module 26 may communicate with marketing database 24 to receive information associated with organization 11 .
  • Marketing database 24 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information.
  • marketing database 24 may include RAM, ROM, magnetic storage devices, optical storage devices, or any other suitable information storage device or combination of these devices.
  • Risk analysis module 26 represents any suitable component that facilitates the analysis of unstructured data to identify external and internal risks. Risk analysis module 26 receives data from data sources 18 , competitor database 20 , vendor database 22 , and/or marketing database 24 and analyzes the received data to identify risks or emerging risks of organization 11 . In an embodiment, risk analysis module 26 receives unstructured data from the various sources to analyze. Additionally, risk analysis module 26 may create reports based on the analysis, and may communicate the reports to computer 12 .
  • Risk analysis module 26 may include a network server, any suitable remote server, a mainframe, a host computer, a workstation, a web server, a personal computer, a file server, or any other suitable device operable to communicate with computers 12 , data sources 18 , competitor database 20 , vendor database 22 , and/or marketing database 24 .
  • risk analysis module 26 may execute any suitable operating system such as IBM's zSeries/Operating System (z/OS), MS-DOS, PC-DOS, MAC-OS, WINDOWS, UNIX, OpenVMS, or any other appropriate operating system, including future operating systems.
  • the functions of risk analysis module 26 may be performed by any suitable combination of one or more servers or other components at one or more locations.
  • risk analysis module 26 is a server
  • the server may be a private server, or the server may be a virtual or physical server.
  • the server may include one or more servers at the same or remote locations.
  • risk analysis module 26 may include any suitable component that functions as a server.
  • risk analysis module 26 includes a network interface 28 , a processor 30 , and a memory 32 .
  • Network interface 28 represents any suitable device operable to receive information from network 16 , transmit information through network 16 , perform processing of information, communicate with other devices, or any combination of the preceding.
  • network interface 28 receives competitor information from competitor database 20 .
  • network interface 28 receives information external to organization 11 from data sources 18 .
  • network interface 28 may communicate reports based on the analysis of the received data to computers 12 .
  • Network interface 28 represents any port or connection, real or virtual, including any suitable hardware and/or software, including protocol conversion and data processing capabilities, to communicate through a LAN, WAN, MAN, or other communication system that allows risk analysis module 26 to exchange information with network 16 , data sources 18 , competitor database 20 , vendor database 22 , marketing database 24 , or other components of system 10 .
  • Processor 30 communicatively couples to network interface 28 and memory 32 , and controls the operation and administration of risk analysis module 26 by processing information received from network interface 28 and memory 32 .
  • Processor 30 includes any hardware and/or software that operates to control and process information.
  • processor 30 executes analysis rules 34 to control the operation of risk analysis module 26 .
  • Processor 30 may be a programmable logic device, a microcontroller, a microprocessor, any suitable processing device, or any suitable combination of the preceding.
  • Memory 32 stores, either permanently or temporarily, data, operational software, or other information for processor 30 .
  • Memory 32 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information.
  • memory 32 may include RAM, ROM, magnetic storage devices, optical storage devices, or any other suitable information storage device or a combination of these devices. While illustrated as including particular modules, memory 32 may include any suitable information for use in the operation or risk analysis module 26 .
  • memory 32 includes analysis rules 34 and vectors 36 .
  • Analysis rules 34 generally refer to logic, rules, algorithms, code, tables, and/or other suitable instructions embodied in a computer-readable storage medium for performing the described functions and operations of risk analysis module 26 .
  • analysis rules 34 facilitate the analysis of data received by risk analysis module 26 .
  • analysis rules 34 facilitate the decomposition of unstructured data into a structured form.
  • rules 34 may facilitate categorizing the data in the structured form and quantifying the data in the structured form.
  • Vectors 36 generally refer to the structured form of the retrieved, unstructured data.
  • Vectors 36 may represent bodies of text that have been converted into a list of terms. For example, each paragraph in an article may be converted into a list of terms. Therefore, if the article includes ten paragraphs, there will be ten vectors 36 stored in risk analysis module 26 .
  • each article, press release, or other suitable compilation of information may be converted into a list of terms and the list of terms from the compilation is converted into a vector 36 . Therefore, each article, press release, or other suitable body of text is associated with a vector 36 .
  • each term in the list of terms may be quantified using any suitable technique.
  • each term in the list of terms may be associated with a number that represents the number of times the term appears in the body of text. For example, if a paragraph includes the term “risk” in it five times, vector 36 associated with that paragraph will include “risk” and the number “5” by the term.
  • risk analysis module 26 may quantify the terms based on expert opinion or structured data. Terms may also be quantified based on their association with a materialized risk. For example, if a risk has materialized, then risk analysis module 26 determines the text associated with that materialized risk, and scores the text based on the association.
  • risk analysis module 26 receives data that is internal to organization 11 , data that is external to organization 11 , and data that is internal and external to organization 11 . After receiving the data to analyze, risk analysis module 26 deconstructs a plurality of bodies of text into individual terms. Risk analysis module 26 converts the individual terms into a structured form. Once in the structured form, risk analysis module 26 categorizes individual terms and quantifies the individual terms. Upon completion of the analysis, risk analysis module 26 creates a report based on the analysis and communicates the report to computers 12 for further use within organization 11 .
  • a component of system 10 may include an interface, logic, memory, and/or other suitable element.
  • An interface receives input, sends output, processes the input and/or output and/or performs other suitable operations.
  • An interface may comprise hardware and/or software.
  • Logic performs the operation of the component, for example, logic executes instructions to generate output from input.
  • Logic may include hardware, software, and/or other logic.
  • Logic may be encoded in one or more tangible media, such as a computer-readable medium or any other suitable tangible medium, and may perform operations when executed by a computer.
  • Certain logic such as a processor, may manage the operation of a component. Examples of a processor include one or more computers, one or more microprocessors, one or more applications, and/or other logic.
  • system 10 may include any number of computers 12 , data sources 18 , competitor databases 20 , vendor databases 22 , marketing databases 24 , and risk analysis modules 26 .
  • organization 11 may include an organization credit risk database, which includes information regarding risk factors that organization 11 has in different countries. Any suitable logic may perform the functions of system 10 and the components within system 10 .
  • FIG. 2 illustrates an example flowchart that analyzes risk using unstructured data.
  • risk analysis module 26 receives unstructured data.
  • Risk analysis module 26 may receive data internal to organization 11 from competitor database 20 , vendor database 22 , and/or marketing database 24 .
  • Risk analysis module 26 may receive data external to organization 11 from data sources 18 .
  • the internal and external data may include unstructured data regarding organization 11 and/or third parties.
  • risk analysis module 26 may provide insight to the risk landscape faced by competitors in a summarized fashion.
  • risk analysis module 26 receives unstructured data internal to organization 11 , but related to various business units or lines of business within organization 11 .
  • risk analysis module 26 may receive text related to documented issues, emerging risks and risk, meeting minutes, newsletters, or other suitable internal data.
  • Receiving data internal to organization 11 but from varying parts of organization 11 , facilitates a comparison of documented issues internal to organization 11 to identify trends or patterns.
  • risk analysis module 26 receives unstructured data related to third parties and related to organization 11 , which may facilitate an analysis of emerging risks that have not materialized for organization 11 . Therefore, organization 11 may identify a risk event that materialized at a third party that aligns to documented issues at organization 11 , and mitigate that risk before it materializes for organization 11 .
  • risk analysis module 26 deconstructs the bodies of text from the unstructured data into individual terms.
  • a body of text may include any suitable division of the unstructured data.
  • a body of text may include a paragraph within the unstructured data.
  • risk analysis module 26 removes insignificant words from the individual terms.
  • insignificant words may include common words, such as “the,” “a,” “an,” and other common words.
  • Insignificant words may also include words that do not have a significant meaning for risk analysis. For example, the word “average” may be removed because it does not have particular significance in the risk context, but the word “large” may remain in the group of individual terms, which may identify a large loss, a large amount, or another significant piece of information in the risk context.
  • Risk analysis module 26 converts the remaining individual terms into a structured form in step 208 .
  • the structured form may include any suitable form that facilitates the organization of the individual terms, such as a vector, a list, or a column.
  • risk analysis module 26 may count the number of instances of each term in the body of text. For example, if a paragraph includes the term “risk” in it five times, vector 36 associated with that paragraph will include “risk” and the number “5” by the term.
  • risk analysis module 26 may quantify the terms based on expert opinion or structured data. Terms may also be quantified based on their association with a materialized risk. For example, if a risk has materialized, then risk analysis module 26 determines the text associated with that materialized risk, and scores the text based on the association.
  • risk analysis module 26 determines whether additional bodies of text from the received data need to put in structured form. If there is additional data to deconstruct and convert into structured form, the method returns to step 204 . If the received data has been deconstructed and converted, the method proceeds from step 212 .
  • risk analysis module 26 categorizes the individual terms included in the structured form. For example, risk analysis module 26 links the terms to specific categories. These categories may include, but are not limited to, the following: organization name, geographical region, size of organization, number of employees, number of countries represented, public organization, private organization, regulatory body, industry, fine amount, or any other suitable category. In an embodiment, risk analysis module 26 may recommend additional categories in which the individual terms may be categorized. When categorizing the individual terms, risk analysis module 26 may use any suitable algorithm to compare the data between a plurality of structured forms, such as a Bayesian inference. For example, risk analysis module 26 compares a plurality of structured forms to identify clusters or groups of structured forms that represent groups of similar terms. In an embodiment, risk analysis module 26 may determine the size of each cluster, which may facilitate additional review of the data, as described below with respect to step 218 .
  • risk analysis module 26 may use any suitable algorithm to compare the data between a plurality of structured forms, such as a Bayesian inference. For example, risk analysis module
  • risk analysis module 26 quantifies the terms included in the structured form. For example, risk analysis module 26 weights each term in the structured form. In an embodiment, the weighting of the terms is predefined by an administrator. In another embodiment, risk analysis module 26 may learn the significance of terms during the implementation of the method, and may determine the appropriate weighting for the terms based on past information. Additionally, risk analysis module 26 may link the terms to quantifiable data when quantifying the terms. To link a term to quantifiable data, risk analysis module 26 determines which terms have associated quantifiable data and then links the terms to the quantifiable data. For example, a term may be categorized as “fine amount” and may have quantifiable data associated with that term. Furthermore, risk analysis module 26 may link the individual terms to a third party, such as a competitor of an organization. By linking individual terms to a third party, risk analysis module 26 may summarize the data associated with the third party.
  • risk analysis module 26 may create a report in step 216 with the analyzed data.
  • the report may take any suitable form that presents the information in a graphical and/or numerical form.
  • the report may include a heat map, a bar chart, or other suitable representation.
  • This report is communicated to computers 12 in step 218 and used in various instances.
  • the reports may be analyzed to determine specific risks that need additional review.
  • risk analysis module 26 may identify a plurality of risks that are quantified as low that may need additional review.
  • risk analysis module 26 includes criteria to determine whether a plurality of low-risk issues meet a particular threshold to trigger additional review and communicates that information in the report.
  • risk analysis module 26 may identify high risk issues for additional review. In this embodiment, the single high risk issue meets a particular threshold to trigger additional review and communicates that information in the report.
  • risk analysis module 26 determines whether to repeat the analysis. Risk analysis module 26 may repeat the analysis at different points in time to identify trends, forecast potential risks, and/or compare external risks to internal risks. If the process begins again, the method continues from step 202 , otherwise the method ends. Risk analysis module 26 may receive information from the various data sources continuously, and may implement the analysis process on a predetermined schedule. For example, risk analysis module 26 may perform the analysis on a quarterly basis, on a monthly basis, on a weekly basis, or during any predetermined time period.
  • risk analysis module 26 may determine synonyms for the individual terms in the structured form and may use the synonyms to make the terms more consistent across a plurality of structured forms. Additionally, steps may be performed in parallel or in any suitable order. While discussed as risk analysis module 26 performing the steps, any suitable component of system 10 may perform one or more steps of the method.
  • Certain embodiments of the present disclosure may provide one or more technical advantages.
  • a technical advantage of one embodiment includes extracting, analyzing, and summarizing useful information from various, unstructured sources to manage operational risks.
  • Another technical advantage of an embodiment includes transforming unstructured data into a structured form to determine risk.
  • Yet another technical advantage of an embodiment includes identifying and aggregating risks across an organization to manage.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Unstructured data is received from a plurality of sources to facilitate risk analysis. The unstructured data comprises a plurality of bodies of text. Each body of text from the unstructured data is deconstructed into individual terms. The individual terms from each body of text are converted into a structured form. The individual terms in the structured form are categorized according to a comparison of the structured form to another structured form. The individual terms in the structured form are quantified according to at least the categorization of the individual terms.

Description

TECHNICAL FIELD OF THE INVENTION
This invention relates generally to risk analysis, and more particularly to risk analysis using unstructured data.
BACKGROUND OF THE INVENTION
Information about risks an organization faces or could face is documented in a variety of systems and sources. This information usually contains unstructured information (i.e., bodies of text) and is not typically easy to quantify and automatically analyze. The information is typically written to allow readers to understand the intended message. While a reader may read small bodies of separate texts, it is impracticable for a reader to read and summarize many bodies of text in a reasonable amount of time.
SUMMARY OF THE INVENTION
According to embodiments of the present disclosure, disadvantages and problems associated with analyzing risk using unstructured data may be reduced or eliminated.
In certain embodiments, unstructured data is received from a plurality of sources to facilitate risk analysis. The unstructured data comprises a plurality of bodies of text. Each body of text from the unstructured data is deconstructed into individual terms. The individual terms from each body of text are converted into a structured form. The individual terms in the structured form are categorized according to a comparison of the structured form to another structured form. The individual terms in the structured form are quantified according to at least the categorization of the individual terms.
Certain embodiments of the present disclosure may provide one or more technical advantages. A technical advantage of one embodiment includes extracting, analyzing, and summarizing useful information from various, unstructured sources to manage operational risks. Another technical advantage of an embodiment includes transforming unstructured data into a structured form to determine risk. Yet another technical advantage of an embodiment includes identifying and aggregating risks across an organization to manage.
Certain embodiments of the present disclosure may include some, all, or none of the above advantages. One or more other technical advantages may be readily apparent to those skilled in the art from the figures, descriptions, and claims included herein.
BRIEF DESCRIPTION OF THE DRAWINGS
To provide a more complete understanding of the present invention and the features and advantages thereof, reference is made to the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 illustrates a block diagram of a system for analyzing risk using unstructured data; and
FIG. 2 illustrates an example flowchart that analyzes risk using unstructured data.
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention and its advantages are best understood by referring to FIGS. 1 through 2 of the drawings, like numerals being used for like and corresponding parts of the various drawings.
Organizations evaluate and manage operational risk as part of the organization's functions. To evaluate and manage that risk, organizations may employ various processes to gather information and evaluate the information that impacts the organization's risk. The information gathered from various sources may be in an unstructured form. Unstructured data represents data that has no easily identifiable structure or consistent and recurring patterns. It is difficult for a reader to understand and summarize a lot of information in a reasonable amount of time. Unstructured data may include substantial text, which results in irregularities and ambiguities that make it difficult for a computer to understand. Therefore, it is advantageous to provide a system and method that employs text mining techniques on bodies of unstructured text to determine patterns of risks, identify emerging risks, and compare external risk data to internal risk data.
FIG. 1 illustrates a block diagram of a system for analyzing risk using unstructured data. System 10 includes computers 12, data sources 18, a competitor database 20, a vendor database 22, and a marketing database 24 that communicate over one or more networks 16 with risk analysis module 26 to facilitate the structuring of unstructured data. The unstructured data is structured to determine patterns of risk, identify emerging risks, and compare external risk data to internal risk data.
In the illustrated embodiment, organization 11 comprises computers 12, competitor database 20, vendor database 22, marketing database 24, and risk analysis module 26. Organization 11 represents an entity in any suitable industry that manages risk. Organization 11 may include companies of any suitable size that evaluate operational risk to manage and identify risk of the organization. Third parties may include any suitable entity that is external to organization 11, such as vendors of organization 11, competitors of organization 11, or entities in industries different from organization 11.
System 10 includes computers 12 a-12 n, where n represents any suitable number, that communicate with risk analysis module 26 through network 16. For example, computer 12 a communicates with risk analysis module 26 to identify the sources from which to retrieve unstructured data. As another example, computer 12 receives quantified and structured data from risk analysis module 26 in a graphical format. In the illustrated embodiment, risk managers, associates, employees, or other suitable individuals in the organization use computer 12. Computer 12 may include a personal computer, a workstation, a laptop, a wireless or cellular telephone, an electronic notebook, a personal digital assistant, a smartphone, a netbook, a tablet, a slate personal computer, or any other device (wireless, wireline, or otherwise) capable of receiving, processing, storing, and/or communicating information with other components of system 10. Computer 12 may also comprise a user interface, such as a display, keyboard, mouse, or other appropriate terminal equipment.
In the illustrated embodiment, computer 12 includes a graphical user interface (“GUI”) 14 that displays information received from risk analysis module 26. For example, GUI 14 may display analyzed external data in a particular format to a user of computer 12. GUI 14 is generally operable to tailor and filter data entered by and presented to the user. GUI 14 may provide the user with an efficient and user-friendly presentation of information using a plurality of displays having interactive fields, pull-down lists, and buttons operated by the user. GUI 14 may include multiple levels of abstraction including groupings and boundaries. It should be understood that the term GUI 14 may be used in the singular or in the plural to describe one or more GUIs 14 in each of the displays of a particular GUI 14.
Network 16 represents any suitable network operable to facilitate communication between the components of system 10, such as computers 12, data sources 18, competitor database 20, vendor database 22, marketing database 24, and risk analysis module 26. Network 16 may include any interconnecting system capable of transmitting audio, video, signals, data, messages, or any combination of the preceding. Network 16 may include all or a portion of a public switched telephone network (PSTN), a public or private data network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a local, regional, or global communication or computer network, such as the Internet, a wireline or wireless network, an enterprise intranet, or any other suitable communication link, including combinations thereof, operable to facilitate communication between the components.
Data sources 18 represent components that are external to organization 11 that provide unstructured data associated with organization 11 and/or third parties to risk analysis module 26. Data sources 18 may provide unbiased, independent information for analysis. For example, data source 18 may include regulatory filings associated with third parties or organization 11, such as filings made with the Security Exchange Commission (e.g., 10Ks and 10Qs). Data sources 18 may also include press releases, news, events, subscription-based information, or any other digital media that may be related to organization 11 or a third party. Additionally, data sources 18 may include independent professional research materials. Data sources 18 may be scanned for targeted, repeatable information.
Data sources 18 may include a network server, any suitable remote server, a mainframe, a host computer, a workstation, a web server, a personal computer, a file server, or any other suitable device operable to communicate with other components in system 10 and process data. In some embodiments, data source 18 may execute any suitable operating system such as IBM's zSeries/Operating System (z/OS), MS-DOS, PC-DOS, MAC-OS, WINDOWS, a .NET environment, UNIX, OpenVMS, or any other appropriate operating system, including future operating systems. The functions of data source 18 may be performed by any suitable combination of one or more servers or other components at one or more locations. In the embodiment where the module is a server, the server may be a private server, and the server may be a virtual or physical server. Also, data source 18 may include any suitable component that functions as a server.
Competitor database 20 stores, either permanently or temporarily, information associated with competitors of organization 11. Competitor database 20 is within organization 11 and represents information that organization 11 compiles associated with its competitors. The information stored in competitor database 20 may include, but is not limited to, press releases, regulatory filing information, professional research materials, or other suitable competitor analysis information. Risk analysis module 26 may communicate with competitor database 20 to receive information associated with competitors of organization 11. Competitor database 20 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information. For example, competitor database 20 may include Random Access Memory (RAM), Read Only Memory (ROM), magnetic storage devices, optical storage devices, or any other suitable information storage device or combination of these devices.
Vendor database 22 stores, either permanently or temporarily, information associated with vendors of organization 11. Vendor database 22 is within organization 11 and represents information that organization 11 compiles associated with its vendors. The information stored in vendor database 22 may include, but is not limited to, press releases, regulatory filing information, professional research materials, performance information, relationship information, financial data, or other suitable vendor analysis information. Risk analysis module 26 may communicate with vendor database 22 to receive information associated with vendors of organization 11. Vendor database 22 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information. For example, vendor database 22 may include RAM, ROM, magnetic storage devices, optical storage devices, or any other suitable information storage device or combination of these devices.
Marketing database 24 stores, either permanently or temporarily, information associated with organization 11 or other third parties. Marketing database 24 is within organization 11 and represents information that organization 11 compiles regarding itself and third parties. For example, marketing database 24 stores information on third parties that are not vendors or competitors. The information stored in marketing database 24 may include, but is not limited to, press releases, regulatory filing information, professional research materials, or other suitable marketing information. Risk analysis module 26 may communicate with marketing database 24 to receive information associated with organization 11. Marketing database 24 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information. For example, marketing database 24 may include RAM, ROM, magnetic storage devices, optical storage devices, or any other suitable information storage device or combination of these devices.
Risk analysis module 26 represents any suitable component that facilitates the analysis of unstructured data to identify external and internal risks. Risk analysis module 26 receives data from data sources 18, competitor database 20, vendor database 22, and/or marketing database 24 and analyzes the received data to identify risks or emerging risks of organization 11. In an embodiment, risk analysis module 26 receives unstructured data from the various sources to analyze. Additionally, risk analysis module 26 may create reports based on the analysis, and may communicate the reports to computer 12.
Risk analysis module 26 may include a network server, any suitable remote server, a mainframe, a host computer, a workstation, a web server, a personal computer, a file server, or any other suitable device operable to communicate with computers 12, data sources 18, competitor database 20, vendor database 22, and/or marketing database 24. In some embodiments, risk analysis module 26 may execute any suitable operating system such as IBM's zSeries/Operating System (z/OS), MS-DOS, PC-DOS, MAC-OS, WINDOWS, UNIX, OpenVMS, or any other appropriate operating system, including future operating systems. The functions of risk analysis module 26 may be performed by any suitable combination of one or more servers or other components at one or more locations. In the embodiment where risk analysis module 26 is a server, the server may be a private server, or the server may be a virtual or physical server. The server may include one or more servers at the same or remote locations. Also, risk analysis module 26 may include any suitable component that functions as a server. In the illustrated embodiment, risk analysis module 26 includes a network interface 28, a processor 30, and a memory 32.
Network interface 28 represents any suitable device operable to receive information from network 16, transmit information through network 16, perform processing of information, communicate with other devices, or any combination of the preceding. For example, network interface 28 receives competitor information from competitor database 20. As another example, network interface 28 receives information external to organization 11 from data sources 18. As yet another example, network interface 28 may communicate reports based on the analysis of the received data to computers 12. Network interface 28 represents any port or connection, real or virtual, including any suitable hardware and/or software, including protocol conversion and data processing capabilities, to communicate through a LAN, WAN, MAN, or other communication system that allows risk analysis module 26 to exchange information with network 16, data sources 18, competitor database 20, vendor database 22, marketing database 24, or other components of system 10.
Processor 30 communicatively couples to network interface 28 and memory 32, and controls the operation and administration of risk analysis module 26 by processing information received from network interface 28 and memory 32. Processor 30 includes any hardware and/or software that operates to control and process information. For example, processor 30 executes analysis rules 34 to control the operation of risk analysis module 26. Processor 30 may be a programmable logic device, a microcontroller, a microprocessor, any suitable processing device, or any suitable combination of the preceding.
Memory 32 stores, either permanently or temporarily, data, operational software, or other information for processor 30. Memory 32 includes any one or a combination of volatile or non-volatile local or remote devices suitable for storing information. For example, memory 32 may include RAM, ROM, magnetic storage devices, optical storage devices, or any other suitable information storage device or a combination of these devices. While illustrated as including particular modules, memory 32 may include any suitable information for use in the operation or risk analysis module 26. In the illustrated embodiment, memory 32 includes analysis rules 34 and vectors 36.
Analysis rules 34 generally refer to logic, rules, algorithms, code, tables, and/or other suitable instructions embodied in a computer-readable storage medium for performing the described functions and operations of risk analysis module 26. For example, analysis rules 34 facilitate the analysis of data received by risk analysis module 26. In an embodiment, analysis rules 34 facilitate the decomposition of unstructured data into a structured form. Additionally, rules 34 may facilitate categorizing the data in the structured form and quantifying the data in the structured form.
Vectors 36 generally refer to the structured form of the retrieved, unstructured data. Vectors 36 may represent bodies of text that have been converted into a list of terms. For example, each paragraph in an article may be converted into a list of terms. Therefore, if the article includes ten paragraphs, there will be ten vectors 36 stored in risk analysis module 26. As another example, each article, press release, or other suitable compilation of information may be converted into a list of terms and the list of terms from the compilation is converted into a vector 36. Therefore, each article, press release, or other suitable body of text is associated with a vector 36.
Additionally, each term in the list of terms may be quantified using any suitable technique. For example, each term in the list of terms may be associated with a number that represents the number of times the term appears in the body of text. For example, if a paragraph includes the term “risk” in it five times, vector 36 associated with that paragraph will include “risk” and the number “5” by the term. As another example, risk analysis module 26 may quantify the terms based on expert opinion or structured data. Terms may also be quantified based on their association with a materialized risk. For example, if a risk has materialized, then risk analysis module 26 determines the text associated with that materialized risk, and scores the text based on the association.
In an exemplary embodiment of operation, risk analysis module 26 receives data that is internal to organization 11, data that is external to organization 11, and data that is internal and external to organization 11. After receiving the data to analyze, risk analysis module 26 deconstructs a plurality of bodies of text into individual terms. Risk analysis module 26 converts the individual terms into a structured form. Once in the structured form, risk analysis module 26 categorizes individual terms and quantifies the individual terms. Upon completion of the analysis, risk analysis module 26 creates a report based on the analysis and communicates the report to computers 12 for further use within organization 11.
A component of system 10 may include an interface, logic, memory, and/or other suitable element. An interface receives input, sends output, processes the input and/or output and/or performs other suitable operations. An interface may comprise hardware and/or software. Logic performs the operation of the component, for example, logic executes instructions to generate output from input. Logic may include hardware, software, and/or other logic. Logic may be encoded in one or more tangible media, such as a computer-readable medium or any other suitable tangible medium, and may perform operations when executed by a computer. Certain logic, such as a processor, may manage the operation of a component. Examples of a processor include one or more computers, one or more microprocessors, one or more applications, and/or other logic.
Modifications, additions, or omissions may be made to system 10 without departing from the scope of the invention. For example, system 10 may include any number of computers 12, data sources 18, competitor databases 20, vendor databases 22, marketing databases 24, and risk analysis modules 26. As another example, organization 11 may include an organization credit risk database, which includes information regarding risk factors that organization 11 has in different countries. Any suitable logic may perform the functions of system 10 and the components within system 10.
FIG. 2 illustrates an example flowchart that analyzes risk using unstructured data. At step 202, risk analysis module 26 receives unstructured data. Risk analysis module 26 may receive data internal to organization 11 from competitor database 20, vendor database 22, and/or marketing database 24. Risk analysis module 26 may receive data external to organization 11 from data sources 18. In an embodiment, the internal and external data may include unstructured data regarding organization 11 and/or third parties.
For example, if risk analysis module 26 receives unstructured data related to competitors, the analyzed data may provide insight to the risk landscape faced by competitors in a summarized fashion. As another example, risk analysis module 26 receives unstructured data internal to organization 11, but related to various business units or lines of business within organization 11. In this embodiment, risk analysis module 26 may receive text related to documented issues, emerging risks and risk, meeting minutes, newsletters, or other suitable internal data. Receiving data internal to organization 11, but from varying parts of organization 11, facilitates a comparison of documented issues internal to organization 11 to identify trends or patterns. As yet another example, risk analysis module 26 receives unstructured data related to third parties and related to organization 11, which may facilitate an analysis of emerging risks that have not materialized for organization 11. Therefore, organization 11 may identify a risk event that materialized at a third party that aligns to documented issues at organization 11, and mitigate that risk before it materializes for organization 11.
At step 204, risk analysis module 26 deconstructs the bodies of text from the unstructured data into individual terms. A body of text may include any suitable division of the unstructured data. For example, a body of text may include a paragraph within the unstructured data. At step 206, risk analysis module 26 removes insignificant words from the individual terms. For example, insignificant words may include common words, such as “the,” “a,” “an,” and other common words. Insignificant words may also include words that do not have a significant meaning for risk analysis. For example, the word “average” may be removed because it does not have particular significance in the risk context, but the word “large” may remain in the group of individual terms, which may identify a large loss, a large amount, or another significant piece of information in the risk context.
Risk analysis module 26 converts the remaining individual terms into a structured form in step 208. The structured form may include any suitable form that facilitates the organization of the individual terms, such as a vector, a list, or a column. When the individual terms are converted into a structured form, risk analysis module 26 may count the number of instances of each term in the body of text. For example, if a paragraph includes the term “risk” in it five times, vector 36 associated with that paragraph will include “risk” and the number “5” by the term. As another example, risk analysis module 26 may quantify the terms based on expert opinion or structured data. Terms may also be quantified based on their association with a materialized risk. For example, if a risk has materialized, then risk analysis module 26 determines the text associated with that materialized risk, and scores the text based on the association.
At step 210, risk analysis module 26 determines whether additional bodies of text from the received data need to put in structured form. If there is additional data to deconstruct and convert into structured form, the method returns to step 204. If the received data has been deconstructed and converted, the method proceeds from step 212.
At step 212, risk analysis module 26 categorizes the individual terms included in the structured form. For example, risk analysis module 26 links the terms to specific categories. These categories may include, but are not limited to, the following: organization name, geographical region, size of organization, number of employees, number of countries represented, public organization, private organization, regulatory body, industry, fine amount, or any other suitable category. In an embodiment, risk analysis module 26 may recommend additional categories in which the individual terms may be categorized. When categorizing the individual terms, risk analysis module 26 may use any suitable algorithm to compare the data between a plurality of structured forms, such as a Bayesian inference. For example, risk analysis module 26 compares a plurality of structured forms to identify clusters or groups of structured forms that represent groups of similar terms. In an embodiment, risk analysis module 26 may determine the size of each cluster, which may facilitate additional review of the data, as described below with respect to step 218.
At step 214, risk analysis module 26 quantifies the terms included in the structured form. For example, risk analysis module 26 weights each term in the structured form. In an embodiment, the weighting of the terms is predefined by an administrator. In another embodiment, risk analysis module 26 may learn the significance of terms during the implementation of the method, and may determine the appropriate weighting for the terms based on past information. Additionally, risk analysis module 26 may link the terms to quantifiable data when quantifying the terms. To link a term to quantifiable data, risk analysis module 26 determines which terms have associated quantifiable data and then links the terms to the quantifiable data. For example, a term may be categorized as “fine amount” and may have quantifiable data associated with that term. Furthermore, risk analysis module 26 may link the individual terms to a third party, such as a competitor of an organization. By linking individual terms to a third party, risk analysis module 26 may summarize the data associated with the third party.
Now that the data has been converted into a structured form, categorized, and quantified, risk analysis module 26 may create a report in step 216 with the analyzed data. The report may take any suitable form that presents the information in a graphical and/or numerical form. For example, the report may include a heat map, a bar chart, or other suitable representation. This report is communicated to computers 12 in step 218 and used in various instances. For example, the reports may be analyzed to determine specific risks that need additional review. In an embodiment, risk analysis module 26 may identify a plurality of risks that are quantified as low that may need additional review. In this embodiment, risk analysis module 26 includes criteria to determine whether a plurality of low-risk issues meet a particular threshold to trigger additional review and communicates that information in the report. In another embodiment, risk analysis module 26 may identify high risk issues for additional review. In this embodiment, the single high risk issue meets a particular threshold to trigger additional review and communicates that information in the report.
At step 220, risk analysis module 26 determines whether to repeat the analysis. Risk analysis module 26 may repeat the analysis at different points in time to identify trends, forecast potential risks, and/or compare external risks to internal risks. If the process begins again, the method continues from step 202, otherwise the method ends. Risk analysis module 26 may receive information from the various data sources continuously, and may implement the analysis process on a predetermined schedule. For example, risk analysis module 26 may perform the analysis on a quarterly basis, on a monthly basis, on a weekly basis, or during any predetermined time period.
Modifications, additions, or omissions may be made to method 200 depicted in FIG. 2. The method may include more, fewer, or other steps. For example, risk analysis module 26 may determine synonyms for the individual terms in the structured form and may use the synonyms to make the terms more consistent across a plurality of structured forms. Additionally, steps may be performed in parallel or in any suitable order. While discussed as risk analysis module 26 performing the steps, any suitable component of system 10 may perform one or more steps of the method.
Certain embodiments of the present disclosure may provide one or more technical advantages. A technical advantage of one embodiment includes extracting, analyzing, and summarizing useful information from various, unstructured sources to manage operational risks. Another technical advantage of an embodiment includes transforming unstructured data into a structured form to determine risk. Yet another technical advantage of an embodiment includes identifying and aggregating risks across an organization to manage.
Although the present invention has been described with several embodiments, a myriad of changes, variations, alterations, transformations, and modifications may be suggested to one skilled in the art, and it is intended that the present invention encompass such changes, variations, alterations, transformations, and modifications as fall within the scope of the appended claims.

Claims (19)

What is claimed is:
1. A system, comprising:
a network interface operable to receive unstructured data from a plurality of data sources, the plurality of data sources comprising a competitor database, a vendor database, and a marketing database, wherein the unstructured data relates to a financial risk of an organization and comprises a plurality of text documents, each text document comprising a plurality of groups of words;
a processor communicatively coupled to the network interface and operable to:
deconstruct each group of words from the unstructured data into individual words;
convert the individual words from each group of words into a plurality of structured forms, each structured form corresponding to a single group of words;
determine a numerical value associated with each individual word according to: a number of times the individual word appears in the group of words and an association of the group of words with a risk experienced by an organization, wherein each structured form is a vector that includes each individual word and the numerical value associated with the individual word;
compare each structured form to another structured form using a Bayesian inference;
categorize the individual words in each structured form into at least one category according to the comparison and the at least one category is selected from a set of categories consisting of organization name, geographical region, organization size, number of employees, number of countries represented, public organization, private organization, regulatory body, industry, and fine amount, the categories indicating the financial risk of the organization; and
quantify the individual words in each structured form according to at least the categorization of the individual words by weighting each individual word.
2. The system of claim 1, wherein the processor is further operable to remove insignificant words from each group of words before converting the individual words into the structured form.
3. The system of claim 1, wherein:
the processor is further operable to create a report based on the categorized and quantified individual words; and
the network interface is further operable to communicate the report to a computer to facilitate analysis of the individual words and the associated group of words.
4. The system of claim 1, wherein the network interface is further operable to receive the unstructured data from at least one of an external data source and an internal data source.
5. The system of claim 1, wherein the processor is further operable to:
determine whether individual words have associated quantifiable data; and
link the individual words to quantifiable data.
6. The system of claim 1, wherein the processor is further operable to:
compare a first structured form to a second structured form;
identify similar text between the first structured form and the second structured form to determine a similar text group; and
determine a size of the similar text group.
7. Non-transitory computer readable medium comprising logic, the logic, when executed by a processor, operable to:
receive unstructured data from a plurality of data sources, the plurality of data sources comprising a competitor database, a vendor database, and a marketing database, wherein the unstructured data relates to a financial risk of an organization and comprises a plurality of text documents, each text document comprising a plurality of groups of words;
deconstruct each group of words from the unstructured data into individual words;
convert the individual words from each group of words into a plurality of structured forms, each structured form corresponding to a single group of words;
determine a numerical value associated with each individual word according to: a number of times the individual word appears in the group of words and an association of the group of words with a risk experienced by an organization, wherein each structured form is a vector that includes each individual word and the numerical value associated with the individual word;
compare each structured form to another structured form using a Bayesian inference;
categorize the individual words in each structured form into at least one category according to the comparison and the at least one category is selected from a set of categories consisting of organization name, geographical region, organization size, number of employees, number of countries represented, public organization, private organization, regulatory body, industry, and fine amount, the categories indicating the financial risk of the organization; and
quantify the individual words in each structured form according to at least the categorization of the individual words by weighting each individual word.
8. The computer readable medium of claim 7, wherein the logic is further operable to remove insignificant words from each group of words before converting the individual words into the structured form.
9. The computer readable medium of claim 7, wherein the logic is further operable to:
create a report based on the categorized and quantified individual words; and
communicate the report to a computer to facilitate analysis of the individual words and the associated group of words.
10. The computer readable medium of claim 7, wherein the logic is further operable to:
determine whether individual words have associated quantifiable data; and
link the individual words to quantifiable data.
11. The computer readable medium of claim 7, wherein the logic is further operable to:
compare a first structured form to a second structured form;
identify similar text between the first structured form and the second structured form to determine a similar text group; and
determine a size of the similar text group.
12. A method, comprising:
receiving unstructured data from a plurality of data sources, the plurality of data sources comprising a competitor database, a vendor database, and a marketing database, wherein the unstructured data relates to a financial risk of an organization and comprises a plurality of text documents, each text document comprising a plurality of groups of words;
deconstructing, by a processor, each group of words from the unstructured data into individual words;
converting, by the processor, the individual words from each group of words into a plurality of structured forms, each structured form corresponding to a single group of words;
determining, by the processor, a numerical value associated with each individual word according to: a number of times the individual word appears in the group of words and an association of the group of words with a risk experienced by an organization, wherein each structured form is a vector that includes each individual word and a quantification of the individual word;
comparing, by the processor, each structured form to another structured form using a Bayesian inference;
categorizing, by the processor, the individual words in each structured form into at least one category according to the comparison and the at least one category is selected from a set of categories consisting of organization name, geographical region, organization size, number of employees, number of countries represented, public organization, private organization, regulatory body, industry, and fine amount, the categories indicating the financial risk of the organization; and
quantifying, by the processor, the individual words in each structured form according to at least the categorization of the individual words by weighting each individual word.
13. The method of claim 12, further comprising removing insignificant words from each group of words before converting the individual words into the structured form.
14. The method of claim 12, further comprising:
creating a report based on the categorized and quantified individual words; and
communicating the report to a computer to facilitate analysis of the individual words and the associated group of words.
15. The method of claim 12, wherein receiving the unstructured data from the plurality of sources comprises receiving the unstructured data from at least one of an external data source and an internal data source.
16. The method of claim 12, wherein quantifying the individual words comprises weighting the individual words based on quantifiable data.
17. The method of claim 12, wherein quantifying the individual words comprises:
determining whether individual words have associated quantifiable data; and
linking the individual words to quantifiable data.
18. The method of claim 12, further comprising:
comparing a first structured form to a second structured form;
identifying similar text between the first structured form and the second structured form to determine a similar text group; and
determining a size of the similar text group.
19. The method of claim 12, wherein each individual term in the structured form is associated with a number that indicates the number of times the individual word appears in the group of words.
US13/672,012 2012-11-08 2012-11-08 Risk analysis using unstructured data Active 2033-02-07 US9141686B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/672,012 US9141686B2 (en) 2012-11-08 2012-11-08 Risk analysis using unstructured data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/672,012 US9141686B2 (en) 2012-11-08 2012-11-08 Risk analysis using unstructured data

Publications (2)

Publication Number Publication Date
US20140129561A1 US20140129561A1 (en) 2014-05-08
US9141686B2 true US9141686B2 (en) 2015-09-22

Family

ID=50623364

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/672,012 Active 2033-02-07 US9141686B2 (en) 2012-11-08 2012-11-08 Risk analysis using unstructured data

Country Status (1)

Country Link
US (1) US9141686B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11263523B1 (en) 2017-01-27 2022-03-01 Manzama, Inc. System and method for organizational health analysis

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10055489B2 (en) * 2016-02-08 2018-08-21 Ebay Inc. System and method for content-based media analysis
US11494687B2 (en) * 2018-03-05 2022-11-08 Yodlee, Inc. Generating samples of transaction data sets
US11275776B2 (en) 2020-06-11 2022-03-15 Capital One Services, Llc Section-linked document classifiers
US11941565B2 (en) * 2020-06-11 2024-03-26 Capital One Services, Llc Citation and policy based document classification

Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020077821A1 (en) * 2000-10-19 2002-06-20 Case Eliot M. System and method for converting text-to-voice
US6523026B1 (en) * 1999-02-08 2003-02-18 Huntsman International Llc Method for retrieving semantically distant analogies
US20030046203A1 (en) 2001-08-28 2003-03-06 Genichiro Ichihari Business performance index processing system
US20030065613A1 (en) 2001-09-28 2003-04-03 Smith Diane K. Software for financial institution monitoring and management and for assessing risk for a financial institution
US6629097B1 (en) * 1999-04-28 2003-09-30 Douglas K. Keith Displaying implicit associations among items in loosely-structured data sets
US20050065807A1 (en) 2003-09-23 2005-03-24 Deangelis Stephen F. Systems and methods for optimizing business processes, complying with regulations, and identifying threat and vulnerabilty risks for an enterprise
US20050197952A1 (en) 2003-08-15 2005-09-08 Providus Software Solutions, Inc. Risk mitigation management
US20050278362A1 (en) * 2003-08-12 2005-12-15 Maren Alianna J Knowledge discovery system
US20060100958A1 (en) 2004-11-09 2006-05-11 Feng Cheng Method and apparatus for operational risk assessment and mitigation
WO2006071900A2 (en) 2004-12-29 2006-07-06 Lehman Brothers Inc. System and method for maintaining continuity of operations
US20060155553A1 (en) 2004-12-30 2006-07-13 Brohman Carole G Risk management methods and systems
US20060224500A1 (en) 2005-03-31 2006-10-05 Kevin Stane System and method for creating risk profiles for use in managing operational risk
US20060277205A1 (en) * 2003-01-10 2006-12-07 Cohesive Knowledge Solutions, Inc. Universal knowledge information and data storage system
US7373274B2 (en) 2003-07-10 2008-05-13 Erc-Ip, Llc Methods and structure for improved interactive statistical analysis
US20080154679A1 (en) 2006-11-03 2008-06-26 Wade Claude E Method and apparatus for a processing risk assessment and operational oversight framework
US20080154873A1 (en) * 2006-12-21 2008-06-26 Redlich Ron M Information Life Cycle Search Engine and Method
US20080221944A1 (en) 2005-05-27 2008-09-11 Martin Kelly System and Method for Risk Assessment and Presentment
US7441197B2 (en) 2002-02-26 2008-10-21 Global Asset Protection Services, Llc Risk management information interface system and associated methods
US20090043797A1 (en) * 2007-07-27 2009-02-12 Sparkip, Inc. System And Methods For Clustering Large Database of Documents
US20090043637A1 (en) 2004-06-01 2009-02-12 Eder Jeffrey Scott Extended value and risk management system
US7536405B2 (en) 2002-02-26 2009-05-19 Global Asset Protection Services, Llc Risk management information interface system and associated methods
US7809595B2 (en) 2002-09-17 2010-10-05 Jpmorgan Chase Bank, Na System and method for managing risks associated with outside service providers
US20100324927A1 (en) * 2009-06-17 2010-12-23 Tinsley Eric C Senior care navigation systems and methods for using the same
US7873567B2 (en) 2001-02-05 2011-01-18 Asset Trust, Inc. Value and risk management system
US20110072052A1 (en) * 2008-05-28 2011-03-24 Aptima Inc. Systems and methods for analyzing entity profiles
US20110179009A1 (en) * 2008-09-23 2011-07-21 Sang Hyob Nam Internet-based opinion search system and method, and internet-based opinion search and advertising service system and method
US20110184935A1 (en) * 2010-01-27 2011-07-28 26F, Llc Computerized system and method for assisting in resolution of litigation discovery in conjunction with the federal rules of practice and procedure and other jurisdictions
US20110231382A1 (en) * 2010-03-19 2011-09-22 Honeywell International Inc. Methods and apparatus for analyzing information to identify entities of significance
US8135638B2 (en) 2005-04-29 2012-03-13 International Business Machines Corporation Summarizing risk ratings to facilitate an analysis of risks
US20120072247A1 (en) 2005-07-01 2012-03-22 Matt Rosauer Risk Modeling System
US20120221485A1 (en) 2009-12-01 2012-08-30 Leidner Jochen L Methods and systems for risk mining and for generating entity risk profiles
US20120278336A1 (en) * 2011-04-29 2012-11-01 Malik Hassan H Representing information from documents
US20130246334A1 (en) * 2011-12-27 2013-09-19 Mcafee, Inc. System and method for providing data protection workflows in a network environment

Patent Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6523026B1 (en) * 1999-02-08 2003-02-18 Huntsman International Llc Method for retrieving semantically distant analogies
US6629097B1 (en) * 1999-04-28 2003-09-30 Douglas K. Keith Displaying implicit associations among items in loosely-structured data sets
US20020077821A1 (en) * 2000-10-19 2002-06-20 Case Eliot M. System and method for converting text-to-voice
US7873567B2 (en) 2001-02-05 2011-01-18 Asset Trust, Inc. Value and risk management system
US20030046203A1 (en) 2001-08-28 2003-03-06 Genichiro Ichihari Business performance index processing system
US20030065613A1 (en) 2001-09-28 2003-04-03 Smith Diane K. Software for financial institution monitoring and management and for assessing risk for a financial institution
US7441197B2 (en) 2002-02-26 2008-10-21 Global Asset Protection Services, Llc Risk management information interface system and associated methods
US7536405B2 (en) 2002-02-26 2009-05-19 Global Asset Protection Services, Llc Risk management information interface system and associated methods
US7809595B2 (en) 2002-09-17 2010-10-05 Jpmorgan Chase Bank, Na System and method for managing risks associated with outside service providers
US20060277205A1 (en) * 2003-01-10 2006-12-07 Cohesive Knowledge Solutions, Inc. Universal knowledge information and data storage system
US7373274B2 (en) 2003-07-10 2008-05-13 Erc-Ip, Llc Methods and structure for improved interactive statistical analysis
US20050278362A1 (en) * 2003-08-12 2005-12-15 Maren Alianna J Knowledge discovery system
US20050197952A1 (en) 2003-08-15 2005-09-08 Providus Software Solutions, Inc. Risk mitigation management
US20050065807A1 (en) 2003-09-23 2005-03-24 Deangelis Stephen F. Systems and methods for optimizing business processes, complying with regulations, and identifying threat and vulnerabilty risks for an enterprise
US20090043637A1 (en) 2004-06-01 2009-02-12 Eder Jeffrey Scott Extended value and risk management system
US20060100958A1 (en) 2004-11-09 2006-05-11 Feng Cheng Method and apparatus for operational risk assessment and mitigation
WO2006071900A2 (en) 2004-12-29 2006-07-06 Lehman Brothers Inc. System and method for maintaining continuity of operations
US20060155553A1 (en) 2004-12-30 2006-07-13 Brohman Carole G Risk management methods and systems
US20060224500A1 (en) 2005-03-31 2006-10-05 Kevin Stane System and method for creating risk profiles for use in managing operational risk
US8135638B2 (en) 2005-04-29 2012-03-13 International Business Machines Corporation Summarizing risk ratings to facilitate an analysis of risks
US20080221944A1 (en) 2005-05-27 2008-09-11 Martin Kelly System and Method for Risk Assessment and Presentment
US20120072247A1 (en) 2005-07-01 2012-03-22 Matt Rosauer Risk Modeling System
US20080154679A1 (en) 2006-11-03 2008-06-26 Wade Claude E Method and apparatus for a processing risk assessment and operational oversight framework
US20080154873A1 (en) * 2006-12-21 2008-06-26 Redlich Ron M Information Life Cycle Search Engine and Method
US20090043797A1 (en) * 2007-07-27 2009-02-12 Sparkip, Inc. System And Methods For Clustering Large Database of Documents
US20110072052A1 (en) * 2008-05-28 2011-03-24 Aptima Inc. Systems and methods for analyzing entity profiles
US20110179009A1 (en) * 2008-09-23 2011-07-21 Sang Hyob Nam Internet-based opinion search system and method, and internet-based opinion search and advertising service system and method
US20100324927A1 (en) * 2009-06-17 2010-12-23 Tinsley Eric C Senior care navigation systems and methods for using the same
US20120221485A1 (en) 2009-12-01 2012-08-30 Leidner Jochen L Methods and systems for risk mining and for generating entity risk profiles
US20110184935A1 (en) * 2010-01-27 2011-07-28 26F, Llc Computerized system and method for assisting in resolution of litigation discovery in conjunction with the federal rules of practice and procedure and other jurisdictions
US20110231382A1 (en) * 2010-03-19 2011-09-22 Honeywell International Inc. Methods and apparatus for analyzing information to identify entities of significance
US20120278336A1 (en) * 2011-04-29 2012-11-01 Malik Hassan H Representing information from documents
US20130246334A1 (en) * 2011-12-27 2013-09-19 Mcafee, Inc. System and method for providing data protection workflows in a network environment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11263523B1 (en) 2017-01-27 2022-03-01 Manzama, Inc. System and method for organizational health analysis

Also Published As

Publication number Publication date
US20140129561A1 (en) 2014-05-08

Similar Documents

Publication Publication Date Title
WO2019095572A1 (en) Enterprise investment risk assessment method, device, and storage medium
CN107967575B (en) Artificial intelligence platform system for artificial intelligence insurance consultation service
US20200151392A1 (en) System and method automated analysis of legal documents within and across specific fields
US9459950B2 (en) Leveraging user-to-tool interactions to automatically analyze defects in IT services delivery
CN110349009B (en) Multi-head lending default prediction method and device and electronic equipment
US20150019513A1 (en) Time-series analysis based on world event derived from unstructured content
US20240211473A1 (en) System and method for automated analysis of legal documents within and across specific fields
US9141686B2 (en) Risk analysis using unstructured data
CN109284371A (en) Anti- fraud method, electronic device and computer readable storage medium
CN110059137B (en) Transaction classification system
CN111210335A (en) User risk identification method and device and electronic equipment
CN103345616A (en) Fingerprint storage comparison system based on behavioral analysis
CN111179051A (en) Financial target customer determination method and device and electronic equipment
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
CN111190967B (en) User multidimensional data processing method and device and electronic equipment
CN115577701A (en) Risk behavior identification method, device, equipment and medium for big data security
CA3183463A1 (en) Systems and methods for generating predictive risk outcomes
US20140122163A1 (en) External operational risk analysis
CN110378543A (en) Leaving office Risk Forecast Method, device, computer equipment and storage medium
Fang et al. Discovery of process variants based on trace context tree
US20140156339A1 (en) Operational risk and control analysis of an organization
CN117273968A (en) Accounting document generation method of cross-business line product and related equipment thereof
CN112712270B (en) Information processing method, device, equipment and storage medium
CN110348190B (en) User equipment attribution judging method and device based on user operation behaviors
KR20230059364A (en) Public opinion poll system using language model and method thereof

Legal Events

Date Code Title Description
AS Assignment

Owner name: BANK OF AMERICA CORPORATION, NORTH CAROLINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KERN, DANIEL C.;HOGEBOOM, DAVID A.;BROMSTEAD, ANNE;SIGNING DATES FROM 20121106 TO 20121107;REEL/FRAME:029265/0077

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8