WO2006026636A2 - Metadata management - Google Patents
Metadata management Download PDFInfo
- Publication number
- WO2006026636A2 WO2006026636A2 PCT/US2005/030897 US2005030897W WO2006026636A2 WO 2006026636 A2 WO2006026636 A2 WO 2006026636A2 US 2005030897 W US2005030897 W US 2005030897W WO 2006026636 A2 WO2006026636 A2 WO 2006026636A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- metadata
- data
- storage mechanism
- properties
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24573—Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Definitions
- This invention relates to the field of information technology, and more particularly to the field of data integration systems.
- EAI efforts encounter many challenges, ranging from the need to handle different protocols, the need to address ever-increasing volumes of data and numbers of transactions, and an ever-increasing appetite for faster integration of data.
- Various approaches to EAI have been taken, including least-common-denominator approaches, atomic approaches, and bridge-type approaches.
- EAI is based upon communication between individual applications.
- the complexity of EAI solutions grows geometrically in response to linear additions of platforms and applications.
- An integrated, platform-independent approach to metadata management may allow enterprise-wide access to data integration services and underlying data, and facilitate reuse and redesign of tools and jobs in the data integration environment.
- Tools are providing for managing metadata, including maintaining versioned metadata models that may be branched and merged during a design cycle, and dynamically implemented across the enterprise.
- the platform-independent approach may facilitate varied uses including implementations in heterogeneous hardware and software computing environments.
- a method described herein includes: expressing a query in terms native to a first model; translating the query into terms native to a second model using mapping information that describes one or more relationships between the first model and the second model; and translating the query into a native data source format.
- a system includes means for expressing a query in terms native to a first model; a mapping model that translates the query into terms native to a second model using mapping information that describes one or more relationships between the first model and the second model; and means for translating the query into a native data source format in which the query is executed against the data source.
- the mapping information may be queried.
- the mapping information may be available during the translating steps.
- the first model may be a view.
- the second model may be a hub.
- the data source may be a database.
- the database may store metadata for one or more data sources.
- the database may store a persistent model representing enterprise metadata.
- the database may be a relational database and/or a file.
- the method may be performed in an enterprise computing system, or the system may be within an enterprise computing system.
- the method may be performed in a data integration system, or the system may be within a data integration system.
- the terms native to the first model may include a syntax native to an external client.
- the first model may be a view for a user interface.
- the method may further include displaying a result of the query in the user interface, or the system may include a user interface for displaying the result of the query.
- the first model may be a view for a service.
- the service may include a data integration system service.
- the service may include a remote tool and/or a real time integration service.
- At least one of the first model and the second model may be a metadata model stored in a repository.
- the method may further include translating a result of the query into the first model with a translation tool, or the system may include a corresponding translation tool.
- the translation tool may be stored in a repository.
- a method as described herein may include: registering a metadata model with a repository; associating a first storage mechanism with one or more design properties of the metadata model; and associating a second storage mechanism with one or more operational properties of the metadata model, wherein the second storage mechanism stores a time stamp for at least one of the one or more operational properties of the metadata model.
- the first storage mechanism may be a versioned storage mechanism that stores one or more versions of at least one of the one or more design properties of the metadata model.
- the method may further include annotating the one or more design properties and the one or more operational properties of the metadata model to associate them with either the first storage mechanism or second storage mechanism.
- the method may further include providing a package structure to allocate the one or more design properties and the one or more operational properties of the metadata model between the first storage mechanism and the second storage mechanism.
- the method may further include providing a manifest associated with the metadata model to allocate the one or more design properties and the one or more operational properties of the metadata model between the first storage mechanism and the second storage mechanism.
- the method may further include registering the operational properties a first model and registering the design properties as a second model.
- the metadata model may be queried across the one or more operational properties and the one or more design properties.
- the method may further include registering one or more mappings with the metadata model, the one or more mappings describing a relationship of the metadata model to one or more
- a system may include: a repository including a registered metadata model; a first storage mechanism within the repository, the first storage mechanism associated with one or more design properties of the metadata model; and a second storage mechanism within the repository, the second storage mechanism associated with one or more operational properties of the metadata model and the second storage mechanism, the second storage mechanism adapted to store a time stamp for at least one of the one or more operation properties of the metadata model.
- the first storage mechanism may be a versioned storage mechanism that stores one or more versions of at least one of the one or more design properties of the metadata model.
- the system may include annotations to associate the one or more design properties of the metadata model and the one or more operational properties of the metadata model with either the first storage mechanism or second storage mechanism.
- the system may include a package structure to allocate the one or more design properties and the one or more operational properties of the metadata model between the first storage mechanism and the second storage mechanism.
- the system may include a manifest associated with the metadata model to allocate the one or more design properties and the one or more operational properties of the metadata model between the first storage mechanism and the second storage mechanism.
- the operational properties may be registered as a first model and the design properties are registered as a second model.
- the metadata model may be queried across the one or more operational properties and the one or more design properties.
- the system may further include one or more mappings registered with the metadata model, the one or more mappings describing a relationship of the metadata model to one or more other metadata models.
- a method for persisting a model includes: registering a first model; identifying a second model and a mapping of at least one property of the first model to the second model; and persisting the mapping of the at least one property of the first model to the second model.
- the method may include identifying at least one other property of the first model not mapped to the second model; and persisting the at least one other property of the first model.
- the first model may include a plurality of classes.
- the second model may include a plurality of classes.
- the method may include providing a storage mechanism for persisting the mapping of the at least one property of the first model to the second model that is a reflective storage mechanism.
- the method may further include defining a schema for representing metadata models in a relational database, and using the schema to persist the mapping of the at least one property of the first model to the second model.
- the method may further include revising the first model by changing the schema, by changing one or more properties in the relational database, and/or by changing the mapping.
- the first model and the second model may be metadata models.
- system for persisting a model may include: a mapping of at least one property of a first model to a second model; and a repository for registering the first model, the repository configured to persist the mapping of the at least one property of the first model to the second model.
- At least one other property of the first model may be not mapped to the second model, and the repository configured to persist the at least one other property of the first model.
- the first model and/or the second model may each include a plurality of classes.
- the system may further include a storage mechanism for persisting the mapping of the at least one property of the first model to the second model, the storage mechanism including a reflective storage mechanism.
- the system may further include a schema for representing metadata models in a relational database, the schema persisting the mapping of the at least one property of the first model to the second model.
- the first model is revised by changing the schema, by changing one or more properties in the relational database, and/or by changing the mapping.
- the first model and the second model may be metadata models.
- a model driven metadata transformation architecture may include: a plurality of translation engines that use one or more model-to-model mappings to translate between one or more models; and a translation registry for dynamically selecting one of the plurality of translation engines.
- the translation engines may include one or more of a compiled language engine, an interpreted language engine, or an interpreted mapping engine.
- the model-to-model mappings may be between a hub and one or more views in hub-and-spoke architecture.
- the one or more model-to-model mappings may be user configurable.
- One of the model-to-model mappings may be configured after the corresponding models have been deployed.
- One of the model-to-model mappings may be repeated in a plurality of translation engines for translation between a hub and a plurality of identical views.
- Different model-to-model mappings may be realized in a plurality of translation engines for translation between a hub and a plurality of different views.
- a method for transforming metadata between models includes: receiving a request to translate metadata between a first model and a second model; retrieving a model-to-model mapping characterizing a translation between the first model and the second model; and translating the metadata from the first model to the second model using the model-to-model mapping.
- the model-to-model mapping may include one or more of a compiled language, an interpreted language, or a mapping adapted for translation by a translation engine.
- the model-to-model mapping may be between a hub and a view in a hub-and-spoke architecture.
- the method may further include providing a user interface for configuring the model-to-model mapping.
- the method may further include storing the model-to-model mapping in a registry for dynamic access.
- the method may further include configuring the model-to-model mapping after at least one of the first model and the second model have been deployed.
- the model-to-model mapping may be used concurrently by a plurality of translation engines for translation between a hub and a plurality of identical views.
- the method may further include registering a plurality of different model-to-model mappings wherein the different model-to-model mappings are used concurrently by a plurality of translation engines for translation between a hub and a plurality of different views.
- a method of managing metadata disclosed herein includes: organizing an object-oriented metadata model into an operational model that includes operational properties and a design model that includes design properties; storing the operational model in an operational repository; and storing the design model in a common repository.
- the method may further include time-stamping at least one item of metadata for the operational model.
- the common repository may support more than one version of the design model.
- the method may further include providing a metadata environment for user interaction with the model.
- the user environment may include a workspace for editing the model.
- the workspace may exclusive to a user and/or shared.
- the metadata environment may include a team space.
- the team space may support versioning of metadata instances.
- the metadata environment may reside locally on a user computer or on a remote server accessible to a user computer.
- the method may include dynamically comparing one or more different versions of the design model in the common repository.
- the common repository may support branching of versions of the design model.
- the method may include reconciling a plurality of versions of the design model and/or dynamically reconciling a plurality of versions of the design model.
- the method may include using the metadata model in a metadata service by asynchronously calling the metadata model through a message-oriented service, and/or using the metadata model in a metadata service by synchronously calling the metadata model through an application programming interface.
- the method may include concurrently executing a service that uses the metadata model, and/or using parallelism to execute a service that uses the model.
- a system for managing metadata as described herein may include: an object-oriented metadata model including an operational model having one or more operational properties of the metadata model and a design model having one or more design properties of the metadata model; an operational repository that stores the operational model; and a common repository that stores the design model
- the common repository may support more than one version of the design model.
- the system may include a metadata environment for user interaction with the model.
- the user environment may include a workspace for editing the model.
- the workspace may be exclusive to a user, or shared.
- the metadata environment may include a team space.
- the team space may support versioning of metadata instances.
- the metadata environment may reside locally on a user computer, or on a remote server.
- the common repository may support dynamic comparison one or more different versions of the design model.
- the common repository may support branching of versions of the design model.
- the common repository may support reconciliation of a plurality of versions of the design model.
- the common repository may support dynamic reconciliation of a plurality of versions of the design model.
- the system may include a metadata service that uses the metadata model by asynchronously calling the metadata model through a message-oriented metadata service, and/or a metadata service that uses the metadata model by synchronously calling the metadata model through an application programming interface.
- the metadata model may be used in a service that executes at least one of concurrently or in parallel.
- a method for reconciling metadata as disclosed herein may include: associating a reconciliation zone property with a metadata object, the reconciliation zone property identifying a reconciliation zone characterized by a common set of reconciliation rules; and reconciling a plurality of instances of the metadata object according to the common set of reconciliation rules to provide a reconciled instance of the metadata object within the reconciliation zone.
- the method may include defining a second reconciliation zone for reconciling the reconciled instance of the metadata object with one or more additional instances of the metadata object.
- the reconciliation zone may include instances of a plurality of metadata objects.
- the method may further include associating a match type with the reconciliation zone property, the match type defining a treatment of the instance of the metadata object.
- the method may further include associating an identity with the instance of the metadata object that uniquely identifies the instance of the metadata object within the reconciliation zone.
- the method may further include providing a reconciliation lineage for a metadata object.
- the reconciliation lineage may describe a path through one or more reconciliation zones, identify one or more data sources, identify one or more reconciliation rules, and/or include a history of instances of the metadata object.
- system for reconciling metadata may include: a reconciliation zone characterized by a common set of reconciliation rules; a plurality of instances of a metadata object including a reconciliation zone property that associates each one of the plurality of instances of the metadata object with the reconciliation zone; and a reconciliation engine that generates a reconciled instance of the metadata object within the reconciliation zone by reconciling the plurality of instances of the metadata object according to the common set of reconciliation rules for the reconciliation zone with which the plurality of instances of the metadata object are associated.
- the system may include a second reconciliation zone for reconciling the reconciled instance of the metadata object with one or more additional instances of the metadata object.
- the reconciliation zone may include instances of a plurality of metadata objects.
- a match type may define a treatment of the instances of the metadata object within the reconciliation zone.
- An identify associated with each instance of the metadata object may uniquely identify that instance of the metadata object within the reconciliation zone.
- a reconciliation lineage may be provided for a metadata object. The reconciliation lineage may describe a path through one or more reconciliation zones, identify one or more data sources, identify one or more reconciliation rules, and/or include a history of instances of the metadata object.
- a method for providing concurrency for metadata services for a data integration system may include: dividing a metadata service into a stream of objects; identifying a cluster of the objects having primarily internal references based upon metadata for the objects; executing the cluster of objects on a single one of a plurality of processors; identifying at least one object outside the cluster of objects; and executing the at least one object on another one of the plurality of processors.
- the objects may include at least one metadata model.
- the processors are on physically separate hardware.
- the service may include a reconciliation process that resolves metadata conflicts.
- the objects may include a metadata import.
- the primarily internal references may be identified using graphs of data dependencies.
- the service may be organized as a pipeline for concurrency.
- the pipeline may include at least an identify objects phase, a fetch candidates phase, a reconcile phase, a merge phase, and a store phase.
- a computer program product may include a computer useable medium including computer readable program code, wherein the computer readable program code when executed on one or more computers causes the one or more computers to perform any one or more of the methods above.
- data source or “data target” are intended to have the broadest possible meaning consistent with these terms, and shall include a database, a plurality of databases, a repository information manager, a queue, a message service, a repository, a data facility, a data storage facility, a data provider, a website, a server, a computer, a computer storage facility, a CD, a DVD, a mobile storage facility, a central storage facility, a hard disk, a multiple coordinating data storage facilities, RAM, ROM, flash memory, a memory card, a temporary memory facility, a permanent memory facility, magnetic tape, a locally connected computing facility, a remotely connected computing facility, a wireless facility, a wired facility, a mobile facility, a central facility, a web browser, a client, a laptop, a personal digital assistant ("PDA"), a telephone, a cellular phone, a mobile phone, an information platform, an analysis facility, a processing facility, a business enterprise system or other facility where data is handled
- PDA personal digital
- Enterprise Java Bean shall include the server-side component architecture for the J2EE platform.
- EJBs support rapid and simplified development of distributed, transactional, secure and portable Java applications.
- EJBs support a container architecture that allows concurrent consumption of messages and provide support for distributed transactions, so that database updates, message processing, and connections to enterprise systems using the J2EE architecture can participate in the same transaction context.
- JMS Java Message Service
- JCA Java Connector Architecture of the J2EE platform described more particularly below. It should be appreciated that, while EJB, JMS, and JCA are commonly used software tools in contemporary distributed transaction environments, any platform, system, or architecture providing similar functionality may be employed with the data integration systems described herein.
- Real time shall include periods of time that approximate the duration of a business transaction or business and shall include processes or services that occur during a business operation or business process, as opposed to occurring off-line, such as in a nightly batch processing operation. Depending on the duration of the business process, real time might include seconds, fractions of seconds, minutes, hours, or even days.
- Business process shall include any methods, service, operations, processes or transactions that can be performed by a business, including, without limitation, sales, marketing, fulfillment, inventory management, pricing, product design, professional services, financial services, administration, finance, underwriting, analysis, contracting, information technology services, data storage, data mining, delivery of information, routing of goods, scheduling, communications, investments, transactions, offerings, promotions, advertisements, offers, engineering, manufacturing, supply chain management, human resources management, data processing, data integration, work flow administration, software production, hardware production, development of new products, research, development, strategy functions, quality control and assurance, packaging, logistics, customer relationship management, handling rebates and returns, customer support, product maintenance, telemarketing, corporate communications, investor relations, and many others.
- Service oriented architecture shall include services that form part of the infrastructure of a business enterprise.
- services can become building blocks for application development and deployment, allowing rapid application development and avoiding redundant code.
- Each service may embody a set of business logic or business rules that can be bound to the surrounding environment, such as the source of the data inputs for the service or the targets for the data outputs of the service.
- SOA Service oriented architecture
- Methods shall include data that brings context to the data being processed, data about the data, information pertaining to the context of related information, information pertaining to the origin of data, information pertaining to the location of data, information pertaining to the meaning of data, information pertaining to the age of data, information pertaining to the heading of data, information pertaining to the units of data, information pertaining to the field of data, and/or information pertaining to any other information relating to the context of the data
- WSDL Web Services Description Language
- WSDL includes an XML format for describing network services (often web services) as a set of endpomts operating on messages containing either document- o ⁇ ented or procedure-oriented information
- the operations and messages are described abstractly, and then bound to a concrete network protocol and message format to define an endpomt
- Related concrete endpomts are combined into abstract endpomts (services)
- WSDL is extensible to allow desc ⁇ ption of endpomts and their messages regardless of what message formats or network protocols are used to communicate
- Fig 1 is a schematic diagram of a business enterprise with a plurality of business processes, each of which may include a plurality of different computer applications and data sources
- Fig 2 is a schematic diagram showing data integration across a plurality of business processes of a business enterprise
- Fig 3 is a schematic diagram showing an architecture for providing data integration for a plurality of data sources for a business enterprise
- Fig 4 shows an architecture for a metadata management system
- Fig 5 shows communication through a view model and data model to query a database
- Fig 6 shows a translation engine being accessed to translate a query result for a view model
- Fig 7 shows a translation engine being accessed to translate a query result for an external service
- Fig 8 shows a static model mapping
- Fig 9 shows an extensible model mapping
- Fig 10 shows a combination of model mappings
- Fig 11 depicts an architecture that exposes a plurality of internal services to external metadata
- Fig 12 depicts a mapped-model driven transformation of metadata
- Fig 13 shows interaction with a metadata environment
- Fig 14 shows a common repository storing a plurality of versions of metadata
- Fig 15 depicts a client dynamically comparing metadata versions in a versioned repository
- Fig 16A shows a process of metadata reconciliation
- Fig 16B depicts phased ieconciliation across reconciliation zones
- Fig 17 depicts reconciliation of versioned metadata objects
- Fig 18 shows an example of the use of concurrency in a metadata process
- Fig 19 is a diagram of entities involved in a query process from a user interface 6702 to a metadata database 6712
- Fig 20 shows the entities involved in a process of extending a metadata database from a metadata model
- Fig 21 shows the entities involved in a process for accessing a repository from a tool
- Fig 22 shows the entities involved in a process by which a tool accesses versioned and unversioned metadata models
- Fig 23 shows the entities involved in a process by which a user interface accesses multiple versions of metadata m a common repository
- Fig 24 shows the entities involved m a reconciliation process for versions of metadata
- Fig. 25 shows the entities involved in a reconciliation process using concurrency.
- the invention(s) disclosed herein can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
- the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- the invention(s) can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- I/O devices can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- Fig. 1 represents a platform 100 for facilitating integration of various data of a business enterprise.
- the platform includes a plurality of business processes, each of which may include a plurality of different computer applications and data sources.
- the platform may include several data sources 102, which may be data sources such as those described above. These data sources may include a wide variety of data types from a wide variety of physical locations.
- the data source may include systems from providers such as such as Sybase,
- the data sources 102 may include systems using database products or standards such as IMS, DB2, ADABAS, VSAM, MD Series, UDB, XML, complex flat files, or FTP files.
- the data sources 102 may include files created or used by applications such as Microsoft Outlook, Microsoft Word, Microsoft Excel, Microsoft Access, as well as files in standard formats such as ASCII, CSV, GIF, TIF, PNG, PDF, and so forth.
- the data sources 102 may come from various locations or they may be centrally located.
- the data supplied from the data sources 102 may come in various forms and have different formats that may or may not be compatible with one another.
- Data targets are discussed later in this description. In general, these data targets may be any of the data sources 102 noted above. This difference in nomenclature typically denotes whether a data system provides data or receives data in a data integration process. However, it should be appreciated that this distinction is not intended to convey any difference in capability between data sources and data targets (unless specifically stated otherwise), since in a conventional data integration system, data sources may receive data and data targets may provide data.
- the platform illustrated in Fig.l also includes a data integration system 104.
- the data integration system may, for example, facilitate the collection of data from the data sources 102 as the result of a query or retrieval command the data integration system 104 receives.
- the data integration system 104 may send commands to one or more of the data sources 102 such that the data source(s) provides data to the data integration system 104. Since the data received may be in multiple formats including varying metadata, the data integration system may reconfigure the received data such that it can be later combined for integrated processing.
- the functions that may be performed by the data integration system 104 are described in more detail below.
- the platform 100 also includes several retrieval systems 108.
- the retrieval systems 108 may include databases or processing platforms used to further manipulate the data communicated from the data integration system 104.
- the data integration system 104 may cleanse, combine, transform or otherwise manipulate the data it receives from the data sources 102 such that a retrieval system 108 can use the processed data to produce reports 110 useful to the business.
- the reports 110 may be used to report data associations, answer complex queries, answer simple queries, or form other reports useful to the business or user, and may include raw data, tables, charts, graphs, and any other representations of data from the retrieval systems 108.
- the platform 100 may also include a database or data base management system 112.
- the database 112 may be used to store information temporally, temporarily, or for permanent or long-term storage.
- the data integration system 104 may collect data from one or more data sources 102 and transform the data into forms that are compatible with one another or compatible to be combined with one another. Once the data is transformed, the data integration system 104 may store the data in the database 112 in a decomposed form, combined form or other form for later retrieval.
- Fig. 2 is a schematic diagram showing data integration across a plurality of entities and business processes of a business enterprise.
- the data integration system 104 facilitates the information flowing between user interface systems 202 and data sources 102.
- the data integration system 104 may receive queries from the interface systems 202, where the queries necessitate the extraction and possibly transformation of data residing in one or more of the data sources 102.
- the interface systems 202 may include any device or program for communicating with the data integration system 104, such as a web browser operating on a laptop or desktop computer, a cell phone, a personal digital assistant ("PDA"), a networked platform and devices attached thereto, or any other device or system that might interface with the data integration system 104.
- PDA personal digital assistant
- a user may be operating a PDA and make a request for information to the data integration system 104 over a WiFi or Wireless Access Protocol/Wireless Markup Language ("WAP/WML") interface.
- the data integration system 104 may receive the request and generate any required queries to access information from a website or other data source 102 such as an FTP file site.
- the data from the data sources 102 may be extracted and transformed into a format compatible with the requesting interface system 202 (a PDA in this example) and then communicated to the interface system 202 for user viewing and manipulation.
- the data may have previously been extracted from the data sources and stored in a separate database 112, which may be a data warehouse or other data facility used by the data integration system 104.
- the data may have been stored in the database 112 in a transformed condition or in its original state.
- the data may be stored in a transformed condition such that the data from a number of data sources 102 can be combined in another transformation process.
- a query from the PDA may be transmitted to the data integration system 104 and the data integration system 104 may extract the information from the database 112. Following the extraction, the data integration system 104 may transform the data into a combined format compatible with the PDA before transmission to the PDA.
- Fig. 3 is a schematic diagram showing an architecture for providing data integration for a plurality of data sources 102 for a business enterprise.
- An embodiment of a data integration system 104 may include a discover data stage 302 to perform, possibly among other processes, extraction of data from a data source and analysis of column values and table structures for source data.
- a discover data stage 302 may also generate recommendations about table structure, relationships, and keys for a data target. More sophisticated profiling and auditing functions may include date range validation, accuracy of computations, accuracy of if-then evaluations, and so forth.
- the discover data stage 302 may normalize data, such as by eliminating redundant dependencies and other anomalies in the source data.
- the discover data stage 302 may provide additional functions, such as drill down to exceptions within a data source 102 for further analysis, or enabling direct profiling of mainframe data.
- a non-limiting example of a commercial embodiment of a discover data stage 302 may be found in IBM's Websphere ProfileStage product.
- the data integration system 104 may also include a data preparation stage 304 where the data is prepared, standardized, matched, or otherwise manipulated to produce quality data to be later transformed.
- the data preparation stage 304 may perform generic data quality functions, such as reconciling inconsistencies or checking for correct matches (including one-to-one matches, one-to-many matches, and deduplication) within data.
- the data preparation stage 304 may also provide specific data enhancement functions. For example, the data preparation stage 304 may ensure that addresses conform to multinational postal references for improved international communication.
- the data preparation stage 304 may conform location data to multinational geocoding standards for spatial information management.
- the data preparation stage may modify or add to addresses to ensure that address information qualifies for U.S. Postal Service mail rate discounts under Government Certified U.S. Address Correction. Similar analysis and data revision may be provided for Canadian and Australian postal systems, which provide discount rates for properly addressed mail.
- a non-limiting example of a commercial embodiment of a data preparation stage 304 may be found in IBM's Websphere QualityStage product.
- the data integration system may also include a data transformation stage 308 to transform, enrich and deliver transformed data.
- the data transformation stage 308 may perform transitional services such as reorganization and reformatting of data, and perform calculations based on business rules and algorithms of the system user.
- the data transformation stage 308 may also organize target data into subsets known as datamarts or cubes for more highly tuned processing of data in certain analytical contexts.
- the data transformation stage 308 may employ bridges, translators, or other interfaces (as discussed generally below) to span various software and hardware architectures of various data sources and data targets used by the data integration system 104.
- the data transformation stage 308 may include a graphical user interface, a command line interface, or some combination of these, to design data integration jobs across the platform 100.
- a non-limiting example of a commercial embodiment of a data transformation stage 308 may be found in IBM's Websphere DataStage product.
- the stages 302, 304, 308 of the data integration system 104 may be executed using a parallel execution system 310 or in a serial or combination manner to optimize the performance of the system 104.
- the data integration system 104 may also include a metadata management system 312 for managing metadata associated with data sources 102.
- the metadata management system 312 may provide for interchange, integration, management, and analysis of metadata across all of the tools in a data integration environment.
- a metadata management system 312 may provide common, universally accessible views of data in disparate sources, such as IBM's Websphere ODBC MetaBroker, CA ER win, IBM Websphere ProfileStage, IBM Websphere DataStage, IBM Websphere QualityStage, IBM DB2 Cube Views, and Cognos Impromptu.
- the metadata management system 312 may also provide analysis tools for data lineage and impact analysis for changes to data structures.
- the metadata management system 312 may further be used to prepare a business data glossary of data definitions, algorithms, and business contexts for data within the data integration system 104, which glossary may be published for use throughout an enterprise.
- a non-limiting example of a commercial embodiment of a metadata management system 312 may be found in IBM's Websphere MetaStage product.
- mapping refers to a design time activity of relating metadata and meta-meta data between views, models, or model instances, while “transformation” refers to the corresponding runtime activity. It should also be noted that the following description relates to a metadata management system where the atomic data items are actually metadata for data sources being modeled.
- Meta-metadata metadata within the metadata management system
- metadata is actually metadata describing this metadata, also known as meta-metadata.
- meta-metadata metadata describing this metadata
- metadata also known as meta-metadata.
- metadata metadata describing this metadata
- meta-metadata hierarchy data represents the underlying data for one or more data sources/targets.
- metadata may be referred to simply as data (being the data managed by the metadata management system)
- meta-metadata is sometimes correspondingly referred to simply as metadata, i.e., metadata from the perspective of the models within the metadata management system. More generally, the usage should be clear from the context.
- Figure 4 shows an architecture for a metadata management system 5202, which may be, for example, any of the metadata management systems or metadata facilities 312 described above.
- the metadata management system 5202 may include a plurality of external users 5204 such as tools or clients, communicating with a hub 5206 through a plurality of views 5208, a repository 5210 including at least one model 5212 that includes at least one operational class 5214 relating to operational metadata for the model 5212 and/or at least one design class 5216 relating to design metadata for the model 5216.
- Metadata services 5218 may be provided for interacting with the model 5212 in the repository 5210.
- the users 5204 may be any of the interface systems 202 described above, or any other client device, tool or other program of software interface, through which a user may run queries or otherwise investigate data in a database.
- the users 5204 may run a query using a view 5208 adapted for communication between a data model employed by the user 5204 and a data model employed by the hub 5260.
- the view 5208 may, for example, include fields, data types, data hierarchy, data relationships, temporal information, source information, or any other information relevant to the manner in which data is displayed or used by the users 5202, as well as any appropriate mappings between the data model in the view 5208 provided to an external user 5204 and the data model employed internally by the hub 5260. While only two views 5208 are illustrated in Fig.
- any number of views 5208 may be used, and that the views 5208 may be the same views 5208, as where there are more than one of the same type of external user 5204, or different views 5208 where there are different external users 5204, or any number and combination of these consistent with the processing capabilities of the metadata management system.
- an external user 5202 may use data or metadata unique to the user 5202, with no corresponding elements within the hub 5260.
- Erwin design tools employ object "coordinates" that are unique to Erwin, and describe where an object appears in a graphical "canvas.”
- the hub 5260 may be designed to handle special cases by supporting extensions to the hub model in a manner transparent to the user 5202.
- the view 5208 may also, or instead, provide direct mapping to appropriate external data in addition to a connection to the hub 5260.
- the hub 5206 may generally employ a data model 5212 defined by the subject matter of the data or its business context. Thus it is generally expected that the hub model for data would not change frequently within a single application. Where changes are made to the hub model, corresponding updates may be required for one or more views 5208.
- the hub 5260 may interact with underlying data (e.g., metadata for enterprise data) using one or more models 5212 stored within the repository 5210. Although the use of a hub for design classes 5216 of a repository 5210 is one useful architecture with broad applicability, it should be appreciated that the operational classes 5214 typically would not require such a hub 5260. More generally, the metadata management system 5202 described herein may be designed without any hub 5260.
- This architecture may be useful, for example, where there is little or no commonality between the design models of various views.
- other techniques may be employed for communication between various views in a metadata management system 5202, such as dynamically generating a non-persistent, logical hub as a central connector.
- Other principles of the systems described herein may be applicable whether or not a central hub 5260 is employed in a metadata management system 5202.
- the models 5212 may be stored and manipulated using object-oriented techniques in a platform such as Eclipse and the Eclipse Modeling Framework ("EMF").
- a model 5212 may include metadata and mappings to relevant structures in data sources and/or targets, and any other useful, more abstract modeling of metadata. These aspects of the model may be contained in a repository object that is persistently stored within the repository 5210.
- the repository 5210 may store one or more models 5212 that contain operational classes 5214 and design classes 5216.
- the models may include metadata, meta-metadata, or any other useful descriptive or functional characteristics of data.
- a model 5212 may contain values for weight in units such as ounces.
- a system user wishes to implement a new data source or integrate an existing data source that specifies weight in pounds, this information may be included in the model 5212 so that corresponding metadata for these disparate sources can be consistently treated within the hub 5206, and presented to external users 5204 through one or more views 5208 that may provide different perspectives (or the same perspective) on the data.
- a model 5212 may contain any information about underlying data and metadata useful for integration and any other uses contemplated by the metadata management system. Models 5210 can usefully capture information about data and how it changes to enable consistent treatment and extensibility of data usage across an enterprise or among enterprises.
- a model 5212 When a model 5212 is created in the repository 5210, it may automatically be partitioned into design and operational components which may be independently managed while being collectively and/or uniformly queried.
- operational classes 5214 may be stored for the model 5212 and inherit any appropriate properties, methods, and so on, among classes.
- the operational classes 5214 may, in particular, contain model operational aspects of external processes, or provide persistent storage of runtime results.
- the operational classes 5214 may be time-stamped or otherwise labeled for unique reference. It will be appreciated that, while the Eclipse platform is one useful tool for building and maintaining the models described herein, any object-oriented tools or techniques may be similarly employed.
- the term “properties” will be used generally to refer to various characteristics of object-oriented descriptions, or other similar descriptions such as elements of a Universal Markup Language (“UML”) class model, including classes, sub-classes, packages, package structure, properties, attributes, methods, relationships, inheritance, and so on.
- UML Universal Markup Language
- an operational class, package structure, or the like may be an operational property as that term is used herein.
- Design classes 5216 may also be instantiated from the model 5212 and inherit any properties, methods, and so on. Information within these design classes 5216 may also include versioning information so that multiple object instances may be maintained either sequentially or in branches, or combinations thereof.
- the versioned metadata objects may be manipulated, edited, updated, or otherwise controlled and managed by users according to the demands and design goals for the enterprise computing system. Using version control or similar techniques, metadata objects for a design class 5216 may be shared, or checked out to individual users or teams. In general, different versions may be employed as different designs are tried, or when there are changes to underlying data. It will be appreciated that various designs may be reconciled, and branches merged, prior to creation of a runtime executable.
- An enterprise computing system may include a data integration system 104.
- the enterprise computing system may include any combination of computers, mainframes, portable devices, data sources, and other devices, connected locally through one or more local area networks and/or connected remotely through one or more wide area or public networks using, for example, a virtual private network over the Internet.
- Devices within the enterprise computing system may be interconnected into a single enterprise to share data, resources, communications, and information technology management.
- resources within the enterprise computing system are used by a common entity, such as a business, association, or governmental body, or university.
- resources of the enterprise computing system may be owned (or leased) and used by a number of different entities, such as where application service provider offers on-demand access to remotely executing applications.
- the enterprise computing system may also include a plurality of tools, which access a common data structure, termed herein a repository information manager (“RIM”) (also referred to below as the "hub") through respective translation engines (which, in a bridge-based system, may be bridges).
- RIM repository information manager
- the RIM may include any of the data sources 102 described above.
- the tools generally comprise, for example, diverse types of database management systems and other applications programs that access shared data stored in the RIM.
- the tools, RIM, and translation engines may be processed and maintained on a single computer system, or they may be processed and maintained on a number of computer systems which may be interconnected by, for example, a network, which transfers data access requests, translated data access requests, and responses between the different components.
- the tools may generate data access requests to initiate a data access operation, that is, a retrieval of data from or storage of data in the RIM.
- Data may be stored in the RIM in an atomic data model and format that will be described below.
- the tools will view the data stored in the RIM in a variety of diverse characteristic data models and formats, as will be described below, and each translation engine, upon receiving a data access request, will translate the data between respective tool's characteristic model and format and the atomic model format of RIM as necessary.
- the translation engine will identify one or more atomic data items in the RIM that jointly comprise the data item to be retrieved in response to the access request, and will enable the RIM to provide the atomic data items to one of the translation engines.
- the translation engine will aggregate the atomic data items that it receives from the RIM into one or more data items as required by the tool's characteristic model and format, or "view" of the data, and provide the aggregated data items to the tool that issued the access request.
- the translation engine may receive the data to be stored in a characteristic model and format for one of the tools.
- the translation engine may translate the data into the atomic model and format for the RIM, and provide the translated data to the RIM for storage. If the data storage access request enables data to be updated, the RIM may substitute the newly-supplied data from the translation engine for the current data. On the other hand, if the data storage access request represents new data, the RIM may add the data, in the atomic format as provided by the translation engine, to the current data in the RIM.
- the metadata services 5218 may be used to create, edit, delete, or otherwise manipulate objects, classes
- the services 5218 may be presented to a user through a user interface, command line interface, programming interface, or other interface.
- the services 5218 may provide functions such as versioning, branching, merging, and any other operations supported within repository 5210. Some of these operations are described in greater detail below.
- the metadata services 5218 may also include, for example, data analysis services such as impact analysis (how a change to one model type instance affects other type instances in the model), operational analysis (history of executable objects through event metadata), data lineage (history of data movement in a warehouse or across the enterprise computing system), version drilldown (investigation of version history for metadata objects), object differencing (investigation of differences between metadata objects), and object merge (combining two objects of the same class according to specified rules).
- the metadata services 5218 may also include import and export services for transforming metadata, for example, as it is moved into and/or out of the repository 5210.
- the metadata services 5218 may be realized using, for example, a J2EE platform, and provided to users through a service-oriented architecture such as the SOA. Similarly, transactions within the repository 5210 may be managed using, for example the bean container within a J2EE Application Server. It will be appreciated that the services 5218 may also be provided to an end user as one or more tools in a user interface.
- metamodeling tool or tools that provide for defining mappings between metamodels, generate interfaces for metamodels, and facilitate implementation and transformation of metadata models.
- the metamodeling tools may be provided through a graphical user interface providing access to a number of related functions. For example, the interface may provide tools to define, validate, test and analyze metamodels and mappings, as well as metadata model output.
- the interface may also provide tools for documentation of metamodels, metamodel mappings, and any instances of metadata models generated through the metamodeling tools. Metamodeling tools could be usefully employed, for example, to deploy new versions of an enterprise model. Diagramming, modeling and mapping may be supported by a service such as IBM Rational XDE.
- the metamodeling tools may be deployed, for example, as services in a service-oriented.
- the metamodeling tools may provide centrally managed mapping specifications for metadata models, with synchronization, versioning, history tracking and other appropriate capabilities consistent with the metadata tools discussed above.
- a mapping model may represent object transformations between a hub and a view (or other models)
- the mapping model from this metamodeling perspective may also, or instead, represent mapping between different metadata models that may ultimately be employed in transformations between the models themselves, such as when upgrading to a newer version of a metadata model.
- the metamodeling tools may, for example provide an independent specification language separate from, and loosely coupled to, the model definition, to allow for development control and implementation flexibility.
- the metamodeling tools may advantageously provide for dynamic browsing of mapping specifications within a development environment, and may provide tools to automatically generate documentation at various levels of detail.
- a corresponding test framework may be developed to generate test metadata and dynamically execute mappings so that immediate results can be obtained and incorporated into active development.
- the repository 5210 may be logically and/or physically separated into two or more repositories, such as a common repository (not shown) for persistent storage of design classes 5216 and properties and an operational repository (not shown) for persistent storage of operational classes 5214 and properties.
- a common repository not shown
- an operational repository not shown
- Operational and design classes may be distinguished within one physical or logical repository using annotations within classes to define their association.
- Figure 5 shows communication with a database (of metadata) through one or more views or models.
- a service 5302, a user interface 5303, or any other interface may communicate with a database 5312, which may be any of the data sources 102 described above, such as to submit a query to the database 5312.
- the communication may be conducted through metadata models, such as a view 5308 and a hub 5310, provided by a repository 5304, such as the repository 5210 described above.
- These metadata models may include any information about data, such as fields, field names, field attributes, data types, data hierarchy, data relationships, temporal information, source information, or any other information relevant to the structure, location, or use of data, or metadata about such data (i.e., meta-metadata).
- the service 5302 may generate a query using a view of data native to that service 5302, i.e., having a structure and format defined by the service 5302.
- This query may be structured by the service 5302 without any information about the structure of data in the database 5312.
- the view 5308 provided by the repository 5304 to the requesting service 5302 may be mapped to a hub 5310 that provides a model for consistent representation of metadata to a plurality of different views, including the view 5308 receiving the query.
- the hub 5310 may in turn be mapped to a structure used internally by the database 5312.
- mapping information between the view 5308, the hub 5310, and the database 5312 the query may be advantageously translated into a query using a data model or syntax native to the database 5312. This may result in significant performance advantages because the query can benefit from any optimization or tuning for the database 5312.
- mapping information may be queried independently to explore possible optimizations for a particular query 5302.
- a user interface 5303 may communicate with the database 5312 through a number of models provided by the repository 5304.
- a user may create a query in the user interface 5303 using fields with a structure and format corresponding to the presentation of data in the user interface 5303.
- the query may be received by the view 5308 and translated into a query for the hub 5310 using any available mapping information, and in turn translated into a query for the database 5312 using any available mapping information to present the entire query in a syntax native to the database 5312.
- a single view 5308 is shown for both the user interface 5303 and the service 5302, each may have its own external model by which it views data, and these models may be maintained and provided by the repository 5304.
- the query may be run against the database 5312 to produce results that may be returned through the hub 5310 and the view 5308 in a form that is readily useable by the user interface 5303. More generally, while a two-tiered structure is depicted in Fig. 5, consistent with a hub-and-spoke architecture of data integration systems, any number of metadata models in any relative relationships to one another may benefit from the techniques described herein for accessing a database, provided mapping information is available concerning relationships among metadata in the various models.
- Fig. 6 shows repository services 5304 including a translation engine that provides metadata translation services between the view 5308 and the hub 5310.
- the translation engine may provide translations of queries, such as those described above, between various native metadata structures used by the different models and the database 5312, as well as transformation of objects between models.
- the translation engine, or a plurality of translation engines may be provided as a service within a repository 5304, as generally depicted in Fig. 6, where the translation engines may be registered and/or stored.
- the repository services 5304 may access a translation engine for translation of the query into a format for the hub 5310.
- a similar translation may be provided between the hub 5310 and the database 5312, More generally, a translation engine may receive queries in a number of query languages or programming languages from external models, and use mapping information available for the respective models and the database 5312 to translate the queries into queries in a structure optimized for the database 5312. Thus queries may generally be expressed in terms native to a view 5308 (or other model), and presented to the database 5312 in terms native to the database 5312.
- Fig 7 shows a repository service providing a translation engine for a plurality of external services 5302.
- the services 5202 may be, for example, a data transformation stage 308, data preparation stage 304, RTI service 2704, a user interface, or any other service or external client that might perform a query on metadata in the database 5312.
- the services 5302 may present a query to the view 5308 in a syntax native to the view 5308.
- the translation engine may translate the query into a syntax native to the hub 5310, which may in turn be translated into a query using a syntax native to the database 5312.
- the query results may be returned to the services 5202 by accessing the translation engine to translate the query results back into a syntax native to the services 5202. In this way services 5202 may efficiently communicate with the database 5312 using their own native syntax.
- syntax refers to any syntax, structure, format, programming language, and/or interface that might be employed to represent queries either externally, such as to services or a database, or internally, such as among metadata models.
- Figures 8-10 depict how a metadata model may be mapped to a schema in a relational database for persistent storage.
- a metadata model may be described using object-oriented relationship management tools.
- the in-memory model may be mapped to a schema in a relational database using a variety of techniques discussed below. This strategy is particularly amenable to management using tools such as the Apache
- FIG. 8 depicts the correspondence between a metadata model and a relational database.
- the metadata model 5602 may include a plurality of object-oriented classes 5604 defining various properties of the model 5602, such as information about metadata including fields, field names, field attributes, data types, data hierarchy, data relationships, temporal information, source information, or any other information relevant to the structure, location, or use of data.
- the database 5608 may include a plurality of tables 5610 representing a relational schema used to physically store the model 5602.
- the mapping between the model 5602 and the database 5608 may be through a one-to-one mapping of classes 5604 in the model 5602 to tables 5610 in the database 5608.
- every aspect of the classes 5602 has a corresponding aspect in one of the tables 5608 so that the model 5602 structure is literally reproduced in the database 5608.
- a conceptually linear translation between the model 5602 and the database 5608 can be maintained.
- Such a representation may generally provide higher performance, and may be directly compiled at run time, or readily pre- compiled, however, changes to the model 5602 may require reconstruction of the entire database 5608 and corresponding changes to compiled versions
- Figure 9 shows an alternative mapping of a metadata model to a relational database
- the metadata model 5702 may be, for example, the metadata model 5602 described above
- the mapping between the model 5702 and the database 5704, as depicted generally by a vertical arrow m the figure, may be from properties of the classes within the model 5702 into entries m tables 5706 withm the database 5704
- the tables 5706 may be organized to optimize certain uses, such as by organizing version data or runtime artifacts in separate physical tables, regardless of the object-oriented structure employed by the model 5702
- This approach advantageously permits an arbitrary model to be fully characte ⁇ zed withm a gene ⁇ c table structure
- This approach may enhance extensibility because any change to the model 5702 will only require updates to any affected entries m the database 5702, such as one or two row updates, without otherwise affecting the desc ⁇ ption stored in the tables 5706 In general, this represents a design trade-off between relatively high performance of the database 5704
- Figure 10 shows a combination of the model mappings described in Figs 8 and 9 above
- the metadata model 5802 may be, for example, the metadata model 5602 desc ⁇ bed above
- the mapping between the model 5802 and the database 5808 may be partially from classes 5804 withm the model 5802 directly to tables 5810 withm the database 5808 having a corresponding structure, as desc ⁇ bed above with reference to Fig 8
- the model 5802 may be modified by a user, such as by adding a property 5806 to the class 5804 A corresponding change may be made to the model stored m the database 5808, such as by recording descriptive entries 5812 in the gene ⁇ c table 5814 as desc ⁇ bed above with reference to Fig 9
- the static portions of a model may be mapped to a more performant, fixed schema, while the non-static or user configurable portions of the model may be mapped to an extensible, desc ⁇ ptive schema In this manner, the relational schema for
- the gene ⁇ c structure desc ⁇ bed above may provide a reflective storage mechanism for extensible models
- the storage mechanism may "understand" its environment, and may look to the model desc ⁇ ption to determine related classes, attributes, mappings, and the like for any object
- These reflective capabilities may be used to provide a higher-level design environment where a schema such as the gene ⁇ c table format described above can persist model properties in a manner that accommodates extension Figure 11 depicts an architecture that exposes a plurality of internal services to external metadata.
- metadata may reside outside the metadata models managed by the metadata management system described herein, such as where data is shared between separate enterprises or enterprise applications.
- An architecture for accessing such external metadata may include external metadata 5902 with a first view 5904, a hub 5906, and a second view 5908 to a plurality of internal services 5910.
- the metadata management system may provide a first view 5904 of the external metadata 5902, which may in turn be connected to the hub 5906 to provide a common internal model for the external metadata 5902.
- the internal services 5910 may be similarly mapped to the hub 5906 through their own view of metadata, the second view 5908. Through these interconnected models 5904, 5906, 5908, the internal services 5910 may access the external metadata 5902 in a form native to the internal services 5910.
- the internal services 5910 may, in turn, be deployed in a services-oriented architecture to provide access to the external metadata 5902 as a service within the metadata management system, or more generally throughout the enterprise.
- Figure 12 depicts a mapped-model driven transformation of metadata using an interpreted mapping to translate between metadata models such as a view and a hub.
- a metadata management system 6000 may include a hub 6002, one or more translation engines 6004, and one or more views 6006, 6008.
- the translation engines 6004 may include mapping models 6010 characterizing one or more mappings between the hub 6002 and the views 6006, 6008. These models may be interpreted when a request is received to determine, using the mapping model 6010, how an instance of an object should be expressed to the requester.
- the mapping model 6010 may be expressed in a number of forms, including as a model (e.g., a data structure, such as Java classes or EMF objects or instances), which may provide greater design flexibility, or as compiled code, which may provide greater execution efficiency to the translation engines 6004, or as interpreted code. More generally, a single model-to-model mapping, or mapping model 6010, may be instantiated in any number of different translation engines 6004. At the same time, different translation engines 6004 may instantiate any number of different mapping models 6010 in any number of forms ranging from abstract model to compiled code. Mapping models 6010 may be registered in a translation registry (not shown) for the translation engines 6004 to provide common access and consistency.
- a view-to-hub mapping is typically generated as a static mapping that does not change once it is deployed.
- the mapping may be interpreted directly when instances of metadata are moved from a view to a hub, or vice versa.
- the view may be represented internally as, for example, Java classes, Java code, or some interpretation of the underlying model.
- the mapping can be interpreted in various forms, such as Java code, Jython (Java- based scripting), and the like.
- the request When a request is received, the request may be parameterized by the view model, the mapping model, and the hub model.
- the model-driven translation engine can receive an object expressed in one of the models and return objects expressed in another one of the models.
- the hub may be an object-oriented construct accessed using interpreted Java code.
- the views 6006, 6008 may be interpreted with Java or some other interpreted programming language.
- the translation engine 6004 may use the metadata model mapping between the hub 6002 and the views 6006, 6008 to move requests and object instances between the hub and the views 6006, 6008.
- the translation engine 6004 may be dynamically modified by a user in a manual operation, or automatically (or manually) in response to a change in one or more of the metadata models or objects. It should be appreciated that, whether interpreted, compiled or otherwise executed the software or software engine that interprets/executes the model may be synchronous or asynchronous. In an asynchronous environment, access to the model is through a messaging service or other asynchronous technique In a synchronous environment, calls may be made directly to the engine through an application programming interface or other synchronous interface to the engine
- FIG. 13 shows interaction with a metadata environment
- a model 6102 may be represented as unversioned classes 6104 (stored in an operational repository) and versioned classes 6106 (stored m a common repository)
- a user metadata environment 6108 may be provided for users 6110 to interact with the model
- An "environment” as used in the following description, is intended to refer to underlying model data and other contextual information for a model or metadata, which one or more users 6110 may view and manipulate through any suitable graphical user interface, command line interface, or other programmatic interface for viewing, querying, and manipulating models and model data, including stored instances of models and metadata, whether in volatile or nonvolatile memory or both, and including operational properties and design properties thereof, along with any versions ot any of the above
- the general term “environment” (or “user environment”) is intended to refer gene ⁇ cally to any model context through which one or moie users might interact with metadata
- several environments are specifically contemplated, as desc ⁇ be below The examples that follow do not limit the number and variety of
- the model 6102 may be, for example, any of the views or hubs described above, or any other metadata model
- the model may include operational classes and attributes as well as design attributes and classes
- a model 6102 may be stored in two different reposito ⁇ es according to the purpose of various model classes
- an operational repository may be configured for stonng metadata results for jobs executed using a model
- a common repository may be configured to support collaboration and iterative design processes
- the operational and common repositories may be physically and/or logically separate, and that each is defined in part by the subset of model classes that are stored therein, and in part by the services or methods used to access each
- the users 6110 may interact with the metadata environment 6108 in a number of different modes, such as a workspace or a team space
- the workspace also referred to as a sandbox, may provide live editing to models in an unversioned environment where, for example, metadata changes to design properties are either saved as a new model or overwritten to an existing model
- the workspace may exist locally on a user's computer, or remotely on a server where the user may interact with metadata
- placing a model in a workspace would lock the model for other potential users
- the workspace may provide shared use, such that more than one user may edit and save changes to the workspace
- the team space may provide versionmg, such that multiple versions may be checked out, checked in, branched, and so on
- the team space may provide a metadata environment for all of the metadata versionmg capabilities discussed above
- a versioned metadata environment may support versionmg of metadata that is created or edited by individual users
- a user of the versioned metadata environment may check out a model, and check the model back m as a new version
- the team space may enable collaborative and/or sequential editing to metadata with version control
- a user interface may also provide access to an event space, which is the metadata environment 6108 associated with operational properties and/or the operation repository described above
- the user environment 6108 may also be, or include, a federated user environment that provides a centralized, global environment for a number of reposito ⁇ es across an enterprise
- the federated user environment may provide a common view of different reposito ⁇ es, or may represent each repository separately
- the users 6110 may be, for example, human users interacting with the metadata environment 6108 through a graphical or command line interface, or a program or service accessing the metadata models in the repository, such as the discover data stage 302, the data preparation stage 304, or the data transformation stage 308 described above.
- Figure 14 depicts a common repository 6202 storing a plurality of versions of metadata 6204.
- the metadata 6204 may be, for example, metadata for the views and hubs discussed above.
- the metadata database 6206 may be any of the data sources 102 described above. Each version of the metadata 6204 may provide a different, but related, version of metadata stored in a metadata database 6206. The versions of metadata 6204 may be created, for example, by a team of developers working on a data integration project, and compared using, among other things, the instances stored in the database 6206.
- Figure 15 depicts a common repository 6302 containing a plurality of object versions 6304 characterizing metadata stored in a metadata database 6306, all as generally described above.
- a client 6308 may interact with an object version 6304 either directly or in one of the user environments described above, and may perform any of the design operations described generally above. This may include, for example, dynamic comparison of metadata models, drilldown, editing, testing, or any other appropriate functions.
- the client may also use the common repository 6302 and object versions 6304 to investigate underlying metadata in the metadata database 6306.
- Figure 16A depicts a reconciliation of versioned metadata objects.
- the common repository 6402 and the versioned objects 6404 may be the common repository 6302 and versioned objects 6304 described above. Reconciliation of the versions may be desired at various points in a design cycle, and is typically required for release of an executable model.
- a reconciliation of the versioned objects 6404 into a single instance 6408 may be controlled through a reconciliation process 6406.
- a number of techniques are known, and may be used for automated, semi-automated, and manual reconciliation. In general, any such techniques may be employed with the systems described herein.
- the reconciliation process 6406 may advantageously retain a full version history and reconciliation lineage for the reconciled single instance 6408 to permit modifications to the reconciliation process 6406, to return to any previous unreconciled state, or to investigate source metadata and the lineage of reconciliation. Where direct conflicts in metadata are resolved during reconciliation, such as in a merge, previous attribute values may be recalled for use with alternate reconciliation of branches and various versions.
- Fig. 16B depicts phased reconciliation across reconciliation zones.
- reconciliation zones may be provided.
- some useful properties of a metadata instance are noted.
- Each metadata instance in an enterprise may have an associated reconciliation zone property that defines an association of the instance with a reconciliation zone.
- the reconciliation zone may be selected by a designer of a reconciliation process to reflect, for example, an institutional separation of data such as human resources, accounting, finance, inventory, manufacturing, payroll, engineering, and so on.
- the reconciliation zone may be geographic at any degree of granularity suitable to the data and the enterprise, such as country, region, state, town, building, facility, and so on.
- the reconciliation zone may be historical or architectural so as to separate, for example, legacy systems from new systems, employee desktops from mainframes, consultants from employees, and so on.
- the reconciliation zone may reflect organization of a business into divisions or other sub-groups, such as consumer products, original equipment manufacturing, products, retail operations, e-commerce operations, and so on, or more generally, manufacturing and retail. Similarly, reconciliation zones may be provided for new business units acquired by a company or spun off from a company.
- the reconciliation zone may further define a match type that defines how reconciliation results are propagated in models referencing the instance, such as no match (duplicates are deleted), view match (versions are retained at the view level), and/or extra-view match (versions are retained at the hub level).
- Each instance of an object may also have an identifier that uniquely identifies the object within a reconciliation zone.
- Each item can be described in terms of various contexts or hierarchies, such as to capture the semantic context of the items.
- the item may be an object, class, attribute, data item, data model, metadata model, model, definition, identity, structure, language, mapping, relationship, instance or other item or concept, including another semantic identifier.
- the semantic identifier may identify the item based on the item's attributes, the item's physical location, the relationship of the item with one or more other items, such as in a hierarchy, or the like. In some cases a relationship may be defined as the absence of some particular relationship.
- a relationship may be based on semantics.
- a relationship may involve the position of the item in a relational hierarchy.
- an item may be identified based on its relationship with the other items to which it is related, and may be directly related to another item, indirectly related to another item, and/or indirectly related to another item through one or more other items. Relationships may be concatenated or recursively defined to permit dynamic, in addition to static, identifiers. For example, if a relationship between two items changes, a semantic identifier for another item that incorporates one of the two items would also incorporate the changed relationship between the two items.
- Jim may be identified as Jim, residing at 111 Anyroad, Anytown, Anystate USA, with phone number 555-555-5555 and social security number 012-34-5678.
- Jim may be identified in terms of his relationships with others.
- Jim may be identified as the son of Betty, brother of Larry and Jeff, father of Jessica and nephew of Frank.
- the semantic identifier may be a unique identifier for an item.
- this semantic identifier would be a unique identifier for Jim. It is possible that a unique semantic identifier to an item takes into account fewer than all of the relationships of that item with other items. If there were only one Jim in the world who was the son of Betty, brother of Larry and father of Jessica, the existence of these relationships alone would be enough to create a unique semantic identifier. Jim's relationships with Jeff and Frank would not need to be considered. It may be advantageous to create a semantic identifier that is based on the minimum number of relationships that ensure uniqueness. For example, if the semantic identifier was to be stored in a database 112 or processed by a data integration system 104, a less complex semantic identifier would require less space and would allow for faster processing.
- the number of relationships required to create a unique semantic identifier for an item may vary based on context. For example, a first item, item 1, may be distinguished from a second item, item 2, within a context, context A, by item 1 's relationship with two additional items, item 3 and item 4. That is, in context A, the unique semantic identifier for item 1 may be that it is directly related to items 3 and 4, and indirectly related to any number of other items through items 3 and 4. In a different context, context B, item 1 may be uniquely identified by its relationship to item 3 (but perhaps not item 4), as well as its relationship to another item, item 5 and the absence of a relationship with item 6.
- a semantic identifier for an item such as an item related to a data integration job or a data integration platform, may be provided with a context-dependent identifier for the item.
- a context-dependent identifier may be stored in an atomic format, such as in a data repository.
- Contexts A and B may be two different imports, mappings, run versions, models, metabroker models, instances, tools, views, objects, classes, items, relationships, attributes, or any combination of any of the foregoing.
- a reconciliation or comparison facility may compare the value and/or syntax of the identity of an item in different imports, run versions, models, metabroker models, instances, tools and/or items and determine or assist with the determination of what action to take or refrain from taking based on the comparison.
- a reconciliation engine may compare the model used by import instance A to the model used by metabroker B. Based on this comparison it may be decided that metabroker B can access the data and metadata of import instance A without transformation or modification, and the comparison facility may direct the metabroker B to proceed.
- a tool A may be compared to a tool B, and it may be determined to perform a cross-tool object merge, wherein each tool can access the objects of the other tool.
- the reconciliation facility may trigger a translation facility to assist the cross-tool object merge, such as establishing a bridge, metabroker, hub or the like for translating any objects that require translation, such as translation that is based on the different syntax for the handling of the identity of particular items in each respective tool, or based on other differences between the tools as determined by the comparison.
- a translation facility to assist the cross-tool object merge, such as establishing a bridge, metabroker, hub or the like for translating any objects that require translation, such as translation that is based on the different syntax for the handling of the identity of particular items in each respective tool, or based on other differences between the tools as determined by the comparison.
- a semantic identifier may be stored, maintained, recorded, processed and/or interpreted in a syntax that may be stored, maintained, recorded, processed and/or interpreted in a string structure or format.
- the syntax may be column name::table name::database name.
- This syntax may be related, for example, to a semantic identifier that identifies a column of a table in a database.
- a string composed in this syntax may be age::employee::employee database. This string may be related, for example, to a semantic identifier that identifies the age of an employee in a particular employee database.
- the string corresponding to the semantic identifier for item 1 in context A may be: direct relation to item 3 : :direct relation to item 4.
- the semantic identifier and corresponding string may also incorporate the lack of a direct relationship between item 1 and item 5, such as occurs in context B above.
- a syntax string may be parsed.
- a syntax and/or string may be truncated, modified and/or the elements of a syntax and/or string may be re-ordered.
- a translation engine may perform the truncation, modification and/or re- ordering. It may be useful to truncate a syntax and/or string when all of the relationships included in the syntax and/or string are not required for the uniqueness of the semantic identifier.
- all items were directly related to item 3; for example, item 3 was a database in which all the items were stored.
- the syntax string could be truncated, such as to create a string omitting a relationship involving item 3, and still remain a unique semantic identifier.
- Truncating a syntax and/or string may reduce storage requirements and increase processing efficiency. It may also be useful to change the order of the relationships in a syntax and/or string, for example, to reduce processing time for data integration processes. If the less common relationships are processed first, a system will likely need to access and process fewer relationships associated with an item in order to identify the item. For example, if very few items were related to item 3, even fewer related to item 4 and many items related to item 2, depending on the context, the one syntax string may allow for the identification of item 9 in a shorter time than another syntax string. It could be that only certain elements of a syntax string are needed to uniquely identify an item in one context, while all elements of a syntax string are required in another context.
- a reconciliation engine may perform reconciliations on instances of metadata using the identity of metadata instances, as well as a reconciliation zone that defines rules for reconciliation and any match type specifications.
- the reconciliation operation may employ semantic identifiers to uniquely identify instances within a reconciliation zone, and may translate or otherwise modify the format, language and/or data model of a semantic identifier for a reconciled instance in another reconciliation zone.
- a reconciliation operation may involve a reconciliation or mapping to or from one or more data tools, languages, formats and/or data models to or from at least one other data tool, language, format and/or data model.
- a reconciliation operation may involve a reconciliation or mapping to, from or between known data integration tools, such as WebSphee DataStage 7 from IBM, WebSphere QualityStage from IBM, Business Objects tools, IBM - DB2 Cube Views, UML 1.1, UML 1.3, ERStudio, IBM's WebSphere ProfileStage, PowerDesigner (with added support for Packages and Extended
- a reconciliation engine and/or reconciliation operation may optionally be embodied in a metabroker.
- a reconciliation operation may be performed, executed and/or conducted in batch, real ⁇ time and/or on a continuous basis.
- a reconciliation operation may be provided or made available as a service, for example, as part of a service-oriented architecture.
- mapping of a reconciliation operation can, among other things, trace reconciliation from the execution of the operation backward and forward between an original semantic context and a translated semantic context.
- the appropriate identifier for the data item may vary, such as by varying or truncating a syntax and/or string to enable more efficient storage or faster processing, or by varying the relationships used to form a unique identifier where the semantic context varies.
- a dynamic identifier may combine the benefits of retraceable reconciliation with the benefits of rapid processing, efficient data processing and effective operation in various contexts in which a data item is used.
- Figure 16B depicts reconciliation zones.
- metadata object or item is uniquely identified within its own data constellation, however, a reconciliation process must also manage identity through a reconciliation process that may combine different instances of an object from different sources.
- a number of reconciliation zones 6450-6458 may be defined for metadata from a number of sources.
- the reconciliation zones 6450- 6454 on the left side of Fig. 16B may be source data from various elements of an enterprise, such as departments within a corporation or discrete databases.
- reconciliation zones, rules, match types, and identifiers may be defined for each metadata instance in each of these source reconciliation zones 6450- 6454.
- a reconciliation engine may reconcile data from two reconciliation zones (e.g., zones 6450 and 6452) into a new reconciliation zone 6456 in which each item is uniquely identified, and represents a reconciled version of metadata instances from the source reconciliation zones.
- This new reconciliation zone 6456 may in turn be reconciled with one or more other reconciliation zones (e.g., zone 6454) to provide another reconciliation zone 6458 representing a full reconciliation of metadata instances within the enterprise.
- any reconciliation zone may have one or more reconciliation zones between itself and a source of data to more finely reconcile metadata from one or more sources before introducing it to a particular reconciliation zone.
- the pattern of Fig. 16B may be repeated, altered, and/or expanded in any manner to achieve any arbitrary pattern or flow for reconciliation of metadata instances.
- the first reconciliation zone 6450 may represent metadata for human resources that may include starting salaries for all new hires.
- the second reconciliation zone 6452 may represent payroll data that includes weekly pay information for all employees. These reconciliation zones may be reconciled into a new reconciliation zone 6456 by a user such as someone in a company accounting department to track salary information.
- the metadata within this reconciliation zone 6456 may be analyzed for accuracy and consistency, and may be modified until a satisfactory reconciliation is obtained.
- Another reconciliation zone 6454 may represent metadata for a corporate financial database.
- the financial database may include full financial data for the corporation, including metadata for salary costs of the corporation. This data may be characterized as having high quality, and may be audited or otherwise used in other areas of the corporation.
- the reconciliation rules may be designed with deference to any information about data quality, such as where one data source represents a compilation prepared by an outside contractor known to have low quality assurance standards, while another data source represents data entry from well-trained and supervised employees within the company.
- the metadata from this reconciliation zone 6454 may be reconciled with salary metadata from another reconciliation zone 6454 in another reconciliation zone 6458 that contains metadata representing a fully integrated view of employee salaries within the corporation.
- all of the reconciliation zones 6450-6458 of Fig. 16B may be specific to a corporate division, and may be further integrated with integrated reconciliation zones from other corporate divisions, or from corporate acquisitions. Similarly, data from different corporate departments, geographic locations, subsidiaries, functional business units, and so on, may be progressively integrated using the phased reconciliation described above.
- phased reconciliation provides visibility into sources of data in an integrated model.
- the fully integrated reconciliation zone 6458 of Fig. 16B may be used by analysts or managers as a metadata model for business analytical tools. Prior to forming a business decision based upon the analytical tools, it may be helpful, or even essential, to examine the sources of data and quality thereof. As another example, a business decision may require a particular view of data. The street name of an address may be critical to an in- person marketing campaign, while the zip code may be important for a mailing campaign. Different data sources may carry the relevant information at different levels of detail, and with different levels of accuracy. The reconciliation process may be inspected, and modified as appropriate, to express the best view of the desired metadata in an analytical tool for designing a marketing campaign.
- one data source may define addresses with very fine detail and good accuracy, but be updated only infrequently, for example, bi- annually, or intermittently as information is received.
- Another data source may contain very up-to-date information (such as phone listings) that includes street addresses but no zip codes.
- phased reconciliation provides an ability to propagate reconciliations and modifications upstream from integrated views toward data sources. This may ultimately improve the data structure and quality of metadata and data from original data sources within an enterprise.
- the general approach above may have particular utility in highly heterogeneous data environments.
- a number of discrete groups such as manufacturing, accounting, human resources, and engineering, may each maintain a separate data silo with a broad array of databases specific to that group.
- data integration may be usefully employed to integrate separate database in a manner that permits improved business intelligence. Integration may be vertical within a group, such as by integrating databases into a comprehensive metadata model for the group, or the integration may be horizontal across groups, such as by integrating payroll from each group into a comprehensive payroll metadata model.
- Full, corporate-wide data integration may include alternating steps of integrating within a group and integrating across groups.
- Figure 17 depicts reconciliation of versioned metadata objects.
- the common repository 6502, versioned objects 6504, reconciliation process 6506, and reconciled single object instance 6508 may all be as described with reference to the figures above.
- each object version 6504 and the single instance 6508 refer to metadata stored in a metadata database 6510.
- the metadata in the metadata database 6510 may change, due to either changes independent of the models (e.g., where a company wishes to track a new, additional characteristic of inventory, or under the influence of some data integration job), or changes to the metadata (e.g., a five day moving average of some number is added to the model for business analytic purposes).
- Figure 18 shows an example of the use of concurrency in a metadata process.
- a plurality of metadata instances 6602 are reconciled in a reconciliation process 6604.
- the process may be improved by structuring the reconciliation process 6604 as independent process objects that may be streamed to individual processors 6606 for independent or pipelined execution.
- the independent process objects may be streamed to a single hardware device 6608 that contains the plurality of processors 6606, or may be streamed to different hardware devices 6610, 6608, or may be streamed to any other processors or groups of processors available through a network.
- concurrency and the related concept of parallel processing are well-known in the art, and need not be described in detail here.
- concurrency and parallelism are appropriate where a process can be broken into "chunks" of primarily self-referential clusters of objects, also known as sub-graphs (referring to a directed graph of dependencies for objects), that can be processed independently or in a pipeline.
- a reconciliation process may be readily modeled as a pipeline for concurrent execution.
- the process may include a task for assigning an identity to a stream of objects from a new metadata source, a task for fetching potential conflict candidates from a previous metadata source, a task for reconciling, a task for merging the reconciliation results into an output set of metadata objects, and a task for storing the merged metadata objects.
- Other metadata processes may also be suitable for concurrency, such as a metadata import.
- the following figures describe several methods associated with metadata management. It will be appreciated that these processes may be realized in hardware, software, or some combination of these. The processes may be realized in one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable devices, along with internal and/or external memory for storing program instructions, program data, and program output or other intermediate or final results. It will further be appreciated that these processes may be realized as computer executable code created using a structured programming language such as C, an object oriented programming language such as C++ or Java, or any other high-level or low-level programming language that may be compiled or interpreted to run on any uniform or heterogeneous group of hardware and software platforms including a computer or computers, networks of computers, and combinations thereof. The processes may also employ a wide variety of tools, platforms and architectures to achieve a scalable enterprise metadata management system. While specific examples of software platforms are provided above, other platforms and technologies exist and may be usefully employed with the systems described herein.
- Figure 19 is a diagram of entities involved in a query process from a user interface 6702 to a metadata database 6712.
- the query may begin at a user interface 6702, where a user prepares a query in a native syntax of the user interface.
- the query may be passed to a metadata model 6704, such as a view.
- the query may in turn be translated by a translation engine 6708 or application of mapping information describing the mapping between the first metadata model 6704, such as a view, and a second metadata model 6710, such as a hub.
- the hub 6710 may pass the translated query to the database 6712 using an additional translation or mapping step to convert the hub- based query into a query in a native syntax of the database 6712.
- the results may be passed through the various entities and any appropriate translations to the user interface 6702 that originally issued the query.
- Figure 20 shows the entities involved in a process of extending a metadata database from a metadata model.
- a user may add an attribute or the like to a view 6802 using an appropriate editing interface.
- a translation engine may be updated for translating metadata between the view model 6802 and a hub 6804.
- the hub 6804 of a hub-and-spoke model is generally maintained in a consistent form, the hub 6804 may also be updated, depending on the nature of the change to the view 6802 and the reasons therefore.
- a translation engine may be updated for translation between the hub and the database 6808.
- the data model 6804 and/or translation engine may also add appropriate rows, columns, or tables to the database 6808 using appropriate, database-specific commands, as appropriate to reflect the new view 6802 within the database 6808. If changes are made to the database 6808, these changes may be pushed back through the model chain up to the view 6802.
- Figure 21 shows the entities involved in a process for accessing a repository 6910 from a tool 6902.
- the tool 6902 may be third party tool communicating in terms of a view 6904.
- the tool 6902 may request mapped metadata through the view 6904, which may be translated by a translation engine into a form for the hub 6908.
- the hub may further translate the request through another translation engine for physical access the mapped metadata in the repository 6910.
- the request may reach the repository through a series of query transformations.
- the result, one or more metadata objects may in turn be passed through a number of translation or transformation engines as it moves from the repository 6910 to the hub 6908, to the view 6904, and finally to the requesting tool 6902.
- a method for accessing a repository from an external tool may include transforming a query through one or more models to a repository, and providing one or more objects, such as mapped metadata through one or more object transformations from the repository 6910 to the tool 6902.
- this method presents a query to the repository 6910 in a native syntax for the repository, while presenting the results to an external tool 6902 in a native syntax for the tool 6902.
- Figure 22 shows the entities involved in a process by which a tool accesses versioned and unversioned metadata models.
- the tool 7002 may communicate with a user environment 7004, which may be, for example, an event user environment, team user environment, or work user environment as described above.
- the user environment 7004 may be implemented as a Java space, or any other framework or platform suitable for use with metadata tools.
- the user environment may communicate with either an unversioned model 7008, i.e., the operational classes and attributes in the operational repository, or a versioned model 7010, i.e., the design classes and attributes in the common repository.
- the metadata model visible in the user environment may be edited and written back to the versioned model 7010 as either a replacement to an existing version or as a new version of the metadata model. It will be appreciated that the tool 7002 may be prevented from checking out versioned metadata 7010 in the common repository if that metadata 7010 is already checked out to another tool or user.
- Figure 23 shows the entities involved in a process by which a user interface accesses multiple versions of metadata in a common repository.
- the user interface 7102 may issue a request to a common repository 7104 to access one or more versions 7108 of metadata, and may further query the common repository 7104 concerning other versions of the metadata and the nature and chronology of changes between the various versions.
- Figure 24 shows the entities involved in a reconciliation process for versions of metadata.
- a version 7202 may be reconciled with another version 7204 through a reconciliation process 7212.
- a similar reconciliation may be performed on two or more additional versions 7208, 7210 with additional reconciliation processes 7214, 7218.
- the reconciled versions may be merged into a new version of the metadata reflecting changes from previous versions. This reconciliation may be performed in phases, or all at once, and user control may optionally be exercised over reconciliation of conflicts, order of reconciliation, and so on.
- Figure 25 shows the entities involved in a reconciliation process using concurrency.
- the reconciliation process may be the reconciliation process as described above, except that each discrete reconciliation may be independently passed to a plurality of processors 7304, which may be in a cluster 7302 or physically remote from one another, and executed in a pipelined or parallel fashion, depending on the nature of dependencies between each reconciliation phase.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2007530290A JP2008511928A (en) | 2004-08-31 | 2005-08-31 | Metadata management |
EP05793044A EP1805645A4 (en) | 2004-08-31 | 2005-08-31 | Metadata management |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US60630104P | 2004-08-31 | 2004-08-31 | |
US60/606,301 | 2004-08-31 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2006026636A2 true WO2006026636A2 (en) | 2006-03-09 |
WO2006026636A3 WO2006026636A3 (en) | 2006-06-08 |
Family
ID=36000698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2005/030897 WO2006026636A2 (en) | 2004-08-31 | 2005-08-31 | Metadata management |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1805645A4 (en) |
JP (1) | JP2008511928A (en) |
CN (1) | CN101040280A (en) |
WO (1) | WO2006026636A2 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2917203A1 (en) * | 2007-06-11 | 2008-12-12 | Kleverware Sarl | Information auditing and analyzing method for personal computer, involves dynamically creating relations between elements based on analyzing requests on information, and analyzing transformed information from created and existing relations |
JP2009110245A (en) * | 2007-10-30 | 2009-05-21 | Yamatake Corp | Information cooperation window system and program |
US8156154B2 (en) | 2007-02-05 | 2012-04-10 | Microsoft Corporation | Techniques to manage a taxonomy system for heterogeneous resource domain |
US8407235B2 (en) | 2011-03-09 | 2013-03-26 | Microsoft Corporation | Exposing and using metadata and meta-metadata |
WO2014193490A1 (en) * | 2013-05-30 | 2014-12-04 | Intuit Inc. | A content based payroll compliance system |
US9015118B2 (en) | 2011-07-15 | 2015-04-21 | International Business Machines Corporation | Determining and presenting provenance and lineage for content in a content management system |
EP2901377A2 (en) * | 2012-09-28 | 2015-08-05 | Barclays Bank PLC | A document management system and method |
US9286334B2 (en) | 2011-07-15 | 2016-03-15 | International Business Machines Corporation | Versioning of metadata, including presentation of provenance and lineage for versioned metadata |
US9330402B2 (en) | 2012-11-02 | 2016-05-03 | Intuit Inc. | Method and system for providing a payroll preparation platform with user contribution-based plug-ins |
US20160140665A1 (en) * | 2014-11-14 | 2016-05-19 | Mastercard International Incorporated | Method and system of improving the integrity of location data in records resulting from atm-based single message transactions processed over a payment network |
US9384193B2 (en) | 2011-07-15 | 2016-07-05 | International Business Machines Corporation | Use and enforcement of provenance and lineage constraints |
US9418065B2 (en) | 2012-01-26 | 2016-08-16 | International Business Machines Corporation | Tracking changes related to a collection of documents |
US9430227B2 (en) | 2013-06-13 | 2016-08-30 | Intuit Inc. | Automatic customization of a software application |
US9922351B2 (en) | 2013-08-29 | 2018-03-20 | Intuit Inc. | Location-based adaptation of financial management system |
US9928085B2 (en) | 2012-11-06 | 2018-03-27 | Intuit Inc. | Stack-based adaptive localization and internationalization of applications |
US10620923B2 (en) * | 2016-08-22 | 2020-04-14 | Oracle International Corporation | System and method for dynamic, incremental recommendations within real-time visual simulation |
US10831534B2 (en) | 2018-03-12 | 2020-11-10 | Walmart Apollo, Llc | Mainframe data flow optimization for data integration |
CN112307063A (en) * | 2020-10-16 | 2021-02-02 | 银盛支付服务股份有限公司 | Method and system for checking data quality of each platform by metadata |
CN113687881A (en) * | 2021-08-20 | 2021-11-23 | 广东电网有限责任公司 | Metadata calling method and device, electronic equipment and storage medium |
CN113986305A (en) * | 2021-11-17 | 2022-01-28 | 广州天维信息技术股份有限公司 | B/S model upgrade detection method, device, equipment and storage medium |
US11429651B2 (en) | 2013-03-14 | 2022-08-30 | International Business Machines Corporation | Document provenance scoring based on changes between document versions |
US20220309045A1 (en) * | 2021-03-29 | 2022-09-29 | PlanetScale, Inc. | Database Schema Branching Workflow, with Support for Data, Keyspaces and VSchemas |
WO2024103714A1 (en) * | 2022-11-18 | 2024-05-23 | 华为云计算技术有限公司 | Data processing method and system, apparatus, and related device |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8347264B2 (en) * | 2008-09-30 | 2013-01-01 | Ics Triplex Isagraf Inc. | Method and system for an automation collaborative framework |
US8543527B2 (en) * | 2010-01-08 | 2013-09-24 | Oracle International Corporation | Method and system for implementing definable actions |
US9367371B2 (en) | 2010-02-05 | 2016-06-14 | Paypal, Inc. | Widget framework, real-time service orchestration, and real-time resource aggregation |
US9110892B2 (en) | 2012-03-13 | 2015-08-18 | Microsoft Technology Licensing, Llc | Synchronizing local and remote data |
US20140026041A1 (en) * | 2012-07-17 | 2014-01-23 | Microsoft Corporation | Interacting with a document as an application |
JP6022409B2 (en) * | 2013-06-11 | 2016-11-09 | 日本電信電話株式会社 | Virtual DB system and information processing method for virtual DB system |
CN106164847A (en) * | 2014-03-31 | 2016-11-23 | 柯法克斯公司 | Expansible business process intelligence and predictability analysis for Distributed architecture |
CN105353988A (en) * | 2015-11-13 | 2016-02-24 | 曙光信息产业(北京)有限公司 | Metadata reading and writing method and device |
US10216160B2 (en) * | 2016-04-21 | 2019-02-26 | Honeywell International Inc. | Matching a building automation algorithm to a building automation system |
GB201615747D0 (en) | 2016-09-15 | 2016-11-02 | Gb Gas Holdings Ltd | System for data management in a large scale data repository |
CN107678774A (en) * | 2017-10-09 | 2018-02-09 | 用友网络科技股份有限公司 | Method, system, computer installation and the readable storage medium storing program for executing of response data modification |
CN109254989B (en) * | 2018-08-27 | 2020-11-20 | 望海康信(北京)科技股份公司 | Elastic ETL (extract transform load) architecture design method and device based on metadata drive |
US11733990B2 (en) * | 2019-08-27 | 2023-08-22 | Salesforce, Inc. | Generating software artifacts from a conceptual data model |
CN112966047B (en) * | 2021-03-05 | 2023-01-13 | 上海沄熹科技有限公司 | Method for realizing table copying function based on distributed database |
CN112988752A (en) * | 2021-03-29 | 2021-06-18 | 北京大米科技有限公司 | Resource management method, device, storage medium and electronic equipment |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5347653A (en) * | 1991-06-28 | 1994-09-13 | Digital Equipment Corporation | System for reconstructing prior versions of indexes using records indicating changes between successive versions of the indexes |
US6279011B1 (en) * | 1998-06-19 | 2001-08-21 | Network Appliance, Inc. | Backup and restore for heterogeneous file server environment |
US6292932B1 (en) * | 1999-05-28 | 2001-09-18 | Unisys Corp. | System and method for converting from one modeling language to another |
US6684207B1 (en) * | 2000-08-01 | 2004-01-27 | Oracle International Corp. | System and method for online analytical processing |
US7149734B2 (en) * | 2001-07-06 | 2006-12-12 | Logic Library, Inc. | Managing reusable software assets |
US6874001B2 (en) * | 2001-10-05 | 2005-03-29 | International Business Machines Corporation | Method of maintaining data consistency in a loose transaction model |
WO2005022417A2 (en) * | 2003-08-27 | 2005-03-10 | Ascential Software Corporation | Methods and systems for real time integration services |
-
2005
- 2005-08-31 WO PCT/US2005/030897 patent/WO2006026636A2/en active Application Filing
- 2005-08-31 JP JP2007530290A patent/JP2008511928A/en active Pending
- 2005-08-31 CN CNA2005800288821A patent/CN101040280A/en active Pending
- 2005-08-31 EP EP05793044A patent/EP1805645A4/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of EP1805645A4 * |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8156154B2 (en) | 2007-02-05 | 2012-04-10 | Microsoft Corporation | Techniques to manage a taxonomy system for heterogeneous resource domain |
FR2917203A1 (en) * | 2007-06-11 | 2008-12-12 | Kleverware Sarl | Information auditing and analyzing method for personal computer, involves dynamically creating relations between elements based on analyzing requests on information, and analyzing transformed information from created and existing relations |
JP2009110245A (en) * | 2007-10-30 | 2009-05-21 | Yamatake Corp | Information cooperation window system and program |
US8407235B2 (en) | 2011-03-09 | 2013-03-26 | Microsoft Corporation | Exposing and using metadata and meta-metadata |
US9015118B2 (en) | 2011-07-15 | 2015-04-21 | International Business Machines Corporation | Determining and presenting provenance and lineage for content in a content management system |
US9286334B2 (en) | 2011-07-15 | 2016-03-15 | International Business Machines Corporation | Versioning of metadata, including presentation of provenance and lineage for versioned metadata |
US9384193B2 (en) | 2011-07-15 | 2016-07-05 | International Business Machines Corporation | Use and enforcement of provenance and lineage constraints |
US9418065B2 (en) | 2012-01-26 | 2016-08-16 | International Business Machines Corporation | Tracking changes related to a collection of documents |
EP2901377A2 (en) * | 2012-09-28 | 2015-08-05 | Barclays Bank PLC | A document management system and method |
US9330402B2 (en) | 2012-11-02 | 2016-05-03 | Intuit Inc. | Method and system for providing a payroll preparation platform with user contribution-based plug-ins |
US10755359B1 (en) | 2012-11-06 | 2020-08-25 | Intuit Inc. | Stack-based adaptive localization and internationalization of applications |
US9928085B2 (en) | 2012-11-06 | 2018-03-27 | Intuit Inc. | Stack-based adaptive localization and internationalization of applications |
US11429651B2 (en) | 2013-03-14 | 2022-08-30 | International Business Machines Corporation | Document provenance scoring based on changes between document versions |
WO2014193490A1 (en) * | 2013-05-30 | 2014-12-04 | Intuit Inc. | A content based payroll compliance system |
US9430227B2 (en) | 2013-06-13 | 2016-08-30 | Intuit Inc. | Automatic customization of a software application |
US9922351B2 (en) | 2013-08-29 | 2018-03-20 | Intuit Inc. | Location-based adaptation of financial management system |
US20160140665A1 (en) * | 2014-11-14 | 2016-05-19 | Mastercard International Incorporated | Method and system of improving the integrity of location data in records resulting from atm-based single message transactions processed over a payment network |
US10620924B2 (en) | 2016-08-22 | 2020-04-14 | Oracle International Corporation | System and method for ontology induction through statistical profiling and reference schema matching |
US10620923B2 (en) * | 2016-08-22 | 2020-04-14 | Oracle International Corporation | System and method for dynamic, incremental recommendations within real-time visual simulation |
US10776086B2 (en) | 2016-08-22 | 2020-09-15 | Oracle International Corporation | System and method for metadata-driven external interface generation of application programming interfaces |
US11537369B2 (en) | 2016-08-22 | 2022-12-27 | Oracle International Corporation | System and method for dynamic, incremental recommendations within real-time visual simulation |
US11537371B2 (en) | 2016-08-22 | 2022-12-27 | Oracle International Corporation | System and method for metadata-driven external interface generation of application programming interfaces |
US11137987B2 (en) | 2016-08-22 | 2021-10-05 | Oracle International Corporation | System and method for automated mapping of data types for use with dataflow environments |
US11537370B2 (en) | 2016-08-22 | 2022-12-27 | Oracle International Corporation | System and method for ontology induction through statistical profiling and reference schema matching |
US11526338B2 (en) | 2016-08-22 | 2022-12-13 | Oracle International Corporation | System and method for inferencing of data transformations through pattern decomposition |
US11347482B2 (en) | 2016-08-22 | 2022-05-31 | Oracle International Corporation | System and method for dynamic lineage tracking, reconstruction, and lifecycle management |
US10705812B2 (en) | 2016-08-22 | 2020-07-07 | Oracle International Corporation | System and method for inferencing of data transformations through pattern decomposition |
US10831534B2 (en) | 2018-03-12 | 2020-11-10 | Walmart Apollo, Llc | Mainframe data flow optimization for data integration |
CN112307063A (en) * | 2020-10-16 | 2021-02-02 | 银盛支付服务股份有限公司 | Method and system for checking data quality of each platform by metadata |
US20220309045A1 (en) * | 2021-03-29 | 2022-09-29 | PlanetScale, Inc. | Database Schema Branching Workflow, with Support for Data, Keyspaces and VSchemas |
US11531653B2 (en) * | 2021-03-29 | 2022-12-20 | PlanetScale, Inc. | Database schema branching workflow, with support for data, keyspaces and VSchemas |
CN113687881A (en) * | 2021-08-20 | 2021-11-23 | 广东电网有限责任公司 | Metadata calling method and device, electronic equipment and storage medium |
CN113986305B (en) * | 2021-11-17 | 2022-10-21 | 广州天维信息技术股份有限公司 | B/S model upgrade detection method, device, equipment and storage medium |
CN113986305A (en) * | 2021-11-17 | 2022-01-28 | 广州天维信息技术股份有限公司 | B/S model upgrade detection method, device, equipment and storage medium |
WO2024103714A1 (en) * | 2022-11-18 | 2024-05-23 | 华为云计算技术有限公司 | Data processing method and system, apparatus, and related device |
Also Published As
Publication number | Publication date |
---|---|
EP1805645A2 (en) | 2007-07-11 |
CN101040280A (en) | 2007-09-19 |
WO2006026636A3 (en) | 2006-06-08 |
JP2008511928A (en) | 2008-04-17 |
EP1805645A4 (en) | 2008-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1805645A2 (en) | Metadata management | |
US8060553B2 (en) | Service oriented architecture for a transformation function in a data integration platform | |
US8041760B2 (en) | Service oriented architecture for a loading function in a data integration platform | |
US7814142B2 (en) | User interface service for a services oriented architecture in a data integration platform | |
US7814470B2 (en) | Multiple service bindings for a real time data integration service | |
Aref et al. | Design and implementation of the LogicBlox system | |
Bernstein et al. | Model management 2.0: manipulating richer mappings | |
US20050223109A1 (en) | Data integration through a services oriented architecture | |
US20050228808A1 (en) | Real time data integration services for health care information data integration | |
US20060069717A1 (en) | Security service for a services oriented architecture in a data integration platform | |
US20050232046A1 (en) | Location-based real time data integration services | |
US20050262193A1 (en) | Logging service for a services oriented architecture in a data integration platform | |
US20050262189A1 (en) | Server-side application programming interface for a real time data integration service | |
US20050240592A1 (en) | Real time data integration for supply chain management | |
US20050240354A1 (en) | Service oriented architecture for an extract function in a data integration platform | |
US20050262190A1 (en) | Client side interface for real time data integration jobs | |
US20050234969A1 (en) | Services oriented architecture for handling metadata in a data integration platform | |
US20050235274A1 (en) | Real time data integration for inventory management | |
US20050222931A1 (en) | Real time data integration services for financial information data integration | |
US20060010195A1 (en) | Service oriented architecture for a message broker in a data integration platform | |
US20080228697A1 (en) | View maintenance rules for an update pipeline of an object-relational mapping (ORM) platform | |
EP1810131A2 (en) | Services oriented architecture for data integration services | |
EP1815349A2 (en) | Methods and systems for semantic identification in data systems | |
WO2006026673A2 (en) | Architecture for enterprise data integration systems | |
Haeusler et al. | ChronoSphere: a graph-based EMF model repository for IT landscape models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 200580028882.1 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007530290 Country of ref document: JP |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005793044 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2005793044 Country of ref document: EP |