CN116868182A

CN116868182A - Data processing system capable of manipulating logical data set group

Info

Publication number: CN116868182A
Application number: CN202280012681.6A
Authority: CN
Inventors: A.威斯曼
Original assignee: Ab Initio Technology LLC
Current assignee: Ab Initio Technology LLC
Priority date: 2021-01-31
Filing date: 2022-01-31
Publication date: 2023-10-10

Abstract

A data processing system receives user input specifying a data set to be operated upon through user interfaces that enable manipulation of a hierarchical set of data sets. The user interface may enable a single dataset or a group of previously defined datasets to be aggregated into another group. The grouping may be scope-limited, including by the persona of the user, such that when the user is prompted to designate one or more data sets as targets for operation by the data processing system, the available selections are limited to having data sets that contain the user's scope. These interfaces may prompt the user to select groupings within the hierarchy containing the data sets on which the operation may be performed. After a group having multiple data sets is selected as a target for an operation performed on the data sets individually, the operation may be performed on each data set in the selected group.

Description

Data processing system capable of manipulating logical data set group

Cross Reference to Related Applications

The present application is based on the priority of U.S. patent application Ser. No. 63/163,699 entitled "DATA PROCESSING SYSTEM WITH MANIPULATION OF LOGICAL DATASET GROUPS," filed on even 19 at 2021, 3, and U.S. patent application Ser. No. 63/143,924, entitled "DATA PROCESSING SYSTEM WITH MANIPULATION OF LOGICAL DATASET GROUPS," filed on even 31 at 2021, the entire contents of which are incorporated herein by reference.

Technical Field

Aspects of the present disclosure relate to techniques for efficiently operating a data processing system having a large number of data sets that may be stored in any one of a large number of data storage devices.

Background

Modern data processing systems manage large amounts of data within an enterprise. For example, a large organization may have millions of data sets. This data may support multiple aspects of enterprise operation, so having such a large number of data sets is invaluable to the enterprise. For example, some data sets may support routines such as tracking customer account balances or sending account statements to customers. In other cases, processing data from one or more data sets may yield business insights, such as concluding that: the requested transaction is fraudulent or the enterprise is faced with a particular level of financial risk as a result of an overall transaction in a particular geographic area. In still other cases, processing data from one or more data sets may yield technical insights, such as concluding that an enterprise is at risk of technical failure due to an incorrect technical process.

The data set may be accessed by an application program executed by the data processing system or by a tool invoked by a user of the data processing system. The programmer may develop an application to perform a iterative process such as tracking a customer account balance or sending an account statement to the customer. The programmer may designate the data set as the data input source for the process, or as the destination for the results generated by executing the process. The tool may also perform operations using the data set. For example, the data processing system may include tools that enable a user to process a data set to remove invalid records or to generate an indicator of the data set (such as the number of records or fields that contain invalid values).

To assist the user, a data set searching capability may be provided to assist the user in finding the appropriate data set among the data sets within the enterprise. For example, an application development environment may include a dataset search interface through which an application programmer may specify characteristics of a desired dataset. The programmer may then select an input dataset or an output dataset from the search results. Similar searches may enable a user to identify a dataset as an input or output of a tool.

The search may be based on metadata stored for the dataset. For example, the data processing system may store metadata for the data set that indicates values of one or more parameters characterizing the data set. The metadata may include, for example, fields in the dataset or names or descriptions of the dataset itself. As another example, metadata may indicate an organization within an enterprise that created the data set, a program that generated the data set, a date of creation of the data set. These or other types of metadata may be used to search the dataset.

Disclosure of Invention

According to some aspects, a method is provided for efficient operation of a data processing system in an environment having multiple data sets by forming a set of data sets and presenting the set of data sets for selection in connection with configuring operations to access one or more of the data sets. The method includes receiving input from a first user through one or more first user interfaces, the input selecting one or more of the plurality of data sets for association with a group of the plurality of data set groups; storing a representation of the plurality of data set groups; presenting a second user interface configured for use by a second user in selecting one or more data sets for use in connection with the operation of accessing the one or more data sets, wherein the second user has a persona and the data sets have a scope based at least in part on the persona of the user, wherein presenting the second user interface comprises: automatically identifying one or more data set groups based at least in part on correspondence between personas associated with a second user of the data processing system and ranges associated with the same one or more automatically identified data set groups; and rendering an indication of the one or more automatically identified dataset groups in the second user interface.

According to one aspect, storing a representation of the plurality of groups includes, for each group of the plurality of data set groups, storing information about one or more users authorized to access the group.

According to one aspect, the one or more first user interfaces comprise a dataset search interface comprising a faceted search interface; and facets in the facet search interface are based on values of metadata associated with the plurality of data sets.

According to one aspect, the one or more first user interfaces include a user interface that displays a lineage of the dataset.

According to one aspect, the one or more first user interfaces include a user interface that displays metadata related to data sets in the plurality of data sets.

According to one aspect, the method further includes receiving, from a second user, through the second user interface, an input specifying a group of the one or more automatically identified groups; and performing the operation for each of the plurality of data sets within the selected group based on the input received from the second user.

According to one aspect, the operations include configuring an application for execution by the data processing system.

According to one aspect, automatically identifying one or more data set groups includes selecting one or more data set groups to which a second user of the data processing system has access, the automatically identifying one or more data set groups being based at least in part on a correspondence between personas associated with the second user and ranges associated with the one or more automatically identified data set groups.

According to one aspect, rendering the indication of the one or more automatically identified groups includes rendering a graphical user interface element indicating a set of data sets for each of the one or more automatically identified groups; and the method further includes receiving, via the second user interface, a selection of a graphical user interface element that is rendered indicative of a group of data sets and rendering a plurality of data sets in the group on the second user interface based on the selection.

According to some aspects, a method is provided for enabling efficient operation of a data processing system in an environment having multiple data sets by presenting a set of data sets for selection by a user of the data processing system in configuring operations to access one or more of the data sets. The method includes presenting a user interface configured for use by the user in selecting one or more data sets for use in connection with the operation of accessing the one or more data sets, wherein the user has a persona and the data sets have a scope based at least in part on the persona of the user, wherein presenting the user interface includes: automatically identifying one or more data set groups based at least in part on correspondence between personas associated with the user of the data processing system and ranges associated with the same one or more automatically identified data set groups; and rendering an indication of the one or more automatically identified one or more data set groups in the user interface.

According to one aspect, the method further includes receiving, through the user interface, user input specifying a group of the one or more groups; and rendering an indication of the data set within the selected group based on the received input.

According to one aspect, the method further includes receiving, through the user interface, user input specifying a group of the one or more groups; and performing the operation for each of the plurality of data sets within the selected group based on the received input.

According to one aspect, automatically identifying one or more data set groups further comprises: receiving a search query for a dataset via the user interface; and performing a search based on the search query to generate search results.

According to one aspect, automatically identifying one or more data set groups includes selecting one or more data set groups to which a user of the data processing system has access, the automatically identifying one or more data set groups being based at least in part on a correspondence between personas associated with the user and ranges associated with the one or more automatically identified data set groups.

According to one aspect, rendering the indication of the one or more automatically identified groups includes rendering a graphical user interface element indicating a set of data sets for each of the one or more automatically identified groups; and the method further includes receiving a selection of a graphical user interface element that is rendered indicative of a group of data sets, and rendering a plurality of data sets in the group on the user interface based on the selection.

According to some aspects, a method is provided for enabling efficient operation of a data processing system in an environment having a plurality of data sets by enabling a set of data sets to be selected to perform an operation on each of the plurality of data sets in the set. The method includes receiving a search query via a user interface to search a data set for use in connection with operations related to data access by the data processing system; presenting search results based on the search query in the user interface, wherein presenting the results includes presenting one or more sets of data sets, at least some of the sets of data sets each including one or more of the searched data sets; receiving, via the user interface, a manipulation of a first set of data sets of the one or more sets of data sets presented in the user interface, wherein the user interface is configured to provide an option for selecting the first set of data sets via the user interface as a target of the operation related to data access; and after selecting the first one of the one or more data set groups presented in the user interface, performing the operation on each of the one or more data sets included in the first data set group.

According to one aspect, performing the operation on each of the one or more data sets includes performing a data quality rule on each of the one or more data sets.

According to one aspect, the user interface provides an option for expanding the first set of data sets to enable selection of one or more data sets of the first set of data sets via the user interface as a target of the operation related to data access, and upon selection of the one or more data sets of the first set of data sets, the operation is performed on each of the one or more data sets of the first set of data sets.

According to one aspect, each of the one or more data set groups presented in the user interface has a correspondence between a persona associated with a user entering the search query via the user interface and a scope associated with the one or more data set groups.

According to one aspect, the search results exclude datasets that do not have metadata associated with the persona of the user.

According to some aspects, a method is provided for enabling efficient operation of a data processing system in an environment having multiple data sets by forming a set of data sets. The method includes rendering one or more first user interfaces in which a plurality of data sets are identified; receiving, via the one or more first user interfaces, user input selecting one or more identified data sets for association with a group of the plurality of data set groups; and storing a representation of the plurality of data set groups.

According to one aspect, storing the representations of the plurality of groups includes: for each of the plurality of data set groups, information about one or more users authorized to access the group is stored.

According to one aspect, the method further comprises rendering a second user interface associated with a user configuration of the data processing system to perform operations related to data access, wherein the second user interface comprises a data set selection portion; and rendering the second user interface includes presenting a representation of one or more of the plurality of data set groups in the data set selection portion.

According to one aspect, the method further includes selecting the one or more of the plurality of data set groups for presentation in the second user interface based on the persona of the user.

According to one aspect, the second user interface comprises a user interface in a program development environment; and the operation related to data access includes configuring components in the program being developed to access the data set or set of data sets.

According to one aspect, the one or more first user interfaces include a dataset search interface.

According to one aspect, the dataset search interface comprises a faceted search interface; and facets in the facet search interface are based on values of metadata associated with the plurality of data sets.

According to some aspects, a method for enabling efficient operation of a data processing system in an environment having multiple data sets is provided. The method includes means for rendering one or more first user interfaces in which a dataset is identified; means for receiving user input through the one or more first user interfaces, the user input selecting one or more identified data sets for association with a group of the plurality of data set groups; and means for storing a representation of the plurality of data set groups.

According to one aspect, the method further comprises means for rendering a second user interface associated with a user configuration of the data processing system to perform operations related to data access, wherein the second user interface comprises a data set selection portion; and the means for rendering the second user interface includes presenting a representation of one or more of the plurality of data set groups in the data set selection portion.

According to one aspect, the method further includes means for selecting the one or more of the plurality of data set groups for presentation in the second user interface based on the persona of the user.

According to some aspects, a method for creating a set of data sets in a data processing system operable with a plurality of data sets is provided. The method includes identifying a set of data sets available for execution by the data processing system, the operations relating to data access by the data processing system; presenting the identified set of data sets in a first user interface; receiving, via the first user interface, a user selection of one or more data sets from the presented identified set of data sets; and storing a representation of the group comprising the selected one or more data sets.

According to one aspect, identifying the set of data sets available to perform operations related to data access of the data processing system includes: receiving, via a user interface, a search query specifying one or more values describing facets of the plurality of data sets defined in the data processing system; and performing a search based on the search query to generate search results, the search results including the set of data sets available to perform the operation.

According to one aspect, the search query comprises a faceted search query that includes one or more facets for filtering the search results.

According to one aspect, the one or more facets include facets that indicate whether the data set is registered in a directory table that associates information for accessing the physical data set with the logical data set.

According to one aspect, the user interface for receiving the search query includes a plurality of fields for receiving user input identifying values of the one or more facets; and the plurality of fields includes a field for receiving values of logical metadata, physical metadata, and/or operational metadata associated with the plurality of data sets.

According to one aspect, the operation related to data access includes configuring components of an application executed by the data processing system.

According to one aspect, a command to update the group is received via a second user interface, the command including a request to add one or more data sets to the group or a request to delete one or more data sets from the group.

According to one aspect, metadata about a data set of the identified set of data sets is presented via the first user interface in response to user input requesting metadata related to the data set.

According to one aspect, the group is a second group; and receiving the user selection of one or more data sets includes receiving a selection of a previously defined first data set group such that the second group includes a hierarchical data set group.

According to one aspect, storing the representation of the group includes storing range information for the group.

According to one aspect, the scope information includes an identification of one or more users authorized to access the group.

According to one aspect, the scope information includes an identification of one or more roles authorized to access the group.

According to one aspect, the method further includes rendering a second user interface associated with a user configuration of the data processing system to perform the operation related to data access, wherein the second user interface includes a data set selection portion and rendering the second user interface includes presenting a representation of a group including the selected one or more data sets in the data set selection portion.

The various aspects described above may alternatively or additionally be used with aspects of any of the systems, methods, and/or processes described herein. Further, the data processing system may be configured to operate in accordance with a method having one or more of the foregoing aspects. Such a data processing system may include at least one computer hardware processor, and at least one non-transitory computer-readable medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform such a method. Further, a non-transitory computer-readable medium may include processor-executable instructions that, when executed by at least one computer hardware processor of a data processing system, cause the at least one computer hardware processor to perform a method having one or more of the foregoing aspects. Accordingly, the foregoing is a non-limiting summary of the invention, which is defined by the appended claims.

Drawings

Various aspects will be described with reference to the following figures. It should be understood that the figures are not necessarily drawn to scale. Items appearing in multiple figures are indicated by the same or similar reference numerals throughout the figures in which they appear.

FIG. 1A is a diagram illustrating the creation and use of a dataset group (such as a dataset cart) by different users of an exemplary enterprise IT system in accordance with aspects of the technology described herein;

FIG. 1B is a diagram illustrating a user of an exemplary enterprise IT system performing various actions related to a data set for the purpose of creating and/or managing a data set group in accordance with aspects of the technology described herein;

FIG. 1C is a block diagram of an exemplary enterprise IT system having a data processing system with a data set catalog that maintains information about data set groups in accordance with aspects of the technology described herein;

FIG. 2A is a pictorial representation of a simplified exemplary graphical user interface rendered by a data processing system through which a user may specify components of an executable dataflow graph and interconnections between components;

FIG. 2B is a diagram of the exemplary graphical user interface of FIG. 2A in an operational state in which a user has accessed a data set selection tool to select a data set as a step in configuring components of an executable data flow graph to access the data set;

FIG. 2C is a diagram of the exemplary graphical user interface of FIG. 2A with additional elements of the depicted user interface;

FIG. 2D is a simplified exemplary graphical user interface rendered by a data processing system through which a user may specify components of an executable dataflow graph and interconnections between components;

FIG. 2E is a diagram of an exemplary graphical user interface in an operational state in which a user has accessed a dataset selection tool to select a dataset cart as a step in configuring a component of an executable dataflow graph;

FIG. 3 is a pictorial representation of an exemplary graphical user interface rendered by a data processing system through which a user may select a logical data set, wherein the user has entered an input requesting to view data in a physical data set corresponding to an alternative logical data set;

FIG. 4A is an illustration of an exemplary graphical user interface rendered by a data processing system through which a user may select a dataset, wherein the user has navigated through a catalog of datasets as a first mechanism to limit searches and then text entered as a search query to appear in a description of the datasets as a second mechanism to limit searches;

FIG. 4B is an illustration of the exemplary graphical user interface of FIG. 4A rendered by the data processing system in an operational state after a search query has been executed and a list of data sets matching the search query is presented for the user to select one or more data sets as targets of the operation, through which the user may select a data set;

FIG. 5 is a diagram of an exemplary graphical user interface rendered by a data processing system through which a user may select a dataset in an operational state after executing a search query that limits a list of datasets to datasets that include fields of stored email;

FIG. 6 is a pictorial representation of an exemplary graphical user interface rendered by a data processing system through which a user may view or change information related to a data set;

FIG. 7 is an illustration of an exemplary graphical user interface rendered by a data processing system through which a user may view or change information related to a data set cart;

FIG. 8A is a diagram of an exemplary graphical user interface rendered by a data processing system through which a user may define a data set cart;

FIG. 8B is an illustration of the exemplary graphical user interface of FIG. 8A in a different operational state in which a user may select a dataset to include in a dataset cart;

FIG. 9 is an illustration of an exemplary graphical user interface rendered by a data processing system through which a user may designate a data set for inclusion in a data set cart;

FIG. 10A is a diagram of an exemplary graphical user interface rendered by a data processing system through which a user may search for a data set;

FIG. 10B is a diagram of the exemplary graphical user interface of FIG. 10A in an operational state in which the user has specified additional search criteria to limit the search results to datasets registered with a catalog of datasets;

FIG. 10C is a diagram of the example graphical user interface of FIG. 10A in an operational state in which a user has indicated a dataset for inclusion in a dataset group, the dataset group being indicated herein as a dataset cart;

FIG. 11 is a pictorial representation of an exemplary graphical user interface rendered by a data processing system through which a user may view or change information related to a set of data sets, identified herein as a technology set;

FIG. 12 is a block diagram of an illustrative data structure that maintains information about a set of data sets in accordance with aspects of the data processing system;

FIG. 13 is a flowchart of an exemplary method of operating a data processing system operable with multiple data sets in accordance with aspects of the technology described herein;

FIG. 14 is a flowchart of an exemplary method for operating a data processing system configured to perform operations to access a data set in accordance with aspects of the technology described herein;

FIG. 15 is a flowchart of an exemplary method for operating a data processing system configured to execute a program for accessing a data set in accordance with aspects of the technology described herein; and

FIG. 16 is a block diagram of an illustrative computing system environment that may be used to implement aspects of the technology described herein.

Detailed Description

The inventors have recognized and appreciated that a data processing system may operate more efficiently and may be a more efficient data analysis tool when the data processing system supports manipulation of a set of data sets that may be targeted for operations performed by the data processing system. Instead of or in addition to a single data set, the group may be presented in a user interface through which the user selects one or more data sets as targets for the operation. The user may then manipulate the group, such as by expanding the group to enable any of its constituent elements to be selected as a target of the operation, or in some cases, to cause the operation to be performed on all of the data sets in the group. Since the data sets to be processed via the operations can be selected by the user directly by manipulating the groups presented in the user interface, it is no longer necessary to locate individual data sets and to set the manipulation of individual data sets. In other words, the techniques described herein provide a graphical shortcut for initiating processing of one or even more data sets via user initiated actions without having to loop through the data sets and set up a menu for each individual data set that needs to be processed.

A group of datasets may be scoped such that a particular group will appear only as search results within the scope of the group. By scope limiting the dataset sets, the data processing system may automatically present the relevant dataset sets when a dataset search is performed. In an enterprise that may actually have millions of data sets, the search results may exclude data sets that are not relevant to the user and/or the task being performed by the user. Thus, in addition to providing more relevant search results, searching for the appropriate data set may be faster and consume less processing resources. That is, the dataset groups described herein facilitate performing technical tasks of storing and retrieving data, such as for efficient management of data in a database management system. In other words, the set of data sets facilitates access to data in an efficient manner.

Manipulation of the set of data sets may be advantageous in a data processing system that maintains a rich set of metadata about the data sets. Metadata may be used to search or otherwise specify a data set for use as a target for operations related to data access in a data processing system. While rich metadata sets provide great flexibility in specifying search queries to identify data sets for particular data access operations, such flexibility may result in complex user interfaces, long search times, or substantial use of computer resources, any or all of which may reduce the effectiveness of the data processing system. Searching for a user-wide set of data sets may enable a simpler search interface to return equally relevant or more relevant search results in less time and/or with less computer resources. Metadata may relate to aspects of a data set, such as logical, physical, and/or operational aspects of the data set.

The logical aspect may refer to the importance of data in a data set or fields in a data set to an enterprise or to people within an enterprise. The logical aspects may apply to a data set regardless of the physical storage of the data set. For example, a data set may be defined to hold customer data. The data set may have a pattern of specified fields that hold specific types of data of interest within the enterprise, such as customer names, customer identifiers, emails, physical addresses, and phone numbers. Fields may be designated as being associated with such logical entities, regardless of the underlying physical storage of data representing those entities.

In contrast, physical aspects may relate to the manner in which data in a dataset is stored. For example, the data sets may be stored in specific data storage devices implemented in specific storage hardware and software. For example, the software may organize the stored data sets into a table having a plurality of rows of cells. Data corresponding to a logical entity may be stored in one or more particular cells of each row. For example, data constituting an email address may be stored in three fields, one field identified as a user name, another field identified as a domain name, and another field identified as a TLD. Metadata regarding physical aspects of a data set may relate to aspects of a physical data storage device, such as storage patterns in physical storage, software for organizing data in the data set, and/or hardware that holds data of the data set. Alternatively or additionally, the physical metadata may indicate characteristics of the data, including, for example, the amount or quality of the data. Metadata related to the amount of data may indicate, for example, the total amount of data in the data set, such as the number of records in the data set. Other metadata related to the quantity may indicate the number of records having a particular value in a particular field. Metadata related to data quality may indicate, for example, the number of records for which certain fields are missing or contain invalid values.

The operational aspects may relate to operations performed on the data set. For example, the operation metadata may be recorded for each job executed by the data processing system. The metadata may indicate the data set accessed during the job, as well as other information about the job, such as parameter values entered into the job, the date or time of job execution, or the user requesting execution of the job.

A metadata repository in the data processing system may store other metadata items about the data set. Such metadata may include items defining the attribution of the data set, such as which user defined the schema of the data set or from which system the data in the physical data set was imported. As another example, a textual description of a dataset or field may be recorded.

Regardless of the particular item of metadata that may be maintained in the data processing system, the metadata may be used to group and/or search one or more data sets from a large number of data sets within an enterprise to serve as targets for operation of the data processing system. Metadata regarding the various aspects may be stored by the data processing system in such a way that they may be related to each other. Thus, the search may find a dataset that satisfies a combination of aspects of the metadata. The data processing system may provide a data set selection tool having a user interface through which a user may search for data sets that meet a plurality of criteria regarding metadata of the data sets. The user may then select a dataset from the datasets identified by the search as a target. In embodiments where the dataset sets are limited in scope, the dataset selection tool may limit the search to return only dataset sets that contain datasets within scope and/or return only dataset sets within scope.

For example, a user developing an application in a development environment may select a dataset as input to the application. The data set selection tool may present a user interface that enables a user to select a data set that is subsequently identified by the development environment as a target of an operation within the development environment that connects the application to the identified data set. To make the selection, the user may input a search query specifying a combination of values for some of the logical metadata aspects, the physical metadata aspects, and/or the operational metadata aspects. As a specific example, the search query may specify a dataset that includes emails that have a data quality above a specified threshold amount for the email field and that were used in the jobs in the last week. Facet search interfaces may be used for this purpose, where different aspects of the dataset metadata provide facets for the search. The user may then select from the result set returned by the data processing system as a result of executing the query against the data set metadata store in the system. If the result set includes one or more data set groups, the user may provide input that is used as a command to expand the data set group and display the data set it contains. The data set may then be selected from the expanded set of data sets. The user-selected data set may be returned to the development environment for use as an input data set for the application being developed.

As another example, a data set selection tool may be used to select a data set for which maintenance may be performed. For example, a user may wish to select a data set for which to run data quality rules. In this example, a data set selection tool may be used to identify a data set that is provided as a target for a tool that enforces a set of data quality rules on the data set. The user may search, via a selection tool, for data sets that are often used in the job to meet other logical, physical, and/or operational requirements, and then select one or more of these data sets from the result set for data quality analysis. If the result set includes one or more data set groups, the user may provide input that is used as a command to expand the data set group and display the data set it contains. The data set may then be selected from the expanded set of data sets. In some embodiments, the user may select a set of data sets instead of selecting a single data set. In this case, rather than presenting the contents of the dataset group for the user to select a single dataset, the dataset group may be selected and provided as a target. When a group is provided as a target for a tool that performs an operation on a dataset, the operation may be performed on each dataset within the group.

To facilitate selection, the data set selection tool may enable a user to access additional information about the data sets returned in response to the search query. The additional information may include, for example, some or all of the metadata stored for the data sets included in the search set. Alternatively or additionally, the additional information may include information about the data in the selected dataset. For example, the additional information may include several rows in the selected dataset or a view of the data. For example, the additional information may be presented in response to a user interaction with a user interface element.

In an enterprise having a large number of data sets, supporting group manipulation of the data sets may enhance the data set search functionality. A group, represented in an exemplary embodiment herein as a dataset cart (dataset cart), may be predefined and, like the datasets, may have associated metadata that may define which datasets are members of the group. The associated dataset cart metadata may include logical metadata, physical metadata, and/or operational metadata. Instead of or in addition to returning a single dataset, the dataset search capability may also return a dataset group, such as a dataset cart. The dataset cart may be represented by a visually unique icon so as to look different from the representation of a single dataset. For example, the icon may be displayed as a shopping cart. In this specification, the description of features in the context of a dataset cart is not limited to a dataset cart, but applies to any representation of a dataset group.

The search for the data set may be limited to returning a data set cart, wherein some or all of the data sets in the data set cart meet specified search criteria. Alternatively, the search interface may include an option, for example, as facets of a search, for the user to specify that only the dataset cart be returned in response to the search query, rather than a single dataset.

The data set cart may enable a user to limit the number of data sets considered when selecting the data sets as targets for operation in the data processing system. In an enterprise with millions of data sets, even a strictly specified search criteria may return so many data sets that it is difficult for a user to identify the most appropriate data set or even the appropriate data set, such as for further processing, without significant additional effort. For example, a data set cart may be predefined to accommodate data sets suitable for a particular task, such that limiting the selection of a data set from the cart reduces the time required to select an appropriate data set. Moreover, a greater number of actually relevant search results may be generated for the user.

The dataset cart may be predefined by the same user that is performing the dataset search. The user may then only consider selecting a data set from their own data set cart. Alternatively or additionally, the data set cart may be planned by other users of the data processing system. For example, a user responsible for maintaining data about customers registered in a customer loyalty program (customer loyalty program) may plan a data set cart to include a data set representing the most authoritative information sources about the loyalty program. Other users may then limit the selection of data sets for data analysis involving customer loyalty programs to data sets in the cart. The data processing system may limit the search results for the data set to only the data set in the data set cart or the data set cart accessible to the user requesting the search.

A data processing system supporting a data set cart may provide any of a number of benefits within an enterprise. For example, the data processing system may automatically execute a process flow that results in greater efficiency. FIG. 1A illustrates how different users of an IT system 100 may create and use a data set cart within an enterprise. As shown in fig. 1A, a first user of a data processing system of an IT system 100, such as user 111A and those who learn about, for example, the data set, ITs lineage, and ITs individual advantages and disadvantages, may define or create a data set cart (such as data set carts 1, 2, 3, 4) from a plurality of data sets (such as data sets 1 through N) that is suitable for certain types of data analysis. A second user of the data processing system, such as users 112a, 113a and those knowledgeable about data analysis, may quickly select one or more of these data set carts or data sets from among the data set carts that are relevant to a particular analysis task. Another benefit is that when a dataset is assigned to a dataset cart, human and computer work to search the dataset in a large dataset in the enterprise can be accomplished. Thereafter, both human and computer effort to search data sets for data access related operations can be simplified. As a specific example, a search interface for selecting a dataset for use in an operation related to data access may contain only a subset of the search facets or other options of the search interface for selecting a dataset for inclusion in a dataset cart, as fewer search facets may be required to find a related dataset if the search results are limited to a dataset cart having a related scope.

Fig. 1B illustrates various actions (e.g., actions 115a, 115B, 115c, 115d, 115 e) that a first user (such as user 111 a) may perform for the purpose of defining, creating, and/or managing a dataset cart. For example, user 111a may view or change information about the data set and/or data set group/cart via the interfaces described with respect to fig. 6, 7, and 11. As another example, user 111a may define or create a dataset cart via the interface described with respect to fig. 8A and 8B. As yet another example, user 111a may select or designate a data set for inclusion in one or more data set carts via the interfaces described with respect to fig. 9 and 10C. As another example, user 111a may search for a dataset via the interface described with respect to fig. 10A and 10B.

Performing these or other operations may require user 111a to have expertise with some or all of the Dataset 1, … Dataset N, or may require user 111a to perform a time-consuming search in a large number of such datasets. However, as shown in FIG. 1B, creating a smaller number of data set carts may avoid the burden of these operations on users 112a and 113a and the enterprise IT system. For example, the processing power and network bandwidth required by the user 112a or 113a to make such a selection may be reduced. Moreover, such a reduction in computing resources may be compounded because users such as 112a and 113a may frequently search for related datasets.

The grouping of data sets may be hierarchical. In addition to the data sets, the data set group may also include data set subgroups. The hierarchy may extend to any number of levels, with the subgroups containing further subgroups. In examples where the group is represented as a dataset cart, the dataset cart may include a subset of datasets instead of or in addition to the dataset. The subset may be identified as a dataset cart within the cart, or the dataset cart may identify a top-level grouping, wherein the subset is represented in a different manner.

The data set selection tool may conditionally perform an operation on the set of data sets returned in the search, depending at least in part on the operation the data set selection tool has been invoked. For example, if an operation requires a single dataset as its target, user selection of a group after execution of a search query, whether the group is a dataset cart or a subset, may result in the dataset selection tool expanding the group to enable the user to select a single dataset. Conversely, if the operation is applicable to multiple data sets, the user may be prompted or otherwise provided with a mechanism to target all of the data sets in the group or have the system present multiple data sets in the group from which the user may then select. Such a selection tool may be implemented, for example, by providing separate navigation controls and selection controls. Through the navigation control, the user may traverse the hierarchy of data set groupings. By selecting a control, the user can select a single dataset or a group of datasets as desired. In some cases, the selection control may be contextually relevant. For example, the selection control may be configured to exclude selection of a set of data sets if only a single data set is the appropriate target.

Groups may be limited in scope such that groups returned in response to a search query are limited based on scope. For example, the dataset cart may be range limited based on the persona of the user. For example, a persona may indicate a particular individual or individuals. An individual may be specified based on his identity (e.g., through credential establishment) or may be specified based on membership of one or more groups, such as membership of a particular team of projects within a department or enterprise. Alternatively or additionally, personas may be established based on roles within an enterprise, such as a data analyst, application developer, test engineer, or database programmer. Other criteria may alternatively or additionally be used to identify users authorized to use the dataset cart and may be used to designate personas.

The range dataset cart may limit the amount of data returned to any particular user in response to a search of the dataset by the dataset selection tool. For example, the tool may examine the personal characteristics of the user requesting the search of the dataset and then limit the result set to only the dataset cart and/or dataset that contain the personal characteristics of the user. In this way, fewer and more relevant results may be returned from a search of the dataset.

Such a selection method may be used, for example, by a data analyst creating a data set cart containing data sets related to the project. The data set selection tool may be used to select a target data set for a plurality of operations within the data processing system. In this way, the available data sets follow the entire work process of the data analyst, thereby ensuring a quick and consistent selection of the appropriate data set.

The exact same computer-executable instructions need not be executed to implement the dataset selection tool for each operation that selects one or more datasets as targets. In some embodiments, a generic tool may be implemented to support this operation. However, in other embodiments, the data set selection method may be implemented by different computer-executable instructions that perform the selection functions described above. When different computer-executable instructions are used to support data set selection for different operations performed by a data processing system, each copy of the computer-executable instructions may render a similar interface for consistency or ease of use. However, the same interface used to select the data sets for different operations is not necessary.

Aspects of a data processing system may be implemented to realize any one or more of the foregoing objects and advantages. These objects and advantages may be used alone or in any suitable combination.

Representative data processing system supporting a data set cart

A dataset group, such as the dataset cart described herein, may be used in a data processing system that provides search interfaces through which a user may search for datasets as targets of an operation. These search interfaces may conduct searches that return data set groups/carts instead of or in addition to the data sets. Other interfaces may enable a user to create or modify a dataset group/cart. Such a data processing system may include one or more components that maintain a repository of information about the dataset cart, including their range.

The exemplary data processing system may operate on logical data sets as well as physical data sets. For example, a logical data set may be defined based on a schema that includes elements that are meaningful to the business of an enterprise but are not related to the physical representation of the stored data. The logical data set may correspond to a physical data set.

The co-pending application entitled "Dataset Multiplexer for Data Processing System [ data set multiplexer for data processing system ] titled attorney docket A1041.70066US02 describes a data processing system that enables operations to be specified on logical data sets while ensuring that those operations are applied to the appropriate physical data sets, the entire contents of which are incorporated herein by reference. Updating a data set catalog in response to events affecting storage of data associated with a logical data set is described. The techniques described herein for selecting a data set may be applied in a data processing system described in the co-pending application.

Operations related to data set selection may be applied to logical data sets and/or physical data sets. For example, a logical data set may be selected. Nevertheless, the selection may relate to or be based on the corresponding physical dataset. Such a result may be achieved by the dataset selection tool accessing the dataset catalog to identify the physical dataset corresponding to the logical dataset when searching for the dataset to be selected so that the physical information of the logical dataset may be obtained and used in the dataset selection process.

FIG. 1C is a block diagram of an IT system 100 including an illustrative data processing system 104 and a data set multiplexer 105 integrated with the data processing system 104. For example, the IT system 100 may be an IT system of an enterprise (such as a financial company). Elements of an enterprise IT system, such as networks, cloud storage, and user devices, are not explicitly shown for simplicity.

The data processing system 104 is configured to access (e.g., read data from and/or write data to) the data storage devices 102-1, 102-3, …, and 102-n. Each of the data storage devices 102-1, 102-3, …, and 102-n may store one or more physical data sets. The data storage device may store any suitable type of data in any suitable manner. The data storage device may store data as flat text files, spreadsheets using, for example, a database system (e.g., a relational database system). In addition, these data storage devices may be internal to the enterprise or external to the enterprise. For example, the external data storage device may be located "in the cloud," or otherwise in storage hardware managed by a third party. Thus, the data storage devices may provide a federated environment in which different data storage devices used by an enterprise may be located in different locations and/or managed by different entities either internal or external to the enterprise.

In some cases, the data storage device may store transaction data. For example, the data storage device may store credit card transaction, phone record data, or banking transaction data. It should be appreciated that data processing system 104 may be configured to access any suitable number of any suitable type of data storage devices, as aspects of the techniques described herein are not limited in this respect. The data storage devices from which data processing system 104 may be configured to read data may be referred to as data sources. The data storage device to which data processing system 104 may be configured to write data may be referred to as a data sink. However, the techniques described herein may be applied to data storage devices that hold other types of data used in enterprises.

Each data storage device may be implemented with one or more storage means and may include data management software or other control mechanisms to support storing physical data sets in one or more formats of any suitable type. The storage device(s) may be of any suitable type and may include, for example, one or more servers, one or more disk arrays, one or more disk array clusters, one or more portable storage devices, one or more non-volatile storage devices, one or more volatile storage devices, and/or any other device(s) configured to electronically store data. In embodiments where the data storage device includes multiple storage devices, the storage devices may be co-located in one physical location (e.g., in one building) or distributed across multiple physical locations (e.g., in multiple buildings, in different cities, states, or countries). These storage devices may be configured to communicate with each other using one or more networks of any suitable type, as aspects of the technology described herein are not limited in this respect.

The data management software may organize data in the physical storage and provide a mechanism for accessing the data so that the data may be written to or read from the physical storage. The data management software may be, for example, a database system or a file management system. Depending on the type of data management software, the storage device(s) may use one or more formats to store the physical data sets, such as database tables, spreadsheet files, flat text files, and/or files in any other suitable format (e.g., native format of a mainframe). In some embodiments, the data storage devices 102-1, 102-2, 102-3, …, and 102-n may be of the same type (e.g., may all be relational databases) or of different types (e.g., one may be relational databases and the other may be data storage devices that store data in flat files). When the data storage devices are of different types, the storage environment may be referred to as a heterogeneous or federated data environment 102. The data store may be, for example, an SQL server database, an ORACLE database, a TERADATA database, a flat file, a multi-file data store, a HADOOP distributed database, a DB2 data store, a Microsoft SQL SERVER data store, an INFORMIX data store, a table, a collection of tables, or other sub-portion of a database, and/or any other suitable type of data store, as aspects of the techniques described herein are not limited in this respect.

The data processing system 104 supports a wide variety of applications 106 for performing functions of accessing (e.g., read access and/or write access) physical data sets stored in the data storage devices 102-1, 102-3, …, and 102-n. The application 106 may then perform operations based on the data in the data storage device. Data processing system 104 may support applications 106-1, 106-2, 162-3, …, and 106-n, which may be the same type or different types. In some cases, the application, when executed, may read transaction data from or write transaction data to one or more physical data sets in the data storage device. In other cases, an application, when executed, may read data from or write data to a physical data set stored in a different data storage device and analyze the data to extract business insights from the data set.

Application 106 may be developed as a data flow graph. The dataflow graph may include components (referred to as "nodes" or "vertices") representing data processing operations to be performed on data, and links between the components representing the flow of data. Techniques for performing computations encoded by dataflow graphs are described in U.S. Pat. No. 5,966,072, entitled "Executing Computations Expressed as Graphs [ performing computations denoted graph ]", the entire contents of which are incorporated herein by reference. An environment for developing an application (e.g., a computer program) as a dataflow graph is described in U.S. patent publication No. 2007/0011668, entitled "Managing Parameters for Graph-Based Applications [ managing parameters of graph-based applications ]", the entire contents of which are incorporated herein by reference. The dataflow graph may include a data source and a data sink. The data sources and sinks are represented by end nodes in the stream that are predictive of access to the data storage devices 102-1, 102-3, …, or 102-n.

However, the application itself need not be programmed with the particular data storage device included in the application. The application 106 may be programmed from a logical data set rather than being hard coded to access a single physical data set. A logical dataset may refer to a logical representation of one or more datasets. Data processing system 104 may store definitions of multiple logical data sets as well as other metadata regarding these logical data sets. This information may be managed by the data multiplexer 105. Tools used with data processing system 104 can access metadata about logical data sets and perform functions based on the metadata. For example, the program development environment may provide a user interface through which an available logical data set may be selected and used for application programming.

The logical data set may have a schema that defines the data independent of the format of the corresponding data in the physical data storage device. For example, a logical data set may have patterns defining logical entities in the logical data set. The logical entity may be identifiable and/or understandable by a human user. For example, the logical data set may include logical entities such as customer names. In the physical dataset corresponding to the logical dataset, the customer name may be stored as three fields in a row of the data table, holding data corresponding to the customer's first, middle first and last names, respectively. However, the logical data set may simply comprise the logical entity customer_name, regardless of the data format in the physical storage.

Data processing system 104 may include an interface (not shown) through which patterns of logical data sets may be defined. For example, the interface may be a user interface through which a user may specify a logical data set or otherwise introduce into the system by specifying a schema for the logical data set. Data processing system 104 may store a set of logical entities commonly used in enterprise business. Examples of common logical entities may include one or more of a name, an identification number, a telephone number, an address, nationality, an account balance, a transaction amount, or a date. These business terms may be used to specify, at least in part, the schema of the logical data set. However, instead of or in addition to predefined logical entities, patterns may be defined to include other logical entities.

The implementation of programming an application based on logical data sets eliminates the need for a programmer creating the application to understand the format of the data storage device storing the corresponding physical data sets. Thus, a data analyst may develop an application using a logical data set even though the data analyst does not understand the data format within the data storage device that holds the physical data set.

As a more detailed example, within an enterprise, a programmer may define a logical dataset that stores new customers. The schema of the logical data set may include logical entities such as customer name, customer address, customer identifier, and customer acquisition date. The data analyst may write applications based on the logical data sets and the logical entities regardless of the storage format of the physical data sets corresponding to the logical data sets. Thus, a data analyst may write an application without knowing the physical data set of the stored data that the application is to access.

Upon execution of the application, data in the physical data set corresponding to the logical data set may be stored in one or more of the data storage devices 102-1, 102-3, …, and 102-n. To execute an application, each operation that specifies access to a logical data set may be performed by data processing system 104 reading or writing data from a corresponding physical data set stored in one of data storage devices 102-1, 102-3, …, and 102-n. The data set multiplexer 105 may enable automatic execution of such operations by automatically accessing the corresponding physical data set and converting between the format of the data stored in the physical data storage device and the format specified in the schema of the logical data set.

As shown in fig. 1C, the data processing system 104 includes a data set multiplexer 105 for automating access to the corresponding physical data set and conversion between the format of the logical data set and the format of the physical data set. The data set multiplexer 105 may maintain a directory table of data sets 107, wherein each entry in the directory table corresponds to a logical data set and provides information for accessing one or more physical data sets. For example, the directory table entry may identify a data set in the data storage device 102-1, 102-3, …, or 102-n that corresponds to the logical data set. The directory table entry may alternatively or additionally include information for converting data stored in the physical dataset into a format of the logical dataset. The information may be or may include an executable program. For example, the directory table information may identify a program for converting data in a plurality of fields in a physical dataset into a format of a corresponding logical entity in a logical dataset. Other information may alternatively or additionally be stored or reflected in directory table information for accessing the one or more physical datasets.

The data set multiplexer 105 enables the application 106 to seamlessly access the physical data set(s) based on the programmed logical data set(s) using the information in the data set catalog. The data set multiplexer 105 of the data processing system 104 may enable access to corresponding physical data sets in a data storage device (e.g., the data storage device 102-1) when operations to access (e.g., read and/or write) logical data sets are performed in an application (e.g., the application 106-3). For example, when the directory information stored for the logical data set is or includes an access control program, the program may be executed. Thus, even though the application 106-3 is programmed with logical data sets, when performing data access operations, the physical data sets stored in the data storage device 102-1 are accessed.

The data set multiplexer 105 may access its data set directory table to select an entry associated with the logical data set referenced in the application 106-3. Information identifying the physical data set stored in the appropriate data storage device 102-1 and/or converting the data in the format of the data storage device 102-1 to the format of the logical data set may then be used for data access.

The access may be dynamic. The directory table information may be used when performing operations in an application that require data access. Entries in the data set catalog table associated with the logical data sets may be updated in response to events indicating changes in storage of information associated with the logical data sets. Access to the physical data storage devices via the directory table information may ensure that the application continues to execute, although changes may be made at any point throughout the IT system 100 even if the data analyst or other user writing the application 106-3 is unaware of the changes.

For example, a physical data set may be migrated from data storage device 102-1 to data storage device 102-n. The logical data set for which the application is programmed need not be modified to account for this change. By updating the table of contents entries of the logical data set, the data set multiplexer 105 can automatically utilize the updated table of contents information to provide the application 106-3 with access to the correct physical data set, regardless of the data storage device on which it resides.

Regardless of the manner in which a particular data storage device is accessed as part of an operation related to accessing a data set, a user may provide input specifying which data sets are the target of a particular operation. In a data processing system in an enterprise having a large number of data sets, one or more search interfaces may be provided to enable the specification of the appropriate data set. For example, the dataset selection tool may provide a user interface that provides interface elements configured to receive input specifying dataset search and selection commands.

Information that enables searching for data sets and operating on groups of data sets may be stored within the IT system 100. In this example, this information may be stored within a data set multiplexer 105, which may contain one or more metadata stores. The metadata store may store information about logical data sets and/or physical data sets, where different types of metadata provide facets to searches to be performed on the data sets. The metadata may be collected using manual or automated techniques, including techniques known in the art.

In addition, one or more repositories may store information about the set of data sets. For example, a dataset group repository 120 holding such information is shown in FIG. 1C. The information may be stored in a non-volatile computer readable medium in a manner that associates multiple types of information. For example, the related information may be stored in the same data structure or may be related, for example, by links.

This information may be shared among multiple users of the data processing system. Thus, different users may create, modify, and/or access information about the dataset group. The information may be limited in scope such that information about each dataset group may be disclosed only to users having personas within the dataset group. Alternatively or additionally, a repository storing information about the dataset groups may enforce access restrictions, thereby restricting which users may create, modify, and/or access some or all of the dataset groups.

The restriction of access to information in the repository may be concurrent with a range restriction of access to the set of data sets. The user may be granted access to create or modify a set of data sets having a personal scope of the user. Alternatively or additionally, users in the group may be granted access to the group, who have roles and/or other characteristics as part of their personas within the scope of the dataset group. However, in some embodiments, the rights to create and modify data set groups may be set separately from the scope of use of those data set groups. Different access controls for managing and using the dataset sets may enable capturing expertise of a subset of employees in the enterprise and automatically publishing the expertise through the data processing system. For example, users having expertise regarding the appropriate data sets for use in a particular operation may be given access to create or modify data set groups that are defined for use by specifically listed users, users having a particular role, or users in an intra-enterprise group performing such operations. When other users perform these operations by selecting a dataset from a set of datasets that their persona is within range, the system may automatically limit their selection of datasets to datasets previously specified by users with data expertise.

Regardless of how the access is implemented, the data processing system 104 may provide user interfaces through which to create or modify sets of data sets, conduct searches of return sets of data sets, and/or select data sets from sets of data sets. Examples of such user interfaces are provided in the following sections.

Representative user interface for selecting logical datasets based on groups

The set of data sets may be used to select one or more data sets to perform operations related to data access. For example, in connection with selecting a dataset for performing an operation, a search interface may be presented and the dataset set may be in the search results.

As one example, an application executed by a data processing system may be configured to access a particular data set based on user input. The dataset cart may be used to simplify this selection process. In embodiments where the application is configured as a dataflow graph, the dataset component of the dataflow graph may be configured as a data source that performs the read operation. Configuration may require searching the data set and selecting the appropriate data set. Inclusion of the data set cart in the search results may simplify the search. For example, a dataset within a dataset cart that matches a search query may not be presented alone as search results. Rather, the search results may be limited by presenting the dataset cart.

Fig. 2A illustrates a GUI 800 in a programming environment in which a dataset cart can be used to assist a user in selecting a dataset to configure an application. In this example, a user (such as user 112a or 113a of FIG. 1A) may specify components of an executable dataflow graph and interconnections between components through GUI 800. These components may represent one or more input sources, one or more output sources, and one or more operations performed on data from the inputs to generate outputs. The components representing the input sources and/or the output sources may be configurable by a user. Configuration may require specifying a data set for input or output. The configuration of these components may require user input that first selects a data set cart and then selects a data set within the selected data set cart.

Fig. 2A shows a simple diagram, omitting some information that may be displayed and interface elements associated with the displayed components for simplicity of illustration. In this example, the user has designated component 804 to process the input dataset. Component 804 may represent, for example, an operation to apply data quality rules to a selected input data set.

Component 802 represents a data source that contains an input data set. The component 802 has interface elements that a user can access to configure the component, including first selecting a data set cart and then selecting a data set within the cart to use as an input data source. Component 806 represents an output component that the user may configure to specify an output dataset that, for example, may be created to save data created in the operation represented by component 804.

As shown in fig. 2A, component 802 includes user interface elements through which a user can interact with a selection tool for selecting a dataset. These interface elements may include a field 812, which is shown here as a drop down menu box. In the state shown in FIG. 2A, the user has selected a value within field 812 that indicates that the user wishes to select a data set in the data set catalog. Link 810 is another user interface element through which a user can enter a command to proceed to the next step in the selection process of selecting a dataset from among the options in the catalog of datasets available to the user.

In response to a user selection of link 810, the data processing system may generate and present GUI 890 of FIG. 2B to the user. FIG. 2B illustrates an interface of a selection tool for selecting a data set, in this case invoked as part of a process of selecting a data set to configure component 802 of the dataflow graph of FIG. 2A. Within GUI 890, an available catalog data set is presented consistent with the user's selection of the source type (as described above in connection with FIG. 2A).

GUI 890 presents in section 855 a data set cart containing alternative data sets. If data sets that are not within the data set cart are available for selection, these data sets may also appear in the list 895. List 895 in GUI 890 includes, for example, a dataset cart (e.g., "bestcatever") created through GUI 400 of fig. 8A, as well as other dataset carts.

In this example, the presentation of the search results preserves the hierarchy of the data set. An icon presented next to the element in list 895 indicates whether the element is a dataset cart or a dataset. For example, the element having a "folder" icon 897 depicted next to it may be a dataset cart, while the element having a different icon 898 (shown here as a file icon) may be a dataset. Navigation graphical user interface elements are provided to enable a user to traverse the hierarchy, such as by displaying or hiding the contents of the dataset group represented by the "folder" icon. In the example of fig. 2B, GUI 890 includes navigation graphical user interface element 896. Selection of element 896 causes GUI 890 to switch between rendering and hiding data sets (e.g., logical data sets) contained in the data set cart. In this way, the user can identify and select icons at the appropriate level of the hierarchy.

Although fig. 2B illustrates a hierarchy of only two levels, in some cases a group may contain further groups, and if a data set cart containing further data set carts is expanded, the user may be presented with an interface having an internal group associated with a user interface element that also provides the user with the option of expanding in the internal group. In this way, a multi-level hierarchy may be disclosed. Regardless of the number of levels of the hierarchy presented to the user, the user may navigate through the levels of the hierarchy to visualize alternative data sets and then select a desired data set.

In addition, the user may provide input to obtain additional information about the data set or data set group displayed via the interface. For example, GUI 900 of FIG. 3 depicts an operational state in which a user has manipulated user interface elements to control a dataset selection tool to expand certain dataset carts, including "Loyalty Data" dataset cart 920, to visualize a set of logical datasets contained in the dataset cart. GUI 900 enables a user to obtain additional information about a particular logical data set by selecting logical data set 930 in GUI 900. For example, the pop-up GUI 910 may be presented in response to a user request to view additional information about the logical data set.

GUI 910 provides additional user interface elements that a user can manipulate to obtain additional information about the dataset. Selection of the "Info" tab in the GUI 910 causes basic information about the logical data set to be presented, such as the data storage device associated with the logical data set, the type of data storage device or storage, the path to the physical data set in the data storage device and/or data storage device, links to corresponding entries in the data set catalog, and/or other information. Selecting the "view" tab in GUI 910 causes physical data associated with the logical data set (such as data in the physical data set corresponding to the logical data set) to be presented. Selecting the "record format" tab in GUI 910 causes record format information about the data set (e.g., record format information about the logical data set and/or logical entity of the logical data set) to be presented. Selecting the "Profile" tab in GUI 910 causes profile information (such as relationships with other data set carts and/or logical data sets defined in the system) to be presented. The user may view any or all of this information to assess whether the data set is suitable for the intended use.

Other mechanisms, such as a search interface, may be used to limit the number of data set carts and/or data sets that are presented to the user as candidates for selection. Referring back to fig. 2b, gui 890 may enable a user to enter a search query. GUI 890 may include graphical user elements 892 for a user to enter a search query. In this example, the search query is specified as text. The user may specify words entered in the repository that describe the data set or the names of fields included within the data set and/or other metadata stored for the data set. For example, FIG. 4A depicts search results for the search query "loy". The data processing system may perform a search based on the query and generate search results that include a list of data set carts and/or logical data sets selected by the data processing system based on the query. In this example, the search query matches the titles of the data sets within the two data set carts, and the list of data sets available for selection through the GUI 1000 is limited to the data set carts containing the two matching data sets.

Regardless of how list 895 (FIG. 2A) is specified, the selection tool may present a user interface through which the user may select from the list. In this example, the user interface element for selection is separate from the navigation user interface element. Such a configuration enables entry selection in list 895 to have a context-appropriate hierarchy in the hierarchy of dataset groups. In the case where the operation for which selection is to be made is to operate on a single data set, selection of a user interface element is only operable when the user has indicated selection of a single data set. Where it is appropriate to select a dataset group, the selection user interface element may be operable when the user has instructed the dataset cart. If a group or single dataset is appropriate for operation, selecting a user interface element may be operable when the group or single dataset element is indicated. In the example of FIG. 2A, where a user is selecting a single dataset to configure components of the graph, the selection tool may limit the selection to a hierarchy showing the hierarchy of the individual datasets.

As shown in fig. 2B, the "loyalty" dataset has been designated as a choice. This may be accomplished by selecting GUI element 898 and then GUI element 845, which causes a "loyalty" dataset to be presented in portion 899 of GUI 810. Selecting GUI element 870 causes the data set identified in portion 899 to be returned by the selection tool as a user selection for use in performing the data access operation. For example, the user may specify the data set that appears in portion 899. From there, the user may invoke information about the data set, as described above, and ultimately determine whether the specified data set should be selected. Other user interface elements may enable the user to modify the specified data set before the selection tool returns to the selection, including removing interface elements labeled "clear" for any data set specified in section 899 or ending the selection process without selecting "cancel" interface elements.

In this example, the search interface is significantly simpler than the search interface in FIG. 10A, presenting fewer fields for specifying search criteria. Even with a simpler search interface, the results may be equally or more relevant than the results that the user might find through the interface of FIG. 10A, as the results may be limited to those within a dataset cart having a range that contains other contexts of the user and/or the search.

The value of simplifying the selection process can be seen in connection with fig. 2C, which shows more information and user interface elements that may be present, even for the simple example of fig. 2A. FIG. 2C illustrates a GUI 875 in a programming environment in which data set selection can be performed. In this example, a user (such as user 112a or 113a of FIG. 1A) may specify components of the executable dataflow graph and interconnections between components through GUI 875. For example, a user may specify a component to perform authentication or apply data quality rules to data. The dataflow graph can include a component 882 that indicates that a dataset is to be used. The component may be configured to identify which data set is to be used for the data access operation associated with the component.

Fig. 2C illustrates a scenario in which operation(s) 884 include performing data quality rules on the selected data sources. Component 886 of the dataflow graph may represent the output of the verification operation(s).

As shown in fig. 2C, a data set whose content is to be verified, such as loyalty. These interface elements may include a field 888, which here shows that the user has selected a value indicating that the data source to be selected is limited to the data source registered in the data set catalog 107 (FIG. 1C). Link 889 is another user interface element that the user may invoke to enter further search criteria.

Selecting link 889 may trigger a selection tool to present a user interface (such as GUI 890 described above in connection with fig. 2B) through which a user may select a dataset. In this example, the "loyalty" dataset is depicted as the selected dataset in component 882 of fig. 2C. Although the interface of FIG. 2C has additional complexity relative to the interface of FIG. 2A, the data set may be selected by a data set selection tool, and the process of selecting the data set is simple.

A similar simple procedure may be used to specify multiple data sets on which the same operation is to be performed. For example, a graph applying validation rules as shown in FIG. 2A may be configured to apply these validation rules to multiple data sets. Fig. 2D illustrates GUI 800 in an operational state in which component 802 has been configured to represent multiple data sets. In this example, the configuration is achieved by user input in field 812', which indicates that the catalog dataset cart is selected as the source type.

Regardless of the source type used to configure the components representing the data input or data output, the data selection tool may be used to receive user input selecting a data set or set of data sets. In the event that a data set is selected in the context of a possible execution of an operation on multiple data sets, the data selection tool may allow the selection of an entire data set cart. Selection of the dataset cart may be performed as described above in connection with fig. 2B, but the user interface element 845 may operate when the dataset group is indicated in the list 855. FIG. 2E provides an example user interface for selecting a dataset group.

The selection of a set of data sets as a target for an operation may be used as a command for the data processing system to perform the operation on each data set in the selected data set cart. For example, the operations may include performing data quality rules for each data set included in the data set cart or performing other types of processing on the contents of each data set.

In the example of FIG. 2E, GUI 811 lists an alternative dataset cart in section 850. The list 815 in the GUI 811 includes the dataset cart (e.g., "bestcatever") created through the GUI 400 of fig. 8A, as well as other dataset carts. The user may select from the list. As shown in fig. 2E, the "bestcatever" dataset has been designated as the selection. This may be accomplished by selecting GUI element 820 and then GUI element 840, which causes the "bestcatever" dataset cart to be presented in portion 860 of GUI 811. Selecting GUI element 861 causes the "bestcatever" dataset cart to be selected for performing the data access operation.

Thus, the selection tools described in these examples provide information and user interface elements that enable a user to efficiently make selections from a myriad of selections.

The selection interface may include other user interface elements to identify a data set or set of data sets for selection. For example, the user interface may accept other search criteria as input to enable a user to identify relevant datasets for operations involving access to one or more datasets or a dataset cart. The options presented to the user, whether the data set or the data set cart, may be limited to the options matching the specified search criteria. In the case of a dataset cart, the presented options may be limited to options that include datasets that match the search criteria and/or carts that match the specified criteria. FIG. 4A is an illustration of an exemplary graphical user interface 1000 rendered by a data processing system through which a user may select a dataset, wherein, as a first mechanism to limit a search, the user has navigated through a dataset catalog and then entered as search query text (e.g., "loy") to appear in a description of the dataset as a second limit to the search. The user may then select a dataset cart and/or dataset from the filtered search results for performing the operation.

In this example, even though additional flexibility is provided in specifying search objects, the search interface is significantly simpler than the search interface in FIG. 10A, presenting fewer fields for specifying search criteria. Even with a simpler search interface, the results may be equally or more relevant than the results that the user might find through the interface of FIG. 10A, as the results may be limited to those within a dataset cart having a range that contains other contexts of the user and/or the search.

FIG. 4B is an illustration of the exemplary graphical user interface of FIG. 4A rendered by the data processing system in an operational state after a search query has been executed and a list of data sets matching the search query (e.g., search query "loy") is presented for a user to select one or more data sets as targets for the operation, through which the user can select the data sets. Search results may be limited to a data set based on the scope of the cart and the user performing the search.

FIG. 5 is an illustration of an exemplary graphical user interface 1100 rendered by a data processing system through which a user may select a dataset after performing a search query that limits a list of datasets to datasets that include fields of stored email. Search results may be limited to a data set based on the scope of the cart and the user performing the search. For example, the search results may be limited to datasets in dataset carts, with users performing searches for those dataset carts being within range of those carts.

Various forms of user input may be used to determine the identity of a user who uses the data processing system to create a data set cart, perform a search, and/or use or select a data set/cart as a target of an operation. For example, user input, such as text input (e.g., user identifier and/or password) using a keyboard, stylus, or other writing instrument, voice input using a microphone or other device, biometric input (e.g., fingerprint, facial pattern, voice pattern, etc.), and/or other forms of input may be used to determine the identity of the user. The identity information may be used to indicate a persona for the user.

Representative user interface for grouping logical data sets

The data processing system may provide one or more mechanisms by which a user may manage a group of data sets, such as by creating, modifying, or deleting groups. The mechanism may be a dedicated tool contained within the data processing system or may be provided through additional user interface options associated with tools or other interfaces through which a user may access data set information that is otherwise present in the data processing system. For example, an interface through which a user can search for a dataset that meets specified criteria may include user interface elements through which the user can provide input that associates the dataset included in the search results with a dataset group. Likewise, other interfaces (such as interfaces through which lineage information is presented) can be augmented with user interface elements through which a user can manage a dataset group. These user interface elements may be linked to computer executable code that accesses and/or modifies stored information about the dataset group.

Fig. 6 illustrates a Graphical User Interface (GUI) 200 generated in response to a request to view information about a dataset and/or a dataset group, in this example depicted as a dataset cart. For example, the interface may be a result that the user provides as input to a dataset search query and then selects a particular dataset from the results. GUI 200 presents information about dataset 202. As shown in fig. 6, information about the dataset "loyalty. The information about the data set 202 may include information about the type of data set (e.g., file, directory, table, etc.), the directory to which the data set belongs, the data set hierarchy to which the data set belongs, and/or other information. For example, GUI 200 depicts dataset 202 as a file belonging to a directory "main" and belonging to at least three dataset hierarchies, such as "loyalty program", "detail" and "main". The hierarchy may be defined or specified by a user of the data processing system 104.

The interface may also include interface elements through which the dataset groups may be managed. In this example, GUI 200 also includes a list of dataset carts 204 that contain datasets 202. For example, user interface 200 depicts dataset carts "Loyalty Data" and "Admin Data" contain datasets 202. A request to view information about the dataset cart may result in the generation of another GUI. For example, selection of the graphical user element 206 representing a "Loyalty Data" dataset cart may result in generation of the GUI 300.

FIG. 7 illustrates an example GUI 300 generated in response to a request to view and/or change information about a dataset cart 302. However, it should be understood that the data processing system may provide alternative or additional mechanisms by which a user may invoke an interface for managing a data set cart as shown. In this example, GUI 300 presents information 340 about a "Loyalty Data" dataset cart. The information about the data set cart 302 includes the name of the data set cart, information describing the data set cart, the owner of the data set cart (e.g., the user who created the data set cart), the user who was granted permission to modify the data set cart (e.g., permission to edit or delete the data set cart), the contents of the data set cart (e.g., information about the data sets included in the data set cart), other data set carts associated with (e.g., having a relationship with) the data set cart, logical data sets or logical entities, and/or other information. Information about a user granted permission to view the dataset cart (whether when viewing information in the repository 120 or when the dataset cart appears in the results of the user conducting a search) may be entered by user selection of the user interface element 304. For example, GUI 300 depicts a dataset cart 302 including a logical dataset "loyalty. Dat"202 and information 206 about a physical dataset corresponding to the logical dataset. As shown in GUI 300, data set cart 302 may include information about physical data sets corresponding to other logical data sets included in the data set cart. For example, the data set cart 302 contains logical data sets 310, 312 and information 314, 316 about the physical data sets corresponding to these logical data sets.

GUI 300 includes an interface element configured to receive input to change the dataset cart. For example, interface element 330, when selected by a user, may present an additional screen through which the user may specify the user in the form of a personal list by role, group membership, or other characteristics of the user's persona, which may read, edit, delete, etc. the dataset cart. The current owner may be assigned to the data set cart. The current owner has full access to all aspects of the dataset cart. The current owner may initially be the user who created the dataset cart. The current owner of the dataset cart may thereafter delegate ownership to another user by selecting the graphical user element 355 and indicating the user or role to which ownership is to be delegated.

In some embodiments, the scope of the data set cart may be commensurate with a user authorized to read and/or edit the data set cart. In other embodiments, the scope of the dataset cart may be specified separately, which specifies the users that the dataset cart may appear in the results of the search performed on the dataset. A separate mechanism may be provided in an interface (such as GUI 300) to set the scope of the data set cart. For example, user interface element 304, when selected by a user having access to edit the dataset, may render another display screen in which the user may enter a scope, such as identifying a particular user, group, role, or the like.

Additionally or alternatively, other parameters may be used to define the range of the dataset cart. For example, time parameters (e.g., time of day, day of week, month of year) may be used to define the range. In such cases, the data processing system may implement the range of time parameters by limiting the selection of data sets and/or data set carts for presentation to the user searching for the data sets to only data sets or data set carts that have been authorized for use at the start of the search.

The dataset cart 302 may be updated via the GUI 300. For example, selecting the graphical user element 320 may enable a user with editing permissions to add or delete a data set from the data set cart 302.

In some cases, a user (such as user 111A of fig. 1A) may request to view and/or change information about a data set or data set cart via interfaces 200, 300 in order to define or create the data set cart.

Fig. 8A illustrates GUI 400 in a state in which a new dataset cart can be created. For example, a user (such as user 111A of FIG. 1A) may specify that a dataset cart is to be created when viewing a user interface in which information about the dataset is displayed. In this example, the user may be looking at information about the "loyalty. Dat" dataset and then want to create a new dataset cart containing the "loyalty. Dat" dataset. The user may select the graphical user element 402 to create a new dataset cart. Selecting the graphical user element 402 may cause the system to generate a pop-up dialog 404 in which the user may name the cart (e.g., "BestCartEver"), indicate the type of entity being created (e.g., the dataset cart), and provide a description of the dataset cart.

Selecting the graphical user element 406 may cause the system to generate a new dataset cart containing a "loyalty. The system may store a representation of the newly created dataset cart. For example, an entry may be added to the repository 120 (fig. 1C) to represent a dataset cart. In some cases, some or all of the characteristics of the dataset cart may initially be assigned default values. For example, a dataset cart may initially be assigned a scope based on the persona of the user who created it. This may be achieved, for example, by initially setting a range that allows the dataset cart to be visible only to its creator. Regardless of how the initial values of the characteristics are initially assigned, one or more users may then make changes thereto. Once a record of the dataset cart is created, it can be edited, such as through a user interface as shown in fig. 6 or fig. 7.

Alternatively, the dataset cart, once created, may be updated in other ways as well. For example, a user may wish to add a data set to an existing data set cart rather than creating a new data set cart to save the data set. FIG. 8B illustrates a portion 450 of the GUI 400 in which a user (such as user 111A of FIG. 1A) may choose to add a "loyalty. Dat" dataset to an existing dataset cart. For example, drop down menu 455 is a user interface element that, when selected by a user, presents a list of existing dataset carts defined in the data processing system. In embodiments where the data set cart has a range, the list may be limited to data set carts having a range that includes the user at the time. Selecting a particular dataset cart from the list may cause the dataset to be added to the selected dataset cart. The system may update the stored representation of the selected data set cart accordingly.

FIG. 12 illustrates an example data structure that stores a stored representation of a data set cart (i.e., stores information about the data set cart). For each dataset cart, various information may be stored. For example, the repository 120 (fig. 1C) may have such a data structure for each data set cart. As shown in fig. 12, the data structure 1202 of the data set cart may include a plurality of fields containing information, such as: the name field 1222 of the data set cart, an identifier of a data set list 1224 contained in the data set cart, and one or more parameters 1226 associated with the data set cart. Here, the parameters 1226 indicate other information that may be stored, such as text describing the dataset cart, values of one or more tags, or other types of information as described herein or otherwise used in connection with the dataset cart. In embodiments where groupings of data sets may be hierarchical, list 1224 may contain further sets of data sets in lieu of or in addition to other data sets.

The access information 1240 may also be stored along with information about the dataset cart. The access information may indicate a user that has access to stored information about the dataset cart. The information may include an owner 1228 of the data set cart, a list of users 1230 authorized to read information about the data set cart, or a list of users 1232 authorized to modify information about the data set cart. Some or all of this authorization information may be processed by other components of the data processing system to determine the extent of the data set cart. Alternatively or additionally, other information may be included to determine the range. For example, list 1234 may define groups within the data set cart. The list 1236 may define the roles of users authorized to access the dataset cart.

The data processing system may provide a plurality of user interfaces in which the data set and/or the data set group are indicated. Each of these interfaces may be configured to enable a user to manage a set of data sets, such as by creating a new set of data sets or adding a data set to a set of data sets. User manipulation of these interfaces may change the set of data set groups available in the data processing system, which may be implemented, for example, by adding, deleting, or changing data structures (e.g., 1202).

Fig. 9 illustrates a GUI 500 by which a user (such as user 111A of fig. 1A) may designate a dataset for inclusion in a dataset cart. GUI 500 displays lineage information associated with a dataset. The data processing system may present such information for any of a number of reasons that are not required in connection with the management of the data set groups. For example, the display technology pedigree may enable a user to explore possible sources of errors identified in data in a dataset. Displaying business pedigrees may enable users to identify groups within an enterprise that may be affected by changes in the data set. Regardless of why the lineage information is displayed, a user viewing such information can identify a need to manage one or more dataset groups, such that integrating user interface elements implementing dataset group management with the lineage user interface can facilitate efficient operation.

For example, GUI 500 is shown displaying lineage information 502 for a "loyalty. One or more components representing the data sets in the displayed lineage information can be selected and manipulated to specify that the data sets represented by the components be included in a data set cart. In this example, the selection component 510 can cause a display window 512 through which a user can select a graphical user interface element 514 that, when invoked, adds the dataset "loyalty_filtered" to an existing dataset cart (as shown in fig. 8B) or a newly created dataset cart (as shown in fig. 8A).

The data sets for inclusion in the data set cart may be selected by a user (such as user 111A of fig. 1A) via a search GUI (such as GUI 600 shown in fig. 10a … C). The data processing system may include a dataset search interface that includes rich combinations of search criteria. The user interface may be presented in response to a request to create a new dataset cart, or after identifying one or more datasets through such a search, the user may specify that certain datasets returned in the search results be used to manage the dataset cart.

Through the search interface, the system may identify a data set that is available to perform operations related to data access by the data processing system 104. In some implementations, the search GUI 600 can include graphical user interface elements 602, 604, 606, 608 for a user to input a search query. For example, the user interface element 602 may be a text field in which the search results are limited to a dataset having names, fields, and/or other associated metadata that includes entered text.

The user may input other inputs through other user interface elements to define the faceted query. In such a query, a user may specify one or more values describing facets of a data set defined in the data processing system. A user interface element may be provided for each facet through which a user may indicate values stored in metadata associated with a data set defined in the data processing system. The range of values may be limited to the values of the data sets that satisfy the criteria already specified in the search interface. User interface elements 604, 606, and 608 are examples of user interface elements through which a user may specify values for facets. For example, one or more facets may correspond to attributes of the data set, such as type, owner, hierarchy, whether the data set is registered in a directory table associating information for accessing the physical data set with the logical data set, and/or other attributes.

Alternatively or additionally, the search query may be defined by entering other information through such a user interface.

The data processing system may perform a search based on the query and generate search results that include a list 610 of data sets selected by the data processing system based on the query. The faceted query may include one or more facets based on which search results may be filtered. In the illustrated example, the dataset list 610 presented in GUI 600 includes all datasets including "loyalty" in a name, field name, or dataset description. Additional facets have been designated to further filter the search results. Selection of a facet may cause search results to be filtered based on the facet.

For example, if facet 606 is selected that indicates whether the data set is registered in a directory table that associates information for accessing the physical data set with the logical data set, the search results are filtered such that only the data set registered in the directory table is presented to the user in the GUI, as shown in the example of FIG. 10B. As shown in FIG. 10B, GUI 600 presents an updated list of data sets 615 that does not include items such as items 620, 625 from list 610.

The user may then select one or more of the presented data sets to include in the data set cart. A dataset cart may be created based on the selected dataset. For example, as shown in fig. 10C, the user may select a dataset "loyalty. Dat" from the dataset list 615 to include in the dataset cart. In this example, the input indicating that is included in the dataset cart is made in multiple steps. For example, the data set names in list 615 may form user interface element 630. Selecting user interface element 630 may open window 632 with information about the dataset associated with element 630. Window 632 may include a further "add to cart" user interface element, selection of which may open window 634 containing the further user interface element. The user interface elements in window 634 may enable a user to specify an existing dataset cart, similar to the selection described in connection with fig. 8B, or to create a new dataset cart to which the selected dataset is added, similar to the process described in connection with fig. 8A.

In the case where the data set is a logical data set, the data processing system may identify a physical data set corresponding to the logical data set and include information about the physical data set in the data set cart.

The created dataset cart may be used in a program. In some cases, the program may be an application program executed by a data processing system. In other cases, the program may be a utility of a data processing system, such as a data analysis utility configured to perform data quality analysis.

FIG. 11 is an illustration of an exemplary graphical user interface 700 rendered by a data processing system through which a user (such as user 111A of FIG. 1A) may view or change information related to a dataset group, identified herein as a technology group. In systems where the packets of the data set are hierarchical, the top-level packets may be identified by a different name than that used in the lower-level packets. For example, the top-level groupings may be referred to as dataset carts. The lower level packets may have different names, such as technical groups. For example, FIG. 11 shows that "forwards" directory 702 is a member of a technology group 704 named "sending_ends", "tpc_customers", "tpc_date_dim" and "web_files". Some or all of the operations described herein for managing a dataset cart may be performed to manage technology groupings. By this dichotomy, a technical group can be included in the dataset cart, but not vice versa. However, such limitations are not required of the layered system.

Representative method of operation of a data processing system supporting a logical dataset group

FIG. 13 is a flowchart of an illustrative process 1300 for operating a data processing system that is operable with multiple data sets. Process 1300 may be performed by data processing system 104 described with reference to fig. 1C. Alternatively or additionally, process 1300 may include other acts including acts as described elsewhere herein in connection with other embodiments.

At act 1302, process 1300 may identify a data set that is available to perform an operation related to data access of data processing system 104. For example, a data set may be identified by performing a search based on a search query specified via GUI 600 as shown in FIG. 10A.

Process 1300 may proceed to act 1304, during which the identified dataset may be presented in a user interface (such as GUI 600 of fig. 10B). For example, FIG. 10B depicts some of the search results generated in response to execution of a search query that includes a keyword "loyalty" and facets indicating whether the dataset is registered in a directory table that associates information for accessing the physical dataset with the logical dataset.

Process 1300 may proceed to act 1306 during which a selection of one or more data sets from the identified data sets may be received. The user may select one or more of the identified data sets to include in a group (such as a data set cart). For example, as shown in fig. 10C, the user may select a dataset "loyalty. Dat" from the identified dataset to include in the dataset cart. The data set may be selected for inclusion in a new data set cart or an existing data set cart.

Process 1300 may proceed to act 1308, during which a representation of a group including the selected one or more data sets may be generated and stored. Such a representation is depicted in fig. 12 and includes various information such as the name of the group, information about the datasets contained in the group, parameters associated with the datasets in the group, the owner of the group, and/or scope information associated with the group.

Process 1300 may proceed to act 1310 during which a determination may be made as to whether to perform further identification of the dataset. For example, the user may specify additional or different facets for the search query. In response, for example, a different set of data may be identified at act 1302. A data set may be selected from the set of different data sets to generate a new representation of the set or to update an existing representation of the set.

FIG. 14 is a flowchart of an illustrative process 1400 for operating a data processing system configured to perform operations to access a data set. Process 1400 may be performed by data processing system 104 described with reference to fig. 1C. Alternatively or additionally, process 1400 may include other actions, including actions as described elsewhere herein in connection with other embodiments.

At act 1402, process 1400 may present a user interface configured for selection by a user of one or more data sets or data set carts to use in connection with operations related to data access of a data processing system. Examples of such user interfaces are shown in fig. 2B and 2E.

Process 1400 may proceed to act 1404, during which personas associated with a user of the data processing system (e.g., a user requesting a search of a dataset) may be identified and scope information associated with the dataset and/or a dataset group (e.g., a dataset cart) may be identified. The scope information associated with the data sets and/or groups of data sets may be defined based on personas and/or other parameters of users of the data processing system.

The process may proceed to act 1406 during which one or more data set groups may be automatically identified based at least in part on a correspondence between the user's persona and scope information associated with the automatically identified data set groups. For example, fig. 2B and 2E depict lists 815, 895 of datasets and/or dataset carts that may be generated by examining personal characteristics (e.g., permissions) of a user requesting a search of datasets, and the result set may be limited to datasets carts and/or datasets having a range that includes the personal characteristics of the user.

The process may proceed to act 1408, during which an indication of the automatically identified dataset group may be rendered via the user interface. For example, when the user selects a particular dataset cart in fig. 8E, an indication of the selected dataset cart may be rendered in the second portion 860 of the user interface.

FIG. 15 is a flowchart of an illustrative process 1500 for operating a data processing system configured to execute a program for accessing a data set. Process 1500 may be performed by data processing system 104 described with reference to fig. 1C. Alternatively or additionally, process 1500 may include other actions, including actions as described elsewhere herein in connection with other embodiments.

At act 1502, process 1500 can receive a search query via a user interface to search a data set for use in connection with operations related to data access by a data processing system. An example of such a user interface is shown in fig. 4A.

Process 1500 may proceed to act 1504, during which a search may be performed based on the search query to generate search results. The search results may be presented in a user interface and include one or more data set carts. At least some of the data set carts may each include one or more searched data sets. The data set and/or data set cart presented in the user interface may be identified by examining personal characteristics (e.g., permissions) of the user requesting the search of the data set, and the result set may be limited to only data set carts and/or data sets having a range that includes the personal characteristics of the user.

The process 1500 may proceed to act 1506, during which operations may be performed on each data set included in the data set cart when the data set cart is selected in the user interface. The user interface may provide options for selecting the dataset cart as a target for the operation.

Additional embodiment details

FIG. 16 illustrates an example of a suitable computing system environment 1600 on which the techniques described herein may be implemented. The computing system environment 1600 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the technology described herein. Neither should the computing environment 900 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 900.

The technology described herein is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the techniques described herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The computing environment may execute computer-executable instructions, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The technology described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

With reference to fig. 16, an exemplary system for implementing the techniques described herein includes a general purpose computing device in the form of a computer 1610. Components of computer 1610 may include, but are not limited to, a processing unit 1620, a system memory 1630, and a system bus 1621 that couples various system components including the system memory to the processing unit 1620. The system bus 1621 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro Channel Architecture (MCA) bus, enhanced ISA (EISA) bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.

Computer 1610 typically includes a variety of computer-readable media. Computer readable media can be any available media that can be accessed by computer 1610 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 1610. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.

The system memory 1630 includes computer storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM) 1631 and Random Access Memory (RAM) 1632. A basic input/output system 1633 (BIOS), containing the basic routines that help to transfer information between elements within computer 1610, such as during start-up, is typically stored in ROM 1631. RAM 1632 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 1620. By way of example, and not limitation, FIG. 16 illustrates operating system 1634, application programs 1635, other program modules 1636, and program data 1637.

Computer 1610 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 16 illustrates a hard disk drive 1641 that reads from or writes to non-removable, nonvolatile magnetic media, a flash memory drive 1651 that reads from or writes to a removable, nonvolatile memory 1652 (such as flash memory), and an optical disk drive 1655 that reads from or writes to a removable, nonvolatile optical disk 1656 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 1641 is typically connected to the system bus 1621 through a non-removable memory interface such as interface 1640, and magnetic disk drive 1651 and optical disk drive 1655 are typically connected to the system bus 1621 by a removable memory interface, such as interface 1650.

The drives and their associated computer storage media discussed above and illustrated in fig. 16, provide storage of computer readable instructions, data structures, program modules and other data for the computer 1610. In FIG. 16, for example, hard disk drive 1641 is illustrated as storing operating system 1644, application programs 1645, other program modules 1646, and program data 1647. Note that these components can either be the same as or different from operating system 1634, application programs 1635, other program modules 1636, and program data 1637. Operating system 1644, application programs 1645, other program modules 1646, and program data 1647 are given different numbers here to illustrate that, at a minimum, they are different copies. An actor may enter commands and information into computer 1610 through input devices (commonly referred to as a mouse, trackball or touch pad) such as a keyboard 1662 and pointing device 1661. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 1620 through a user input interface 1660 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a Universal Serial Bus (USB). A monitor 1691 or other type of display device is also connected to the system bus 1621 via an interface, such as a video interface 1690. In addition to the monitor, computers may also include other peripheral output devices such as speakers 1697 and printer 1696, which may be connected through an output peripheral interface 1695.

Computer 1610 can operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 1680. The remote computer 1680 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 1610, although only a memory storage device 1681 has been illustrated in FIG. 16. The logical connections depicted in FIG. 16 include a Local Area Network (LAN) 1671 and a Wide Area Network (WAN) 1673, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

When used in a LAN networking environment, the computer 1610 is connected to the LAN 1671 through a network interface or adapter 1670. When used in a WAN networking environment, the computer 1610 typically includes a modem 1672 or other means for establishing communications over the WAN 1673, such as the Internet. The modem 1672, which may be internal or external, may be connected to the system bus 1621 via the actor input interface 1660, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 1610, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, fig. 16 illustrates remote application programs 1685 as residing on memory device 1681. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.

The techniques described herein may be implemented in any of a variety of ways as these techniques are not limited to any particular implementation. The examples of implementation details provided herein are for illustration purposes only. Furthermore, the techniques disclosed herein may be used alone or in any suitable combination, as aspects of the techniques described herein are not limited to use with any particular technique or combination of techniques.

Thus, having described several aspects of the technology described herein, it is to be appreciated that various alterations, modifications, and improvements are possible.

For example, examples are provided in which a dataset group contains a plurality of datasets. The data processing systems described herein may be implemented to support groups with a single data set in some cases and/or to support empty groups without a data set in other cases.

As another example, an example is provided in which a set of data sets is included in a result set from which a user may select. The user may select a dataset group, and then the contents of the dataset group may be presented to the user for further selection. The case where the user selects the data set included in the data set group is described. In some cases, the dataset group may contain other dataset groups. The selection of a set of data sets contained within a set may repeat the process in which the contents of the selected set of data sets are presented to the user for selection from the contents of the set of data sets. Such a recursive process may be repeated recursively for any number of levels.

Further, examples are provided in which the data set selection tool receives user input to specify only a single data set by stepping through one or more screens of the user interface until the user reaches a screen presenting the desired data set. In a variation of the data processing system described herein, a user may navigate through a user interface screen and select a plurality of data sets, wherein a selection tool is used in the operation of specifying the plurality of data sets.

Further, the dataset cart is described as having a range of user-based personas. Other characteristics that may be evaluated in use may be used to define the scope. For example, time may be used for the range. For example, defining the scope of a set of data sets based on the day of the week may be such that access to data sets updated on a particular date in the week may return the data set on the latest date when searched.

As yet another example, the scope is described as limiting the number of sets of data sets returned in response to a search query and increasing its relevance. In some embodiments, the scope may be appended to the data set separately, such that the data set returned in response to the search query is limited based on the scope at the time of the search.

As yet another example, a dataset group is described as having a range. The scope may be implemented by storing and accessing scope information associated with the set of data sets. In a data processing system, components not necessarily limited to a set of data sets may be assigned scope. For example, some tools are limited in scope, thereby restricting their use to users having personas within scope. In such an embodiment, the range information of the dataset group may be set and used in the same manner as the range information of other components.

As yet another variation, search results for a dataset may be limited to a dataset cart that itself matches a search query or contains a dataset that matches a search criteria. In some embodiments, the search results may include a data set cart containing data sets that match the criteria and data sets that match the search criteria and are not assigned to any data set cart. While separate datasets may be presented, search results may be limited by hierarchically presenting the datasets such that datasets that fall within a dataset cart or other grouping are not shown separately.

Further, examples are provided in which user input specifies a source type, which can distinguish between selecting a context that should be a single dataset and selecting a context that should be a group of datasets. The context may be determined in other ways, including automatically. If the context is automatically determined, it may be based on computerized analysis of the operations to be performed on the selected one or more data sets.

As a further example of a possible variation of the disclosed embodiments, a user writing an application specifying access to a logical data set is described. In some embodiments, the user may be a human user. In other embodiments, the user may be a program with Artificial Intelligence (AI). For example, the AI may derive a data processing algorithm by processing a data set, which may then be applied to other data sets.

Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the disclosure. Further, while advantages of the techniques described herein are indicated, it should be understood that not every embodiment of the techniques described herein will include every described advantage. Some embodiments may not implement any features described herein as advantageous, and in some cases, one or more of the described features may be implemented to implement further embodiments. Accordingly, the foregoing description and drawings are by way of example only.

The above-described aspects of the technology described herein may be implemented in any of a variety of ways. For example, these aspects may be implemented using hardware, software, or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Such a processor may be implemented as an integrated circuit (with one or more processors in the integrated circuit component) including commercially available integrated circuit components known in the art under the name of a CPU chip, GPU chip, microprocessor, microcontroller, or co-processor. In the alternative, the processor may be implemented in custom circuitry (such as an ASIC) or semi-custom circuitry generated by configuring a programmable logic device. As yet another alternative, the processor may be part of a larger circuit or semiconductor device, whether commercially available, semi-custom or custom made. As a specific example, some commercially available microprocessors have multiple cores such that one or a subset of the cores may constitute the processor. However, a processor may be implemented using any suitable form of circuitry.

Further, it should be appreciated that the computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. In addition, a computer may be embedded in a device that is not typically considered a computer, but that has suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone, or any other suitable portable or stationary electronic device.

Moreover, the computer may have one or more input devices and output devices. These devices may be particularly useful for presenting user interfaces. Examples of output devices that may be used to provide a user interface include printers or display screens for visual presentation of output, and speakers or other sound generating devices for audible presentation of output. Examples of input devices that may be used for the user interface include keyboards and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.

Such computers may be interconnected by one or more networks in any suitable form, including as a local area network or a wide area network, such as an enterprise network or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol, and may include wireless networks, wired networks, or fiber optic networks.

Moreover, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. In addition, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.

In this regard, aspects of the technology described herein may be embodied as a computer-readable storage medium (or multiple computer-readable media) (e.g., a computer memory, one or more floppy discs, compact Discs (CDs), optical discs, digital Video Discs (DVDs), magnetic tapes, flash memory, circuit arrangements in field programmable gate arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments described above. As will be apparent from the foregoing examples, a computer-readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form. Such computer-readable storage media may be transportable, such that the one or more programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the techniques discussed above. As used herein, the term "computer-readable storage medium" encompasses only non-transitory computer-readable media that may be considered an article of manufacture (i.e., article of manufacture) or machine. Alternatively or additionally, aspects of the technology described herein may be embodied as a computer-readable medium other than a computer-readable storage medium, such as a propagated signal.

The terms "program" or "software" are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions or processor-executable instructions that can be used to program a computer or other processor to implement aspects of the techniques described above. In addition, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the techniques described herein.

Computer-executable instructions may take the form of program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.

Furthermore, the data structures may be stored in any suitable form in a computer readable medium. For simplicity of illustration, the data structure may be shown with fields related by location in the data structure. Such relationships may also be implemented by assigning locations in a computer-readable medium that convey relationships between fields for storage for the fields. However, any suitable mechanism may be used to establish relationships between information in fields of a data structure, including through the use of pointers, tags, or other mechanisms that establish relationships between data elements.

The various aspects of the technology described herein may be used alone, in combination, or in a variety of arrangements not specifically described in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.

Moreover, the techniques described herein may be embodied as methods, examples of which are provided herein, including with reference to fig. 13-15. Acts performed as part of any of these methods may be ordered in any suitable manner. Thus, embodiments may be constructed in which acts are performed in a different order than shown, which may include performing some acts simultaneously, even though shown as sequential acts in the illustrative embodiments.

Further, some actions are described as being performed by an "actor" or "user. It should be appreciated that the "actor" or "user" need not be a single individual, and in some embodiments, actions attributable to the "actor" or "user" may be performed by a team of individuals and/or a combination of individuals and computer-aided tools or other mechanisms.

Use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.

Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. As used herein, "include", "comprising" or "having", "containing", "involving" and variations thereof are intended to encompass the items listed thereafter and equivalents thereof as well as additional items.

Claims

1. A method of enabling efficient operation of a data processing system in an environment having a plurality of data sets by enabling a set of data sets to be selected to perform an operation on each of the plurality of data sets in the set, the method comprising:

receiving a search query via a user interface to search a data set for use in connection with an operation related to data access by the data processing system;

Presenting search results based on the search query in the user interface, wherein presenting the results includes presenting one or more sets of data sets, at least some of the sets of data sets each including one or more of the searched data sets;

receiving, via the user interface, a manipulation of a first set of data sets of the one or more sets of data sets presented in the user interface, wherein the user interface is configured to provide an option for selecting the first set of data sets via the user interface as a target of the operation related to data access; and

upon selection of the first one of the one or more data set groups presented in the user interface, the operation is performed on each of the one or more data sets included in the first data set group.

2. The method of claim 1, wherein the user interface provides an option for expanding the first set of data sets to enable selection of one or more data sets of the first set of data sets via the user interface as a target of the operation related to data access, and

after the one or more data sets of the first set of data sets are selected, the operation is performed on each of the one or more data sets of the first set of data sets.

3. The method of claim 1 or any other preceding claim, wherein each of the one or more data set groups presented in the user interface has a correspondence between a persona associated with a user entering the search query via the user interface and a scope associated with the one or more data set groups.

4. The method of claim 3, wherein the search results exclude datasets that do not have metadata associated with the persona of the user.

5. The method of claim 1 or any other preceding claim, wherein performing the operation on each of one or more data sets comprises performing a data quality rule on each of the one or more data sets.

6. A method for enabling efficient operation of a data processing system in an environment having a plurality of data sets by enabling a first user to form a set of data sets and to present the set of data sets to a second user for selection in configuring operations to access one or more of the data sets, the method comprising:

receiving input from the first user through one or more first user interfaces, the input selecting one or more of the plurality of data sets for association with a group of the plurality of data set groups;

Storing a representation of the plurality of data set groups;

presenting a second user interface configured for use by a second user in selecting one or more data sets for use in connection with the operation of accessing the one or more data sets, wherein the second user has a persona and the data sets have a scope based at least in part on the persona of the user, wherein presenting the second user interface comprises:

automatically identifying one or more data set groups based at least in part on correspondence between personas associated with the second user of the data processing system and ranges associated with the same one or more automatically identified data set groups; and

an indication of the one or more automatically identified dataset groups is rendered in the second user interface.

7. The method of claim 6, wherein storing the representations of the plurality of groups comprises:

for each of the plurality of data set groups, information about one or more users authorized to access the group is stored.

8. The method of claim 6 or any other preceding claim, wherein:

the one or more first user interfaces include a dataset search interface including a facet search interface; and is also provided with

Facets in the facet search interface are based on values of metadata associated with the plurality of data sets.

9. The method of claim 6 or any other preceding claim, wherein the one or more first user interfaces comprise a user interface that displays a lineage of a dataset.

10. The method of claim 6 or any other preceding claim, wherein the one or more first user interfaces comprise a user interface displaying metadata related to data sets in the plurality of data sets.

11. The method of claim 6 or any other preceding claim, further comprising:

receiving, from the second user, input specifying a group of the one or more automatically identified groups through the second user interface; and

based on the input received from the second user, the operation is performed for each of the plurality of data sets within the selected group.

12. The method of claim 6 or any other preceding claim, wherein the operations comprise configuring an application for execution by the data processing system.

13. The method of claim 6 or any other preceding claim, wherein automatically identifying one or more dataset groups comprises selecting one or more dataset groups to which the second user of the data processing system has access, the automatically identifying one or more dataset groups being based at least in part on a correspondence between personas associated with the second user and ranges associated with the one or more automatically identified dataset groups.

14. The method of claim 6 or any other preceding claim, wherein:

rendering the indication of the one or more automatically identified groups includes rendering a graphical user interface element indicating a set of data sets for each of the one or more automatically identified groups; and is also provided with

The method further includes receiving, via the second user interface, a selection of a graphical user interface element that is rendered that indicates a group of data sets and rendering a plurality of data sets in the group on the second user interface based on the selection.

15. A method for enabling efficient operation of a data processing system in an environment having a plurality of data sets by presenting a set of data sets for selection by a user of the data processing system in configuring operations to access one or more data sets, the method comprising:

presenting a user interface configured for use by the user in selecting one or more data sets for use in connection with the operation of accessing the one or more data sets, wherein the user has a persona and the data sets have a scope based at least in part on the persona of the user, wherein presenting the user interface comprises:

Automatically identifying one or more data set groups based at least in part on correspondence between personas associated with the user of the data processing system and ranges associated with the same one or more automatically identified data set groups; and

an indication of the one or more automatically identified one or more data set groups is rendered in the user interface.

16. The method of claim 15, wherein the method further comprises:

receiving, through the user interface, user input specifying a group of one or more groups; and

based on the received input, an indication of the dataset within the selected group is rendered.

17. The method of claim 15 or any other preceding claim, wherein the method further comprises:

based on the received input, the operation is performed for each of the plurality of data sets within the selected group.

18. The method of claim 15 or any other preceding claim, wherein:

automatically identifying one or more data set groups further comprises:

receiving a search query for a dataset via the user interface;

A search is performed based on the search query to generate search results.

19. The method of claim 15 or any other preceding claim, wherein the operations comprise configuring an application for execution by the data processing system.

20. The method of claim 15 or any other preceding claim, wherein automatically identifying one or more dataset groups comprises selecting one or more dataset groups to which a user of the data processing system has access, the automatically identifying one or more dataset groups being based at least in part on a correspondence between personas associated with the user and ranges associated with the one or more automatically identified dataset groups.

21. The method of claim 15 or any other preceding claim, wherein:

The method further includes receiving a selection of a graphical user interface element that is rendered that indicates a group of data sets and rendering a plurality of data sets in the group on the user interface based on the selection.

22. A method of achieving efficient operation of a data processing system in an environment having a plurality of data sets by forming a set of data sets, the method comprising:

rendering one or more first user interfaces in which a plurality of data sets are identified;

receiving, via the one or more first user interfaces, user input selecting one or more identified data sets for association with a group of the plurality of data set groups; and

a representation of the plurality of data set groups is stored.

23. The method of claim 22, wherein storing the representations of the plurality of groups comprises:

24. The method of claim 22 or any other preceding claim, wherein the method further comprises:

rendering a second user interface associated with a user configuration of the data processing system to perform operations related to data access, wherein the second user interface includes a data set selection portion; and

rendering the second user interface includes presenting a representation of one or more of the plurality of data set groups in the data set selection portion.

25. The method of claim 24, wherein the method further comprises:

the one or more of the plurality of data set groups are selected for presentation in the second user interface based on the persona of the user.

26. The method of claim 24, wherein:

the second user interface includes a user interface in a program development environment; and is also provided with

This operation related to data access includes configuring components in the program being developed to access a dataset or a group of datasets.

27. The method of claim 22 or any other preceding claim, wherein:

the one or more first user interfaces include a dataset search interface.

28. The method of claim 27, wherein:

the dataset search interface includes a faceted search interface; and is also provided with

29. The method of claim 22 or any other preceding claim, wherein:

the one or more first user interfaces include a user interface that displays a lineage of the dataset.

30. The method of claim 22 or any other preceding claim, wherein:

The one or more first user interfaces include a user interface that displays metadata related to data sets of the plurality of data sets.

31. A method for achieving efficient operation of a data processing system in an environment having a plurality of data sets, the method comprising:

means for rendering one or more first user interfaces in which the data set is identified;

means for receiving user input through the one or more first user interfaces, the user input selecting one or more identified data sets for association with a group of the plurality of data set groups; and

means for storing a representation of the plurality of data set groups.

32. The method of claim 31, wherein the method further comprises:

means for rendering a second user interface associated with a user configuration of the data processing system to perform operations related to data access, wherein the second user interface includes a data set selection portion; and is also provided with

The means for rendering the second user interface includes presenting a representation of one or more of the plurality of data set groups in the data set selection portion.

33. The method of claim 32, wherein the method further comprises:

Means for selecting the one or more of the plurality of data set groups for presentation in the second user interface based on the persona of the user.

34. A method for creating a set of data sets in a data processing system operable with a plurality of data sets, the method comprising using at least one hardware processor to:

identifying a set of data sets available for execution by the data processing system, the operations relating to data access by the data processing system;

presenting the identified set of data sets in a first user interface;

receiving, via the first user interface, a user selection of one or more data sets from the presented identified set of data sets; and

a representation of a group comprising the selected one or more data sets is stored.

35. The method of claim 34, wherein identifying the set of data sets available to perform operations related to data access of the data processing system comprises:

receiving, via a user interface, a search query specifying one or more values describing facets of the plurality of data sets defined in the data processing system; and

a search is performed based on the search query to generate search results that include the set of data sets available to perform the operation.

36. The method of claim 35, wherein the search query comprises a faceted search query comprising one or more facets for filtering the search results.

37. The method of claim 36, wherein the one or more facets include facets indicating whether the data set is registered in a directory table that associates information for accessing the physical data set with the logical data set.

38. The method of claim 36 or 37, wherein:

the user interface for receiving the search query includes a plurality of fields for receiving user input identifying values of the one or more facets; and

the plurality of fields includes a field for receiving values of logical metadata, physical metadata, and/or operational metadata associated with the plurality of data sets.

39. A method as claimed in claim 34 or any other preceding claim, wherein the operation relating to data access comprises configuring a component of an application executed by the data processing system.

40. The method of claim 34 or any other preceding claim, further comprising:

a command is received via the second user interface for updating the group, the command including a request to add one or more data sets to the group or a request to delete one or more data sets from the group.

41. The method of claim 34 or any other preceding claim, further comprising:

metadata about a data set of the identified set of data sets is presented via the first user interface in response to user input requesting metadata related to the data set.

42. The method of claim 34 or any other preceding claim, wherein:

the group is a second group; and

receiving the user selection of one or more data sets includes: a selection of a previously defined first set of data sets is received such that the second set includes a hierarchical set of data sets.

43. A method as claimed in claim 34 or any other preceding claim, wherein storing the representation of the group comprises storing range information for the group.

44. The method of claim 43, wherein the range information includes an identification of one or more users authorized to access the group.

45. The method of claim 43, wherein the scope information includes an identification of one or more roles authorized to access the group.

46. The method of claim 34 or any other preceding claim, further comprising:

rendering a second user interface associated with a user configuration of the data processing system to perform the operation related to data access, wherein the second user interface includes a data set selection portion and rendering the second user interface includes presenting a representation of a group including the selected one or more data sets in the data set selection portion.

47. A data processing system, comprising:

at least one computer hardware processor; and

at least one non-transitory computer-readable medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform the method of any one of claims 1 to 46.

48. At least one non-transitory computer-readable medium comprising processor-executable instructions that, when executed by at least one computer hardware processor, cause the at least one computer hardware processor to perform the method of any one of claims 1 to 46.