Nothing Special   »   [go: up one dir, main page]

CN117762404A - Configurable operator processing method and device for data mining - Google Patents

Configurable operator processing method and device for data mining Download PDF

Info

Publication number
CN117762404A
CN117762404A CN202311619248.1A CN202311619248A CN117762404A CN 117762404 A CN117762404 A CN 117762404A CN 202311619248 A CN202311619248 A CN 202311619248A CN 117762404 A CN117762404 A CN 117762404A
Authority
CN
China
Prior art keywords
operator
configuration
parameter
parameters
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311619248.1A
Other languages
Chinese (zh)
Inventor
林培峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202311619248.1A priority Critical patent/CN117762404A/en
Publication of CN117762404A publication Critical patent/CN117762404A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention provides a configurable operator processing method and a device for data mining, which relate to the technical field of big data and operator development, and the method comprises the following steps: setting operator processing logic, operator parameters, front-end rendering components and component style information; constructing an operator configuration template and an algorithm model file; generating an operator page through an algorithm model file, a template engine and a front end framework according to the operator configuration template; acquiring a first configuration parameter input by a user on the operator page, checking and analyzing the first configuration parameter, and analyzing to obtain a second configuration parameter after the first configuration parameter passes the checking; generating a corresponding Spark task code and configuring task parameters according to the second configuration parameters obtained by analysis and operator processing logic in an operator configuration template; and constructing a Spark task according to the Spark task code and the task parameters, and executing the Spark task in response to the call of a preset remote call interface to obtain an operator calculation result.

Description

Configurable operator processing method and device for data mining
Technical Field
The invention relates to the technical field of big data and operator development, in particular to a configurable operator processing method and device for data mining.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The mature scheme in the mass data computing industry in the big data field generally selects spark as the bottom computing engine. In the field of data mining, business personnel often need to clean, analyze and process mass data. Such business personnel are typically data analysts who are familiar with business data meanings of the enterprise, etc., but do not have specialized programming capabilities. It is difficult to require them to be able to implement the data cleansing and processing requirements described above through the API interface of spark sql or spark RDD.
One way that is currently in common use is to encapsulate the basic capabilities of spark into specific operators, implementing different processing operations by means of parameterized configurations. The data analyst can quickly explore a set of data processing schemes based on a variety of operators. When new computing logic needs to be supported, an independent operator is developed specifically to support the logic, and therefore, updating operators is a relatively frequent operation.
In an operator development scene, when a new operator needs to be newly added, a product manager or a demand party is used for combing out configuration parameters which the operator needs to support and an integral UI interface of the operator, front-end engineers and back-end engineers are used for negotiating and determining interfaces for receiving the operator parameters, then the front-end engineers need to develop a specific interface of the operator according to the parameters and the UI, and the back-end needs to perform parameter analysis and code conversion on the operator. In this process, if the parameters are increased or decreased or UI is adjusted due to the variation of the requirements, the front-end engineer needs to re-develop and re-communicate the interface protocol with the back-end engineer. In practice, it is often the case that it is time consuming to develop an operator. This is mainly due to two reasons: firstly, front-end engineers and back-end engineers are required to develop and input the response to the requirement change; and secondly, in the debugging process, the front and rear interfaces are often not updated timely due to parameter change, and parameters cannot be analyzed correctly.
In view of the foregoing, a technical solution is needed to overcome the above-mentioned drawbacks, and to improve the processing efficiency and reduce the development cost for operator demand modification and debugging.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a configurable operator processing method and a device for data mining. The invention can flexibly design the operator configuration template, can be compatible with various common operator types, and supports the front end to render and produce parameter analysis interfaces according to the template.
In a first aspect of an embodiment of the present invention, a configurable operator processing method for data mining is provided, the method including:
setting operator processing logic, operator parameters, front-end rendering components and component style information;
constructing an operator configuration template according to the operator processing logic, the operator parameters, the front-end rendering component and the component style information;
creating an algorithm model file according to the calling relation, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating operator types and the logical relations of input and output interfaces among the operators;
Generating an operator page through an algorithm model file, a template engine and a front end framework according to the operator configuration template;
analyzing a preconfigured operator JSON file, acquiring configuration description information corresponding to each operator according to the corresponding relation between the operators and the configuration description information, and displaying the configuration description information corresponding to the operators on the operator page;
acquiring a first configuration parameter input by a user on the operator page, checking and analyzing the first configuration parameter, checking whether a parameter format and a parameter value of the first configuration parameter meet a set requirement, analyzing a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter after the first configuration parameter passes the check, and constructing a second configuration parameter through the command line parameter, the configuration file and the JSON format;
generating a corresponding Spark task code and configuring task parameters according to the second configuration parameters obtained by analysis and operator processing logic in an operator configuration template;
and constructing a Spark task according to the Spark task code and the task parameters, and executing the Spark task in response to the call of a preset remote call interface to obtain an operator calculation result.
In a second aspect of the embodiments of the present invention, a configurable operator processing apparatus for data mining is provided, the apparatus comprising:
the setting module is used for setting operator processing logic, operator parameters, front-end rendering components and component style information;
the operator configuration template module is used for constructing an operator configuration template according to the operator processing logic, the operator parameters, the front-end rendering components and the component style information;
the algorithm model file construction module is used for creating an algorithm model file according to the calling relation, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating operator types and the logical relations of input and output interfaces among the operators;
the front-end operator page generation module is used for generating an operator page through an algorithm model file, a template engine and a front-end framework according to the operator configuration template;
the configuration description information processing module is used for analyzing a preconfigured operator JSON file, acquiring configuration description information corresponding to each operator according to the corresponding relation between the operators and the configuration description information, and displaying the configuration description information corresponding to the operators on the operator page;
The front-end parameter verification and analysis module is used for acquiring a first configuration parameter input by a user on the operator page, verifying and analyzing the first configuration parameter, verifying whether a parameter format and a parameter value of the first configuration parameter meet set requirements, analyzing a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter after the verification is passed, and constructing a second configuration parameter through the command line parameter, the configuration file and the JSON format;
the back-end parameter analysis and code generation module is used for generating corresponding Spark task codes and configuring task parameters according to the second configuration parameters obtained through analysis and operator processing logic in the operator configuration template;
the task construction and execution module is used for constructing and obtaining a Spark task according to the Spark task code and the task parameter, and executing the Spark task in response to the call of the preset remote call interface to obtain an operator calculation result.
In a third aspect of the embodiments of the present invention, a computer device is presented, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a configurable operator processing method for data mining when executing the computer program.
In a fourth aspect of the embodiments of the present invention, a computer readable storage medium is presented, the computer readable storage medium storing a computer program which, when executed by a processor, implements a configurable operator processing method for data mining.
In a fifth aspect of the embodiments of the present invention, a computer program product is presented, the computer program product comprising a computer program which, when executed by a processor, implements a configurable operator processing method for data mining.
Compared with the prior art, the configurable operator processing method and the device for data mining have at least the following advantages:
reducing a threshold for data processing by data mining personnel: by means of the configured operator management method, data mining personnel can quickly develop a spark-based visual operator by adding new configuration nodes into a configuration file without writing complex digital codes. This allows non-professional developers to easily get on hand, lowering the threshold for processing data.
Development efficiency is improved: because the operator configuration template already contains processing logic, parameters, components for front-end rendering, component styles and the like of operators, the front end and the back end can automatically generate corresponding pages, parameter verification and analysis methods, analysis methods for receiving operator parameters, templates for generating operator codes and the like according to the templates, so that the time and energy of developers are saved, and the development efficiency is improved.
Flexibility and expandability of the system are improved: the invention adopts a configured operator management method, when a new operator needs to be added, only new configuration nodes need to be added on the operator configuration file, and the original codes do not need to be modified, so that the flexibility and the expandability of the system are greatly improved.
Optimizing user experience: for data miner, operator interface and parameter verification and analysis method automatically generated according to operator template, so that they can concentrate on the business logic of data processing without concern for the implementation of underlying technology. This helps to improve the work efficiency and optimize the user experience.
System maintenance and update are facilitated: because the invention adopts the configured operator management method, when the operator needs to be modified or updated, only the corresponding configuration file needs to be modified without changing the bottom layer. Thus, not only is the maintenance cost reduced, but also the stability and maintainability of the system are facilitated.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a configurable operator processing method for data mining according to one embodiment of the present invention.
FIG. 2 is a flow chart of a configurable operator processing method for data mining in accordance with another embodiment of the present invention.
FIG. 3 is a code example diagram of an operator configuration template according to an embodiment of the invention.
FIG. 4 is a code example diagram of an operator page of an embodiment of the present invention.
FIG. 5 is a diagram of code examples for configuration parameter verification and parsing according to an embodiment of the present invention.
FIG. 6 is a code example diagram of receiving operator parameters and generating operator code according to an embodiment of the present invention.
FIG. 7 is a schematic diagram of a configurable operator processing device architecture for data mining in accordance with an embodiment of the present invention.
FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The principles and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the invention and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the invention may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, a configurable operator processing method and device for data mining are provided, and the configurable operator processing method and device relate to the technical field of big data and operator development. The front end can automatically generate an operator interface and automatically generate a parameter verification and analysis method according to the operator template. The back end can automatically generate a parameter interface and a code conversion template according to the template.
In the embodiments of the present invention, terms to be described are as follows:
operators: the content of data processing is defined, and the method is mainly used for scenes such as data cleaning and feature derivation. Such as join operators, column split operators, etc.
spark: distributed computing engines based on memory computing are commonly used for big data computing.
JSON: a lightweight data exchange format employs a text format that is completely independent of language.
JSON-Schema, also known as JSON Schema, is used to describe the JSON data format, defining one criterion for JSON data constraint. According to the convention mode, two parties exchanging data can understand the requirements and constraints of JSON data, and can verify the data according to the requirements and constraints, so that the correctness of data exchange is ensured.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments thereof.
FIG. 1 is a flow diagram of a configurable operator processing method for data mining according to one embodiment of the present invention. As shown in fig. 1, the method includes:
s101, setting operator processing logic, operator parameters, front-end rendering components and component style information;
s102, constructing an operator configuration template according to the operator processing logic, the operator parameters, the front-end rendering components and the component style information;
s103, creating an algorithm model file according to the calling relation, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating operator types and the logical relations of input and output interfaces among the operators;
s104, generating an operator page through an algorithm model file, a template engine and a front end frame according to the operator configuration template;
s105, analyzing a preconfigured operator JSON file, acquiring configuration description information corresponding to each operator according to the corresponding relation between the operators and the configuration description information, and displaying the configuration description information corresponding to the operators on the operator page;
S106, acquiring a first configuration parameter input by a user on the operator page, checking and analyzing the first configuration parameter, checking whether a parameter format and a parameter value of the first configuration parameter meet a set requirement, analyzing a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter after the checking is passed, and constructing a second configuration parameter through the command line parameter, the configuration file and the JSON format;
s107, generating a corresponding Spark task code and configuring task parameters according to the second configuration parameters obtained by analysis and operator processing logic in an operator configuration template;
s108, constructing a Spark task according to the Spark task code and the task parameters, and executing the Spark task in response to the call of a preset remote call interface to obtain an operator calculation result.
In the implementation scene of the invention, the operator is described by JSON-SCHEMA, and comprises parameters of the operator, a front-end component type corresponding to each parameter, an optional parameter list, a code template and the like. And when the service is started each time, the front end requests the operator list through the interface, and the rear end returns the operator list in the JSON-SCHEMA format. And rendering each operator component according to the operator list after the front end receives the operator list, and generating a parameter analysis function. The back end can acquire a translation code template according to the operator list, analyzes parameters transmitted by the front end through a parameter interface, and instantiates the parameters to the code template to form the final executable sparkSQL. The whole process can enable the front end and the rear end to update parameters in time, ensure correct analysis of the parameters, reduce development cost and effectively improve development efficiency.
Furthermore, when new operators are needed, only the operator configuration is needed to be newly added in the back-end operator list. Specific process referring to fig. 2, the method further comprises:
s201, when a new operator is added, adding a new configuration node in an operator configuration file;
s202, generating corresponding operator pages, task codes and task parameters according to the new configuration nodes, and developing and deploying new operators.
In order to more clearly explain the above-mentioned configurable operator processing method for data mining, a detailed description will be given below in connection with each step.
In one embodiment, S101, operator processing logic, operator parameters, front-end rendering components, and component style information are set.
The specific setting mode is as follows:
1. setting functions and logic of operators, wherein the functions of the operators at least comprise data filtering, aggregation and sequencing;
2. different parameter items are configured according to the type of the parameter to be input, wherein the parameter items at least comprise filtering conditions, aggregation fields and ordering sequences;
3. designating a corresponding front-end rendering component for each parameter, wherein the front-end rendering component at least comprises an input frame, a drop-down list and a single selection frame;
4. When the component style is set, the appearance and the layout of the component are respectively defined, and at least comprise color, font and interval.
This step is a preparation step for designing an operator configuration template, since the operator configuration template contains the following: the processing logic, parameters, front-end rendering components, component styles of operators require that these information be predefined and set.
In one embodiment, S102, an operator configuration template is constructed according to the operator processing logic, operator parameters, front-end rendering components, and component style information.
Designing an operator configuration template according to the processing logic and parameters of the operator; when designing the operator configuration template, defining by adopting a component provided by a template engine or a front end frame, and adding corresponding style classes or style attributes for the operator configuration template according to component style information;
binding the operator parameters with components in the operator configuration template according to the operator parameters;
and according to the front-end rendering component, rendering the operator configuration template into a front-end interface by utilizing a front-end framework or a self-defined rendering function.
Referring to FIG. 3, a code example diagram of an operator configuration template is provided in accordance with an embodiment of the present invention. An operator configuration template is created that contains primarily the operator's logic, parameters, front-end rendering components, component styles, etc. This template will be used to generate the front-end interface and back-end code.
Specifically, binding the operator parameters with the components in the operator configuration template according to the operator parameters, including:
configuring name attribute and id attribute of form element of template by operator, and making corresponding operation parameter name;
generating an operator page according to the configuration information, name attribute, id attribute and name of operator parameter of operators with logical relations in the operator configuration template;
when a user modifies parameters in the operator configuration template, parameter values are obtained through form submission or JavaScript event monitoring.
Specifically, the operator configuration template is rendered into a front-end interface by using a front-end framework or a custom rendering function, which comprises the following steps:
the front-end interface is optimized using CSS style.
In the actual application scene, the operator parameters can be updated or the interface is re-rendered through a monitoring mechanism, and the specific method comprises the following steps:
monitoring the change of the operator configuration template through JavaScript events: wherein, the change of parameters in the operator configuration template is monitored in real time; when a user modifies the parameters, triggering a corresponding event processing function, updating the values of operator parameters or re-rendering the interface;
and according to the parameter values in the operator configuration template, applying processing logic of the operator, transferring the parameters in the operator configuration template to the operator by calling the functions of the operator, and executing corresponding functions.
Through the process, the processing logic, parameters, front-end rendering components and component styles of operators can be combined, and the design and application of the operator configuration template are realized. The user may customize the parameters of the operator by configuring the templates and trigger the corresponding processing logic. Meanwhile, good user experience is provided through rendering and style setting of the front-end interface.
In one embodiment, S103, an algorithm model file is created according to the calling relation, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating the operator types and the logical relations of input and output interfaces among the operators.
And finishing drawing of the algorithm flow through the front-end interface, and processing the calling relation, the modification mode and the interface configuration of the operator. In an actual application scene, operators can realize connection operation among operators through a front interface connection tool, a data input-output relation among operators is established, the connection has directivity, and operator parameter setting is completed in a functional area on the right side of the front panel through selecting operators. The front panel is composed of a functional area, an operator library area and a drawing area. And the operator is called, parameters are modified and the interface is configured through the operation, so that a model is built, an algorithm model file is generated, and the algorithm model file indicates the calling relation of the operator parameter configuration and the input/output interface.
The algorithm model file comprises at least: generating code storage path configuration information, operator self attribute, operator input interface configuration, operator output interface configuration; the operator self attributes include, but are not limited to: operator self ID, name and category, operator parameter configuration information; the operator input interface configuration includes, but is not limited to: the operator input interface name, the operator ID of the upper level connected with the operator in the algorithm model file, namely the father node ID and the operator output interface name of the upper level connected with the operator in the algorithm model file; the output interface configuration includes, but is not limited to, the operator output interface name, the next operator ID in the algorithm model file connected to the operator, i.e., the child node ID, and the operator input interface name in the algorithm model file connected to the operator next stage.
The operator is a basic element for constructing an algorithm model file, is a code set with initialization, parameter setting, checking and operation methods, and the generated target code realizes the specific function of the operator by calling the method. The operators have specified development templates, and corresponding operators can be designed according to requirements. The operators are classified into mathematical operation nodes, comparison operation nodes, logic operation nodes, statistical operation nodes, modulation sampling operation nodes, data processing nodes and control flow nodes according to functions;
And reading the algorithm model file, and compiling on the specified algorithm operation template file to generate a code segment with a specified function. Specifically, an initialization code and an operator parameter setting code of an operator can be generated according to configuration information in the model file. In the process of generating the operation codes, the generating algorithm reads configuration information of an operator input/output interface in the algorithm model file, performs layer sequence traversal through a breadth-first traversal method, determines the sequence of generating the operation codes according to the sequence of each operator operation logic in the algorithm model file, and generates corresponding operation codes. An operator error information output code is generated. If the operator is in operation, outputting error information in the operator operation result, outputting code traversal operator operation result by the error information, and outputting corresponding operator error information. And writing the generated target codes into the target codes according to the specified algorithm operation template, generating the target codes under a specified path in the algorithm model file, and directly operating the generated file through a compiler to finish corresponding algorithm operation. The operator can develop operators by himself according to the operator development templates provided according to specific requirements, and corresponding algorithms are realized.
In one embodiment, S104, an operator page is generated through an algorithm model file, a template engine and a front end framework according to the operator configuration template.
And acquiring an operator configuration template through an API, and generating a corresponding front-end operator page according to the operator configuration template, wherein the front-end operator page at least comprises one or more combined interactive elements including an input box, a drop-down list and a single selection box.
Referring to FIG. 4, a code example diagram of an operator page according to an embodiment of the invention is shown. The front end obtains all available operator templates from the back end through the interface. The templates contain information about operators, and the front end generates an operator page according to the information. In this way, the data miner can create and configure operators by dragging, etc. on the front page.
When page generation is performed according to an operator configuration template, it can be implemented using a template engine and a front-end framework.
Specifically, an operator page is generated using a Vue.js front end framework and a Handlebars template engine. The method comprises the following specific steps:
defining a parameter structure of an operator, and defining a template of an operator page by using a handles template engine; in this embodiment, the Handlebars template syntax is used to define form elements, while the two-way data binding (v-model) of Vue. Js is used to bind parameters to form elements.
The page rendering and interaction logic is implemented in JavaScript using a vue.js front end framework. A Vue instance is created using Vue. Js, binding parameters to pages, and implementing the logic of form submission.
In the above manner, the operator page may be generated using the template engine and front end framework. In practical application, a proper template engine and a front end frame can be selected according to specific requirements, and corresponding page templates and interaction logic are designed according to the parameter structure of an operator.
In an embodiment, S105, the preconfigured operator JSON file is parsed, and according to the corresponding relation between the operators and the configuration description information, the configuration description information corresponding to each operator is obtained, and the configuration description information corresponding to the operator is displayed on the operator page.
Pipeline is carried out by configuring upstream and downstream connection information among operators, and target codes for direct running are generated by calling an API interface; and when the target code is executed, receiving input parameter configuration information of each operator, wherein the parameter configuration information and the upstream and downstream connection information are determined based on configuration description information and processing logic of the target service.
Specifically, configuration description information of the target operator is displayed to a user through a UI interface so as to prompt the user to configure corresponding parameters, upstream-downstream connection relations and the like according to the configuration description information and service requirements of the target service, and flexible configuration and pipeline of the target operator are achieved.
After the parameter configuration information of each operator and the upstream and downstream connection information between the operators are obtained based on the operator management component, the API interface can be called by the automatic programming component to generate the target code.
It should be noted that, code conversion templates corresponding to different operators are preconfigured. When generating the target code, the code conversion is carried out according to the code conversion template corresponding to the operators and the parameter configuration information of the operators, so as to obtain the code segments corresponding to the operators, and the splicing is carried out according to the upstream and downstream connection information between the operators so as to generate the target code corresponding to the target service.
The function realized by the code conversion template is to read the Hive table, and whether the Hive link is an effective Hive link needs to be judged first; if not, the database is concatenated to generate a hive active link. Then, the user selects which columns (partition_columns) are read in the data table on the UI side, the partition of the column selected by the user is selected based on the column name selected by the user, and the table name (table_name) of which hive table is required to be read is loaded; then, the data can be read according to the partition_cols, and limit_num indicates that all output quantities are required to be obtained during reading; cached indicates whether or not the read data is cached; op_id represents the ID (operator number) of the operator, and the operator is uniquely identified through the operator number, so that data among operators of the same type is prevented from being covered. For example, an operator may be used twice in pipeline, once in training and once in using, in order to prevent the variable generated by the operator from collision, the operator is covered, and the uniqueness of the variable can be ensured through op_id.
In an embodiment, S106, a first configuration parameter input by a user on the operator page is obtained, the first configuration parameter is checked and analyzed, whether a parameter format and a parameter value of the first configuration parameter meet a set requirement is checked, after the check is passed, a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter are analyzed, and a second configuration parameter is obtained through the command line parameter, the configuration file and the JSON format.
Acquiring a first configuration parameter input by a user on an operator page, and checking whether the first configuration parameter meets a preset parameter requirement; the preset parameter requirements at least comprise parameter format requirements and parameter value requirements.
After the first configuration parameters are checked and the preset parameter requirements are met, the first configuration parameters are analyzed, and the second configuration parameters meeting the back-end processing format conditions are obtained through the command line parameters, the configuration files and the JSON format.
When parameter analysis is carried out, the specific process is as follows: determining the format of operator parameters, wherein the format comprises command line parameters, configuration files and JSON format;
and reading and analyzing parameters by utilizing corresponding analysis codes according to the format of the operator parameters, wherein parameter analysis is performed by using a command line parameter analysis library, a configuration file analysis library or a JSON analysis library tool.
Referring to fig. 5, a code example diagram of configuration parameter verification and parsing according to an embodiment of the present invention is shown. And generating a parameter verification and analysis method at the front end according to the operator configuration template. These methods are used to check whether the parameters entered by the user meet the expected requirements and to parse the parameters entered by the user into a format that can be processed by the backend.
In one embodiment, S107, generating a corresponding Spark task code and configuring task parameters according to the second configuration parameters obtained by parsing and operator processing logic in the operator configuration template, where the steps include:
generating a corresponding Spark task code according to the analyzed parameters, wherein the task is constructed by utilizing an API of Spark, and the task comprises defining a data source, a conversion operation, an aggregation operation and an output operation;
and setting configuration parameters of the Spark task by using a configuration API of the Spark, wherein the configuration parameters at least comprise parallelism, memory allocation and execution mode.
Referring to FIG. 6, a diagram of code instances of receiving operator parameters and generating operator codes according to one embodiment of the invention is shown. At the back end, according to the operator configuration template, an parsing method for receiving operator parameters and a template for generating operator codes are generated. These methods and templates will be used to process the parameters sent by the front end and convert them into executable Spark tasks
In one embodiment, S108 constructs a Spark task according to the Spark task code and the task parameter, and executes the Spark task in response to the call of the preset remote call interface to obtain the operator calculation result.
And submitting the Spark task to a Spark cluster for execution by utilizing a Spark submitting script or command line tool.
Further, (S201, S202) when a new operator needs to be added, only a new configuration node needs to be added in the operator configuration file. Therefore, the front end and the back end can generate corresponding interfaces and codes according to the new configuration nodes, so that quick development and deployment of new operators are realized.
The main core of the invention is the design of the operator template, which needs to be compatible with various common operator types and support the front end to render and produce parameter analysis interfaces according to the template. The main purpose of adding the part of content is to further lighten the workload of adding operators, and reduce the reconstruction cost of new operators by automatically generating front-end codes and back-end codes.
It should be noted that although the operations of the method of the present invention are described in a particular order in the above embodiments and the accompanying drawings, this does not require or imply that the operations must be performed in the particular order or that all of the illustrated operations be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
Having described the method of an exemplary embodiment of the present invention, a description is next made of a configurable operator processing apparatus for data mining of an exemplary embodiment of the present invention with reference to fig. 7.
The implementation of the configurable operator processing apparatus for data mining may refer to the implementation of the above method, and the repetition is not repeated. The term "module" or "unit" as used below may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Based on the same inventive concept, the invention also provides a configurable operator processing device for data mining, as shown in fig. 7, the device comprises:
a setting module 710, configured to set operator processing logic, operator parameters, front-end rendering components, and component style information;
an operator configuration template module 720, configured to construct an operator configuration template according to the operator processing logic, the operator parameters, the front-end rendering component and the component style information;
the algorithm model file construction module 730 is configured to create an algorithm model file according to the calling relationship, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating operator types and the logical relations of input and output interfaces among the operators;
The front-end operator page generation module 740 is configured to generate an operator page through an algorithm model file, a template engine and a front-end framework according to the operator configuration template;
the configuration description information processing module 750 is configured to parse a preconfigured operator JSON file, obtain configuration description information corresponding to each operator according to a corresponding relationship between the operator and the configuration description information, and display the configuration description information corresponding to the operator on the operator page;
the front-end parameter checking and analyzing module 760 is configured to obtain a first configuration parameter input by a user on the operator page, check and analyze the first configuration parameter, check whether a parameter format and a parameter value of the first configuration parameter meet a set requirement, analyze a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter after the check is passed, and construct a second configuration parameter according to the command line parameter, the configuration file and the JSON format;
the back-end parameter analysis and code generation module 770 is configured to generate a corresponding Spark task code and configure task parameters according to the second configuration parameters obtained by analysis and operator processing logic in the operator configuration template;
The task construction and execution module 780 is configured to construct a Spark task according to a Spark task code and task parameters, and execute the Spark task in response to a call of a preset remote call interface to obtain an operator calculation result.
Further, the device further comprises: operator configuration extension module 790;
wherein the operator configuration extension module 790 is configured to:
when a new operator is added, adding a new configuration node in the operator configuration file;
and generating corresponding operator pages, task codes and task parameters according to the new configuration nodes, and developing and deploying new operators.
Specifically, the operator configures the template module (Operator Configuration Template Module):
the main role of this module is to define and store operator configuration templates. The operator configuration templates contain processing logic, parameters, front-end rendered components, component styles, etc. of the operators. The configuration template provides basic data for the subsequent front-end and back-end modules and is used for dynamically generating operator interfaces and codes.
Algorithm model file construction module (Algorithm model file construction module):
creating an algorithm model file according to the calling relation, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating the operator types and the logical relations of input and output interfaces among the operators.
Front end operator page generation module (Front-end Operator Page Generation Module):
the module obtains all operator configuration templates through the interface and generates an operator page according to the templates. The module is responsible for applying front-end rendering components and component styles in the operator configuration templates to the generated operator pages, thereby implementing a visualized operator interface.
Configuration description information processing module (Configuration Description Information Processing Module):
analyzing a preconfigured operator JSON file, acquiring configuration description information corresponding to each operator according to the corresponding relation between the operators and the configuration description information, and displaying the configuration description information corresponding to the operators on the operator page.
Front-end parameter verification and analysis module (Front-end Parameter Validation and Parsing Module):
the module generates a parameter verification and analysis method according to operator configuration. After the user inputs parameters on the front-end operator page, the module is responsible for verifying whether the parameters input by the user meet the specifications and analyzing the parameters so as to be transmitted to the back-end for processing.
Back-end parameter parsing and code generation template module (Back-end Parameter Parsing and Code Generation Template Module):
The module root configuration template generates an analysis method for receiving operator parameters and a template for generating operator codes. The module needs to parse the parameters passed by the front end and generate executable operator code according to the processing logic defined in the operator configuration template.
Operator configuration extension module (Operator Configuration Extension Module):
this module is used to add operator configuration. When a new operator needs to be added, only new configuration nodes need to be added on the operator configuration file. And then, the front-end and back-end modules can automatically generate corresponding operator pages, parameter verification methods and codes according to the newly added operator configuration nodes.
The interconnection and working modes between the modules are as follows:
the operator configuration template module provides basic data for the front-end operator page generation module and the back-end parameter analysis and code generation template module.
The front-end operator page generation module generates a visual operator interface after obtaining the operator configuration template and provides the visual operator interface for a user to input parameters.
After the user inputs parameters on the front-end operator page, the front-end parameter verification and analysis module verifies and analyzes the parameters and then transmits the parameters to the back-end.
The back-end parameter analysis and code generation template module receives parameters transmitted by the front end and generates executable operator codes according to the operator configuration template.
When new operators are needed, new configuration nodes are added through the operator configuration expansion module, and the front-end module and the back-end module automatically adapt to the changes to generate corresponding operator pages and codes.
In one embodiment, the setting module sets operator processing logic, operator parameters, front-end rendering components, and component style information, including:
setting functions and logic of operators, wherein the functions of the operators at least comprise data filtering, aggregation and sequencing;
different parameter items are configured according to the type of the parameter to be input, wherein the parameter items at least comprise filtering conditions, aggregation fields and ordering sequences;
designating a corresponding front-end rendering component for each parameter, wherein the front-end rendering component at least comprises an input frame, a drop-down list and a single selection frame;
when the component style is set, the appearance and the layout of the component are respectively defined, and at least comprise color, font and interval.
In one embodiment, the operator configuration template module constructs an operator configuration template according to the operator processing logic, the operator parameters, the front-end rendering component and the component style information, and includes:
designing an operator configuration template according to the processing logic and parameters of the operator; when designing the operator configuration template, defining by adopting a component provided by a template engine or a front end frame, and adding corresponding style classes or style attributes for the operator configuration template according to component style information;
Binding the operator parameters with components in the operator configuration template according to the operator parameters;
and according to the front-end rendering component, rendering the operator configuration template into a front-end interface by utilizing a front-end framework or a self-defined rendering function.
In an embodiment, binding the operator parameter with a component in the operator configuration template according to the operator parameter includes:
configuring name attribute and id attribute of form element of template by operator, and making corresponding operation parameter name;
generating an operator page according to the configuration information, name attribute, id attribute and name of operator parameter of operators with logical relations in the operator configuration template;
when a user modifies parameters in the operator configuration template, parameter values are obtained through form submission or JavaScript event monitoring.
In an embodiment, the configuration description information processing module analyzes a preconfigured operator JSON file, obtains configuration description information corresponding to each operator according to a corresponding relation between the operator and the configuration description information, and displays the configuration description information corresponding to the operator on the operator page, including:
performing pipeline by configuring upstream and downstream connection information among operators, and generating an object code for direct operation by calling an API interface; and when the target code is executed, receiving input parameter configuration information of each operator, wherein the parameter configuration information and the upstream and downstream connection information are determined based on configuration description information and processing logic of the target service.
In an embodiment, rendering the operator configuration template as a front-end interface using a front-end framework or a custom rendering function includes:
the front-end interface is optimized using CSS style.
In an embodiment, the apparatus is further for:
monitoring the change of the operator configuration template through JavaScript events: wherein, the change of parameters in the operator configuration template is monitored in real time; when a user modifies the parameters, triggering a corresponding event processing function, updating the values of operator parameters or re-rendering the interface;
and according to the parameter values in the operator configuration template, applying processing logic of the operator, transferring the parameters in the operator configuration template to the operator by calling the functions of the operator, and executing corresponding functions.
In one embodiment, the front-end operator page generating module generates an operator page through an algorithm model file, a template engine and a front-end framework according to the operator configuration template, and includes:
and acquiring an operator configuration template through an API, and generating a corresponding front-end operator page according to the operator configuration template, wherein the front-end operator page at least comprises one or more combined interactive elements including an input box, a drop-down list and a single selection box.
In an embodiment, a front-end parameter checking and analyzing module obtains a first configuration parameter input by a user on the operator page, checks and analyzes the first configuration parameter, checks whether a parameter format and a parameter value of the first configuration parameter meet a set requirement, analyzes a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter after the check is passed, and constructs a second configuration parameter through the command line parameter, the configuration file and the JSON format, including:
Acquiring a first configuration parameter input by a user on an operator page, and checking whether the first configuration parameter meets a preset parameter requirement; the preset parameter requirements at least comprise parameter format requirements and parameter value requirements.
After the first configuration parameters are checked and the preset parameter requirements are met, the first configuration parameters are analyzed, and the second configuration parameters meeting the back-end processing format conditions are obtained through the command line parameters, the configuration files and the JSON format.
In an embodiment, the apparatus is further for:
determining the format of operator parameters, wherein the format comprises command line parameters, configuration files and JSON format;
and reading and analyzing parameters by utilizing corresponding analysis codes according to the format of the operator parameters, wherein parameter analysis is performed by using a command line parameter analysis library, a configuration file analysis library or a JSON analysis library tool.
In an embodiment, the back-end parameter analysis and code generation module generates a corresponding Spark task code and configures task parameters according to the second configuration parameters obtained by analysis and operator processing logic in the operator configuration template, and includes:
generating a corresponding Spark task code according to the analyzed parameters, wherein the task is constructed by utilizing an API of Spark, and the task comprises defining a data source, a conversion operation, an aggregation operation and an output operation;
And setting configuration parameters of the Spark task by using a configuration API of the Spark, wherein the configuration parameters at least comprise parallelism, memory allocation and execution mode.
In an embodiment, the task construction and execution module constructs a Spark task according to a Spark task code and task parameters, and executes the Spark task in response to a call of a preset remote call interface to obtain an operator calculation result, including:
and submitting the Spark task to a Spark cluster for execution by utilizing a Spark submitting script or command line tool.
In a practical application scenario, from the development point of view, the following needs to be done:
designing an operator configuration template: a configuration template is created for each operator, including information on the operator's name, parameter type, parameter range, default values, etc.
Front-end interface development: and acquiring an operator configuration template through an API, and generating a corresponding front-end page according to configuration, wherein the front-end page comprises interactive elements such as an input box, a drop-down list and the like.
The parameter checking and analyzing method comprises the following steps: and developing a parameter verification method at the front end to ensure that the parameters input by the user meet the requirements, and simultaneously developing an analysis method to convert the parameters input by the user into a format which can be processed by the rear end.
And (3) developing a rear-end interface: the back-end provides an API to receive the parameters transferred by the front-end and generates corresponding Spark codes according to the parameters.
Transcoding templates: designing a code conversion template to convert operator parameters configured by a user into actually executed Spark codes
Operator expansion: when a new operator needs to be added, only new configuration nodes need to be added on the configuration file, and core codes do not need to be modified.
By using the configurable operator processing method and device for data mining, operator management can be realized.
Creating an operator: operators may be created by defining an operator class or function. The operator class may inherit from the base operator class or implement custom operator logic. The operator functions may define operator logic by using specific grammars and keywords.
Registration operator: in an operator management system, created operators need to be registered in order for the system to be able to identify and use these operators. The registry operator may be implemented by a configuration file, command line parameters, or a registry function in the code.
Configuration operators: operator management systems typically provide a set of configuration parameters to control the behavior and performance of operators. The configuration parameters of the operator may be set by a configuration file, command line parameters, or configuration functions in the code.
Scheduling operators: an operator management system typically provides a set of scheduling policies to manage the order of execution and concurrency between operators. The scheduling policy of the operator may be set by a configuration file, command line parameters, or a scheduling function in the code.
Monitoring operator: operator management systems typically provide a set of monitoring mechanisms to monitor the execution state and performance metrics of operators. The monitoring mechanism of the operator may be set by a monitoring function in a configuration file, command line parameters or code.
Destroying operators: when operators no longer need to be used, they need to be destroyed to free up resources. The operators may be destroyed by a destruction function in a configuration file, command line parameters, or code.
The invention can effectively manage and control the creation, registration, configuration, scheduling, monitoring and destruction of operators, thereby realizing the purpose of operator management.
In the invention, the main core is the design of the operator template, which needs to be compatible with various common operator types and support the front end to render and produce parameter analysis interfaces according to the template. The main purpose of adding the part of content is to further lighten the workload of adding operators, and reduce the reconstruction cost of new operators by automatically generating front-end codes and back-end codes
Thus, the most important and innovative part is the design of operator configuration templates and transcoding templates.
Designing an operator configuration template: the design of the configuration template needs to consider the universality and usability of various operators and also needs to consider different types of data mining tasks. This is in part critical to the flexibility and scalability of the overall system.
Transcoding templates: designing efficient and flexible transcoding templates is a key to implementing operator functions. This is important to the performance and maintainability of the overall system, requiring technicians to have a solid Spark programming base and excellent code design capabilities.
It should be noted that while several modules of a configurable operator processing apparatus for data mining are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present invention. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.
Based on the foregoing inventive concept, as shown in fig. 8, the present invention further proposes a computer device 800, including a memory 810, a processor 820, and a computer program 830 stored in the memory 810 and executable on the processor 820, where the processor 820 implements the foregoing configurable operator processing method for data mining when executing the computer program 830.
Based on the foregoing inventive concept, the present invention proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements the aforementioned configurable operator processing method for data mining.
Based on the foregoing inventive concept, the present invention proposes a computer program product comprising a computer program which, when executed by a processor, implements a configurable operator processing method for data mining.
Compared with the prior art, the configurable operator processing method and the device for data mining have at least the following advantages:
reducing a threshold for data processing by data mining personnel: by means of the configured operator management method, data mining personnel can quickly develop a spark-based visual operator by adding new configuration nodes into a configuration file without writing complex digital codes. This allows non-professional developers to easily get on hand, lowering the threshold for processing data.
Development efficiency is improved: because the operator configuration template already contains processing logic, parameters, components for front-end rendering, component styles and the like of operators, the front end and the back end can automatically generate corresponding pages, parameter verification and analysis methods, analysis methods for receiving operator parameters, templates for generating operator codes and the like according to the templates, so that the time and energy of developers are saved, and the development efficiency is improved.
Flexibility and expandability of the system are improved: the invention adopts a configured operator management method, when a new operator needs to be added, only new configuration nodes need to be added on the operator configuration file, and the original codes do not need to be modified, so that the flexibility and the expandability of the system are greatly improved.
Optimizing user experience: for data miner, operator interface and parameter verification and analysis method automatically generated according to operator template, so that they can concentrate on the business logic of data processing without concern for the implementation of underlying technology. This helps to improve the work efficiency and optimize the user experience.
System maintenance and update are facilitated: because the invention adopts the configured operator management method, when the operator needs to be modified or updated, only the corresponding configuration file needs to be modified without changing the bottom layer. Thus, not only is the maintenance cost reduced, but also the stability and maintainability of the system are facilitated.
In the technical scheme, the acquisition, storage, use, processing and the like of the data all accord with the relevant regulations of laws and regulations.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (18)

1. A configurable operator processing method for data mining, the method comprising:
setting operator processing logic, operator parameters, front-end rendering components and component style information;
constructing an operator configuration template according to the operator processing logic, the operator parameters, the front-end rendering component and the component style information;
creating an algorithm model file according to the calling relation, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating operator types and the logical relations of input and output interfaces among the operators;
generating an operator page through an algorithm model file, a template engine and a front end framework according to the operator configuration template;
analyzing a preconfigured operator JSON file, acquiring configuration description information corresponding to each operator according to the corresponding relation between the operators and the configuration description information, and displaying the configuration description information corresponding to the operators on the operator page;
acquiring a first configuration parameter input by a user on the operator page, checking and analyzing the first configuration parameter, checking whether a parameter format and a parameter value of the first configuration parameter meet a set requirement, analyzing a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter after the first configuration parameter passes the check, and constructing a second configuration parameter through the command line parameter, the configuration file and the JSON format;
Generating a corresponding Spark task code and configuring task parameters according to the second configuration parameters obtained by analysis and operator processing logic in an operator configuration template;
and constructing a Spark task according to the Spark task code and the task parameters, and executing the Spark task in response to the call of a preset remote call interface to obtain an operator calculation result.
2. The method of claim 1, wherein setting operator processing logic, operator parameters, front-end rendering components, and component style information comprises:
setting functions and logic of operators, wherein the functions of the operators at least comprise data filtering, aggregation and sequencing;
different parameter items are configured according to the type of the parameter to be input, wherein the parameter items at least comprise filtering conditions, aggregation fields and ordering sequences;
designating a corresponding front-end rendering component for each parameter, wherein the front-end rendering component at least comprises an input frame, a drop-down list and a single selection frame;
when the component style is set, the appearance and the layout of the component are respectively defined, and at least comprise color, font and interval.
3. The method of claim 1, wherein constructing an operator configuration template from the operator processing logic, operator parameters, front-end rendering components, and component style information comprises:
Designing an operator configuration template according to the processing logic and parameters of the operator; when designing the operator configuration template, defining by adopting a component provided by a template engine or a front end frame, and adding corresponding style classes or style attributes for the operator configuration template according to component style information;
binding the operator parameters with components in the operator configuration template according to the operator parameters;
and according to the front-end rendering component, rendering the operator configuration template into a front-end interface by utilizing a front-end framework or a self-defined rendering function.
4. A method according to claim 3, wherein binding operator parameters with components in an operator configuration template according to the operator parameters comprises:
configuring name attribute and id attribute of form element of template by operator, and making corresponding operation parameter name;
generating an operator page according to the configuration information, name attribute, id attribute and name of operator parameter of operators with logical relations in the operator configuration template;
when a user modifies parameters in the operator configuration template, parameter values are obtained through form submission or JavaScript event monitoring.
5. A method according to claim 3, wherein rendering the operator configuration template as a front-end interface using a front-end framework or a custom rendering function comprises:
The front-end interface is optimized using CSS style.
6. A method according to claim 3, characterized in that the method further comprises:
monitoring the change of the operator configuration template through JavaScript events: wherein, the change of parameters in the operator configuration template is monitored in real time; when a user modifies the parameters, triggering a corresponding event processing function, updating the values of operator parameters or re-rendering the interface;
and according to the parameter values in the operator configuration template, applying processing logic of the operator, transferring the parameters in the operator configuration template to the operator by calling the functions of the operator, and executing corresponding functions.
7. The method of claim 1, wherein generating an operator page from the operator configuration template via an algorithm model file, a template engine, and a front end framework, comprises:
and acquiring an operator configuration template through an API, and generating a corresponding front-end operator page according to the operator configuration template, wherein the front-end operator page at least comprises one or more combined interactive elements including an input box, a drop-down list and a single selection box.
8. The method according to claim 1, wherein parsing the preconfigured operator JSON file, obtaining configuration description information corresponding to each operator according to a correspondence between operators and configuration description information, and displaying the configuration description information corresponding to each operator on the operator page, comprises:
Performing pipeline by configuring upstream and downstream connection information among operators, and generating an object code for direct operation by calling an API interface; and when the target code is executed, receiving input parameter configuration information of each operator, wherein the parameter configuration information and the upstream and downstream connection information are determined based on configuration description information and processing logic of the target service.
9. The method of claim 1, wherein obtaining a first configuration parameter input by a user on the operator page, checking and analyzing the first configuration parameter, checking whether a parameter format and a parameter value of the first configuration parameter meet a set requirement, and after the checking is passed, analyzing a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter, and constructing a second configuration parameter through the command line parameter, the configuration file and the JSON format, wherein the method comprises the steps of:
acquiring a first configuration parameter input by a user on an operator page, and checking whether the first configuration parameter meets a preset parameter requirement; wherein, the preset parameter requirements at least comprise parameter format requirements and parameter value requirements;
after the first configuration parameters are checked and the preset parameter requirements are met, the first configuration parameters are analyzed, and the second configuration parameters meeting the back-end processing format conditions are obtained through the command line parameters, the configuration files and the JSON format.
10. The method of claim 8, wherein the method further comprises:
and reading and analyzing parameters by utilizing corresponding analysis codes according to the format of the operator parameters, wherein parameter analysis is performed by using a command line parameter analysis library, a configuration file analysis library or a JSON analysis library tool.
11. The method of claim 1, wherein generating a corresponding Spark task code and configuring task parameters according to the parsed second configuration parameters and operator processing logic in the operator configuration template, comprises:
generating a corresponding Spark task code according to the analyzed parameters, wherein the task is constructed by utilizing an API of Spark, and the task comprises defining a data source, a conversion operation, an aggregation operation and an output operation;
and setting configuration parameters of the Spark task by using a configuration API of the Spark, wherein the configuration parameters at least comprise parallelism, memory allocation and execution mode.
12. The method of claim 1, wherein constructing a Spark task according to a Spark task code and task parameters, and executing the Spark task in response to a call of a preset remote call interface, to obtain an operator calculation result, comprises:
and submitting the Spark task to a Spark cluster for execution by utilizing a Spark submitting script or command line tool.
13. The method according to claim 1, characterized in that the method further comprises:
when a new operator is added, adding a new configuration node in the operator configuration file;
and generating corresponding operator pages, task codes and task parameters according to the new configuration nodes, and developing and deploying new operators.
14. A configurable operator processing apparatus for data mining, the apparatus comprising:
the setting module is used for setting operator processing logic, operator parameters, front-end rendering components and component style information;
the operator configuration template module is used for constructing an operator configuration template according to the operator processing logic, the operator parameters, the front-end rendering components and the component style information;
the algorithm model file construction module is used for creating an algorithm model file according to the calling relation, the modification mode and the interface configuration of the operator; the algorithm model file is a configuration information set of operators with logical relations, and is used for indicating operator types and the logical relations of input and output interfaces among the operators;
the front-end operator page generation module is used for generating an operator page through an algorithm model file, a template engine and a front-end framework according to the operator configuration template;
The configuration description information processing module is used for analyzing a preconfigured operator JSON file, acquiring configuration description information corresponding to each operator according to the corresponding relation between the operators and the configuration description information, and displaying the configuration description information corresponding to the operators on the operator page;
the front-end parameter verification and analysis module is used for acquiring a first configuration parameter input by a user on the operator page, verifying and analyzing the first configuration parameter, verifying whether a parameter format and a parameter value of the first configuration parameter meet set requirements, analyzing a command line parameter, a configuration file and a JSON format corresponding to the first configuration parameter after the verification is passed, and constructing a second configuration parameter through the command line parameter, the configuration file and the JSON format;
the back-end parameter analysis and code generation module is used for generating corresponding Spark task codes and configuring task parameters according to the second configuration parameters obtained through analysis and operator processing logic in the operator configuration template;
the task construction and execution module is used for constructing and obtaining a Spark task according to the Spark task code and the task parameter, and executing the Spark task in response to the call of the preset remote call interface to obtain an operator calculation result.
15. The apparatus of claim 14, wherein the apparatus further comprises: an operator configuration expansion module;
wherein, operator configuration extension module is used for:
when a new operator is added, adding a new configuration node in the operator configuration file;
and generating corresponding operator pages, task codes and task parameters according to the new configuration nodes, and developing and deploying new operators.
16. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 13 when executing the computer program.
17. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 13.
18. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any of claims 1 to 13.
CN202311619248.1A 2023-11-29 2023-11-29 Configurable operator processing method and device for data mining Pending CN117762404A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311619248.1A CN117762404A (en) 2023-11-29 2023-11-29 Configurable operator processing method and device for data mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311619248.1A CN117762404A (en) 2023-11-29 2023-11-29 Configurable operator processing method and device for data mining

Publications (1)

Publication Number Publication Date
CN117762404A true CN117762404A (en) 2024-03-26

Family

ID=90319164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311619248.1A Pending CN117762404A (en) 2023-11-29 2023-11-29 Configurable operator processing method and device for data mining

Country Status (1)

Country Link
CN (1) CN117762404A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971236A (en) * 2024-03-31 2024-05-03 浪潮电子信息产业股份有限公司 Operator analysis method, device, equipment and medium based on lexical and grammatical analysis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971236A (en) * 2024-03-31 2024-05-03 浪潮电子信息产业股份有限公司 Operator analysis method, device, equipment and medium based on lexical and grammatical analysis

Similar Documents

Publication Publication Date Title
US6408430B2 (en) Interactive software testing system and method
US8869103B2 (en) Using intermediate representations to verify computer-executable code generated from a model
US8856726B2 (en) Verification of computer-executable code generated from a slice of a model
US9075544B2 (en) Integration and user story generation and requirements management
von Pilgrim et al. Constructing and visualizing transformation chains
CN111880784B (en) SIMSCRIPT language-oriented discrete event simulation graphical modeling method
CN117762404A (en) Configurable operator processing method and device for data mining
US10915302B2 (en) Identification and visualization of associations among code generated from a model and sources that affect code generation
US20050137839A1 (en) Methods, apparatus and programs for system development
CN118245032B (en) Attribute linkage engine method and system for customer relationship management
CN110019207B (en) Data processing method and device and script display method and device
US8448143B2 (en) System and method for message choreographies of services
CN113010168B (en) User interface generation method based on scene tree
Samuel et al. A novel test case design technique using dynamic slicing of UML sequence diagrams
Ko et al. ModelCenter MBSE for OpenMBEE: MBSE analysis integration for distributed development
CN114757124B (en) CFD workflow modeling method and device based on XML, computer and storage medium
Adamek et al. Component reliability extensions for fractal component model
Sun et al. A demonstration-based model transformation approach to automate model scalability
Bouaziz et al. SysML model-driven approach to verify blocks compatibility
Schmid et al. Qrygraph: A graphical tool for big data analytics
Shahin et al. An aspect-oriented approach for saas application customization
CN118034672B (en) Spark-based visual data processing analysis method and system
CN118012387B (en) Construction system and construction method of big data analysis algorithm library
Kulankhina A framework for rigorous development of distributed components: formalisation and tools
Patel Spring 5.0 Projects: Build Seven Web Development Projects with Spring MVC, Angular 6, JHipster, WebFlux, and Spring Boot 2

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination