CN117313683A

CN117313683A - Metadata processing method, device, server and storage medium

Info

Publication number: CN117313683A
Application number: CN202311524393.1A
Authority: CN
Inventors: 李晓娟; 贾玉武; 周莉; 秦宏伟; 桑海岩; 李大中; 宋雨伦; 倪明鉴
Original assignee: China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd
Current assignee: China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2023-12-29

Abstract

The application provides a metadata processing method, a metadata processing device, a server and a storage medium. The method comprises the following steps: receiving a plurality of metadata sent by a data terminal, wherein the plurality of metadata comprises a plurality of table names, and each table name comprises a plurality of field names and a plurality of field description information; acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords; obtaining a table keyword corresponding to each table name to obtain a plurality of table keywords; setting a plurality of category rules according to the table keywords and the field keywords; acquiring a category label of each field description information according to a plurality of category rules; acquiring metadata in a triplet format according to each field description information, a field name corresponding to each field description information and a category label of each field description information; and obtaining a metadata classification model according to the metadata in the multiple triplet formats. The method improves the efficiency of metadata classification and classification.

Description

Metadata processing method, device, server and storage medium

Technical Field

The present disclosure relates to the field of big data technologies, and in particular, to a metadata processing method, a metadata processing device, a server, and a storage medium.

Background

With the development of digital and informative construction, enterprises are generating massive amounts of data every day. However, the vast amount of data and unstructured data types further complicate data management complexity, increasing the difficulty of metadata identification and classification.

Currently, in the prior art, classification and grading of metadata are mainly achieved by exporting metadata to a file in a proper format and then processing the exported data by using a corresponding data processing tool or script.

However, metadata generally has the problem of irregular and non-uniform naming, so that a great deal of manpower is required to be input into the method, manpower resources are wasted, time cost is increased, and the efficiency of metadata classification and classification is reduced.

Disclosure of Invention

The application provides a metadata processing method, a metadata processing device, a server and a storage medium, which are used for solving the technical problem of low metadata classification and classification efficiency.

In a first aspect, the present application provides a metadata processing method, including:

and receiving a plurality of metadata sent by the data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.

And acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords.

And obtaining the table keywords corresponding to each table name to obtain a plurality of table keywords.

And setting a plurality of category rules according to the table keywords and the field keywords.

And obtaining the category label of each field description information according to the category rules.

And acquiring the metadata in the triplet format according to the field description information, the field name corresponding to the field description information and the category label of the field description information, so as to acquire the metadata in the triplet format.

And obtaining a metadata classification model according to the metadata in the multiple triplet formats, wherein the metadata classification model is used for classifying metadata to be classified.

Optionally, in the method as described above, the obtaining a field keyword corresponding to each field description information includes: and acquiring the field keywords corresponding to the descriptive information of each field by adopting a word segmentation method and a keyword extraction method.

Optionally, the method described above, wherein each table name is associated with a chinese table name and an english table name; correspondingly, the obtaining the table keyword corresponding to each table name includes: splicing the Chinese table names and the English table names to obtain the table names; and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.

Optionally, the method as described above, wherein the setting a plurality of category rules according to the plurality of table keywords and the plurality of field keywords includes: obtaining a plurality of field keywords in each table name according to each table keyword; setting a category rule corresponding to each field keyword to obtain the category rules.

Optionally, the method as described above, the obtaining a metadata classification model according to the metadata in the multiple triples format includes: determining the metadata in the multiple triplet formats as sample data; wherein the sample data comprises a training set, a validation set, and a test set; constructing a deep learning network model, and inputting the training set into the deep learning network model for training to obtain an initial metadata classification model; according to the verification set, parameter adjustment is carried out on the initial metadata classification model to obtain a trained metadata classification model; and testing the trained metadata classification model according to the test set to determine the metadata classification model.

Optionally, the method as described above further comprises: acquiring any field name and field description information corresponding to the any field name; acquiring metadata in a binary group format according to any field name and field description information corresponding to the any field name, and determining the metadata in the binary group format as the metadata to be classified; and inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified.

Optionally, after the inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified, the method further includes: transmitting the metadata category to a user terminal so that the user terminal can audit the metadata category to obtain an audit result; and receiving the auditing result sent by the user terminal, and if the auditing result is judged to pass the auditing, obtaining the metadata level corresponding to the metadata category according to the category and the level mapping table.

Optionally, in the method as described above, the sending the metadata category to a user terminal, so that the user terminal audits the metadata category, and after obtaining an audit result, further includes: receiving the auditing result sent by the user terminal, and if the auditing result is judged to be that the auditing is not passed, sending the metadata to be classified to the user terminal so that the user terminal obtains the metadata category according to the metadata to be classified; receiving the metadata category sent by the user terminal; obtaining the metadata of the triplet format according to the metadata category and the metadata of the triplet format; determining the metadata in the triplet format as newly added sample data; when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying the metadata to be classified.

In a second aspect, the present application provides a metadata processing apparatus, including:

and the receiving module is used for receiving a plurality of metadata sent by the data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.

The first acquisition module is used for acquiring the field keywords corresponding to each field description information so as to obtain a plurality of field keywords.

And the second acquisition module is used for acquiring the table keywords corresponding to each table name so as to obtain a plurality of table keywords.

And the setting module is used for setting a plurality of category rules according to the table keywords and the field keywords.

And the third acquisition module is used for acquiring the category label of each field description information according to the plurality of category rules.

And a fourth obtaining module, configured to obtain metadata in a triplet format according to the each field description information, the field name corresponding to the each field description information, and the class label of the each field description information, so as to obtain metadata in a plurality of triples.

And a fifth obtaining module, configured to obtain a metadata classification model according to the metadata in the multiple triplet formats, where the metadata classification model is used to classify metadata to be classified.

In a third aspect, the present application provides a server comprising:

at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the metadata processing method as described above in the first aspect and the various possible designs of the first aspect.

In a fourth aspect, the present application provides a computer storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the metadata processing method according to the first aspect and the various possible designs of the first aspect.

According to the metadata processing method, the metadata processing device, the server and the storage medium, a plurality of category rules are set by acquiring the table keywords corresponding to the table names and the keywords corresponding to the field description information in each metadata, and the tag category of each field description information is acquired according to the category rules; and obtaining metadata in a plurality of triplet formats according to each field name, each field description information and the category label of each field description information, and constructing and training a metadata classification model. The problems of large input of manpower, waste of manpower resources and time and cost increase caused by irregular and non-uniform naming of metadata are avoided, and the metadata classification efficiency is improved, so that the metadata classification efficiency is also improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of a metadata processing system according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a metadata processing method according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a metadata processing method according to another embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a metadata processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic hardware structure of a server according to an embodiment of the present application.

Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

With the development of digital and informative construction, enterprises are generating massive amounts of data every day. However, the vast amount of data and unstructured data types further complicate data management complexity, increasing the difficulty of metadata identification and classification. Currently, in the prior art, classification and grading of metadata are mainly achieved by exporting metadata to a file in a proper format and then processing the exported data by using a corresponding data processing tool or script. However, metadata generally has the problem of irregular and non-uniform naming, so that a great deal of manpower is required to be input into the method, manpower resources are wasted, time cost is increased, and the efficiency of metadata classification and classification is reduced.

In order to solve the above technical problems, the embodiments of the present application provide the following technical ideas: considering that metadata generally has the problems of irregular naming and non-uniformity, a great deal of manpower is required to be input, and manpower resources are wasted, so that time cost is increased, and the efficiency of metadata classification and classification is reduced. The inventor thinks that according to the table keywords corresponding to the table names and the keywords corresponding to the field description information in each metadata, category rules are set, and category labels of each field description information are obtained according to the category rules; and obtaining metadata in a plurality of triplet formats according to each field name, each field description information and the category label of each field description information, and constructing and training a metadata classification model. The problems of human resources waste and time and cost increase caused by irregular and non-uniform naming of metadata are avoided, and the efficiency of metadata classification and classification is improved.

The metadata processing method aims at solving the technical problems in the prior art.

The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a schematic application scenario diagram of metadata processing provided in an embodiment of the present application. As shown in fig. 1, the application scenario includes: a data terminal 101, a server 102, and a display terminal 103.

The data terminal 101 may be a computer or a terminal device.

The server 102 may be one server or a cluster formed by a plurality of servers.

A platform device 103, a display terminal, etc.

Referring to fig. 1, a data terminal 101 transmits a plurality of metadata to a server 102; the server 102 performs a series of processes on the plurality of metadata to construct a metadata classification model; the metadata classification model is used for classifying metadata to be classified, and sending metadata categories to the display terminal 103 for display.

Fig. 2 is a flow chart of a metadata processing method according to an embodiment of the present application, and the execution subject of the embodiment may be the server 102 in the embodiment of fig. 1, or may be another server with similar functions, which is not particularly limited herein. As shown in fig. 2, the method includes:

s201: and receiving a plurality of metadata sent by the data terminal, wherein the plurality of metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.

Wherein the metadata includes: database name, database description, table name, table description, field name, field description, field type, field maximum length, and the like.

In this embodiment, after metadata is obtained, some data cleansing operations are required, including but not limited to: duplicate data removal, processing missing values, data type conversion and normalization, and the like.

S202: and acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords.

Specifically, a word segmentation method and a keyword extraction method are adopted to obtain field keywords corresponding to each field description information.

In this embodiment, a word segmentation technique in natural language processing is adopted to break field description information into individual words, and a keyword extraction technique is adopted to extract keywords from the field description information on the basis of word segmentation.

Alternatively, the word segmentation technique may be HanLP word segmentation or space word segmentation. The keyword extraction technology can be TF-IDF or textRank.

S203: and obtaining the table keywords corresponding to each table name to obtain a plurality of table keywords.

Wherein each table name is associated with a Chinese table name and an English table name.

Wherein the table name is table description information.

Specifically, the Chinese table names and the English table names are spliced to obtain the table names; and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.

S204: a plurality of category rules are set based on the plurality of table keywords and the plurality of field keywords.

Specifically, according to each table keyword, obtaining a plurality of field keywords in each table name; setting a category rule corresponding to each field keyword to obtain a plurality of category rules.

Specifically, by matching each field keyword with metadata or an actual table structure in a database, a table name where each field description information is located is determined, so as to obtain field keywords corresponding to a plurality of field description information in each table name, which may involve keyword matching or pattern matching.

In this embodiment, a rule is formulated for each field keyword based on a plurality of field keywords. Alternatively, the rules may be content-based or schema-based.

S205: and obtaining the category label of each field descriptive information according to a plurality of category rules.

Specifically, each field description information and the field name are matched through a formulated rule, and if the matching is successful, a corresponding category label is added for the field description information and the field name.

S206: and acquiring the metadata in the triplet format according to each field description information, the field name corresponding to each field description information and the category label of each field description information so as to acquire the metadata in the triplet format.

Illustratively, the metadata in the triplet format is < category tag, field name, field description >, where the field description may be null.

S207: and obtaining a metadata classification model according to the metadata in the multiple triplet formats, wherein the metadata classification model is used for classifying the metadata to be classified.

Specifically, step S207 includes S2071 to S2074:

s2071: determining metadata in a plurality of triplet formats as sample data; wherein the sample data includes a training set, a validation set, and a test set.

In machine learning, data is generally divided into training sets, validation sets, and test sets. The training set is used for training the model, the verification set is used for adjusting the super parameters of the model, and the test set is used for evaluating the performance of the model.

Optionally, a certain proportion of metadata is used as a training set and a verification set, and the rest metadata is used as a test set. Illustratively, 60% of the sample data is selected as the training set, 20% of the sample data is selected as the validation set, and 20% of the sample data is selected as the test set.

S2072: and constructing a deep learning network model, inputting a training set into the deep learning network model for training, and obtaining an initial metadata classification model.

Specifically, a structure of a deep learning network is constructed. This may include selecting an appropriate convolution layer, pooling layer, full connection layer, etc., as well as parameter settings for each layer. And inputting the training set into a deep learning network model, and optimizing the model by solving the gradient of the loss function and updating the parameters to obtain an initial metadata classification model.

S2073: and according to the verification set, performing parameter adjustment on the initial metadata classification model to obtain a trained metadata classification model.

In particular, the performance of the model is evaluated using the validation set. If the performance of the model is not improved, the super-parameters may be adjusted in an attempt to improve the performance of the model.

S2074: and testing the trained metadata classification model according to the test set to determine the metadata classification model.

Specifically, the trained metadata classification model is tested with a test set, and the metadata classification model is evaluated and determined.

As can be seen from the above, in this embodiment, by acquiring a table keyword corresponding to a table name and a keyword corresponding to field description information in each metadata, setting a plurality of category rules, and acquiring a tag category of each field description information according to the category rules; and obtaining metadata in a plurality of triplet formats according to each field name, each field description information and the category label of each field description information, and constructing and training a metadata classification model. The problems of large input of manpower, waste of manpower resources and time and cost increase caused by irregular and non-uniform naming of metadata are avoided, and the metadata classification efficiency is improved, so that the metadata classification efficiency is also improved.

Fig. 3 is a schematic flow chart of a metadata processing method according to another embodiment of the present application, where the execution subject of the embodiment may be the server 102 in the embodiment of fig. 1, or may be another server with similar functions, and this embodiment is not particularly limited herein, and the present embodiment focuses on a process of classifying metadata to be classified by a metadata classification model and acquiring metadata levels. As shown in fig. 3, the method includes:

s301: and acquiring any field name and field description information corresponding to the any field name.

S302: and acquiring the metadata in the binary group format according to any field name and the field description information corresponding to any field name, and determining the metadata in the binary group format as metadata to be classified.

Illustratively, the metadata in the binary format is < field name, field description >.

S303: and inputting the metadata to be classified into a metadata classification model to output the metadata category of the metadata to be classified.

Specifically, the < field name, field description > is input to the metadata classification model for prediction, and the category corresponding to the metadata is obtained.

S304: and sending the metadata category to the user terminal so that the user terminal can audit the metadata category to obtain an audit result.

Wherein each user terminal is associated with an auditor.

Specifically, the auditor audits whether the metadata category is correct.

S305: and receiving an audit result sent by the user terminal, and if the audit result is judged to pass the audit, obtaining a metadata level corresponding to the metadata category according to the category and the level mapping table.

Wherein, the category and level mapping table refers to a table or mapping relation that maps metadata categories to corresponding levels.

S306: and receiving an audit result sent by the user terminal, and if the audit result is judged to be not passed, sending the metadata to be classified to the user terminal so that the user terminal obtains the metadata category according to the metadata to be classified.

Specifically, if the audit is not passed, the auditor is required to give the true category of the metadata.

S307: and receiving the metadata category sent by the user terminal.

S308: and obtaining the metadata in the triplet format according to the metadata category and the metadata in the triplet format.

And obtaining the metadata in the < category label, the field name and the field description > triplet format according to the metadata category and the < field name.

S309: and determining the metadata in the triplet format as newly added sample data.

S310: when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying metadata to be classified.

Specifically, if the newly added sample data exceeds a preset threshold value or the iteration time interval exceeds a preset time, merging the newly added sample data into the sample data, and optimizing the metadata classification model to obtain an optimized metadata classification model.

As can be seen from the above, in this embodiment, if the auditing result of the metadata is that the auditing is not passed, the metadata category is obtained; and converting the metadata in the binary format which is not passed through the verification into the metadata in the ternary format, determining the metadata as newly added sample data, optimizing the metadata classification model, obtaining an optimized metadata classification model, and further improving the accuracy of the metadata classification model.

Fig. 4 is a schematic structural diagram of a metadata processing apparatus according to an embodiment of the present application. As shown in fig. 4, the metadata processing apparatus includes: a receiving module 401, a first acquiring module 402, a second acquiring module 403, a setting module 404, a third acquiring module 405, a fourth acquiring module 406, and a fifth acquiring module 407.

The receiving module 401 is configured to receive a plurality of metadata sent by a data terminal, where the plurality of metadata includes a plurality of table names, each table name includes a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.

The first obtaining module 402 is configured to obtain field keywords corresponding to each field description information, so as to obtain a plurality of field keywords.

The second obtaining module 403 is configured to obtain a table keyword corresponding to each table name, so as to obtain a plurality of table keywords.

A setting module 404, configured to set a plurality of category rules according to the table keywords and the field keywords.

And a third obtaining module 405, configured to obtain a category label of the descriptive information of each field according to the plurality of category rules.

And a fourth obtaining module 406, configured to obtain metadata in a triplet format according to the each field description information, the field name corresponding to the each field description information, and the class label of the each field description information, so as to obtain metadata in a plurality of triples.

And a fifth obtaining module 407, configured to obtain a metadata classification model according to the metadata in the multiple triples, where the metadata classification model is used to classify metadata to be classified.

Optionally, the first obtaining module 402 is specifically configured to: and acquiring the field keywords corresponding to the descriptive information of each field by adopting a word segmentation method and a keyword extraction method.

Optionally, the method described above, wherein each table name is associated with a chinese table name and an english table name; accordingly, the second obtaining module 403 is specifically configured to: splicing the Chinese table names and the English table names to obtain the table names; and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.

Optionally, the method as described above, the setting module is specifically configured to: obtaining a plurality of field keywords in each table name according to each table keyword; setting a category rule corresponding to each field keyword to obtain the category rules.

Optionally, the fifth obtaining module 407 is specifically configured to: determining the metadata in the multiple triplet formats as sample data; wherein the sample data comprises a training set, a validation set, and a test set; constructing a deep learning network model, and inputting the training set into the deep learning network model for training to obtain an initial metadata classification model; according to the verification set, parameter adjustment is carried out on the initial metadata classification model to obtain a trained metadata classification model; and testing the trained metadata classification model according to the test set to determine the metadata classification model.

Optionally, the method as described above, the apparatus further comprises: an input module 408, configured to obtain any field name and field description information corresponding to the any field name; acquiring metadata in a binary group format according to any field name and field description information corresponding to the any field name, and determining the metadata in the binary group format as the metadata to be classified; and inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified.

Optionally, the method as described above, the apparatus further comprises: the sending module 409 is configured to send the metadata category to a user terminal, so that the user terminal performs an audit on the metadata category to obtain an audit result; and receiving the auditing result sent by the user terminal, and if the auditing result is judged to pass the auditing, obtaining the metadata level corresponding to the metadata category according to the category and the level mapping table.

Optionally, the method as described above, the apparatus further comprises: the optimizing module 410 is configured to receive the audit result sent by the user terminal, and if it is determined that the audit result is not passed, send the metadata to be classified to the user terminal, so that the user terminal obtains the metadata category according to the metadata to be classified; receiving the metadata category sent by the user terminal; obtaining the metadata of the triplet format according to the metadata category and the metadata of the triplet format; determining the metadata in the triplet format as newly added sample data; when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying the metadata to be classified.

Fig. 5 is a schematic hardware structure of a server according to an embodiment of the present application. As shown in fig. 5, the server of the present embodiment includes: at least one processor 501 and memory 502; the memory stores computer-executable instructions; at least one processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the metadata processing method as described above.

Alternatively, the memory 502 may be separate or integrated with the processor 501.

When the memory 502 is provided separately, the server further comprises a bus 503 for connecting said memory 502 and the processor 501.

The embodiment of the application also provides a computer readable storage medium, wherein computer execution instructions are stored in the computer readable storage medium, and when a processor executes the computer execution instructions, the metadata processing method is realized.

Embodiments of the present application also provide a computer program product, including a computer program stored in a computer storage medium, from which at least one processor can read the computer program, and the metadata processing method as above can be implemented when the at least one processor executes the computer program.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required in the present application.

It should be further noted that, although the steps in the flowchart are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in the flowcharts may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order in which the sub-steps or stages are performed is not necessarily sequential, and may be performed in turn or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

It should be understood that the above-described device embodiments are merely illustrative, and that the device of the present application may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.

In addition, each functional unit/module in each embodiment of the present application may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together, unless otherwise specified. The integrated units/modules described above may be implemented either in hardware or in software program modules.

The integrated units/modules, if implemented in hardware, may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The processor may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP and ASIC, etc., unless otherwise specified. Unless otherwise indicated, the storage elements may be any suitable magnetic or magneto-optical storage medium, such as resistive Random Access Memory RRAM (Resistive Random Access Memory), dynamic Random Access Memory DRAM (Dynamic Random Access Memory), static Random Access Memory SRAM (Static Random-Access Memory), enhanced dynamic Random Access Memory EDRAM (Enhanced Dynamic Random Access Memory), high-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cube HMC (Hybrid Memory Cube), etc.

The integrated units/modules may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A metadata processing method, applied to a server, comprising:

receiving a plurality of metadata sent by a data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information;

acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords;

obtaining a table keyword corresponding to each table name to obtain a plurality of table keywords;

setting a plurality of category rules according to the table keywords and the field keywords;

acquiring a category label of each field description information according to the category rules;

acquiring metadata in a triplet format according to the field description information, the field name corresponding to the field description information and the category label of the field description information, so as to acquire metadata in a plurality of triplet formats;

2. The method of claim 1, wherein the obtaining the field keyword corresponding to each field description information includes:

and acquiring the field keywords corresponding to the descriptive information of each field by adopting a word segmentation method and a keyword extraction method.

3. The method of claim 1, wherein each table name is associated with a chinese table name and an english table name;

correspondingly, the obtaining the table keyword corresponding to each table name includes:

splicing the Chinese table names and the English table names to obtain the table names;

and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.

4. The method of claim 1, wherein setting a plurality of category rules based on the plurality of table keywords and the plurality of field keywords comprises:

obtaining a plurality of field keywords in each table name according to each table keyword;

setting a category rule corresponding to each field keyword to obtain the category rules.

5. The method of any of claims 1-4, wherein the deriving a metadata classification model from the plurality of triplet-formatted metadata comprises:

determining the metadata in the multiple triplet formats as sample data; wherein the sample data comprises a training set, a validation set, and a test set;

constructing a deep learning network model, and inputting the training set into the deep learning network model for training to obtain an initial metadata classification model;

according to the verification set, parameter adjustment is carried out on the initial metadata classification model to obtain a trained metadata classification model;

and testing the trained metadata classification model according to the test set to determine the metadata classification model.

6. The method as recited in claim 1, further comprising:

acquiring any field name and field description information corresponding to the any field name;

acquiring metadata in a binary group format according to any field name and field description information corresponding to the any field name, and determining the metadata in the binary group format as the metadata to be classified;

and inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified.

7. The method of claim 6, further comprising, after inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified:

transmitting the metadata category to a user terminal so that the user terminal can audit the metadata category to obtain an audit result;

and receiving the auditing result sent by the user terminal, and if the auditing result is judged to pass the auditing, obtaining the metadata level corresponding to the metadata category according to the category and the level mapping table.

8. The method of claim 7, wherein the sending the metadata category to a user terminal, so that the user terminal performs an audit on the metadata category, and further comprises, after obtaining an audit result:

receiving the auditing result sent by the user terminal, and if the auditing result is judged to be that the auditing is not passed, sending the metadata to be classified to the user terminal so that the user terminal obtains the metadata category according to the metadata to be classified;

receiving the metadata category sent by the user terminal;

obtaining the metadata of the triplet format according to the metadata category and the metadata of the triplet format;

determining the metadata in the triplet format as newly added sample data;

when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying the metadata to be classified.

9. A metadata processing apparatus, applied to a server, comprising:

the receiving module is used for receiving a plurality of metadata sent by the data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information;

the first acquisition module is used for acquiring field keywords corresponding to each field description information so as to obtain a plurality of field keywords;

the second acquisition module is used for acquiring the table keywords corresponding to each table name so as to obtain a plurality of table keywords;

the setting module is used for setting a plurality of category rules according to the table keywords and the field keywords;

the third acquisition module is used for acquiring the category label of each field description information according to the plurality of category rules;

a fourth obtaining module, configured to obtain metadata in a triplet format according to the each field description information, the field name corresponding to the each field description information, and the class label of the each field description information, so as to obtain metadata in a plurality of triples formats;

10. A server, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the metadata processing method of any one of claims 1-8.

11. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the metadata processing method of any of claims 1-8.