Nothing Special   »   [go: up one dir, main page]

CN117313683A - Metadata processing method, device, server and storage medium - Google Patents

Metadata processing method, device, server and storage medium Download PDF

Info

Publication number
CN117313683A
CN117313683A CN202311524393.1A CN202311524393A CN117313683A CN 117313683 A CN117313683 A CN 117313683A CN 202311524393 A CN202311524393 A CN 202311524393A CN 117313683 A CN117313683 A CN 117313683A
Authority
CN
China
Prior art keywords
metadata
field
category
description information
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311524393.1A
Other languages
Chinese (zh)
Inventor
李晓娟
贾玉武
周莉
秦宏伟
桑海岩
李大中
宋雨伦
倪明鉴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, Unicom Digital Technology Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202311524393.1A priority Critical patent/CN117313683A/en
Publication of CN117313683A publication Critical patent/CN117313683A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a metadata processing method, a metadata processing device, a server and a storage medium. The method comprises the following steps: receiving a plurality of metadata sent by a data terminal, wherein the plurality of metadata comprises a plurality of table names, and each table name comprises a plurality of field names and a plurality of field description information; acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords; obtaining a table keyword corresponding to each table name to obtain a plurality of table keywords; setting a plurality of category rules according to the table keywords and the field keywords; acquiring a category label of each field description information according to a plurality of category rules; acquiring metadata in a triplet format according to each field description information, a field name corresponding to each field description information and a category label of each field description information; and obtaining a metadata classification model according to the metadata in the multiple triplet formats. The method improves the efficiency of metadata classification and classification.

Description

Metadata processing method, device, server and storage medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a metadata processing method, a metadata processing device, a server, and a storage medium.
Background
With the development of digital and informative construction, enterprises are generating massive amounts of data every day. However, the vast amount of data and unstructured data types further complicate data management complexity, increasing the difficulty of metadata identification and classification.
Currently, in the prior art, classification and grading of metadata are mainly achieved by exporting metadata to a file in a proper format and then processing the exported data by using a corresponding data processing tool or script.
However, metadata generally has the problem of irregular and non-uniform naming, so that a great deal of manpower is required to be input into the method, manpower resources are wasted, time cost is increased, and the efficiency of metadata classification and classification is reduced.
Disclosure of Invention
The application provides a metadata processing method, a metadata processing device, a server and a storage medium, which are used for solving the technical problem of low metadata classification and classification efficiency.
In a first aspect, the present application provides a metadata processing method, including:
and receiving a plurality of metadata sent by the data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.
And acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords.
And obtaining the table keywords corresponding to each table name to obtain a plurality of table keywords.
And setting a plurality of category rules according to the table keywords and the field keywords.
And obtaining the category label of each field description information according to the category rules.
And acquiring the metadata in the triplet format according to the field description information, the field name corresponding to the field description information and the category label of the field description information, so as to acquire the metadata in the triplet format.
And obtaining a metadata classification model according to the metadata in the multiple triplet formats, wherein the metadata classification model is used for classifying metadata to be classified.
Optionally, in the method as described above, the obtaining a field keyword corresponding to each field description information includes: and acquiring the field keywords corresponding to the descriptive information of each field by adopting a word segmentation method and a keyword extraction method.
Optionally, the method described above, wherein each table name is associated with a chinese table name and an english table name; correspondingly, the obtaining the table keyword corresponding to each table name includes: splicing the Chinese table names and the English table names to obtain the table names; and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.
Optionally, the method as described above, wherein the setting a plurality of category rules according to the plurality of table keywords and the plurality of field keywords includes: obtaining a plurality of field keywords in each table name according to each table keyword; setting a category rule corresponding to each field keyword to obtain the category rules.
Optionally, the method as described above, the obtaining a metadata classification model according to the metadata in the multiple triples format includes: determining the metadata in the multiple triplet formats as sample data; wherein the sample data comprises a training set, a validation set, and a test set; constructing a deep learning network model, and inputting the training set into the deep learning network model for training to obtain an initial metadata classification model; according to the verification set, parameter adjustment is carried out on the initial metadata classification model to obtain a trained metadata classification model; and testing the trained metadata classification model according to the test set to determine the metadata classification model.
Optionally, the method as described above further comprises: acquiring any field name and field description information corresponding to the any field name; acquiring metadata in a binary group format according to any field name and field description information corresponding to the any field name, and determining the metadata in the binary group format as the metadata to be classified; and inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified.
Optionally, after the inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified, the method further includes: transmitting the metadata category to a user terminal so that the user terminal can audit the metadata category to obtain an audit result; and receiving the auditing result sent by the user terminal, and if the auditing result is judged to pass the auditing, obtaining the metadata level corresponding to the metadata category according to the category and the level mapping table.
Optionally, in the method as described above, the sending the metadata category to a user terminal, so that the user terminal audits the metadata category, and after obtaining an audit result, further includes: receiving the auditing result sent by the user terminal, and if the auditing result is judged to be that the auditing is not passed, sending the metadata to be classified to the user terminal so that the user terminal obtains the metadata category according to the metadata to be classified; receiving the metadata category sent by the user terminal; obtaining the metadata of the triplet format according to the metadata category and the metadata of the triplet format; determining the metadata in the triplet format as newly added sample data; when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying the metadata to be classified.
In a second aspect, the present application provides a metadata processing apparatus, including:
and the receiving module is used for receiving a plurality of metadata sent by the data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.
The first acquisition module is used for acquiring the field keywords corresponding to each field description information so as to obtain a plurality of field keywords.
And the second acquisition module is used for acquiring the table keywords corresponding to each table name so as to obtain a plurality of table keywords.
And the setting module is used for setting a plurality of category rules according to the table keywords and the field keywords.
And the third acquisition module is used for acquiring the category label of each field description information according to the plurality of category rules.
And a fourth obtaining module, configured to obtain metadata in a triplet format according to the each field description information, the field name corresponding to the each field description information, and the class label of the each field description information, so as to obtain metadata in a plurality of triples.
And a fifth obtaining module, configured to obtain a metadata classification model according to the metadata in the multiple triplet formats, where the metadata classification model is used to classify metadata to be classified.
In a third aspect, the present application provides a server comprising:
at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the metadata processing method as described above in the first aspect and the various possible designs of the first aspect.
In a fourth aspect, the present application provides a computer storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the metadata processing method according to the first aspect and the various possible designs of the first aspect.
According to the metadata processing method, the metadata processing device, the server and the storage medium, a plurality of category rules are set by acquiring the table keywords corresponding to the table names and the keywords corresponding to the field description information in each metadata, and the tag category of each field description information is acquired according to the category rules; and obtaining metadata in a plurality of triplet formats according to each field name, each field description information and the category label of each field description information, and constructing and training a metadata classification model. The problems of large input of manpower, waste of manpower resources and time and cost increase caused by irregular and non-uniform naming of metadata are avoided, and the metadata classification efficiency is improved, so that the metadata classification efficiency is also improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of a metadata processing system according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a metadata processing method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a metadata processing method according to another embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a metadata processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic hardware structure of a server according to an embodiment of the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
With the development of digital and informative construction, enterprises are generating massive amounts of data every day. However, the vast amount of data and unstructured data types further complicate data management complexity, increasing the difficulty of metadata identification and classification. Currently, in the prior art, classification and grading of metadata are mainly achieved by exporting metadata to a file in a proper format and then processing the exported data by using a corresponding data processing tool or script. However, metadata generally has the problem of irregular and non-uniform naming, so that a great deal of manpower is required to be input into the method, manpower resources are wasted, time cost is increased, and the efficiency of metadata classification and classification is reduced.
In order to solve the above technical problems, the embodiments of the present application provide the following technical ideas: considering that metadata generally has the problems of irregular naming and non-uniformity, a great deal of manpower is required to be input, and manpower resources are wasted, so that time cost is increased, and the efficiency of metadata classification and classification is reduced. The inventor thinks that according to the table keywords corresponding to the table names and the keywords corresponding to the field description information in each metadata, category rules are set, and category labels of each field description information are obtained according to the category rules; and obtaining metadata in a plurality of triplet formats according to each field name, each field description information and the category label of each field description information, and constructing and training a metadata classification model. The problems of human resources waste and time and cost increase caused by irregular and non-uniform naming of metadata are avoided, and the efficiency of metadata classification and classification is improved.
The metadata processing method aims at solving the technical problems in the prior art.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic application scenario diagram of metadata processing provided in an embodiment of the present application. As shown in fig. 1, the application scenario includes: a data terminal 101, a server 102, and a display terminal 103.
The data terminal 101 may be a computer or a terminal device.
The server 102 may be one server or a cluster formed by a plurality of servers.
A platform device 103, a display terminal, etc.
Referring to fig. 1, a data terminal 101 transmits a plurality of metadata to a server 102; the server 102 performs a series of processes on the plurality of metadata to construct a metadata classification model; the metadata classification model is used for classifying metadata to be classified, and sending metadata categories to the display terminal 103 for display.
Fig. 2 is a flow chart of a metadata processing method according to an embodiment of the present application, and the execution subject of the embodiment may be the server 102 in the embodiment of fig. 1, or may be another server with similar functions, which is not particularly limited herein. As shown in fig. 2, the method includes:
s201: and receiving a plurality of metadata sent by the data terminal, wherein the plurality of metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.
Wherein the metadata includes: database name, database description, table name, table description, field name, field description, field type, field maximum length, and the like.
In this embodiment, after metadata is obtained, some data cleansing operations are required, including but not limited to: duplicate data removal, processing missing values, data type conversion and normalization, and the like.
S202: and acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords.
Specifically, a word segmentation method and a keyword extraction method are adopted to obtain field keywords corresponding to each field description information.
In this embodiment, a word segmentation technique in natural language processing is adopted to break field description information into individual words, and a keyword extraction technique is adopted to extract keywords from the field description information on the basis of word segmentation.
Alternatively, the word segmentation technique may be HanLP word segmentation or space word segmentation. The keyword extraction technology can be TF-IDF or textRank.
S203: and obtaining the table keywords corresponding to each table name to obtain a plurality of table keywords.
Wherein each table name is associated with a Chinese table name and an English table name.
Wherein the table name is table description information.
Specifically, the Chinese table names and the English table names are spliced to obtain the table names; and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.
S204: a plurality of category rules are set based on the plurality of table keywords and the plurality of field keywords.
Specifically, according to each table keyword, obtaining a plurality of field keywords in each table name; setting a category rule corresponding to each field keyword to obtain a plurality of category rules.
Specifically, by matching each field keyword with metadata or an actual table structure in a database, a table name where each field description information is located is determined, so as to obtain field keywords corresponding to a plurality of field description information in each table name, which may involve keyword matching or pattern matching.
In this embodiment, a rule is formulated for each field keyword based on a plurality of field keywords. Alternatively, the rules may be content-based or schema-based.
S205: and obtaining the category label of each field descriptive information according to a plurality of category rules.
Specifically, each field description information and the field name are matched through a formulated rule, and if the matching is successful, a corresponding category label is added for the field description information and the field name.
S206: and acquiring the metadata in the triplet format according to each field description information, the field name corresponding to each field description information and the category label of each field description information so as to acquire the metadata in the triplet format.
Illustratively, the metadata in the triplet format is < category tag, field name, field description >, where the field description may be null.
S207: and obtaining a metadata classification model according to the metadata in the multiple triplet formats, wherein the metadata classification model is used for classifying the metadata to be classified.
Specifically, step S207 includes S2071 to S2074:
s2071: determining metadata in a plurality of triplet formats as sample data; wherein the sample data includes a training set, a validation set, and a test set.
In machine learning, data is generally divided into training sets, validation sets, and test sets. The training set is used for training the model, the verification set is used for adjusting the super parameters of the model, and the test set is used for evaluating the performance of the model.
Optionally, a certain proportion of metadata is used as a training set and a verification set, and the rest metadata is used as a test set. Illustratively, 60% of the sample data is selected as the training set, 20% of the sample data is selected as the validation set, and 20% of the sample data is selected as the test set.
S2072: and constructing a deep learning network model, inputting a training set into the deep learning network model for training, and obtaining an initial metadata classification model.
Specifically, a structure of a deep learning network is constructed. This may include selecting an appropriate convolution layer, pooling layer, full connection layer, etc., as well as parameter settings for each layer. And inputting the training set into a deep learning network model, and optimizing the model by solving the gradient of the loss function and updating the parameters to obtain an initial metadata classification model.
S2073: and according to the verification set, performing parameter adjustment on the initial metadata classification model to obtain a trained metadata classification model.
In particular, the performance of the model is evaluated using the validation set. If the performance of the model is not improved, the super-parameters may be adjusted in an attempt to improve the performance of the model.
S2074: and testing the trained metadata classification model according to the test set to determine the metadata classification model.
Specifically, the trained metadata classification model is tested with a test set, and the metadata classification model is evaluated and determined.
As can be seen from the above, in this embodiment, by acquiring a table keyword corresponding to a table name and a keyword corresponding to field description information in each metadata, setting a plurality of category rules, and acquiring a tag category of each field description information according to the category rules; and obtaining metadata in a plurality of triplet formats according to each field name, each field description information and the category label of each field description information, and constructing and training a metadata classification model. The problems of large input of manpower, waste of manpower resources and time and cost increase caused by irregular and non-uniform naming of metadata are avoided, and the metadata classification efficiency is improved, so that the metadata classification efficiency is also improved.
Fig. 3 is a schematic flow chart of a metadata processing method according to another embodiment of the present application, where the execution subject of the embodiment may be the server 102 in the embodiment of fig. 1, or may be another server with similar functions, and this embodiment is not particularly limited herein, and the present embodiment focuses on a process of classifying metadata to be classified by a metadata classification model and acquiring metadata levels. As shown in fig. 3, the method includes:
s301: and acquiring any field name and field description information corresponding to the any field name.
S302: and acquiring the metadata in the binary group format according to any field name and the field description information corresponding to any field name, and determining the metadata in the binary group format as metadata to be classified.
Illustratively, the metadata in the binary format is < field name, field description >.
S303: and inputting the metadata to be classified into a metadata classification model to output the metadata category of the metadata to be classified.
Specifically, the < field name, field description > is input to the metadata classification model for prediction, and the category corresponding to the metadata is obtained.
S304: and sending the metadata category to the user terminal so that the user terminal can audit the metadata category to obtain an audit result.
Wherein each user terminal is associated with an auditor.
Specifically, the auditor audits whether the metadata category is correct.
S305: and receiving an audit result sent by the user terminal, and if the audit result is judged to pass the audit, obtaining a metadata level corresponding to the metadata category according to the category and the level mapping table.
Wherein, the category and level mapping table refers to a table or mapping relation that maps metadata categories to corresponding levels.
S306: and receiving an audit result sent by the user terminal, and if the audit result is judged to be not passed, sending the metadata to be classified to the user terminal so that the user terminal obtains the metadata category according to the metadata to be classified.
Specifically, if the audit is not passed, the auditor is required to give the true category of the metadata.
S307: and receiving the metadata category sent by the user terminal.
S308: and obtaining the metadata in the triplet format according to the metadata category and the metadata in the triplet format.
And obtaining the metadata in the < category label, the field name and the field description > triplet format according to the metadata category and the < field name.
S309: and determining the metadata in the triplet format as newly added sample data.
S310: when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying metadata to be classified.
Specifically, if the newly added sample data exceeds a preset threshold value or the iteration time interval exceeds a preset time, merging the newly added sample data into the sample data, and optimizing the metadata classification model to obtain an optimized metadata classification model.
As can be seen from the above, in this embodiment, if the auditing result of the metadata is that the auditing is not passed, the metadata category is obtained; and converting the metadata in the binary format which is not passed through the verification into the metadata in the ternary format, determining the metadata as newly added sample data, optimizing the metadata classification model, obtaining an optimized metadata classification model, and further improving the accuracy of the metadata classification model.
Fig. 4 is a schematic structural diagram of a metadata processing apparatus according to an embodiment of the present application. As shown in fig. 4, the metadata processing apparatus includes: a receiving module 401, a first acquiring module 402, a second acquiring module 403, a setting module 404, a third acquiring module 405, a fourth acquiring module 406, and a fifth acquiring module 407.
The receiving module 401 is configured to receive a plurality of metadata sent by a data terminal, where the plurality of metadata includes a plurality of table names, each table name includes a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information.
The first obtaining module 402 is configured to obtain field keywords corresponding to each field description information, so as to obtain a plurality of field keywords.
The second obtaining module 403 is configured to obtain a table keyword corresponding to each table name, so as to obtain a plurality of table keywords.
A setting module 404, configured to set a plurality of category rules according to the table keywords and the field keywords.
And a third obtaining module 405, configured to obtain a category label of the descriptive information of each field according to the plurality of category rules.
And a fourth obtaining module 406, configured to obtain metadata in a triplet format according to the each field description information, the field name corresponding to the each field description information, and the class label of the each field description information, so as to obtain metadata in a plurality of triples.
And a fifth obtaining module 407, configured to obtain a metadata classification model according to the metadata in the multiple triples, where the metadata classification model is used to classify metadata to be classified.
Optionally, the first obtaining module 402 is specifically configured to: and acquiring the field keywords corresponding to the descriptive information of each field by adopting a word segmentation method and a keyword extraction method.
Optionally, the method described above, wherein each table name is associated with a chinese table name and an english table name; accordingly, the second obtaining module 403 is specifically configured to: splicing the Chinese table names and the English table names to obtain the table names; and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.
Optionally, the method as described above, the setting module is specifically configured to: obtaining a plurality of field keywords in each table name according to each table keyword; setting a category rule corresponding to each field keyword to obtain the category rules.
Optionally, the fifth obtaining module 407 is specifically configured to: determining the metadata in the multiple triplet formats as sample data; wherein the sample data comprises a training set, a validation set, and a test set; constructing a deep learning network model, and inputting the training set into the deep learning network model for training to obtain an initial metadata classification model; according to the verification set, parameter adjustment is carried out on the initial metadata classification model to obtain a trained metadata classification model; and testing the trained metadata classification model according to the test set to determine the metadata classification model.
Optionally, the method as described above, the apparatus further comprises: an input module 408, configured to obtain any field name and field description information corresponding to the any field name; acquiring metadata in a binary group format according to any field name and field description information corresponding to the any field name, and determining the metadata in the binary group format as the metadata to be classified; and inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified.
Optionally, the method as described above, the apparatus further comprises: the sending module 409 is configured to send the metadata category to a user terminal, so that the user terminal performs an audit on the metadata category to obtain an audit result; and receiving the auditing result sent by the user terminal, and if the auditing result is judged to pass the auditing, obtaining the metadata level corresponding to the metadata category according to the category and the level mapping table.
Optionally, the method as described above, the apparatus further comprises: the optimizing module 410 is configured to receive the audit result sent by the user terminal, and if it is determined that the audit result is not passed, send the metadata to be classified to the user terminal, so that the user terminal obtains the metadata category according to the metadata to be classified; receiving the metadata category sent by the user terminal; obtaining the metadata of the triplet format according to the metadata category and the metadata of the triplet format; determining the metadata in the triplet format as newly added sample data; when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying the metadata to be classified.
Fig. 5 is a schematic hardware structure of a server according to an embodiment of the present application. As shown in fig. 5, the server of the present embodiment includes: at least one processor 501 and memory 502; the memory stores computer-executable instructions; at least one processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the metadata processing method as described above.
Alternatively, the memory 502 may be separate or integrated with the processor 501.
When the memory 502 is provided separately, the server further comprises a bus 503 for connecting said memory 502 and the processor 501.
The embodiment of the application also provides a computer readable storage medium, wherein computer execution instructions are stored in the computer readable storage medium, and when a processor executes the computer execution instructions, the metadata processing method is realized.
Embodiments of the present application also provide a computer program product, including a computer program stored in a computer storage medium, from which at least one processor can read the computer program, and the metadata processing method as above can be implemented when the at least one processor executes the computer program.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required in the present application.
It should be further noted that, although the steps in the flowchart are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in the flowcharts may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order in which the sub-steps or stages are performed is not necessarily sequential, and may be performed in turn or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
It should be understood that the above-described device embodiments are merely illustrative, and that the device of the present application may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.
In addition, each functional unit/module in each embodiment of the present application may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together, unless otherwise specified. The integrated units/modules described above may be implemented either in hardware or in software program modules.
The integrated units/modules, if implemented in hardware, may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The processor may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP and ASIC, etc., unless otherwise specified. Unless otherwise indicated, the storage elements may be any suitable magnetic or magneto-optical storage medium, such as resistive Random Access Memory RRAM (Resistive Random Access Memory), dynamic Random Access Memory DRAM (Dynamic Random Access Memory), static Random Access Memory SRAM (Static Random-Access Memory), enhanced dynamic Random Access Memory EDRAM (Enhanced Dynamic Random Access Memory), high-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cube HMC (Hybrid Memory Cube), etc.
The integrated units/modules may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (11)

1. A metadata processing method, applied to a server, comprising:
receiving a plurality of metadata sent by a data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information;
acquiring field keywords corresponding to each field description information to obtain a plurality of field keywords;
obtaining a table keyword corresponding to each table name to obtain a plurality of table keywords;
setting a plurality of category rules according to the table keywords and the field keywords;
acquiring a category label of each field description information according to the category rules;
acquiring metadata in a triplet format according to the field description information, the field name corresponding to the field description information and the category label of the field description information, so as to acquire metadata in a plurality of triplet formats;
and obtaining a metadata classification model according to the metadata in the multiple triplet formats, wherein the metadata classification model is used for classifying metadata to be classified.
2. The method of claim 1, wherein the obtaining the field keyword corresponding to each field description information includes:
and acquiring the field keywords corresponding to the descriptive information of each field by adopting a word segmentation method and a keyword extraction method.
3. The method of claim 1, wherein each table name is associated with a chinese table name and an english table name;
correspondingly, the obtaining the table keyword corresponding to each table name includes:
splicing the Chinese table names and the English table names to obtain the table names;
and obtaining the list keywords corresponding to each list name by adopting a word segmentation method and a keyword extraction method.
4. The method of claim 1, wherein setting a plurality of category rules based on the plurality of table keywords and the plurality of field keywords comprises:
obtaining a plurality of field keywords in each table name according to each table keyword;
setting a category rule corresponding to each field keyword to obtain the category rules.
5. The method of any of claims 1-4, wherein the deriving a metadata classification model from the plurality of triplet-formatted metadata comprises:
determining the metadata in the multiple triplet formats as sample data; wherein the sample data comprises a training set, a validation set, and a test set;
constructing a deep learning network model, and inputting the training set into the deep learning network model for training to obtain an initial metadata classification model;
according to the verification set, parameter adjustment is carried out on the initial metadata classification model to obtain a trained metadata classification model;
and testing the trained metadata classification model according to the test set to determine the metadata classification model.
6. The method as recited in claim 1, further comprising:
acquiring any field name and field description information corresponding to the any field name;
acquiring metadata in a binary group format according to any field name and field description information corresponding to the any field name, and determining the metadata in the binary group format as the metadata to be classified;
and inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified.
7. The method of claim 6, further comprising, after inputting the metadata to be classified into the metadata classification model to output the metadata category of the metadata to be classified:
transmitting the metadata category to a user terminal so that the user terminal can audit the metadata category to obtain an audit result;
and receiving the auditing result sent by the user terminal, and if the auditing result is judged to pass the auditing, obtaining the metadata level corresponding to the metadata category according to the category and the level mapping table.
8. The method of claim 7, wherein the sending the metadata category to a user terminal, so that the user terminal performs an audit on the metadata category, and further comprises, after obtaining an audit result:
receiving the auditing result sent by the user terminal, and if the auditing result is judged to be that the auditing is not passed, sending the metadata to be classified to the user terminal so that the user terminal obtains the metadata category according to the metadata to be classified;
receiving the metadata category sent by the user terminal;
obtaining the metadata of the triplet format according to the metadata category and the metadata of the triplet format;
determining the metadata in the triplet format as newly added sample data;
when the newly added sample data meets preset conditions, optimizing the metadata classification model according to the newly added sample data to obtain an optimized metadata classification model; the optimized metadata classification model is used for classifying the metadata to be classified.
9. A metadata processing apparatus, applied to a server, comprising:
the receiving module is used for receiving a plurality of metadata sent by the data terminal, wherein the metadata comprises a plurality of table names, each table name comprises a plurality of field names and a plurality of field description information, and each field name corresponds to one field description information;
the first acquisition module is used for acquiring field keywords corresponding to each field description information so as to obtain a plurality of field keywords;
the second acquisition module is used for acquiring the table keywords corresponding to each table name so as to obtain a plurality of table keywords;
the setting module is used for setting a plurality of category rules according to the table keywords and the field keywords;
the third acquisition module is used for acquiring the category label of each field description information according to the plurality of category rules;
a fourth obtaining module, configured to obtain metadata in a triplet format according to the each field description information, the field name corresponding to the each field description information, and the class label of the each field description information, so as to obtain metadata in a plurality of triples formats;
and a fifth obtaining module, configured to obtain a metadata classification model according to the metadata in the multiple triplet formats, where the metadata classification model is used to classify metadata to be classified.
10. A server, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the metadata processing method of any one of claims 1-8.
11. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the metadata processing method of any of claims 1-8.
CN202311524393.1A 2023-11-15 2023-11-15 Metadata processing method, device, server and storage medium Pending CN117313683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311524393.1A CN117313683A (en) 2023-11-15 2023-11-15 Metadata processing method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311524393.1A CN117313683A (en) 2023-11-15 2023-11-15 Metadata processing method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN117313683A true CN117313683A (en) 2023-12-29

Family

ID=89242965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311524393.1A Pending CN117313683A (en) 2023-11-15 2023-11-15 Metadata processing method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN117313683A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556050A (en) * 2024-01-12 2024-02-13 长春吉大正元信息技术股份有限公司 Data classification and classification method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556050A (en) * 2024-01-12 2024-02-13 长春吉大正元信息技术股份有限公司 Data classification and classification method and device, electronic equipment and storage medium
CN117556050B (en) * 2024-01-12 2024-04-12 长春吉大正元信息技术股份有限公司 Data classification and classification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
KR102576320B1 (en) Apparatus for amplifying training dataset for deep learning based generative ai system and method thereof
CN111539612B (en) Training method and system of risk classification model
CN104063314A (en) Test data automatic generation device and test data automatic generation method
WO2022154897A1 (en) Classifier assistance using domain-trained embedding
CN103885966A (en) Question and answer interaction method and system of electronic commerce transaction platform
CN117556369B (en) Power theft detection method and system for dynamically generated residual error graph convolution neural network
JP2020113044A (en) Data expansion program, data expansion method, and data expansion device
CN117313683A (en) Metadata processing method, device, server and storage medium
CN114329455B (en) User abnormal behavior detection method and device based on heterogeneous graph embedding
CN115170874A (en) Self-distillation implementation method based on decoupling distillation loss
CN110472659A (en) Data processing method, device, computer readable storage medium and computer equipment
CN112015895B (en) Patent text classification method and device
CN110929085B (en) System and method for processing electric customer service message generation model sample based on meta-semantic decomposition
CN116860583A (en) Database performance optimization method and device, storage medium and electronic equipment
CN118331890B (en) Data batch generation method for defining large language model based on token training
CN116187299B (en) Scientific and technological project text data verification and evaluation method, system and medium
CN117235366B (en) Collaborative recommendation method and system based on content relevance
CN113987309B (en) Personal privacy data identification method and device, computer equipment and storage medium
US11398161B1 (en) Systems and methods for detecting unusually frequent exactly matching and nearly matching test responses
CN117520548A (en) Metadata processing method, device, server and storage medium
CN117235521A (en) Processing method and electronic equipment
KR20230172283A (en) Device and Method for Generating Training Data of Language Model
KR20230146399A (en) Apparatus for conversation clustering and control method thereof
White et al. Investigation into the application of data mining techniques to classification of call centre data
CN118261567A (en) Report generation and risk early warning method, device and equipment based on large language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination