Nothing Special   »   [go: up one dir, main page]

CN113298145A - Label filling method and device - Google Patents

Label filling method and device Download PDF

Info

Publication number
CN113298145A
CN113298145A CN202110567882.XA CN202110567882A CN113298145A CN 113298145 A CN113298145 A CN 113298145A CN 202110567882 A CN202110567882 A CN 202110567882A CN 113298145 A CN113298145 A CN 113298145A
Authority
CN
China
Prior art keywords
label
labels
original
determining
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110567882.XA
Other languages
Chinese (zh)
Inventor
陈靖
刘伟煜
刘义
王磊
吴波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Postal Savings Bank of China Ltd
Original Assignee
Postal Savings Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postal Savings Bank of China Ltd filed Critical Postal Savings Bank of China Ltd
Priority to CN202110567882.XA priority Critical patent/CN113298145A/en
Publication of CN113298145A publication Critical patent/CN113298145A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a label filling method and a label filling device. Wherein, the method comprises the following steps: acquiring an original label of a target object, wherein the original label is a current label of a user portrait of the target object; determining at least one label set based on the original labels; a target label is determined based on the at least one set of labels, wherein the target label is used to populate a user representation of the target object. The method and the device solve the technical problem that the accuracy rate of the label for filling the user portrait acquired in the related technology is low.

Description

Label filling method and device
Technical Field
The invention relates to the field of label filling, in particular to a label filling method and a label filling device.
Background
At present, due to the problem of a data source, a plurality of missing tags exist in an existing user tag system, namely, from the perspective of a user, the tags of one user are incomplete. The incompleteness of the user label not only causes a fragment type user portrait to be obtained, but also influences the accuracy of subsequent data analysis and modeling results based on the user portrait, wherein the user portrait refers to a process that in a big data era, enterprises abstract data into labels by cleaning, clustering and analyzing mass data information, and then the labels are utilized to embody the user image.
At present, user portrayal is generally perfected by automatically filling labels, specifically, a similar user group is divided according to the label condition of a user, and labels of the user are filled according to labels of other users in the similar user group.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a label filling method and a label filling device, which are used for at least solving the technical problem of low accuracy of a label for filling a user portrait acquired in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a label filling method, including: acquiring an original label of a target object, wherein the original label is a current label of a user portrait of the target object; determining at least one label set based on the original labels; a target label is determined based on the at least one set of labels, wherein the target label is used to populate a user representation of the target object.
Optionally, the at least one labelset comprises: a first labelset, based on an original label, determining at least one labelset, comprising: classifying the original labels based on a first preset model to obtain a classification result, wherein the first preset model is obtained based on first sample training; based on the classification result, a first label set is determined, wherein the first label set is used for characterizing the same set of labels as the original label category.
Optionally, the at least one labelset comprises: a second set of tags, the method further comprising: performing semantic analysis on the original label based on a second preset model to obtain a first analysis result, wherein the second preset model is obtained based on second sample training; and determining a second label set based on the first analysis result, wherein the second label set is used for representing a set of labels with cosine similarity greater than the preset similarity with the original label.
Optionally, the at least one labelset comprises: a third set of tags, the method further comprising: performing correlation analysis on the original label based on a third preset model to obtain a second analysis result, wherein the third preset model is obtained based on third sample training; and determining a third label set based on the second analysis result, wherein the third label set is used for representing the set of the labels with the relevance degree greater than the preset relevance degree with the original label.
Optionally, determining the target tag based on at least one tag set includes: and performing de-duplication processing on the labels in the first label set, the second label set and the third label set to obtain the target label.
Optionally, the method further comprises: determining a target matrix from a user tag library, wherein the target matrix is used for describing the corresponding relationship between users and tags; determining a label vector corresponding to each label by using the target matrix; clustering the label vectors to obtain a first sample, wherein the first sample is used for describing a category corresponding to each label; and training the first initial model by using the first sample to obtain a first preset model.
Optionally, the method further comprises: determining a word vector of each label from a user label library; determining cosine similarity between each label and other labels based on the word vector of each label and the word vectors of other labels in the user label library; determining a second sample based on cosine similarity between each label and other labels, wherein the second sample is used for describing each label and other labels with the cosine similarity larger than the preset similarity; and training a second initial model by using a second sample to obtain a second preset model.
Optionally, the method further comprises: determining the number of labels of each label from a user label library; determining the association degree between each label and other labels based on the number of labels of each label and the number of labels of other labels; determining a third sample based on the association degree between each label and other labels, wherein the third sample is used for describing each label and other labels with the association degree larger than the preset association degree; and training a third initial model by using a third sample to obtain a third preset model.
According to an aspect of an embodiment of the present invention, there is also provided a label filling apparatus including: the acquisition module is used for acquiring an original label of the target object, wherein the original label is a current label of the user portrait of the target object; a first determining module for determining at least one label set based on an original label; a second determination module that determines a target label based on the at least one set of labels, wherein the target label is used to populate a user representation of the target object.
According to another aspect of the embodiments of the present invention, there is also provided a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to execute the above-mentioned label filling method.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above described label filling method.
In the embodiment of the invention, an original label of a target object is obtained firstly, wherein the original label is a current label of a user portrait of the target object, then at least one label set is determined based on the original label, and a target label is determined based on the at least one label set, wherein the target label is used for filling the user portrait of the target object, so that the target label with higher accuracy is obtained through the original label, and since the label is generated and is rarely changed, the association relationship between the label and the label is generally stable and cannot be changed greatly, compared with the prior art that the label is determined through an unstable relationship between the user and is lower in label accuracy, the accuracy of the determined target label can be improved through the stable relationship between the label and the label, and the accuracy of the determined target label is lower due to the unstable relationship is avoided, and the technical problem that the accuracy rate of the label for filling the user portrait acquired in the related technology is low is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a label population method according to an embodiment of the present invention;
FIG. 2 is a flow diagram of another method of label population according to an embodiment of the present invention;
fig. 3 is a schematic view of a label filling apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
In accordance with an embodiment of the present invention, there is provided a label filling method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a label filling method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, obtaining an original label of the target object.
Wherein, the original label is the current label of the target object user portrait.
The target object is a user to be filled with the label; the original label is a label currently displayed by the target user portrait.
The user representation may be a tagged user model abstracted from information such as gender, age, occupation, name, user preferences, lifestyle, and user behavior. Determining the user representation is in fact tagging the user, which is a highly refined signature analyzed for the user's attribute information.
In an alternative embodiment, the tags of the target object may be populated according to the scene requirements and the number of original tags of the target object.
For example, a financial product needs to be recommended to a user by combining 10 tags of the user, but in the case that only 8 tags and 2 tags in the user representation are missing, in this case, 2 tags that need to be filled in need to be determined by the 8 tags.
Step S104, based on the original label, at least one label set is determined.
The above-mentioned label set may be a set of labels of the same category as the original labels, a set of labels having the same semantic meaning as the original labels, or a set of labels having strong correlation with the original labels.
In an alternative embodiment, the original labels may be input into a pre-trained model to output at least one label set.
In another optional embodiment, a model may be trained through a user in a user tag library and a tag corresponding to the user, a category to which each tag sample belongs may be obtained by obtaining the tag samples and performing cluster analysis on the tag samples, the model may be trained based on the tag samples and the categories corresponding to the tag samples, the model may be enabled to determine the category of an original tag, and other tags that are the same as the category of the original tag may be determined according to the category of the original tag.
The class required to be divided by the clustering is unknown, and the clustering is a process of classifying data into different classes or clusters, so that objects in the same cluster have great similarity, and objects in different clusters have great dissimilarity. For example, clustering analysis is used for people clustering in the people recommendation, people belonging to the same category are considered to have similar attributes or behaviors in the clustering result, and mutual recommendation can be performed.
Further, correlation analysis can be performed on the label samples, correlation relations among all the label samples are determined, the model is trained based on the label samples and the correlation relations among the label samples, and the model can determine other labels having correlation with the original labels.
The association analysis described above, also known as association rule mining, can discover relevant associations between sets of items from a large amount of data. A typical example of association analysis is shopping basket analysis, which analyzes purchasing habits of shopping by discovering the associations between different items that a customer places in their shopping basket, and which may help a retailer specify marketing strategies by knowing which items are frequently purchased simultaneously by the customer. Other applications include price list design, merchandise promotion, merchandise discharge, and customer segmentation based on purchasing patterns.
Furthermore, semantic analysis can be performed on the label samples to determine word vectors of the label samples, the model can be trained based on the word vectors of the label samples, the model can analyze the semantics of the original labels, and therefore other labels similar to the semantics of the original labels are determined according to the analysis result.
The semantic analysis may be implemented by word to vector (word to vector), and the word2vec model may be used to map each word to a vector, which may be used to represent the word-to-word relationship, and the vector is a hidden layer of the neural network. Converting the label sample into a word vector through word2vec can facilitate matching between the label sample and other samples so as to determine other label samples similar to the label sample.
And step S106, determining a target label based on at least one label set.
Wherein the target tag is used to populate a user representation of the target object.
In an alternative embodiment, the target tag may be determined from at least one tag set according to the number of missing tags in the user representation of the target object, for example, 10 tags are required to be recorded in the user representation, but only 8 tags are required in the current user representation, and in this case, two tags may be randomly selected from the at least one tag set as the target tags, and the user representation of the target object may be populated based on the two tags.
In another alternative embodiment, deduplication processing may be performed on all the obtained tags in the tag set, that is, duplicate tags are deleted, and the remaining tags are determined as target tags.
Further, the target label can be determined from the label set obtained after the de-duplication processing according to the number of missing labels of the user portrait of the target object.
Through the steps, the original label of the target object is obtained firstly, wherein the original label is the current label of the user portrait of the target object, then at least one label set is determined based on the original label, and the target label is determined based on the at least one label set, wherein the target label is used for filling the user portrait of the target object, so that the target label with higher accuracy is obtained through the original label, and the association relationship between the label and the label is generally stable and does not change greatly due to the fact that the label is rarely changed after the label is generated, compared with the prior art that the label is determined through the unstable relationship between the user and the label, so that the label accuracy is lower, the accuracy of the determined target label can be improved through the stable relationship between the label and the label, and the accuracy of the determined target label is lower due to the unstable relationship is avoided, and the technical problem that the accuracy rate of the label for filling the user portrait acquired in the related technology is low is solved.
Optionally, the at least one labelset comprises: a first labelset, based on an original label, determining at least one labelset, comprising: classifying the original labels based on a first preset model to obtain a classification result, wherein the first preset model is obtained based on first sample training; based on the classification result, a first label set is determined, wherein the first label set is used for characterizing the same set of labels as the original label category.
The first preset model described above may be a neural network model capable of determining the original label class.
In an alternative embodiment, the original tags may be classified through a first preset model to obtain categories of the original tags, so as to determine a first tag set of other tags similar to the categories of the original tags according to the categories of the original tags, and since the tags in the first tag set are the same as the categories of the original tags, the target tags determined based on the first tag set are also the same as the categories of the original tags, so that accuracy of tags used for filling the target object user portrait can be ensured.
In another alternative embodiment, the tag vector of each tag may be determined from the user tag library, and the tag vectors may be clustered to obtain a plurality of-ones, i.e., first samples, and the first preset model may be obtained through training of the first samples.
Optionally, the at least one labelset comprises: a second set of tags, the method further comprising: performing semantic analysis on the original label based on a second preset model to obtain a first analysis result, wherein the second preset model is obtained based on second sample training; and determining a second label set based on the first analysis result, wherein the second label set is used for representing a set of labels with cosine similarity greater than the preset similarity with the original label.
The second pre-set model described above may be a neural network model that is capable of determining the original tag semantics.
In an optional embodiment, semantic analysis may be performed on the original tag through a second preset model to obtain a word vector of the original tag, so that matching is performed on the word vector of the original tag according to the word vector of the original tag, and the higher the matching degree is, the higher the similarity between the original tag and other tags is, and thus a second tag set of other tags similar to the original tag in semantic can be obtained.
In another alternative embodiment, a word vector of each tag may be determined from the user tag library, that is, a second sample, and a second preset model may be obtained through training of the second sample.
Optionally, the at least one labelset comprises: a third set of tags, the method further comprising: performing correlation analysis on the original label based on a third preset model to obtain a second analysis result, wherein the third preset model is obtained based on third sample training; and determining a third label set based on the second analysis result, wherein the third label set is used for representing the set of the labels with the relevance degree greater than the preset relevance degree with the original label.
In an optional embodiment, the original tag may be subjected to association analysis through a third preset model, so as to obtain a support degree, a confidence degree, and a lift degree between the original tag and other tags, and the association degree between the original tag and other tags is determined according to the obtained support degree, confidence degree, and lift degree, so as to determine a third tag set of other tags that are highly associated with the original tag according to the association degree.
In another alternative embodiment, the support degree, the confidence degree, and the promotion degree between each tag and other tags may be determined from the user tag library, that is, a third sample, and a third preset model may be obtained through training of the third sample.
Optionally, determining the target tag based on at least one tag set includes: and performing de-duplication processing on the labels in the first label set, the second label set and the third label set to obtain the target label.
In an alternative embodiment, the first tab set, the second tab set, and the third tab set may be merged, repeated tabs in the merged tab set may be removed, and the remaining tabs may be used as target tabs to populate the user representation of the target object.
Illustratively, the merged tag set includes a platform active user, a high price sensitivity tag and a long triangle, and the tags in the merged tag set are used as candidate tags of tags to be filled to realize filling of missing tags, for example, a user a is labeled with three tags of the platform active user, the high price sensitivity tag and the long triangle, but a column of the tags with high activity participation degree is missing, and the user a can be labeled with high activity participation degree.
Optionally, the method further comprises: determining a target matrix from a user tag library, wherein the target matrix is used for describing the corresponding relationship between users and tags; determining a label vector corresponding to each label by using the target matrix; clustering the label vectors to obtain a first sample, wherein the first sample is used for describing a category corresponding to each label; and training the first initial model by using the first sample to obtain a first preset model.
In an alternative embodiment, part of the original data in the user tag library may be obtained to be processed into a user tag label matrix, where the rows of the matrix are tags and the columns are users. A value of 1 indicates the presence of the tag, and 0 indicates the absence; clustering analysis is carried out by taking the label vectors of the user label labeling matrix as features, and the labels are divided into a plurality of types of arrowheads, namely a first sample.
The user tag library may be an nxm user tag record table, where N represents the number of users and is in the order of millions, and M represents the number of tags and is about thousands of tags.
In an alternative embodiment, the target matrix may be determined from a library of user tags, as shown in Table 1.
TABLE 1
Figure BDA0003081432380000081
Each row in table 1 indicates a label owned by a certain user, a value of 1 indicates a label, and 0 indicates no label. For example, user 1 has the labels "student", "high price sensitive", "long triangle", "girl", "platform active user", "low income", without the "high activity participation" label.
In an alternative embodiment, a tag vector corresponding to each tag may be determined from the target matrix, where each tag in the matrix corresponds to a vector of N × 1, and the clustering analysis is performed by taking the tag vector as a feature, and the tags may be classified as a plurality of types of arrowts, such as those two tags having an activity-intensive participation and an price-intensive sensitivity after clustering, and when the original tag has an activity-intensive participation, the target tag for filling may be determined as an arrow-intensive sensitivity.
Optionally, the method further comprises: determining a word vector of each label from a user label library; determining cosine similarity between each label and other labels based on the word vector of each label and the word vectors of other labels in the user label library; determining a second sample based on cosine similarity between each label and other labels, wherein the second sample is used for describing each label and other labels with the cosine similarity larger than the preset similarity; and training a second initial model by using a second sample to obtain a second preset model.
In an alternative embodiment, the trained second predetermined model may be a word2vec model, an LDA model (late Dirichlet Allocation, three-layer bayesian probability model), and an LSA model (late Semantic Analysis, information retrieval model), which is not limited herein. Taking a word2vec model as an example, obtaining word2vec word vectors of all labels according to the model, and then determining cosine similarity among the labels; and selecting a preset number of labels with the maximum cosine similarity to the original label to be added into the second label set, wherein the preset number can be set according to the actual situation, and no limitation is made here.
For example, the tag with the highest cosine similarity to the high activity engagement tag is calculated to be the platform active user, and the tag of the platform active user may be added to the second tag set.
Optionally, the method further comprises: determining the number of labels of each label from a user label library; determining the association degree between each label and other labels based on the number of labels of each label and the number of labels of other labels; determining a third sample based on the association degree between each label and other labels, wherein the third sample is used for describing each label and other labels with the association degree larger than the preset association degree; and training a third initial model by using a third sample to obtain a third preset model.
The relevance can represent support degree, confidence degree and promotion degree between the two labels, the relevance can reflect the intrinsic relevance relationship between the two labels, and a third label set from the original label can be determined according to the relevance degree. Specifically, the support degree, the confidence degree and the promotion degree between two labels can be determined through an Apriori algorithm or an FP-growth algorithm.
Illustratively, the top label in table 2 is a region label of the user, the region label values include long triangle, bead triangle, middle province, and forming Yu, the bottom label is an activity participation label, and the activity participation label values include high activity participation, middle activity participation, and low activity participation. For example, the total number of users is 100 ten thousand, the number of customers whose region labels are long triangles is 12 ten thousand, the number of customers whose region labels are high activity participation is 21 ten thousand, and the number of customers whose region labels are long triangles and at the same time are high activity participation is 8 ten thousand. The support degree of { long triangle and high activity participation degree } obtained by the FP-growth calculation is 8/100-0.08. The confidence of the label association rule { long triangle } - > { high activity engagement } is 0.08/(12/100) } 0.67. The promotion degree of the label association rule { long triangle } - > { high activity participation } is 0.67/(21/100) } 3.2. In the same way, the support degree, the confidence degree and the promotion degree of the association rule formed by other region labels and activity participation degree labels in the table can be obtained:
front label Back label Degree of support Confidence level Degree of lifting
Long triangle High activity engagement 0.08 0.67 3.2
Bead triangle High activity engagement 0.05 0.84 2.1
Middle part of the design reside in High activity engagement 0.03 0.71 1.3
Long triangle Middle activity engagement 0.04 0.54 1.4
Cheng Yu Low activity engagement 0.04 0.50 1.7
Thresholds for support, confidence and boost may be set at 0.05, 0.6 and 2.5, respectively. In the table, association rules { long triangle } - > { high activity participation } and { bead triangle } - > { high activity participation } satisfy requirements for support and confidence, but only the promotion degree of { long triangle } - > { high activity participation } is greater than a threshold value, and the association belongs to effective strong association.
A preferred embodiment of the present invention will be described in detail with reference to fig. 2. As shown in fig. 2, the method may include the steps of:
step S201, acquiring an original label of a target object;
step S202, classifying the original labels based on a first preset model to obtain a classification result;
step S203, determining a first label set based on the classification result:
step S204, performing semantic analysis on the original label based on a second preset model to obtain a first analysis result;
step S205, determining a second label set based on the first analysis result;
step S206, performing correlation analysis on the original label based on a third preset model to obtain a second analysis result;
step S207, determining a third label set based on the second analysis result;
step S208, removing repeated labels in the first label set, the second label set and the third label set, and determining the rest labels as target labels.
The potential association relation among the user labels in the user portrait is extracted by introducing modes such as cluster analysis, association analysis and semantic analysis in the steps, the filling and expansion of the user labels are realized from multiple dimensions, more associated labels can be expanded based on one or more limited labels of the user for filling missing labels, a richer and complete user label system is provided for subsequent accurate marketing, the practicability is high, and the commercial value is high; in addition, the missing labels can be automatically filled in for the user under the condition that the user does not need to supplement the information, and compared with a manual filling mode, the method is lower in cost and higher in efficiency.
It should be noted that, in the existing automatic label filling technology, label filling is realized from the dimension of a user, a tree model is trained by using the labels possessed by the user as features, and automatic filling of missing labels of users of the same category is realized by using the tree model. The method is beneficial to users to comprehensively use the label filling method and the existing label filling method to fill the user portrait, thereby improving the comprehensiveness and accuracy of the user portrait.
Example 2
According to the embodiment of the present invention, a tag filling apparatus is further provided, which can execute the tag filling method in the foregoing embodiment, and the specific implementation manner and the preferred application scenario are the same as those in the foregoing embodiment, and are not described herein again.
Fig. 3 is a schematic view of a label filling apparatus according to an embodiment of the present invention, as shown in fig. 3, the apparatus including:
an obtaining module 32, configured to obtain an original tag of a target object, where the original tag is a current tag of a user portrait of the target object;
a first determining module 34 for determining at least one label set based on the original label;
a second determining module 36 for determining a target label based on the at least one set of labels, wherein the target label is for populating a user representation of the target object.
Optionally, the at least one labelset comprises: a first set of tags, the first determination module comprising: the classification unit is used for classifying the original labels based on a first preset model to obtain a classification result, wherein the first preset model is obtained based on first sample training; a first determining unit, configured to determine a first label set based on the classification result, where the first label set is used to characterize a set of labels that are the same as the original label category.
Optionally, the at least one labelset comprises: a second set of tags, the apparatus further comprising: the first analysis unit is used for performing semantic analysis on the original label based on a second preset model to obtain a first analysis result, wherein the second preset model is obtained based on second sample training; and the second determining unit is used for determining a second label set based on the first analysis result, wherein the second label set is used for representing a set of labels with cosine similarity greater than the preset similarity with the original label.
Optionally, the at least one labelset comprises: a third set of tags, the apparatus further comprising: the second analysis result is used for performing correlation analysis on the original label based on a third preset model to obtain a second analysis result, wherein the third preset model is obtained based on third sample training; and a third determining unit, configured to determine a third tag set based on the second analysis result, where the third tag set is used to characterize a set of tags whose association degree with the original tag is greater than the preset association degree.
Optionally, the second determining module includes: and the processing unit is used for carrying out de-duplication processing on the labels in the first label set, the second label set and the third label set to obtain the target label.
Optionally, the apparatus further comprises: the third determining module is used for determining a target matrix from the user tag library, wherein the target matrix is used for describing the corresponding relation between the user and the tag; the fourth determining module is used for determining a label vector corresponding to each label by using the target matrix; the clustering module is used for clustering the label vectors to obtain a first sample, wherein the first sample is used for describing the category corresponding to each label; and the first training module is used for training the first initial model by using the first sample to obtain a first preset model.
Optionally, the apparatus further comprises: a fifth determining module, configured to determine a word vector for each tag from the user tag library; the fifth determining module is further used for determining cosine similarity between each label and other labels based on the word vector of each label and the word vectors of other labels in the user label library; the fifth determining module is further configured to determine a second sample based on cosine similarity between each tag and other tags, where the second sample is used to describe each tag and other tags whose cosine similarity is greater than the preset similarity; and the second training module is used for training the second initial model by using the second sample to obtain a second preset model.
Optionally, the apparatus further comprises: a sixth determining module, configured to determine the number of labels of each label from the user label library; the sixth determining module is further configured to determine the association degree between each tag and other tags based on the number of labels of each tag and the number of labels of other tags; the sixth determining module is further configured to determine a third sample based on the association degree between each tag and other tags, where the third sample is used to describe each tag and other tags whose association degree is greater than the preset association degree; and the third training module is used for training a third initial model by using a third sample to obtain a third preset model.
Example 3
The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the method steps in the embodiments shown in fig. 1 to fig. 3, and a specific execution process may refer to specific descriptions of the embodiments shown in fig. 1 to fig. 3, which is not described herein again.
Example 4
According to an embodiment of the present invention, there is also provided an electronic device including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the label filling method in embodiment 1 above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (11)

1. A label filling method, comprising:
acquiring an original label of a target object, wherein the original label is a current label of a user portrait of the target object;
determining at least one set of tags based on the original tags;
based on the at least one set of tags, a target tag is determined, wherein the target tag is used to populate a user representation of the target object.
2. The method of claim 1, wherein the at least one labelset comprises: a first labelset, based on the original labels, determining at least one labelset, comprising:
classifying the original labels based on a first preset model to obtain a classification result, wherein the first preset model is obtained based on first sample training;
determining a first label set based on the classification result, wherein the first label set is used for characterizing the same set of labels as the original label category.
3. The method of claim 2, wherein the at least one labelset comprises: a second set of tags, the method further comprising:
performing semantic analysis on the original label based on a second preset model to obtain a first analysis result, wherein the second preset model is obtained based on second sample training;
and determining the second label set based on the first analysis result, wherein the second label set is used for representing a set of labels with cosine similarity greater than a preset similarity with the original label.
4. The method of claim 3, wherein the at least one labelset comprises: a third set of tags, the method further comprising:
performing correlation analysis on the original label based on a third preset model to obtain a second analysis result, wherein the third preset model is obtained based on third sample training;
and determining the third label set based on the second analysis result, wherein the third label set is used for characterizing the set of labels with the association degree with the original label being greater than the preset association degree.
5. The method of claim 4, wherein determining the target label based on the at least one labelset comprises:
and performing de-duplication processing on the labels in the first label set, the second label set and the third label set to obtain the target label.
6. The method of claim 5, further comprising:
determining a target matrix from a user tag library, wherein the target matrix is used for describing the corresponding relationship between users and tags;
determining a label vector corresponding to each label by using the target matrix;
clustering the label vectors to obtain the first sample, wherein the first sample is used for describing a category corresponding to each label;
and training a first initial model by using the first sample to obtain the first preset model.
7. The method of claim 6, further comprising:
determining a word vector of each label from a user label library;
determining cosine similarity between each label and other labels in the user label library based on the word vector of each label and the word vectors of the other labels;
determining a second sample based on cosine similarity between each label and the other labels, wherein the second sample is used for describing each label and the other labels of which the cosine similarity is greater than the preset similarity;
and training a second initial model by using the second sample to obtain the second preset model.
8. The method of claim 7, further comprising:
determining the number of labels of each label from a user label library;
determining the association degree between each label and other labels based on the number of labels of each label and the number of labels of other labels;
determining a third sample based on the association degree between each label and the other labels, wherein the third sample is used for describing each label and the other labels with the association degree larger than the preset association degree;
and training a third initial model by using the third sample to obtain the third preset model.
9. A label filling apparatus, comprising:
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring an original label of a target object, and the original label is a current label of a user portrait of the target object;
a first determining module for determining at least one tag set based on the original tag;
a second determination module to determine a target label based on the at least one set of labels, wherein the target label is to populate a user representation of the target object.
10. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the steps of the label filling method according to any of claims 1 to 8.
11. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of the label filling method according to any of claims 1 to 8.
CN202110567882.XA 2021-05-24 2021-05-24 Label filling method and device Pending CN113298145A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110567882.XA CN113298145A (en) 2021-05-24 2021-05-24 Label filling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110567882.XA CN113298145A (en) 2021-05-24 2021-05-24 Label filling method and device

Publications (1)

Publication Number Publication Date
CN113298145A true CN113298145A (en) 2021-08-24

Family

ID=77324421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110567882.XA Pending CN113298145A (en) 2021-05-24 2021-05-24 Label filling method and device

Country Status (1)

Country Link
CN (1) CN113298145A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538020A (en) * 2021-07-05 2021-10-22 深圳索信达数据技术有限公司 Method and device for acquiring guest group feature association degree, storage medium and electronic device
CN114445146A (en) * 2022-01-30 2022-05-06 北京火山引擎科技有限公司 Label filling method and related equipment thereof
CN114661969A (en) * 2022-02-28 2022-06-24 上海钐昆网络科技有限公司 A method, device, device and storage medium for optimizing user labeling system
CN118568513A (en) * 2024-05-24 2024-08-30 珠海市卓轩科技有限公司 AI-based user portrait construction method, system, equipment and medium
CN118982384A (en) * 2024-10-22 2024-11-19 浙江广电新媒体有限公司 Intelligent marketing method and system based on user portrait

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108451A (en) * 2017-12-27 2018-06-01 合肥美的智能科技有限公司 The group of subscribers portrait acquisition methods and device of group
CN109102157A (en) * 2018-07-11 2018-12-28 交通银行股份有限公司 A kind of bank's work order worksheet processing method and system based on deep learning
CN109934281A (en) * 2019-03-08 2019-06-25 电子科技大学 An unsupervised training method for binary classification network
CN110674144A (en) * 2019-08-14 2020-01-10 深圳壹账通智能科技有限公司 User portrait generation method and device, computer equipment and storage medium
CN111538751A (en) * 2020-03-23 2020-08-14 重庆特斯联智慧科技股份有限公司 Tagged user portrait generation system and method for Internet of things data
CN111813982A (en) * 2020-07-23 2020-10-23 中原工学院 Data processing method and device for subspace clustering algorithm based on spectral clustering

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108451A (en) * 2017-12-27 2018-06-01 合肥美的智能科技有限公司 The group of subscribers portrait acquisition methods and device of group
CN109102157A (en) * 2018-07-11 2018-12-28 交通银行股份有限公司 A kind of bank's work order worksheet processing method and system based on deep learning
CN109934281A (en) * 2019-03-08 2019-06-25 电子科技大学 An unsupervised training method for binary classification network
CN110674144A (en) * 2019-08-14 2020-01-10 深圳壹账通智能科技有限公司 User portrait generation method and device, computer equipment and storage medium
CN111538751A (en) * 2020-03-23 2020-08-14 重庆特斯联智慧科技股份有限公司 Tagged user portrait generation system and method for Internet of things data
CN111813982A (en) * 2020-07-23 2020-10-23 中原工学院 Data processing method and device for subspace clustering algorithm based on spectral clustering

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538020A (en) * 2021-07-05 2021-10-22 深圳索信达数据技术有限公司 Method and device for acquiring guest group feature association degree, storage medium and electronic device
CN113538020B (en) * 2021-07-05 2024-03-26 深圳索信达数据技术有限公司 Method and device for acquiring association degree of group of people features, storage medium and electronic device
CN114445146A (en) * 2022-01-30 2022-05-06 北京火山引擎科技有限公司 Label filling method and related equipment thereof
CN114661969A (en) * 2022-02-28 2022-06-24 上海钐昆网络科技有限公司 A method, device, device and storage medium for optimizing user labeling system
CN118568513A (en) * 2024-05-24 2024-08-30 珠海市卓轩科技有限公司 AI-based user portrait construction method, system, equipment and medium
CN118982384A (en) * 2024-10-22 2024-11-19 浙江广电新媒体有限公司 Intelligent marketing method and system based on user portrait

Similar Documents

Publication Publication Date Title
CN113298145A (en) Label filling method and device
WO2019214245A1 (en) Information pushing method and apparatus, and terminal device and storage medium
CN106919619B (en) Commodity clustering method and device and electronic equipment
US20160012061A1 (en) Similar document detection and electronic discovery
CN106570718B (en) Information delivery method and delivery system
EP2068276A1 (en) Information processing device and method, program, and recording medium
CN106294500B (en) Content item pushing method, device and system
CN111259173B (en) Search information recommendation method and device
CN109255000B (en) Dimension management method and device for label data
CN105825396B (en) Method and system for clustering advertisement labels based on co-occurrence
CN112685635B (en) Item recommendation method, device, server and storage medium based on classification label
CN113570413A (en) Method and device for generating advertisement keywords, storage medium and electronic equipment
CN111861605A (en) Business object recommendation method
CN116663505B (en) Comment area management method and system based on Internet
CN115018588A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN110795613A (en) Commodity searching method, device and system and electronic equipment
CN114223012A (en) Push object determination method and device, terminal equipment and storage medium
CN113407700B (en) A data query method, device and equipment
CN118193806B (en) Target retrieval method, target retrieval device, electronic equipment and storage medium
CN113077292B (en) User classification method and device, storage medium and electronic equipment
CN118250516B (en) Hierarchical processing method for users
CN115018576A (en) Financial data processing method, device, equipment and storage medium
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
JP6763967B2 (en) Data conversion device and data conversion method
CN109284384A (en) Text analysis method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210824