CN110264318A - Data processing method and device, electronic equipment and storage medium - Google Patents
Data processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110264318A CN110264318A CN201910563737.7A CN201910563737A CN110264318A CN 110264318 A CN110264318 A CN 110264318A CN 201910563737 A CN201910563737 A CN 201910563737A CN 110264318 A CN110264318 A CN 110264318A
- Authority
- CN
- China
- Prior art keywords
- product
- sample
- keyword
- category
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims description 26
- 238000000605 extraction Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 18
- 235000013330 chicken meat Nutrition 0.000 description 16
- 241000287828 Gallus gallus Species 0.000 description 15
- 235000013601 eggs Nutrition 0.000 description 14
- 239000000463 material Substances 0.000 description 10
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 8
- 240000003768 Solanum lycopersicum Species 0.000 description 8
- 235000013305 food Nutrition 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 235000013372 meat Nutrition 0.000 description 8
- 235000013527 bean curd Nutrition 0.000 description 7
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 6
- 240000007594 Oryza sativa Species 0.000 description 6
- 235000007164 Oryza sativa Nutrition 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 6
- 235000009566 rice Nutrition 0.000 description 6
- 244000061458 Solanum melongena Species 0.000 description 4
- 235000002597 Solanum melongena Nutrition 0.000 description 4
- 244000061456 Solanum tuberosum Species 0.000 description 4
- 235000002595 Solanum tuberosum Nutrition 0.000 description 4
- 235000013405 beer Nutrition 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 235000021185 dessert Nutrition 0.000 description 4
- 235000021186 dishes Nutrition 0.000 description 4
- 235000015067 sauces Nutrition 0.000 description 4
- 235000011888 snacks Nutrition 0.000 description 4
- 210000002435 tendon Anatomy 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 235000013311 vegetables Nutrition 0.000 description 4
- 235000014101 wine Nutrition 0.000 description 4
- 241000251468 Actinopterygii Species 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 229910052742 iron Inorganic materials 0.000 description 3
- 235000001674 Agaricus brunnescens Nutrition 0.000 description 2
- 235000003261 Artemisia vulgaris Nutrition 0.000 description 2
- 240000006891 Artemisia vulgaris Species 0.000 description 2
- 244000036905 Benincasa cerifera Species 0.000 description 2
- 235000011274 Benincasa cerifera Nutrition 0.000 description 2
- 241000238366 Cephalopoda Species 0.000 description 2
- 235000015001 Cucumis melo var inodorus Nutrition 0.000 description 2
- 240000002495 Cucumis melo var. inodorus Species 0.000 description 2
- 244000304337 Cuminum cyminum Species 0.000 description 2
- 235000007129 Cuminum cyminum Nutrition 0.000 description 2
- 101100453790 Drosophila melanogaster Kebab gene Proteins 0.000 description 2
- 240000008415 Lactuca sativa Species 0.000 description 2
- 241000533950 Leucojum Species 0.000 description 2
- 241000238413 Octopus Species 0.000 description 2
- 244000242564 Osmanthus fragrans Species 0.000 description 2
- 235000019083 Osmanthus fragrans Nutrition 0.000 description 2
- 241000124033 Salix Species 0.000 description 2
- 241000269851 Sarda sarda Species 0.000 description 2
- 244000000231 Sesamum indicum Species 0.000 description 2
- 235000003434 Sesamum indicum Nutrition 0.000 description 2
- 244000269722 Thea sinensis Species 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 244000098338 Triticum aestivum Species 0.000 description 2
- 235000015278 beef Nutrition 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 235000008429 bread Nutrition 0.000 description 2
- 235000014121 butter Nutrition 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010411 cooking Methods 0.000 description 2
- 210000000805 cytoplasm Anatomy 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 235000011389 fruit/vegetable juice Nutrition 0.000 description 2
- ZZUFCTLCJUWOSV-UHFFFAOYSA-N furosemide Chemical compound C1=C(Cl)C(S(=O)(=O)N)=CC(C(O)=O)=C1NCC1=CC=CO1 ZZUFCTLCJUWOSV-UHFFFAOYSA-N 0.000 description 2
- 235000015231 kebab Nutrition 0.000 description 2
- 210000003734 kidney Anatomy 0.000 description 2
- 235000020094 liqueur Nutrition 0.000 description 2
- 235000012054 meals Nutrition 0.000 description 2
- 230000027939 micturition Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 235000012771 pancakes Nutrition 0.000 description 2
- 239000008188 pellet Substances 0.000 description 2
- 235000012830 plain croissants Nutrition 0.000 description 2
- 235000015277 pork Nutrition 0.000 description 2
- 230000001012 protector Effects 0.000 description 2
- 235000019633 pungent taste Nutrition 0.000 description 2
- 235000019991 rice wine Nutrition 0.000 description 2
- 235000012045 salad Nutrition 0.000 description 2
- 235000013580 sausages Nutrition 0.000 description 2
- 235000014347 soups Nutrition 0.000 description 2
- 235000013547 stew Nutrition 0.000 description 2
- 238000010998 test method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 210000004291 uterus Anatomy 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 235000021419 vinegar Nutrition 0.000 description 2
- 239000000052 vinegar Substances 0.000 description 2
- 240000007124 Brassica oleracea Species 0.000 description 1
- 235000003899 Brassica oleracea var acephala Nutrition 0.000 description 1
- 235000011301 Brassica oleracea var capitata Nutrition 0.000 description 1
- 235000001169 Brassica oleracea var oleracea Nutrition 0.000 description 1
- 241000272041 Naja Species 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 229910052571 earthenware Inorganic materials 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 235000015110 jellies Nutrition 0.000 description 1
- 239000008274 jelly Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 235000013555 soy sauce Nutrition 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/12—Hotels or restaurants
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the disclosure discloses a data processing method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs; extracting key words in the text description; determining the importance degree of the keyword; training a product identification model by using the characteristic data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product. The product identification model trained in the mode can learn the influence degree of the keywords in the text description on the product identification under the product category from the text description of the product, the accuracy of the product category identification can be improved, and even different products with similar text descriptions can be identified by the product identification model.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of internet technology, more and more products appear on an online operation platform. In order to express common points and different points of various products well, an online operation platform usually generates portrait data for the products, and facilitates classification and identification of the products in various scenes such as retrieval scenes. However, because of the wide variety of products, the same product may have different text descriptions such as product names, etc., and different products may also have the same or similar text descriptions, the product representation data is usually given by manually screening keywords, and the abstract summarization capability of people is different, so the error is also large. Therefore, the process of labeling the product portrait data is time-consuming and labor-consuming, and the accuracy is not high.
Disclosure of Invention
The embodiment of the disclosure provides a data processing method and device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing method.
Specifically, the data processing method includes:
acquiring sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
extracting key words in the text description;
determining the importance degree of the keyword;
training a product identification model by using the characteristic data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
With reference to the first aspect, in a first implementation manner of the first aspect, the obtaining sample data includes:
acquiring text descriptions of a plurality of sample products in a preset category;
and performing de-duplication processing on the text descriptions of the sample products.
With reference to the first aspect and/or the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the present disclosure performs deduplication processing on text descriptions of a plurality of sample products, including:
and uniformly mapping a plurality of different text descriptions corresponding to the same sample product into the same text description.
With reference to the first aspect, the first implementation manner of the first aspect, and/or the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the extracting keywords in the text description includes:
segmenting the text description;
and determining the participles with the correlation higher than a preset threshold value with the category to which the sample product belongs as the keywords.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, and/or the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the determining, as the keyword, a segmented word whose correlation with a category to which the sample product belongs is higher than a preset threshold includes:
and determining the relevance of the word segmentation and the category to which the sample product belongs by using a chi-square independent test method.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and/or the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the determining the importance degree of the keyword includes:
and determining the TD-IDF value of the keyword as the importance degree of the keyword.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, and/or the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the determining the TD-IDF value of the keyword as the importance degree of the keyword includes:
determining a TD-IDF value of the keyword under a category to which the sample product belongs;
and when the keyword corresponds to a plurality of TD-IDF values under different categories, selecting the smallest TD-IDF value as the importance degree of the keyword.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, and/or the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the determining the importance degree of the keyword includes:
and when the relevance of all the participles corresponding to the sample product and the category to which the sample product belongs is lower than a preset threshold value, taking a default value as the importance degree of the keyword corresponding to the sample product.
In a second aspect, a product identification method is provided in an embodiment of the present disclosure.
Specifically, the product identification method includes:
acquiring text description of a product to be identified;
extracting keywords of the text description;
determining the importance degree of the keyword;
inputting the importance degree of the keyword into a pre-trained product identification model so as to identify the product to be identified; wherein the product identification model is trained using the method of the first aspect.
With reference to the second aspect, in a first implementation manner of the second aspect, the extracting keywords in the text description includes:
segmenting the text description;
and matching the word segmentation with a keyword set to determine whether the word segmentation is a keyword.
In a third aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
a first obtaining module configured to obtain sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
a first extraction module configured to extract keywords in the text description;
a first determination module configured to determine a degree of importance of the keyword;
a training module configured to train a product recognition model using the feature data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
In a fourth aspect, a product identification device is provided in embodiments of the present disclosure.
Specifically, the product identification device includes:
the second acquisition module is configured to acquire a text description of the product to be identified;
a second extraction module configured to extract keywords of the textual description;
a second determination module configured to determine a degree of importance of the keyword;
the recognition module is configured to input the importance degree of the keyword into a pre-trained product recognition model so as to recognize the product to be recognized; wherein the product identification model is obtained by training with the device of the third aspect.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the data processing apparatus and/or the product identification apparatus includes a memory and a processor, the memory is used for storing one or more computer instructions for supporting the data processing apparatus and/or the product identification apparatus to execute the data processing method and/or the product identification method, and the processor is configured to execute the computer instructions stored in the memory. The data processing apparatus and/or the product identifying apparatus may further comprise a communication interface for the data processing apparatus and/or the product identifying apparatus to communicate with other devices or a communication network.
In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of:
acquiring sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
extracting key words in the text description;
determining the importance degree of the keyword;
training a product identification model by using the characteristic data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
With reference to the fifth aspect, in a first implementation manner of the fifth aspect, the obtaining sample data includes:
acquiring text descriptions of a plurality of sample products in a preset category;
and performing de-duplication processing on the text descriptions of the sample products.
With reference to the fifth aspect and/or the first implementation manner of the fifth aspect, in a second implementation manner of the fifth aspect, the present disclosure performs deduplication processing on text descriptions of a plurality of sample products, including:
and uniformly mapping a plurality of different text descriptions corresponding to the same sample product into the same text description.
With reference to the fifth aspect, the first implementation manner of the fifth aspect, and/or the second implementation manner of the fifth aspect, in a third implementation manner of the fifth aspect, the extracting keywords in the text description includes:
segmenting the text description;
and determining the participles with the correlation higher than a preset threshold value with the category to which the sample product belongs as the keywords.
With reference to the fifth aspect, the first implementation manner of the fifth aspect, the second implementation manner of the fifth aspect, and/or the third implementation manner of the fifth aspect, in a fourth implementation manner of the fifth aspect, the determining, as the keyword, a participle whose relevance to a category to which the sample product belongs is higher than a preset threshold includes:
determining, with a chi-square independent-check electronic device, a relevance of the segmented word to a category to which the sample product belongs.
With reference to the fifth aspect, the first implementation manner of the fifth aspect, the second implementation manner of the fifth aspect, the third implementation manner of the fifth aspect, and/or the fourth implementation manner of the fifth aspect, in a fifth implementation manner of the fifth aspect, the determining the importance degree of the keyword includes:
and determining the TD-IDF value of the keyword as the importance degree of the keyword.
With reference to the fifth aspect, the first implementation manner of the fifth aspect, the second implementation manner of the fifth aspect, the third implementation manner of the fifth aspect, the fourth implementation manner of the fifth aspect, and/or the fifth implementation manner of the fifth aspect, in a sixth implementation manner of the fifth aspect, the determining the TD-IDF value of the keyword as the importance degree of the keyword includes:
determining a TD-IDF value of the keyword under a category to which the sample product belongs;
and when the keyword corresponds to a plurality of TD-IDF values under different categories, selecting the smallest TD-IDF value as the importance degree of the keyword.
With reference to the fifth aspect, the first implementation manner of the fifth aspect, the second implementation manner of the fifth aspect, the third implementation manner of the fifth aspect, the fourth implementation manner of the fifth aspect, the fifth implementation manner of the fifth aspect, and/or the sixth implementation manner of the fifth aspect, in a seventh implementation manner of the fifth aspect, the determining the importance degree of the keyword includes:
and when the relevance of all the participles corresponding to the sample product and the category to which the sample product belongs is lower than a preset threshold value, taking a default value as the importance degree of the keyword corresponding to the sample product.
In a sixth aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of:
acquiring text description of a product to be identified;
extracting keywords of the text description;
determining the importance degree of the keyword;
inputting the importance degree of the keyword into a pre-trained product identification model so as to identify the product to be identified; wherein the product identification model is obtained by training with the electronic device of the fifth aspect.
With reference to the sixth aspect, in a first implementation manner of the sixth aspect, the obtaining sample data includes:
extracting keywords in the text description, including:
segmenting the text description;
and matching the word segmentation with a keyword set to determine whether the word segmentation is a keyword.
In a seventh aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for a data processing apparatus and/or a product identification apparatus, which includes computer instructions for performing any of the methods described above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the data processing method, the text description of the sample product and the category to which the product belongs are obtained, the keywords of the text description are extracted, the importance degree of the extracted keywords under the category to which the sample product belongs is determined, and then a product identification model is trained according to feature data including the importance degree of the keywords and the category to which the product belongs. The product identification model trained in the mode can learn the influence degree of the keywords in the text description on the product identification under the product category from the text description of the product, the accuracy of the product category identification can be improved, and even different products with similar text descriptions can be identified by the product identification model.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;
FIG. 2 shows a flow chart of step S101 according to the embodiment shown in FIG. 1;
FIG. 3 shows a flowchart of step S102 according to the embodiment shown in FIG. 1;
FIG. 4 is a flow diagram illustrating a portion of determining importance of keywords in accordance with the embodiment shown in FIG. 1;
FIG. 5 illustrates a flow diagram of a method of product identification according to an embodiment of the present disclosure;
FIG. 6 shows a flowchart of step S502 according to the embodiment shown in FIG. 5;
FIG. 7 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of the first obtaining module 701 according to the embodiment shown in FIG. 7;
FIG. 9 illustrates a block diagram of the first extraction module 702 according to the embodiment illustrated in FIG. 7;
FIG. 10 is a block diagram illustrating a structure of a portion for determining importance of a keyword according to an embodiment of the present disclosure;
fig. 11 illustrates a block diagram of a product recognition apparatus according to an embodiment of the present disclosure;
FIG. 12 is a block diagram illustrating a second extraction module 1102 according to the embodiment shown in FIG. 11;
fig. 13 is a schematic structural diagram of an electronic device suitable for implementing a data processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the data processing method includes the steps of:
in step S101, sample data is acquired; wherein the sample data comprises a textual description of a sample product and a category of the sample product;
in step S102, extracting keywords in the text description;
in step S103, determining the importance degree of the keyword under the category;
in step S104, training a product identification model using the feature data of the sample product and the category; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product under the category.
In this embodiment, the sample product may be a product currently related to the online platform, such as dishes on a take-away ordering platform, clothes on an e-commerce platform, living goods, household goods, and the like. The textual description of the sample product includes, but is not limited to, textual descriptions of attributes such as product name, product material, manufacturing process, efficacy, size, quantity, and the like. For example, the textual description of the dish on the takeaway ordering platform may include the name of the dish, the food material of the dish, the method of doing the dish, and so on.
In some embodiments, the textual description of the sample product may also include a textual description of an operator to whom the sample product belongs. Under the general condition, the product types operated by one operator are relatively similar, even some operators only operate products of one type, so that when the product identification model is trained, the data of the operator is also used as input data, the product identification model learns the characteristics which can influence the product types from the operator data, and the identification accuracy of the product identification model is further improved.
The data of the operator to which the sample product belongs may include, but is not limited to, the name of the operator, the main operating scope, the large scope to which the product operated by the operator belongs (such as a cuisine in the catering industry), and the like.
The type of the sample product can be determined according to the existing data of the online platform, and can also be manually marked. For example, sample data may be collected from existing products of the online platform, and the online platform generally has its own classification of the products, so that sample data required for the training may be obtained by collecting text descriptions related to the products under each category.
For each sample data obtained, one or more keywords can be extracted from the text description of the sample product, and then the importance degree of the one or more keywords is determined, wherein the importance degree is used for indicating the action size of the keyword on product identification, if the keyword can play an important role in product identification, the importance degree of the keyword is higher, and if the keyword cannot play an important role in product identification, the importance degree of the keyword is lower.
The importance degree of the keyword may be determined in advance by counting the number of times that the keyword appears in the text descriptions of all sample products in the same category, for example, if the number of times that a certain keyword appears in the text descriptions of all sample products in the same category is large, the importance degree of the keyword may be considered to be high, and if the number of times that the keyword appears is small, the importance degree of the keyword may be considered to be low.
The product identification model may employ an xgboost model, a GBDT model, a neural network model, or the like. When the product identification model is trained, the importance degree can be converted into a vector form, and a plurality of vectors corresponding to the keywords are combined to form input data of the model. In each iteration cycle process, the characteristic data in one sample data is used as the input of the product identification model, after the output result of the product identification model is obtained, the output result can be compared with the class to which the sample product in the sample data belongs, and then the model parameters of the product identification model are updated, so that the output result of the product identification model is closer to the class to which the sample product belongs. After training of a large amount of sample data, model parameters of the product identification model are continuously updated, and after training is finished, the product identification model can provide a relatively accurate output result aiming at input data.
According to the data processing method, the text description of the sample product and the category to which the product belongs are obtained, the keywords of the text description are extracted, the importance degree of the extracted keywords is determined, and then the product identification model is trained according to the feature data including the importance degree of the keywords and the category to which the product belongs. The product identification model trained in the mode can learn the influence degree of the keywords in the text description on the product identification under the product category from the text description of the product, the accuracy of the product identification can be improved, and even different products with similar text descriptions can be identified by the product identification model.
In an optional implementation manner of this embodiment, as shown in fig. 2, the step S101, namely the step of obtaining sample data, further includes the following steps:
in step S201, obtaining text descriptions of a plurality of sample products in a preset category;
in step S202, a text description of a plurality of the sample products is subjected to a deduplication process.
In this optional implementation manner, when collecting sample data, the text descriptions of a plurality of sample products may be obtained from a plurality of preset categories, which already have classification data, for the online platform. The textual description may include, but is not limited to, textual descriptions of attributes such as product name, product material, manufacturing process, efficacy, size, quantity, and the like. To avoid collecting duplicate sample products, the textual description may be deduplicated.
In an optional implementation manner of this embodiment, the step S202, that is, the step of performing deduplication processing on the text descriptions of the plurality of sample products, further includes the following steps:
and uniformly mapping a plurality of different text descriptions corresponding to the same sample product into the same text description.
In this alternative implementation, there may be a plurality of sample products under the same preset category, and these sample products belong to the same product although the text description is different. For example, in the take-away ordering platform, some merchants upload menus with tomato fried eggs, and some merchants upload menus with tomato fried eggs, which belong to the same product substantially and adopt different names, so that the two products can be mapped to the same product name in a unified manner. Of course, it is understood that other content in the text description may be mapped uniformly.
In an optional implementation manner of this embodiment, as shown in fig. 3, the step S102, namely the step of extracting the keywords in the text description, further includes the following steps:
in step S301, performing word segmentation on the text description;
in step S302, a segmented word having a correlation higher than a preset threshold with the category to which the sample product belongs is determined as the keyword.
In this optional implementation manner, when extracting the keywords in the text description, after segmenting the text description, the keywords may be determined according to the relevance of the segments to the category to which the sample product belongs, for example, one of the segments "pan" of "stir-fried eggs with tomatoes" in the takeaway meal ordering platform is not important for the category identification of the dish, that is, the relevance of the word "pan" to the identification of the dish is not high, and the word may be removed without being used as the keywords. The preset threshold may be set according to actual conditions, and is not limited herein. Keywords are extracted from word segmentation results described by the text, word segmentation with low correlation can be eliminated, and the problem that training efficiency is low due to overlarge feature data dimension of a subsequent training product identification model can be solved.
In an optional implementation manner of this embodiment, the step S302 of determining, as the keyword, a segmented word having a correlation with the category to which the sample product belongs higher than a preset threshold, further includes the following steps:
and determining the relevance of the word segmentation and the category to which the sample product belongs by using a chi-square independent test method.
In this alternative implementation, the card-side independence check can determine the association and dependency between two types of variables. Therefore, in the embodiment of the present disclosure, text descriptions of sample products in different preset categories are collected, and after the text descriptions are participled, for each preset category, the relevance between a participle result obtained from the collected text descriptions of the sample products and the preset category may be determined in a chi-square independence verification manner, and a participle with the relevance higher than a preset threshold is determined as a keyword.
After a large amount of sample data is collected, aiming at the text description of a sample product, keywords in different preset categories are extracted from the text description by chi-square independence check to form a keyword set. The card-side independence check is the prior art and is not described herein.
In an optional implementation manner of this embodiment, the step S103, namely, the step of determining the importance degree of the keyword, further includes the following steps:
and determining the TD-IDF value of the keyword as the importance degree of the keyword.
In this alternative implementation, TF-IDF (Term Frequency-Inverse Document Frequency) is a commonly used weighting technique for information retrieval and data mining, TF means Term Frequency (Term Frequency) and IDF means Inverse text Frequency index (Inverse Document Frequency). The TF-IDF value of the keyword may have the following meaning: the frequency TF of the current keyword appearing in the text descriptions of all sample products in the current preset category is high, and the current keyword rarely appears in the text descriptions of all sample products in other preset categories, so that the keyword can be considered to have good category distinguishing capability and is suitable for classification, therefore, the keyword can be considered to be important relative to the current preset category, and the TD-IDF value of the keyword can be used for measuring the importance of the keyword.
The TD-IDF values of the keywords can be obtained by counting text descriptions of all sample products in preset categories on an online platform in advance, extracting the keywords from the text descriptions and determining the TD-IDF values of the keywords in the text descriptions. As described above, for the keyword set formed by sample data, each keyword may further correspond to a corresponding TD-IDF value, so that during online identification, the keyword set and the corresponding TD-IDF value may be directly utilized to obtain the keyword and the TD-IDF value corresponding to the product to be identified.
In this embodiment, the TF value of a keyword may be obtained by dividing the number of times that the keyword appears in the text descriptions of all sample products in the preset category by the number of text descriptions (i.e., the number of all sample products in the preset category); the IDF of the keyword may be determined by the number of preset categories to which the sample product corresponding to the text description where the keyword appears belongs and the total number of the preset categories, and the calculation formula is IDF log (n/m), where n is the total number of the preset categories, and m is the number of the preset categories where the keyword appears. For example, if the keyword a appears in the text descriptions of the sample products in the preset category 1, the preset category 2, and the preset category 3, and the preset categories are 5 in total, the keyword appears in the three preset categories, and thus the IDF of the keyword is log (5/3).
The TD-IDF value of a keyword is the product of the TD value and the IDF value of the keyword.
For example, the TD-IDF values of the keywords under each preset category in the sample data collected in one takeaway ordering platform are as follows:
[ "fried egg", "braised meat", "home style", "cold and dressed with sauce", "braised eggplant", "palace chicken", "diced", "sugar and vinegar", "shredded meat", "jelly", "potato", "mugwort", "bean curd", "kidney", "tomato", "inner ridge", "spelling", "dish", "small" ] home style vegetable [0.34, 033, 0.28, 0.26, 0.22, 0.19, 0.17, 0.13, 0.12, 0.1, 0.07, 0.08, 0.05, 0.07, 0.06, 0.07, 0.05, 0.05, 0.06]
[ "beer", "rice wine", "Yanjing beer", "snowflake", "wheat", "Harbin", "tin", "Skyo", "protoplasm", "Islands", "wine egg", "courage", "bizard", "Belgium", "refreshing", "king", "white", "sweet osmanthus", "Xiao", "liqueur" ] wine [1.68, 0.42, 0.37, 0.37, 0.22, 0.19, 0.18, 0.17, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.12, 0.1, 0.1, 0.11, 0.07]
[ "roast meat", "crispy skin", "roast", "roasted", "crusty", "sausage", "roast string", "barbeque", "Orleans", "brazilian", "tendon", "Cumin", "Salix", "honeydew", "salad", "New Orleans", "Chicken", "tendon", "croissant", "baked bread", "roasted", "Turkey" ] roast meat [1.57, 0.21, 0.19, 0.18, 0.15, 0.14, 0.11, 0.12, 0.1, 0.1, 0.1, 0.08, 0.08, 0.09, 0.07, 0.08, 0.06, 0.05]
[ "braised in brown sauce", "chicken", "rice", "extreme hot", "tofu skin", "golden mushroom", "Wu' o", "abalone", "of", "chicken small", "potato block", "not", "earthenware", "dry pot", "big hot", "give tofu skin" ] yellow braised chicken rice [1.81, 1.26, 0.4, 0.14, 0.13, 0.12, 0.1, 0.09, 0.1, 0.08, 0.07, 0.07, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]
[ "teppanyaki", "kebab", "whisker", "pungency", "big chicken chop", "select", "squid", "old dry mother", "pancake", "iron plate", "fish", "eggplant", "juice", "tofu", "sesame cake", "egg" ] teppanyaki [2.94, 0.43, 043, 0.41, 0.4, 037, 0.33, 032, 0.32, 0.29, 0.28, 0.27, 0.25, 0.25, 0.23, 0.21]
[ "meatball", "beef meatball", "pellet", "meat meatball", "cabbage", "stew", "soup", "meatball", "urination", "white gourd", "casserole", "cooking", "octopus", "croquette", "delicacy", "handmade", "four happiness", "vegetarian", "vermicelli", "meatball" ] meatballs [1.14, 0.35, 0.19, 0.19, 0.18, 0.16, 0.14, 0.12, 0.11, 0.1, 0.11, 0.1, 0.09, 0.07, 0.07, 0.06, 0.07, 0.06, 0.05]
[ "dessert", "foreign exchange", "double poetry", "macarons", "gift box", "afternoon tea", "Rui +", "Rubi", "Brown", "Nib", "butter", "curl" ] dessert [2.82, 0.66, 0.66, 0.62, 0.6, 0.6, 0.58, 0.55, 0.54, 0.53, 0.5, 0.42]
Wherein, in each section of content, the former part is a plurality of key words extracted from the dish category, such as 'fried egg', 'braised pork in brown sauce', and the middle part is the name of the dish category, such as 'home dish'; the latter half is the corresponding TD-IDF values of these keywords, such as "0.34, 0.33", etc.
For example, for a dish "bonbon pot" the keywords and feature data shown in the following table can be extracted from the relevant text description:
the name of the dish is as follows: bonito, restaurant: millions, cuisine: beijing vegetable, main operation: snack food
Uterus protector | Diced chicken | ....... | Snack food | ....... |
0.3 | 0.5 | 1 |
The first action in the table is a keyword, and the second action is characteristic data corresponding to the dish of 'Tungbao chicken with rice covered in pot', namely the TD-IDF value corresponding to each keyword.
In an optional implementation manner of this embodiment, as shown in fig. 4, the step of determining the TD-IDF value of the keyword as the importance degree of the keyword further includes the following steps:
in step S401, determining a TD-IDF value of the keyword under the category to which the sample product belongs;
in step S402, when the keyword corresponds to a plurality of TD-IDF values in different categories, the smallest TD-IDF value is selected as the importance level of the keyword.
In this alternative implementation, if the same keyword occurs in multiple predetermined categories, a TD-IDF value of the keyword can be calculated for each of the predetermined categories, and for the sake of uniformity, the minimum TD-IDF value can be selected for the importance of the keyword.
In an optional implementation manner of this embodiment, the step S103, namely, the step of determining the importance degree of the keyword, further includes the following steps:
and when the relevance of all the participles corresponding to the sample product and the category to which the sample product belongs is lower than a preset threshold value, taking a default value as the importance degree of the keyword corresponding to the sample product.
In this optional implementation manner, when extracting the keywords in the text description, the low-relevance participles with the category to which the sample product belongs are removed, and if the relevance of all the participles with the category to which the sample product belongs in the text description of one sample product is lower than a preset threshold, the importance degree of the keywords in the feature data of the sample product may be set to a default value. The default value may be set according to practical situations, and is not limited herein.
Fig. 5 illustrates a flow diagram of a product identification method according to an embodiment of the present disclosure. As shown in fig. 5, the product identification method includes the steps of:
in step S501, a text description of a product to be identified is acquired;
in step S502, extracting keywords of the text description;
in step S503, the importance degree of the keyword is determined;
in step S504, the importance degree of the keyword is input into a product recognition model trained in advance to recognize the product to be recognized; and the product identification model is obtained by utilizing the data processing method for training.
In this embodiment, the product to be identified may be a product currently related to the online platform, such as dishes on a take-out ordering platform, clothing on an e-commerce platform, living goods, household goods, and the like. The textual description of the product to be identified includes, but is not limited to, textual descriptions of attributes such as product name, product material, manufacturing process, efficacy, size, quantity, and the like. For example, the textual description of the dish on the takeaway ordering platform may include the name of the dish, the food material of the dish, the method of doing the dish, and so on.
In some embodiments, the textual description of the product to be identified may also include a textual description of an operator to whom the product to be identified belongs. Under the general condition, the product types operated by one operator are relatively similar, even some operators only operate products of one type, so that when the product identification model is trained, the data of the operator is also used as input data, the product identification model learns the characteristics which can influence the product types from the operator data, and the identification accuracy of the product identification model is further improved.
The data of the operator to which the product to be identified belongs may include, but is not limited to, the name of the operator, the main operation range, the large range to which the product operated by the operator belongs (such as a cuisine in the catering industry), and the like.
For a product to be identified, one or more keywords can be extracted from the text description of the product to be identified, and then the importance degree of the one or more keywords is determined, wherein the importance degree is used for indicating the action size of the keyword on product identification, if the keyword can play an important role in product identification, the importance degree of the keyword is higher, and if the keyword cannot play an important role in product identification, the importance degree of the keyword is lower.
The product identification model is obtained by training the data processing method, so the specific details of the product identification model can be referred to the above description of the data processing method, and are not described herein again.
In an optional implementation manner of this embodiment, as shown in fig. 6, the step S502, namely the step of extracting the keywords in the text description, further includes the following steps:
in step S601, performing word segmentation on the text description;
in step S602, the segmentation is matched with a keyword set, and it is determined whether the segmentation is a keyword.
In this optional implementation manner, as described in the above data processing method, in the training process, for all collected sample products, keywords in corresponding text descriptions are extracted, a keyword set corresponding to the sample product is formed, and the importance degrees of the keywords are also determined in subsequent steps. Therefore, after the training of the product recognition model is completed, the keyword set can be reserved, the participles in the text description of the product to be recognized are obtained and then matched with the keyword set, if the matching is successful, the participles can be determined as the keywords corresponding to the product to be recognized, and the importance degrees of the keywords can also be directly determined.
For determining the keywords and determining the importance degree, reference may be made to the above description of the data processing method, which is not described herein again.
It should be noted that, when there is no keyword matching with the keyword set in the text description of the product to be recognized, the importance degree of the keyword corresponding to the recognized product may be set as a default value.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 7 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 7, the data processing apparatus includes:
a first obtaining module 701 configured to obtain sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
a first extraction module 702 configured to extract keywords in the text description;
a first determining module 703 configured to determine the importance degree of the keyword;
a training module 704 configured to train a product recognition model using the feature data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
In this embodiment, the sample product may be a product currently related to the online platform, such as dishes on a take-away ordering platform, clothes on an e-commerce platform, living goods, household goods, and the like. The textual description of the sample product includes, but is not limited to, textual descriptions of attributes such as product name, product material, manufacturing process, efficacy, size, quantity, and the like. For example, the textual description of the dish on the takeaway ordering platform may include the name of the dish, the food material of the dish, the method of doing the dish, and so on.
In some embodiments, the textual description of the sample product may also include a textual description of an operator to whom the sample product belongs. Under the general condition, the product types operated by one operator are relatively similar, even some operators only operate products of one type, so that when the product identification model is trained, the data of the operator is also used as input data, the product identification model learns the characteristics which can influence the product types from the operator data, and the identification accuracy of the product identification model is further improved.
The data of the operator to which the sample product belongs may include, but is not limited to, the name of the operator, the main operating scope, the large scope to which the product operated by the operator belongs (such as a cuisine in the catering industry), and the like.
The type of the sample product can be determined according to the existing data of the online platform, and can also be manually marked. For example, sample data may be collected from existing products of the online platform, and the online platform generally has its own classification of the products, so that sample data required for the training may be obtained by collecting text descriptions related to the products under each category.
For each sample data obtained, one or more keywords can be extracted from the text description of the sample product, and then the importance degree of the one or more keywords is determined, wherein the importance degree is used for indicating the action size of the keyword on product identification, if the keyword can play an important role in product identification, the importance degree of the keyword is higher, and if the keyword cannot play an important role in product identification, the importance degree of the keyword is lower.
The importance degree of the keyword may be determined in advance by counting the number of times that the keyword appears in the text descriptions of all sample products in the same category, for example, if the number of times that a certain keyword appears in the text descriptions of all sample products in the same category is large, the importance degree of the keyword may be considered to be high, and if the number of times that the keyword appears is small, the importance degree of the keyword may be considered to be low.
The product identification model may employ an xgboost model, a GBDT model, a neural network model, or the like. When the product identification model is trained, the importance degree can be converted into a vector form, and a plurality of vectors corresponding to the keywords are combined to form input data of the model. In each iteration cycle process, the characteristic data in one sample data is used as the input of the product identification model, after the output result of the product identification model is obtained, the output result can be compared with the class to which the sample product in the sample data belongs, and then the model parameters of the product identification model are updated, so that the output result of the product identification model is closer to the class to which the sample product belongs. After training of a large amount of sample data, model parameters of the product identification model are continuously updated, and after training is finished, the product identification model can provide a relatively accurate output result aiming at input data.
In the data processing device of the embodiment of the disclosure, the text description of the sample product and the category to which the product belongs are obtained, the keywords of the text description are extracted, the importance degree of the extracted keywords is determined, and then the product identification model is trained according to the feature data including the importance degree of the keywords and the category to which the product belongs. The product identification model trained in the mode can learn the influence degree of the keywords in the text description on the product identification under the product category from the text description of the product, the accuracy of the product identification can be improved, and even different products with similar text descriptions can be identified by the product identification model.
In an optional implementation manner of this embodiment, as shown in fig. 8, the first obtaining module 701 includes:
a first obtaining sub-module 801 configured to obtain text descriptions of a plurality of sample products in a preset category;
a deduplication sub-module 802 configured to perform deduplication processing on textual descriptions of a plurality of the sample products.
In this optional implementation manner, when collecting sample data, the text descriptions of a plurality of sample products may be obtained from a plurality of preset categories, which already have classification data, for the online platform. The textual description may include, but is not limited to, textual descriptions of attributes such as product name, product material, manufacturing process, efficacy, size, quantity, and the like. To avoid collecting duplicate sample products, the textual description may be deduplicated.
In an optional implementation manner of this embodiment, the duplication elimination sub-module 802 includes:
a mapping sub-module configured to uniformly map a plurality of different text descriptions corresponding to the same sample product into the same text description.
In this alternative implementation, there may be a plurality of sample products under the same preset category, and these sample products belong to the same product although the text description is different. For example, in the take-away ordering platform, some merchants upload menus with tomato fried eggs, and some merchants upload menus with tomato fried eggs, which belong to the same product substantially and adopt different names, so that the two products can be mapped to the same product name in a unified manner. Of course, it is understood that other content in the text description may be mapped uniformly.
In an optional implementation manner of this embodiment, as shown in fig. 9, the first extracting module 702 includes:
a first word segmentation sub-module 901 configured to segment the text description;
a first determining sub-module 902 configured to determine, as the keyword, a segmented word having a correlation higher than a preset threshold with respect to a category to which the sample product belongs.
In this optional implementation manner, when extracting the keywords in the text description, after segmenting the text description, the keywords may be determined according to the relevance of the segments to the category to which the sample product belongs, for example, one of the segments "pan" of "stir-fried eggs with tomatoes" in the takeaway meal ordering platform is not important for the category identification of the dish, that is, the relevance of the word "pan" to the identification of the dish is not high, and the word may be removed without being used as the keywords. The preset threshold may be set according to actual conditions, and is not limited herein. Keywords are extracted from word segmentation results described by the text, word segmentation with low correlation can be eliminated, and the problem that training efficiency is low due to overlarge feature data dimension of a subsequent training product identification model can be solved.
In an optional implementation manner of this embodiment, the first determining sub-module 902 includes:
a second determining sub-module configured to determine a correlation of the segmented word with a category to which the sample product belongs using a chi-square independent test apparatus.
In this alternative implementation, the card-side independence check can determine the association and dependency between two types of variables. Therefore, in the embodiment of the present disclosure, text descriptions of sample products in different preset categories are collected, and after the text descriptions are participled, for each preset category, the relevance between a participle result obtained from the collected text descriptions of the sample products and the preset category may be determined in a chi-square independence verification manner, and a participle with the relevance higher than a preset threshold is determined as a keyword.
After a large amount of sample data is collected, aiming at the text description of a sample product, keywords in different preset categories are extracted from the text description by chi-square independence check to form a keyword set. The card-side independence check is the prior art and is not described herein.
In an optional implementation manner of this embodiment, the first determining module 703 includes:
a third determination submodule configured to determine the TD-IDF value of the keyword as the degree of importance of the keyword.
In this alternative implementation, TF-IDF (Term Frequency-Inverse Document Frequency) is a commonly used weighting technique for information retrieval and data mining, TF means Term Frequency (Term Frequency) and IDF means Inverse text Frequency index (Inverse Document Frequency). The TF-IDF value of the keyword may have the following meaning: the frequency TF of the current keyword appearing in the text descriptions of all sample products in the current preset category is high, and the current keyword rarely appears in the text descriptions of all sample products in other preset categories, so that the keyword can be considered to have good category distinguishing capability and is suitable for classification, therefore, the keyword can be considered to be important relative to the current preset category, and the TD-IDF value of the keyword can be used for measuring the importance of the keyword.
The TD-IDF values of the keywords can be obtained by counting text descriptions of all sample products in preset categories on an online platform in advance, extracting the keywords from the text descriptions and determining the TD-IDF values of the keywords in the text descriptions. As described above, for the keyword set formed by sample data, each keyword may further correspond to a corresponding TD-IDF value, so that during online identification, the keyword set and the corresponding TD-IDF value may be directly utilized to obtain the keyword and the TD-IDF value corresponding to the product to be identified.
In this embodiment, the TF value of a keyword may be obtained by dividing the number of times that the keyword appears in the text descriptions of all sample products in the preset category by the number of text descriptions (i.e., the number of all sample products in the preset category); the IDF of the keyword may be determined by the number of preset categories to which the sample product corresponding to the text description where the keyword appears belongs and the total number of the preset categories, and the calculation formula is IDF log (n/m), where n is the total number of the preset categories, and m is the number of the preset categories where the keyword appears. For example, if the keyword a appears in the text descriptions of the sample products in the preset category 1, the preset category 2, and the preset category 3, and the preset categories are 5 in total, the keyword appears in the three preset categories, and thus the IDF of the keyword is log (5/3).
The TD-IDF value of a keyword is the product of the TD value and the IDF value of the keyword.
For example, the TD-IDF values of the keywords under each preset category in the sample data collected in one takeaway ordering platform are as follows:
[ "fried egg", "braised in soy sauce", "home", "cold mix", "braised eggplant", "palace chicken", "diced", "sugar and vinegar", "shredded fish", "agaric", "potato", "mugwort", "tofu", "kidney", "tomato", "inner ridge", "jigsaw", "dish", "small" ] home vegetables [0.34, 0.33, 0.28, 0.26, 0.22, 0.19, 0.17, 0.13, 0.12, 0.1, 0.07, 0.08, 0.05, 0.07, 0.06, 0.07, 0.05, 0.05, 0.06]
[ "beer", "rice wine", "Yanjing beer", "snowflake", "wheat", "Harbin", "can", "Skyo", "Naja", "protoplasm", "Qingdao", "egg wine", "courage", "Belgium", "refreshing", "king", "white", "sweet osmanthus", "Xiao", "liqueur" ] wine [1.68, 0.42, 0.37, 0.37, 0.22, 0.19, 0.18, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.12, 0.1, 0.1, 0.11, 0.07]
[ "roast meat", "crispy skin", "roast", "roasted", "crusty", "sausage", "kebab", "Orleans", "muscle", "tendon", "cumin", "Salix", "honeydew", "salad", "New Orleans", "chicken", "tendon", "croissant", "baked bread", "roasted", "Turkey" ] roast meat [1.57, 0.21, 0.19, 0.18, 0.15, 0.14, 0.11, 0.12, 0.1, 0.1, 0.1, 0.08, 0.08, 0.09, 0.07, 0.08, 0.06, 0.08, 0.06, 0.05]
[ "braised", "chicken", "rice", "extremely spicy", "tofu skin", "golden mushroom", "Wu Ji", "abalone", "of", "chicken meat", "potato piece", "not yet", "casserole", "thousand pot", "big spicy", "giving skin of beans" ] braised chicken rice [1.81, 1.26, 0.4, 0.14, 0.13, 0.12, 0.1, 0.09, 0.1, 0.08, 0.07, 0.07, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]
[ "iron plate roasting", "skewered meat", "whisker", "pungency", "big chicken chop", "select", "squid", "old dry mother", "pancake", "iron plate", "fish", "eggplant", "juice", "tofu", "sesame cake", "egg", "teppanyak [2.94, 0.43, 0.43, 0.41, 0.4, 0.37, 0.33, 0.32, 0.32, 0.29, 0.28, 0.27, 0.25, 0.25, 0.23, 0.21]
[ "meatball", "beef meatball", "pellet", "stew", "soup", "meatball", "urination", "white gourd", "casserole", "cooking", "octopus", "croquette", "delicacy", "handmade", "four happiness", "vegetarian", "vermicelli", "meatball" ] meatballs [1.14, 0.35, 0.19, 0.19, 0.18, 0.16, 0.14, 0.12, 0.11, 0.1, 0.110.1, 0.09, 0.07, 0.07, 0.07, 0.06, 0.07, 0.06, 0.05]
[ "dessert", "outman", "double poetry" macarons "," gift box "," afternoon tea "," Rui + - "," Rubi "," Brown "," Nib "," butter "," curl "] dessert [2.82, 0.66, 0.66, 0.62, 0.6, 0.6, 0.58, 0.55, 0.54, 0.53, 0.5, 0.42]
Wherein, in each section of content, the former part is a plurality of key words extracted from the dish category, such as 'fried egg', 'braised pork in brown sauce', and the middle part is the name of the dish category, such as 'home dish'; the latter half is the corresponding TD-IDF values of these keywords, such as "0.34, 0.33", etc.
For example, for a dish "bonbon pot" the keywords and feature data shown in the following table can be extracted from the relevant text description:
the name of the dish is as follows: bonito, restaurant: millions, cuisine: beijing vegetable, main operation: snack food
Uterus protector | Diced chicken | ....... | Snack food | ....... |
0.3 | 0.5 | 1 |
The first action in the table is a keyword, and the second action is characteristic data corresponding to the dish of 'Tungbao chicken with rice covered in pot', namely the TD-IDF value corresponding to each keyword.
In an optional implementation manner of this embodiment, as shown in fig. 10, the third determining sub-module includes:
a fourth determination submodule 1001 configured to determine a TD-IDF value of the keyword under a category to which the sample product belongs;
a selecting sub-module 1002 configured to select a smallest TD-IDF value as the importance degree of the keyword when the keyword corresponds to a plurality of TD-IDF values under different categories.
In this alternative implementation, if the same keyword occurs in multiple predetermined categories, a TD-IDF value of the keyword can be calculated for each of the predetermined categories, and for the sake of uniformity, the minimum TD-IDF value can be selected for the importance of the keyword.
In an optional implementation manner of this embodiment, the first determining module 703 includes:
and the fifth determining sub-module is configured to take a default value as the importance degree of the keyword corresponding to the sample product when the relevance of all the participles corresponding to the sample product and the category to which the sample product belongs is lower than a preset threshold value.
In this optional implementation manner, when extracting the keywords in the text description, the low-relevance participles with the category to which the sample product belongs are removed, and if the relevance of all the participles with the category to which the sample product belongs in the text description of one sample product is lower than a preset threshold, the importance degree of the keywords in the feature data of the sample product may be set to a default value. The default value may be set according to practical situations, and is not limited herein.
Fig. 11 shows a block diagram of a product identification device according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 11, the product recognition apparatus includes:
a second obtaining module 1101 configured to obtain a text description of the product to be identified;
a second extraction module 1102 configured to extract keywords of the text description;
a second determining module 1103 configured to determine the importance degree of the keyword;
the recognition module 1104 is configured to input the importance degree of the keyword into a pre-trained product recognition model so as to recognize the product to be recognized; wherein the product identification model is trained by the data processing device.
In this embodiment, the product to be identified may be a product currently related to the online platform, such as dishes on a take-out ordering platform, clothing on an e-commerce platform, living goods, household goods, and the like. The textual description of the product to be identified includes, but is not limited to, textual descriptions of attributes such as product name, product material, manufacturing process, efficacy, size, quantity, and the like. For example, the textual description of the dish on the takeaway ordering platform may include the name of the dish, the food material of the dish, the method of doing the dish, and so on.
In some embodiments, the textual description of the product to be identified may also include a textual description of an operator to whom the product to be identified belongs. Under the general condition, the product types operated by one operator are relatively similar, even some operators only operate products of one type, so that when the product identification model is trained, the data of the operator is also used as input data, the product identification model learns the characteristics which can influence the product types from the operator data, and the identification accuracy of the product identification model is further improved.
The data of the operator to which the product to be identified belongs may include, but is not limited to, the name of the operator, the main operation range, the large range to which the product operated by the operator belongs (such as a cuisine in the catering industry), and the like.
For a product to be identified, one or more keywords can be extracted from the text description of the product to be identified, and then the importance degree of the one or more keywords is determined, wherein the importance degree is used for indicating the action size of the keyword on product identification, if the keyword can play an important role in product identification, the importance degree of the keyword is higher, and if the keyword cannot play an important role in product identification, the importance degree of the keyword is lower.
The product identification model is obtained by training the data processing device, so the specific details of the product identification model can be referred to the above description of the data processing device, and are not described herein again.
In an optional implementation manner of this embodiment, as shown in fig. 12, the second extracting module 1102 includes:
a second word segmentation sub-module 1201 configured to segment the text description;
a matching sub-module 1202 configured to match the participle with a keyword set, and determine whether the participle is a keyword.
In this optional implementation manner, as described in the above data processing apparatus, in the training process, for all collected sample products, keywords in corresponding text descriptions are extracted, and a keyword set corresponding to the sample products is formed, and the importance degrees of these keywords are also determined in subsequent steps. Therefore, after the training of the product recognition model is completed, the keyword set can be reserved, the participles in the text description of the product to be recognized are obtained and then matched with the keyword set, if the matching is successful, the participles can be determined as the keywords corresponding to the product to be recognized, and the importance degrees of the keywords can also be directly determined.
The determination of the keywords and the determination of the importance degree can be referred to the above description of the data processing apparatus, and are not described herein again.
It should be noted that, when there is no keyword matching with the keyword set in the text description of the product to be recognized, the importance degree of the keyword corresponding to the recognized product may be set as a default value.
The disclosed embodiment also provides an electronic device, as shown in fig. 13, including a processor 1301; and memory 1302 communicatively coupled to the processor 1301; wherein the memory 1302 stores instructions executable by the processor 1301, the instructions being executable by the processor 1301 to implement:
acquiring sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
extracting key words in the text description;
determining the importance degree of the keyword;
training a product identification model by using the characteristic data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
Wherein, obtaining sample data comprises:
acquiring text descriptions of a plurality of sample products in a preset category;
and performing de-duplication processing on the text descriptions of the sample products.
Wherein de-duplicating the textual descriptions of the plurality of sample products comprises:
and uniformly mapping a plurality of different text descriptions corresponding to the same sample product into the same text description.
Wherein, extracting the keywords in the text description comprises:
segmenting the text description;
and determining the participles with the correlation higher than a preset threshold value with the category to which the sample product belongs as the keywords.
Determining the participles with the relevance higher than a preset threshold value with the category to which the sample product belongs as the keywords, wherein the method comprises the following steps:
determining the segmentation words and the categories to which the sample products belong by using a chi-square independent test electronic device, wherein determining the importance degree of the keywords comprises the following steps:
and determining the TD-IDF value of the keyword as the importance degree of the keyword.
Determining the TD-IDF value of the keyword as the importance degree of the keyword, wherein the determining comprises the following steps:
determining a TD-IDF value of the keyword under a category to which the sample product belongs;
and when the keyword corresponds to a plurality of TD-IDF values under different categories, selecting the smallest TD-IDF value as the importance degree of the keyword.
Wherein determining the importance of the keyword comprises:
and when the relevance of all the participles corresponding to the sample product and the category to which the sample product belongs is lower than a preset threshold value, taking a default value as the importance degree of the keyword corresponding to the sample product.
The present implementations also provide an electronic device comprising a memory and a processor; wherein,
the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of: acquiring text description of a product to be identified;
extracting keywords of the text description;
determining the importance degree of the keyword;
inputting the importance degree of the keyword into a pre-trained product identification model so as to identify the product to be identified; wherein the product identification model is trained by using the electronic device shown in fig. 13.
Wherein, extracting the keywords in the text description comprises:
segmenting the text description;
and matching the word segmentation with a keyword set to determine whether the word segmentation is a keyword.
Specifically, the processor 1301 and the memory 1302 may be connected by a bus or in other manners, and fig. 13 illustrates an example of connection by a bus. Memory 1302, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 1301 executes various functional applications of the apparatus and data processing by running nonvolatile software programs, instructions, and modules stored in the memory 1302, that is, implements the above-described method in the embodiments of the present disclosure.
The memory 1302 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, application programs required for functions; the storage data area may store historical data of shipping network traffic, and the like. Further, the memory 1302 may include high speed random access memory and may also include non-volatile memory, such as a magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the electronic device optionally includes a communication component 1303, and the memory 1302 optionally includes memory remotely located from the processor 1301, which may be connected to an external device through the communication component 1303. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 1302, which when executed by the one or more processors 1301, perform the methods described above in the embodiments of the present disclosure.
The product can execute the method provided by the embodiment of the disclosure, has corresponding functional modules and beneficial effects of the execution method, and reference can be made to the method provided by the embodiment of the disclosure for technical details which are not described in detail in the embodiment.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Claims (10)
1. A data processing method, comprising:
acquiring sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
extracting key words in the text description;
determining the importance degree of the keyword;
training a product identification model by using the characteristic data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
2. The method of claim 1, wherein obtaining sample data comprises:
acquiring text descriptions of a plurality of sample products in a preset category;
and performing de-duplication processing on the text descriptions of the sample products.
3. The method of claim 1, wherein de-duplicating the textual descriptions of the plurality of sample products comprises:
and uniformly mapping a plurality of different text descriptions corresponding to the same sample product into the same text description.
4. The method according to any one of claims 1-3, wherein extracting keywords from the textual description comprises:
segmenting the text description;
and determining the participles with the correlation higher than a preset threshold value with the category to which the sample product belongs as the keywords.
5. A method of product identification, comprising:
acquiring text description of a product to be identified;
extracting keywords of the text description;
determining the importance degree of the keyword;
inputting the importance degree of the keyword into a pre-trained product identification model so as to identify the product to be identified; wherein the product recognition model is trained using the method of any one of claims 1-4.
6. A data processing apparatus, comprising:
a first obtaining module configured to obtain sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
a first extraction module configured to extract keywords in the text description;
a first determination module configured to determine a degree of importance of the keyword;
a training module configured to train a product recognition model using the feature data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
7. A product identification device, comprising:
the second acquisition module is configured to acquire a text description of the product to be identified;
a second extraction module configured to extract keywords of the textual description;
a second determination module configured to determine a degree of importance of the keyword;
the recognition module is configured to input the importance degree of the keyword into a pre-trained product recognition model so as to recognize the product to be recognized; wherein the product recognition model is trained using the apparatus of claim 6.
8. An electronic device comprising a memory and a processor; wherein,
the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of:
acquiring sample data; wherein the sample data comprises a textual description of a sample product and a category to which the sample product belongs;
extracting key words in the text description;
determining the importance degree of the keyword;
training a product identification model by using the characteristic data of the sample product and the category to which the sample product belongs; wherein the feature data includes the degree of importance of the keyword corresponding to the sample product.
9. An electronic device comprising a memory and a processor; wherein,
the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of:
acquiring text description of a product to be identified;
extracting keywords of the text description;
determining the importance degree of the keyword;
inputting the importance degree of the keyword into a pre-trained product identification model so as to identify the product to be identified; wherein the product recognition model is trained using the electronic device of claim 8.
10. A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910563737.7A CN110264318A (en) | 2019-06-26 | 2019-06-26 | Data processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910563737.7A CN110264318A (en) | 2019-06-26 | 2019-06-26 | Data processing method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110264318A true CN110264318A (en) | 2019-09-20 |
Family
ID=67921955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910563737.7A Pending CN110264318A (en) | 2019-06-26 | 2019-06-26 | Data processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110264318A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837867A (en) * | 2019-11-08 | 2020-02-25 | 深圳市深视创新科技有限公司 | Method for automatically distinguishing similar and heterogeneous products based on deep learning |
CN110941719A (en) * | 2019-12-02 | 2020-03-31 | 中国银行股份有限公司 | Data classification method, test method, device and storage medium |
CN111190635A (en) * | 2020-01-03 | 2020-05-22 | 拉扎斯网络科技(上海)有限公司 | Method, device and equipment for determining characteristic data of application program and storage medium |
CN111429184A (en) * | 2020-03-27 | 2020-07-17 | 北京睿科伦智能科技有限公司 | User portrait extraction method based on text information |
CN111522945A (en) * | 2020-04-10 | 2020-08-11 | 南通大学 | Poetry style analysis method based on chi-square test |
CN113657113A (en) * | 2021-08-24 | 2021-11-16 | 北京字跳网络技术有限公司 | Text processing method and device and electronic equipment |
CN114416992A (en) * | 2022-01-18 | 2022-04-29 | 新华智云科技有限公司 | Entity text relevance calculation method and system based on machine learning |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104951430A (en) * | 2014-03-27 | 2015-09-30 | 携程计算机技术(上海)有限公司 | Product feature tag extraction method and device |
US20160239865A1 (en) * | 2013-10-28 | 2016-08-18 | Tencent Technology (Shenzhen) Company Limited | Method and device for advertisement classification |
CN106095996A (en) * | 2016-06-22 | 2016-11-09 | 量子云未来(北京)信息科技有限公司 | Method for text classification |
CN106156372A (en) * | 2016-08-31 | 2016-11-23 | 北京北信源软件股份有限公司 | The sorting technique of a kind of internet site and device |
CN106294355A (en) * | 2015-05-14 | 2017-01-04 | 阿里巴巴集团控股有限公司 | A kind of determination method and apparatus of business object attribute |
CN107609160A (en) * | 2017-09-26 | 2018-01-19 | 联想(北京)有限公司 | A kind of file classification method and device |
CN108595418A (en) * | 2018-04-03 | 2018-09-28 | 上海透云物联网科技有限公司 | A kind of commodity classification method and system |
US20190005121A1 (en) * | 2017-06-29 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for pushing information |
CN109388712A (en) * | 2018-09-21 | 2019-02-26 | 平安科技(深圳)有限公司 | A kind of trade classification method and terminal device based on machine learning |
CN109522544A (en) * | 2018-09-27 | 2019-03-26 | 厦门快商通信息技术有限公司 | Sentence vector calculation, file classification method and system based on Chi-square Test |
CN109614475A (en) * | 2018-12-07 | 2019-04-12 | 广东工业大学 | A kind of product feature based on deep learning determines method |
-
2019
- 2019-06-26 CN CN201910563737.7A patent/CN110264318A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160239865A1 (en) * | 2013-10-28 | 2016-08-18 | Tencent Technology (Shenzhen) Company Limited | Method and device for advertisement classification |
CN104951430A (en) * | 2014-03-27 | 2015-09-30 | 携程计算机技术(上海)有限公司 | Product feature tag extraction method and device |
CN106294355A (en) * | 2015-05-14 | 2017-01-04 | 阿里巴巴集团控股有限公司 | A kind of determination method and apparatus of business object attribute |
CN106095996A (en) * | 2016-06-22 | 2016-11-09 | 量子云未来(北京)信息科技有限公司 | Method for text classification |
CN106156372A (en) * | 2016-08-31 | 2016-11-23 | 北京北信源软件股份有限公司 | The sorting technique of a kind of internet site and device |
US20190005121A1 (en) * | 2017-06-29 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for pushing information |
CN107609160A (en) * | 2017-09-26 | 2018-01-19 | 联想(北京)有限公司 | A kind of file classification method and device |
CN108595418A (en) * | 2018-04-03 | 2018-09-28 | 上海透云物联网科技有限公司 | A kind of commodity classification method and system |
CN109388712A (en) * | 2018-09-21 | 2019-02-26 | 平安科技(深圳)有限公司 | A kind of trade classification method and terminal device based on machine learning |
CN109522544A (en) * | 2018-09-27 | 2019-03-26 | 厦门快商通信息技术有限公司 | Sentence vector calculation, file classification method and system based on Chi-square Test |
CN109614475A (en) * | 2018-12-07 | 2019-04-12 | 广东工业大学 | A kind of product feature based on deep learning determines method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837867A (en) * | 2019-11-08 | 2020-02-25 | 深圳市深视创新科技有限公司 | Method for automatically distinguishing similar and heterogeneous products based on deep learning |
CN110941719A (en) * | 2019-12-02 | 2020-03-31 | 中国银行股份有限公司 | Data classification method, test method, device and storage medium |
CN110941719B (en) * | 2019-12-02 | 2023-12-19 | 中国银行股份有限公司 | Data classification method, testing method, device and storage medium |
CN111190635A (en) * | 2020-01-03 | 2020-05-22 | 拉扎斯网络科技(上海)有限公司 | Method, device and equipment for determining characteristic data of application program and storage medium |
CN111190635B (en) * | 2020-01-03 | 2021-10-29 | 拉扎斯网络科技(上海)有限公司 | Method, device and equipment for determining characteristic data of application program and storage medium |
CN111429184A (en) * | 2020-03-27 | 2020-07-17 | 北京睿科伦智能科技有限公司 | User portrait extraction method based on text information |
CN111522945A (en) * | 2020-04-10 | 2020-08-11 | 南通大学 | Poetry style analysis method based on chi-square test |
CN113657113A (en) * | 2021-08-24 | 2021-11-16 | 北京字跳网络技术有限公司 | Text processing method and device and electronic equipment |
CN114416992A (en) * | 2022-01-18 | 2022-04-29 | 新华智云科技有限公司 | Entity text relevance calculation method and system based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110264318A (en) | Data processing method and device, electronic equipment and storage medium | |
Chen et al. | Cross-modal recipe retrieval with rich food attributes | |
US20220005376A1 (en) | Systems and methods to mimic target food items using artificial intelligence | |
US11823042B2 (en) | System for measuring food weight | |
CN106503442A (en) | Menu recommendation method and device | |
CN107067293A (en) | Merchant category method, device and electronic equipment | |
Morol et al. | Food recipe recommendation based on ingredients detection using deep learning | |
CN110851571B (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
Sudo et al. | Estimating nutritional value from food images based on semantic segmentation | |
CN108596789B (en) | Dish standardization method | |
Kitamura et al. | Image processing based approach to food balance analysis for personal food logging | |
CN110322323A (en) | Entity display method, entity display device, storage medium and electronic equipment | |
Park et al. | Adapting a standardised international 24 h dietary recall methodology (GloboDiet software) for research and dietary surveillance in Korea | |
CN110968748A (en) | Electronic menu processing method, device and system | |
Amano et al. | Food category representatives: Extracting categories from meal names in food recordings and recipe data | |
CN109472025B (en) | Dish name extraction method and device | |
EP3848870A1 (en) | Nutritional value calculation of a dish | |
CN118193523A (en) | Intelligent cooking method and intelligent cooking device based on cooking AI large model and RAG system | |
Tachibana et al. | Extraction of naming concepts based on modifiers in recipe titles | |
Yanai et al. | Large-scale twitter food photo mining and its applications | |
CN114218415A (en) | Cooking recipe display method and device | |
Prajena et al. | Indonesian Traditional Food Image Recognition using Convolutional Neural Network | |
Lim et al. | Explainable artificial intelligence in oriental food recognition using convolutional neural network | |
CN117541359B (en) | Dining recommendation method and system based on preference analysis | |
CN115797924A (en) | Training method and image retrieval method of model for food image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190920 |
|
RJ01 | Rejection of invention patent application after publication |