Nothing Special   »   [go: up one dir, main page]

WO2022126901A1 - Commodity recommendation method and related device thereof - Google Patents

Commodity recommendation method and related device thereof Download PDF

Info

Publication number
WO2022126901A1
WO2022126901A1 PCT/CN2021/082934 CN2021082934W WO2022126901A1 WO 2022126901 A1 WO2022126901 A1 WO 2022126901A1 CN 2021082934 W CN2021082934 W CN 2021082934W WO 2022126901 A1 WO2022126901 A1 WO 2022126901A1
Authority
WO
WIPO (PCT)
Prior art keywords
commodity
node
embedding vector
sequence
attribute
Prior art date
Application number
PCT/CN2021/082934
Other languages
French (fr)
Chinese (zh)
Inventor
陈浩
谯轶轩
高鹏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022126901A1 publication Critical patent/WO2022126901A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a product recommendation method, device, computer equipment and storage medium.
  • Graph embedding is one of the hot research fields in recommender systems and graph social networks in recent years. By learning the information contained in the nodes in the graph, the nodes are mapped to a quantifiable space, that is, the node embedding in the graph is generated. In downstream tasks, the complex relationship between nodes in the network can be more deeply understood by performing tasks such as similarity calculation, classification, and clustering on node embedding.
  • the historical browsing order records of the products can be obtained first, and a graph network of commodity nodes can be constructed based on the historical browsing order records. Then, based on the random walk method, the topology structure between the nodes in each graph in the constructed commodity node graph network is transformed into a text-like sequence structure. Further, the word2vec model combined with the method of negative sampling is used to embed the generated sequence structure into a new space, and the structural relationship of the browsing order between each node in the graph network is mined, so as to determine the embedding vector of each commodity node in the commodity node graph network, That is, the embedding vector. Finally, use the determined embedding vector of each commodity node to recommend similar commodities.
  • the above-determined embedding vector of commodity nodes can realize commodity recommendation to a certain extent, but the inventor found that the commodity node embedding vector determined only by the above-mentioned means only contains the order information between commodity nodes, and contains a single information. When recommended, the effect of product recommendation is not significant.
  • the purpose of the embodiments of the present application is to provide a method, apparatus, computer equipment and storage medium for recommending products, which are mainly used to solve the technical problem of poor product recommendation effect due to the single information contained in the embedded vector of existing product nodes.
  • the embodiment of the present application provides a method for recommending commodities, which adopts the following technical solutions:
  • the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
  • the product corresponding to the comprehensive embedding vector close to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
  • the embodiment of the present application also provides a product recommendation device, which adopts the following technical solutions:
  • an acquisition unit used to acquire historical commodity browsing records of multiple users
  • a generating unit generating a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records
  • the training unit is used for combining the negative sampling strategy, using the commodity node sequence and the commodity attribute node sequence as training corpus, respectively inputting the word2vec model, and calculating the commodity node embedding vector and the commodity attribute node embedding vector;
  • a computing unit configured to perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
  • a response unit used to determine the target commodity accessed by the user
  • the recommendation unit is configured to determine the product corresponding to the comprehensive embedding vector similar to the comprehensive embedding vector of the target product, which is the recommended product determined for the target product.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • a computer device comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the following steps of a commodity recommendation method:
  • the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
  • the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • a computer-readable storage medium wherein computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the following commodity recommendation method are implemented:
  • the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
  • the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
  • a commodity node sequence and a commodity attribute node sequence are further generated according to the historical commodity browsing records. Then, combined with the negative sampling strategy, the commodity node sequence and commodity attribute node sequence are used as training corpus, and input into the word2vec model respectively, and the commodity node embedding vector and commodity attribute node embedding vector are calculated. Finally, perform vector splicing or vector summation operations on the commodity node embedding vector and commodity attribute node embedding vector to obtain the comprehensive embedding vector corresponding to each commodity.
  • the commodity recommendation method that integrates commodity attribute information proposed in this application can better combine the structural information and attribute information of the nodes in the graph network.
  • the commodity comprehensive embedding vector calculated in this application not only contains the user's behavior information, but also contains the content information of the commodity itself, which can reflect the property information of the commodity itself.
  • the embedding vector in the prior art can achieve a more accurate product recommendation effect.
  • FIG. 1 is a flowchart of an embodiment of a product recommendation method of the present application
  • Fig. 2 is a flow chart of a specific implementation manner of step S120 in Fig. 1;
  • FIG. 3 is a flowchart of another embodiment of a product recommendation method of the present application.
  • FIG. 4 is a schematic diagram of an embodiment of a product recommendation device of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of the generating unit 420
  • FIG. 6 is a schematic diagram of an embodiment of a computer device of the present application.
  • Embedding refers to using a low-dimensional vector to represent an object, which can be a word, a commodity, or a movie, etc.
  • the nature of the embedding vector is to make the objects corresponding to the vectors with similar distances have similar meanings. For example, the distance between embedding (Avengers) and embedding (Iron Man) will be very close, but embedding (Avengers) and embedding (Gone with the Wind) distance will be farther.
  • the Google team published a tool for converting words into vector form, the word2vec (word to vector) algorithm.
  • word2vec word to vector
  • a unique multi-dimensional word vector can be mapped for each word included in the training corpus after the training, thereby simplifying the processing of the text content.
  • word2vec can be used in semantic analysis scenarios.
  • the word vectors corresponding to the two words can be determined.
  • word2vec provides an efficient bag-of-words and skip-gram implementation for computing vector words.
  • an embodiment of the present application proposes a method for recommending products.
  • the final network embedding vector not only contains the user's behavior information, but also contains the content information of the product itself, which can be used to a greater extent. It characterizes the nature of the commodity itself.
  • FIG. 1 a flowchart of an embodiment of a product recommendation method of the present application is shown.
  • the described method for recommending products includes the following steps:
  • Step S110 obtaining historical commodity browsing records of multiple users.
  • a commodity recommendation method runs on an electronic device (for example, a server/terminal device) on which it can receive transmissions from an external device through a wired connection or a wireless connection, or actively collect the user's history.
  • an electronic device for example, a server/terminal device
  • the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, ultra wide band (UWB) connection, and other currently known or developed in the future. wireless connection method.
  • the above-mentioned commodities may cover various concepts, for example, may be actual commodities in a shopping website, or may be virtual commodities such as movies, music, videos, or games, which are not specifically limited in this embodiment.
  • the obtained historical commodity browsing records may specifically include commodity name or commodity ID information of the commodity, and attribute information of the commodity.
  • the attribute information of the product, or the label information of the product can be classified in various ways according to different classification standards and classification levels.
  • commodities can be classified according to attributes and characteristics such as usage, raw materials, production methods, chemical composition, and usage status, and there are no specific restrictions on the classification standards or classification levels corresponding to the attributes of specific commodities.
  • the acquisition of historical commodity browsing records may be collected by means of data buried points and browsing time thresholds. Trigger the operation of collecting the browsing records of the user this time.
  • the acquisition of historical commodity browsing records may also be for commodities under a certain commodity attribute. product.
  • the above-mentioned historical commodity browsing records may also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Step S120 Generate a commodity node sequence and a commodity attribute node sequence according to the historical commodity browsing records.
  • the historical commodity browsing records of multiple users recorded in the system can be counted to determine the commodity node set included in the historical commodity browsing records.
  • the commodity node set is processed by the commodity recommendation method of the graph network, and the commodity node sequence is output.
  • the classification standard corresponds to a set of commodity attributes, and the commodity node sequence is converted into a commodity attribute node sequence.
  • Commodity node sequence for example: commodity 1-commodity 6-commodity 3...commodity N, commodity attribute node sequence, such as category 1-category 4-category 3...category N, where the length of each sequence, that is, the included nodes The number can be set by the user in advance.
  • step S130 combining the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input into the word2vec model, and the commodity node embedding vector and commodity attribute node embedding vector are calculated.
  • the sequence of commodity nodes and the sequence of commodity attribute nodes are used as training corpus, respectively input
  • the product node embedding vector corresponding to each product and the corresponding at least one product attribute category embedding vector are finally calculated respectively.
  • the negative sampling strategy formulated in this application is as follows Table 1 shown:
  • the structure of the graph network represents the behavior information of the user clicking on the product, and the side information represents the attribute information of the product.
  • the number of commodity nodes is generally much more than that of category nodes.
  • ⁇ r % commodity nodes and (1- ⁇ r )% category nodes are selected to participate in its training, and because there are fewer category nodes, in order to fully train them without losing their value
  • this application selects ⁇ c % category nodes and (1- ⁇ c )% commodity nodes to participate in training, where ⁇ r > ⁇ c .
  • Step S140 Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity.
  • the comprehensive embedding vector corresponding to any commodity node ri can be calculated as:
  • S150 Determine the target commodity accessed by the user.
  • the target commodity may be the commodity currently being browsed by the user, or may be the commodity in the user's recent browsing record.
  • S160 Determine a product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product as a recommended product determined for the target product.
  • the comprehensive embedding vector of the commodity calculated above not only contains the behavior information of the user, but also contains the content information of the commodity itself, which can reflect the property information of the commodity itself. Therefore, the relationship between the commodity and the target commodity can be determined by calculation. Commodities corresponding to the integrated embedding vector close to the integrated embedding vector are the recommended commodities determined for the target commodity.
  • using the commodity node embedding vector and commodity attribute node embedding vector to perform commodity recommendation at the same time may include: calculating the comprehensive embedding vector of the target commodity and the combination of all other commodities except the target commodity.
  • the similarity is the cosine similarity. The larger the value of the cosine similarity, the closer the corresponding recommended product is to the target product. Therefore, a preset threshold or parameter N can be set to filter and determine the target product. Recommended product.
  • using the commodity node embedding vector and commodity attribute node embedding vector to perform commodity recommendation at the same time may include: determining the target commodity attribute corresponding to the target commodity; calculating the comprehensive embedding vector of the target commodity and In the attributes of the target product, the similarity between the comprehensive embedding vectors of all other products except the target product; the similarity is greater than the preset threshold, or the top N are ranked in the order of the similarity.
  • the commodity corresponding to the comprehensive embedding vector is the recommended commodity determined for the target commodity.
  • the attribute of the target product corresponding to the target product is determined by setting, and only the comprehensive embedding vector corresponding to the target product and the attribute of the target product are calculated, and the sum of the comprehensive embedding vector of each product in all other products except the target product is calculated. The similarity between the two, and then determine the recommended products of the target product, reducing the calculation pressure.
  • a commodity node sequence and a commodity attribute node sequence are further generated according to the historical commodity browsing records. Then, combined with the negative sampling strategy, the commodity node sequence and commodity attribute node sequence are used as training corpus, and input into the word2vec model respectively, and the commodity node embedding vector and commodity attribute node embedding vector are calculated. Finally, perform vector splicing or vector summation operations on the commodity node embedding vector and commodity attribute node embedding vector to obtain the comprehensive embedding vector corresponding to each commodity.
  • the commodity recommendation method that integrates commodity attribute information proposed in this application can better combine the structural information and attribute information of the nodes in the graph network.
  • the commodity comprehensive embedding vector calculated by this application not only contains the user's behavior information, but also contains the content information of the commodity itself, which can reflect more information on the nature of the commodity itself. Therefore, through this application
  • the comprehensive embedding vector of the commodity calculated in the embodiment is used for commodity recommendation, which can improve the accuracy of commodity recommendation.
  • FIG. 2 is a schematic diagram of an embodiment of step S120 shown in FIG. 1 , which may include:
  • step S121 the historical commodity browsing records are counted, and a graph network of commodity nodes is constructed.
  • the commodity set r in the historical commodity browsing record is selected as a graph node of the graph network
  • the set c of attribute information in the historical commodity browsing record is selected as the edge information of the graph network to be constructed.
  • the final graph network is generated by counting the historical commodity browsing records of all users and selecting nodes whose edge weights are greater than the preset value w, where V represents the set of commodity nodes in the graph, E represents the set of edges between items in the graph, e ij ⁇ E, e ij >w, and w is a positive integer.
  • Step S122 constructing a commodity attribute dictionary according to the graph network, where the commodity attribute dictionary includes corresponding relationships between different commodity nodes and different commodity attributes.
  • the attribute information of the commodities in the graph network is counted, and the mapping dictionary D from the commodity set r to the category set c is generated according to the node information in the graph, namely:
  • each commodity may correspond to one or more categories.
  • Step S123 converting the graph network into a sequence of commodity nodes by random walk.
  • the deepwalk algorithm is used for reference, and the commodity node sequence is obtained according to the commodity node set included in the graph network in a random walk manner.
  • the Deepwalk algorithm is a graph-structured data mining algorithm that combines random walk and word2vec algorithms. Random walk refers to randomly taking a node in the graph network as the starting point to generate a sequence of commodity nodes with a preset random walk sequence length.
  • specifically converting the graph network into a sequence of commodity nodes through a random walk method may include: sequentially normalizing the out-degrees of commodity nodes in the graph network, and determining each commodity node The out-degree probability of ; according to the out-degree probability, a random walk method is used to generate the commodity node sequence.
  • graphs can be divided into directed graphs and undirected graphs. All edges of a directed graph have a direction, that is, a direction from vertex to vertex is determined; while all edges of an undirected graph are bidirectional, that is, two vertices connected by an undirected edge can reach each other.
  • an undirected graph can be thought of as consisting of two directed edges where all edges are positive and negative.
  • the degree of a vertex refers to the number of edges connected to the vertex.
  • the number of out-edges of a vertex is called the out-degree of the vertex
  • the number of in-edges of a vertex is called the in-degree of the vertex.
  • random walk is performed according to the out-degree probability obtained after out-degree normalization, and a graph network that better reflects the closeness of commodity nodes can be obtained.
  • steps S122 and S123 do not necessarily require an execution order.
  • Step S124 based on the commodity attribute dictionary, convert the commodity node sequence into a commodity attribute node sequence.
  • the commodity node sequence set S r in the above step S123 is converted into the corresponding category node sequence set S c , that is, for any commodity sequence L r in S r , Its corresponding category sequence L c is:
  • the traditional word2vec combined with negative sampling model architecture is adopted, and the structure information and attribute information of the commodity nodes of the network nodes are integrated on the basis of it, which has the advantages of simple model structure, less parameter quantity and short training time. , which can be better and widely used in real-time graph network scenarios.
  • FIG. 3 is a schematic diagram of another embodiment of a product recommendation method in the embodiment of the present application, which may include:
  • Step S310 obtaining historical commodity browsing records of multiple users.
  • step S310 is similar to step S110 shown in FIG. 1 , and details are not described here.
  • step S320 the historical commodity browsing records are counted, and a graph network of commodity nodes is constructed.
  • step S320 is similar to step S121 shown in FIG. 2 , and details are not described here.
  • Step S330 constructing corresponding commodity attribute dictionaries for different classification standards preset by the graph network respectively, wherein each classification standard is preset corresponding to a commodity attribute set.
  • classification standard 1 and classification standard 2 can be set, and classification standard 1 can include a set of book categories, such as ⁇ city, romance, martial arts, fantasy, Suspense, game, reasoning ⁇ , the classification standard 2 is a collection of book author names, such as ⁇ Lu Yao, Guo Jingming, Higashino Keigo, Natsume Soseki, Salinger, Shakespeare, ... ⁇ and so on.
  • classification standard 1 and classification standard 2 can include a set of book categories, such as ⁇ city, romance, martial arts, fantasy, Suspense, game, reasoning ⁇
  • the classification standard 2 is a collection of book author names, such as ⁇ Lu Yao, Guo Jingming, Higashino Keigo, Natsume Soseki, Salinger, Shakespeare, ... ⁇ and so on.
  • corresponding commodity attribute dictionaries under various classification systems can be constructed.
  • step S122 in FIG. 2 for the process of constructing the commodity attribute dictionary under each classification system, reference may be made to step S122 in FIG
  • Step S340 converting the graph network into a sequence of commodity nodes by random walk.
  • step S340 is similar to step S123 shown in FIG. 2 , and details are not described here.
  • Step S350 based on the commodity attribute dictionary, convert the commodity node sequence into commodity attribute node sequences corresponding to different classification standards.
  • step S330 when a commodity attribute dictionary with multiple classification systems is generated, when the commodity node sequence is converted into a commodity attribute node sequence, commodity attribute nodes corresponding to different classification standards are also converted together. sequence.
  • step S124 for the process of converting the commodity attribute node sequence to obtain the commodity attribute node sequence, reference may be made to step S124 in FIG. 2 , and details are not described here.
  • step S360 combining the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and commodity attribute node embedding vector under different classification standards are calculated.
  • step S350 when the commodity attribute sequences of multiple classification systems are obtained after conversion, the commodity attribute sequence is used as the training corpus to calculate the commodity attribute node embedding vector, and the corresponding commodity attribute sequences under different classification systems are also obtained.
  • Product attribute node embedding vector For the specific calculation process of obtaining the commodity node embedding vector and commodity attribute node embedding vector under different classification standards, reference may be made to step S130 in FIG. 1 , which will not be repeated here.
  • Step S370 Perform a vector weighted sum operation on the commodity node embedding vector and the commodity attribute node embedding vector according to the preset weights corresponding to the commodity node embedding vectors and the corresponding weights of different classification standards, to obtain each The comprehensive embedding vector corresponding to the product.
  • the embedding vector and the corresponding commodity node embedding vector under different classification standards set the corresponding weight value. Therefore, the vector weighted summation operation is performed on the commodity node embedding vector corresponding to each commodity and the corresponding commodity attribute node embedding vector under at least one classification standard to obtain the comprehensive embedding vector corresponding to each commodity.
  • the comprehensive embedding vector may be obtained by combining commodity node embedding vector A, classification standard 1 commodity attribute node embedding vector B, and classification standard 2 commodity attribute node embedding vector C.
  • the priority order of importance is B-A-C.
  • weight parameters of different sizes can be set for A, B, and C respectively, such as 0.8, 0.9., and 0.6. Therefore, when the comprehensive embedding vector corresponding to a certain book is combined and calculated, the A, B and C corresponding to the book are first determined respectively, and then the respective embedding vectors are multiplied with the corresponding weights, and the final comprehensive calculation of the book is obtained by weighted combination calculation. Embedding vector.
  • S390 Determine a product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product as a recommended product determined for the target product.
  • steps S380-S390 are similar to the foregoing steps S150-S160, and are not repeated here.
  • the embedded vector considering that the embedded vector is in actual application scenarios, the impact of the embedded vector of commodity attribute nodes corresponding to different classification standards on the comprehensive embedded vector of the product is different.
  • the embedding vector and the corresponding commodity node embedding vector under different classification standards set the corresponding weight value. Therefore, the comprehensive embedding vector corresponding to the product obtained by the final calculation can better represent the essential properties of the product in different application scenarios. Therefore, product recommendation can be performed by using the comprehensive embedding vector of the product calculated in the embodiment of the present application, which can improve the accuracy of product recommendation. .
  • the present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides an embodiment of a product recommendation device
  • the device embodiment corresponds to the product recommendation method embodiment shown in FIG. 1
  • the device Specifically, it can be applied to various electronic devices.
  • the apparatus 400 for determining a network embedding vector includes:
  • an obtaining unit 410 configured to obtain historical commodity browsing records of multiple users
  • generating unit 420 generating a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records
  • the training unit 430 is configured to combine the negative sampling strategy, use the commodity node sequence and the commodity attribute node sequence as training corpus, respectively input the word2vec model, and calculate and obtain the commodity node embedding vector and the commodity attribute node embedding vector;
  • a computing unit 440 configured to perform a vector splicing or vector sum operation on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
  • a response unit 450 configured to determine the target commodity accessed by the user
  • the recommending unit 460 is configured to determine a product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product, which is a recommended product determined for the target product.
  • FIG. 4 is a schematic diagram of an embodiment of the generating unit 420, which may include:
  • the first construction subunit 421 is used to count the historical commodity browsing records and construct a graph network of commodity nodes
  • the second construction subunit 422 is configured to construct a commodity attribute dictionary according to the graph network, and the commodity attribute dictionary includes the corresponding relationship between different commodity nodes and different commodity attributes;
  • the first conversion subunit 423 is used to convert the graph network into a sequence of commodity nodes by random walk;
  • the second conversion subunit 424 is configured to convert the sequence of commodity nodes into a sequence of commodity attribute nodes based on the commodity attribute dictionary.
  • the first conversion subunit is specifically configured to perform normalization processing on the out-degrees of commodity nodes in the graph network in turn, and determine the out-degree probability of each commodity node;
  • the degree probability adopts a random walk method to generate the commodity node sequence.
  • the second constructing subunit is specifically configured to construct corresponding commodity attribute dictionaries according to different classification standards preset by the graph network, wherein each classification standard is preset to correspond to a commodity attribute set;
  • a second conversion subunit specifically configured to convert the commodity node sequence into commodity attribute node sequences corresponding to different classification standards based on the commodity attribute dictionary
  • the training unit 430 is specifically configured to combine the negative sampling strategy, use the commodity node sequence and the commodity attribute node sequence as training corpus, respectively input the word2vec model, and calculate the commodity node embedding vector and commodity attribute node embedding under different classification standards. vector;
  • the calculation unit 440 is specifically configured to perform a vector weighted sum operation on the commodity node embedding vector and the commodity attribute node embedding vector according to the preset weights corresponding to the commodity node embedding vectors and the corresponding weights of different classification standards , to obtain the comprehensive embedding vector corresponding to each commodity.
  • the first construction subunit 421 is specifically configured to count the historical commodity browsing records, and select the commodity collection in the historical commodity browsing records as the graph nodes of the graph network to be constructed; determine according to the statistical results Edge weights between each commodity node; select commodity nodes whose edge weights are greater than a preset threshold to construct a graph network of commodity nodes.
  • the apparatus 400 for determining the network embedding vector may further include:
  • the recommended commodity determination unit is used for determining the target commodity accessed by the user; determining the commodity corresponding to the integrated embedding vector close to the integrated embedding vector of the target commodity is the recommended commodity determined for the target commodity.
  • Recommended product determination unit including:
  • a first similarity calculation subunit configured to calculate the similarity between the comprehensive embedding vector of the target commodity and the comprehensive embedding vectors of all other commodities except the target commodity;
  • the first recommended commodity determination subunit is used to determine that the similarity is greater than a preset threshold, or the commodities corresponding to the top N comprehensive embedding vectors in the order of the similarity are determined for the target commodity.
  • N is a positive integer.
  • the recommended commodity determination unit includes:
  • a first commodity attribute determination subunit configured to determine the target commodity attribute corresponding to the target commodity
  • a second similarity calculation subunit configured to calculate the similarity between the comprehensive embedding vector of the target commodity and the comprehensive embedding vectors of all other commodities except the target commodity in the attributes of the target commodity;
  • the second recommended commodity determination subunit is used to take the similarity greater than the preset threshold, or the commodity corresponding to the top N comprehensive embedding vectors ranked according to the similarity degree, is determined for the target commodity
  • N is a positive integer.
  • the device 400 for determining the network embedding vector after acquiring the historical commodity browsing records of multiple users, the device 400 for determining the network embedding vector further generates commodity node sequences and commodity attribute node sequences according to the historical commodity browsing records. Then, combined with the negative sampling strategy, the commodity node sequence and commodity attribute node sequence are used as training corpus, and input into the word2vec model respectively, and the commodity node embedding vector and commodity attribute node embedding vector are calculated. Finally, perform vector splicing or vector summation operations on the commodity node embedding vector and commodity attribute node embedding vector to obtain the comprehensive embedding vector corresponding to each commodity.
  • the commodity recommendation method that integrates commodity attribute information proposed in this application can better combine the structural information and attribute information of the nodes in the graph network.
  • the commodity comprehensive embedding vector calculated by this application not only contains the user's behavior information, but also contains the content information of the commodity itself, which can reflect the property information of the commodity itself.
  • FIG. 6 is a block diagram of the basic structure of a computer device according to this embodiment.
  • the computer device 6 includes a memory 601 , a processor 602 , and a network interface 603 that communicate with each other through a system bus. It should be pointed out that only the computer device 6 with components 601-603 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded equipment etc.
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 601 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the memory 601 may be an internal storage unit of the computer device 6 , such as a hard disk or a memory of the computer device 6 .
  • the memory 601 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 601 may also include both the internal storage unit of the computer device 6 and its external storage device.
  • the memory 601 is generally used to store the operating system and various application software installed on the computer device 6 , such as computer-readable instructions of the above-mentioned method for recommending products.
  • the memory 601 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 602 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 602 is typically used to control the overall operation of the computer device 6 .
  • the processor 602 is configured to execute computer-readable instructions stored in the memory 601 or process data, for example, computer-readable instructions for executing the method for recommending a commodity.
  • the network interface 603 may include a wireless network interface or a wired network interface, and the network interface 603 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
  • the present application also provides another implementation manner, that is, to provide a computer-readable storage medium
  • the computer-readable storage medium may be non-volatile or volatile
  • the computer-readable storage medium stores Computer-readable instructions, executable by at least one processor, to cause the at least one processor to perform the steps of a method for recommending an item as described above.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the commodity recommendation method described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present application relate to the field of artificial intelligence, and relates to a commodity recommendation method. The method mainly comprises: after obtaining a historical commodity browsing record, separately determining a commodity node embedding vector and a commodity attribute node embedding node, then merging or splicing the two embedding vectors to obtain a comprehensive embedding vector of a commodity node, and performing commodity recommendation by using the comprehensive embedding vector. Since the comprehensive embedding vector represents the behavior information of a commodity and the attribute information of the commodity, compared with the prior art, the represented information is richer, and therefore, a more accurate commodity recommendation effect can be achieved. The present application further provides a commodity recommendation apparatus, a computer device, and a storage medium. In addition, the present application further relates to a blockchain technology, and the historical commodity browsing record of a user can be stored in a blockchain, thereby improving the security and stability of the storage of the historical commodity browsing record.

Description

一种商品推荐方法及其相关设备A product recommendation method and related equipment
本申请以2020年12月18日提交的申请号为202011504942.5,名称为“一种商品推荐方法及其相关设备”的中国发明专利申请为基础,并要求其优先权。This application is based on the Chinese invention patent application with the application number 202011504942.5 filed on December 18, 2020, titled "A Commodity Recommendation Method and Related Equipment", and claims its priority.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种商品推荐方法、装置、计算机设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a product recommendation method, device, computer equipment and storage medium.
背景技术Background technique
图嵌入(Graph embedding)是推荐系统、图社交网络近年来的热门研究领域之一。通过学习图中节点所蕴含的信息,将节点映射到可以量化的空间,即生成图中节点embedding。在下游任务中,通过对节点embedding进行相似性计算、分类、聚类等任务可以更加深入的理解网络中节点之间的复杂关系。Graph embedding is one of the hot research fields in recommender systems and graph social networks in recent years. By learning the information contained in the nodes in the graph, the nodes are mapped to a quantifiable space, that is, the node embedding in the graph is generated. In downstream tasks, the complex relationship between nodes in the network can be more deeply understood by performing tasks such as similarity calculation, classification, and clustering on node embedding.
现有的graph embedding技术应用在商品推荐场景时,首先可以获取商品的历史浏览顺序记录,并基于历史浏览顺序记录构建商品节点的图网络。然后,基于随机游走的方式,将构建的商品节点图网络中各个图中节点之间的拓扑结构,转化为类似文本的序列结构。进一步的,再利用word2vec模型结合负采样的方法将生成的序列结构嵌入新的空间,挖掘图网络中各个节点之间浏览次序的结构关系,从而确定商品节点图网络中各个商品节点的embedding向量,即嵌入向量。最后,再利用确定的各个商品节点的embedding向量进行相似商品推荐。When the existing graph embedding technology is applied in the product recommendation scenario, the historical browsing order records of the products can be obtained first, and a graph network of commodity nodes can be constructed based on the historical browsing order records. Then, based on the random walk method, the topology structure between the nodes in each graph in the constructed commodity node graph network is transformed into a text-like sequence structure. Further, the word2vec model combined with the method of negative sampling is used to embed the generated sequence structure into a new space, and the structural relationship of the browsing order between each node in the graph network is mined, so as to determine the embedding vector of each commodity node in the commodity node graph network, That is, the embedding vector. Finally, use the determined embedding vector of each commodity node to recommend similar commodities.
上述确定的商品节点的embedding向量可以实现一定程度上的商品推荐,但发明人发现,目前仅依赖上述手段确定的商品节点embedding向量由于只包含商品节点之间的次序信息,包含信息单一,进行商品推荐时,商品推荐的效果并不显著。The above-determined embedding vector of commodity nodes can realize commodity recommendation to a certain extent, but the inventor found that the commodity node embedding vector determined only by the above-mentioned means only contains the order information between commodity nodes, and contains a single information. When recommended, the effect of product recommendation is not significant.
发明内容SUMMARY OF THE INVENTION
本申请实施例的目的在于提出一种商品推荐方法、装置、计算机设备及存储介质,主要用于解决现有的商品节点的嵌入向量由于包含信息单一,商品推荐效果较差的技术问题。The purpose of the embodiments of the present application is to provide a method, apparatus, computer equipment and storage medium for recommending products, which are mainly used to solve the technical problem of poor product recommendation effect due to the single information contained in the embedded vector of existing product nodes.
为了解决上述技术问题,本申请实施例提供一种商品推荐方法,采用了如下所述的技术方案:In order to solve the above-mentioned technical problems, the embodiment of the present application provides a method for recommending commodities, which adopts the following technical solutions:
获取多个用户的历史商品浏览记录;Get the historical commodity browsing records of multiple users;
根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;Generate a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
确定用户访问的目标商品;Determine the target product that the user visits;
确定与该目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对该目标商品所确定的推荐商品。It is determined that the product corresponding to the comprehensive embedding vector close to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
为了解决上述技术问题,本申请实施例还提供一种商品推荐装置,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a product recommendation device, which adopts the following technical solutions:
获取单元,用于获取多个用户的历史商品浏览记录;an acquisition unit, used to acquire historical commodity browsing records of multiple users;
生成单元,根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;a generating unit, generating a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
训练单元,用于结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;The training unit is used for combining the negative sampling strategy, using the commodity node sequence and the commodity attribute node sequence as training corpus, respectively inputting the word2vec model, and calculating the commodity node embedding vector and the commodity attribute node embedding vector;
计算单元,用于对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;a computing unit, configured to perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
响应单元,用于确定用户访问的目标商品;A response unit, used to determine the target commodity accessed by the user;
推荐单元,用于确定与该目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对该目标商品所确定的推荐商品。The recommendation unit is configured to determine the product corresponding to the comprehensive embedding vector similar to the comprehensive embedding vector of the target product, which is the recommended product determined for the target product.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述一种商品推荐方法的步骤:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the following steps of a commodity recommendation method:
获取多个用户的历史商品浏览记录;Get the historical commodity browsing records of multiple users;
根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;Generate a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
确定用户访问的目标商品;Determine the target product that the user visits;
确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。It is determined that the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述一种商品推荐方法的步骤:A computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the following commodity recommendation method are implemented:
获取多个用户的历史商品浏览记录;Get the historical commodity browsing records of multiple users;
根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;Generate a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
确定用户访问的目标商品;Determine the target product that the user visits;
确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。It is determined that the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请实施例中,在获取到多个用户的历史商品浏览记录后,进而根据历史商品浏览记录生成商品节点序列和商品属性节点序列。然后,结合负采样策略,将商品节点序列和商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量。最后,对商品节点嵌入向量和商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量。In the embodiment of the present application, after obtaining the historical commodity browsing records of multiple users, a commodity node sequence and a commodity attribute node sequence are further generated according to the historical commodity browsing records. Then, combined with the negative sampling strategy, the commodity node sequence and commodity attribute node sequence are used as training corpus, and input into the word2vec model respectively, and the commodity node embedding vector and commodity attribute node embedding vector are calculated. Finally, perform vector splicing or vector summation operations on the commodity node embedding vector and commodity attribute node embedding vector to obtain the comprehensive embedding vector corresponding to each commodity.
综上,相较传统的图嵌入方法计算商品的嵌入向量,本申请提出的融合商品属性信息的商品推荐方法能够较好的将图网络中节点的结构信息和属性信息相结合。在推荐系统相似性商品场景中,本申请所计算的商品综合嵌入向量,不仅包含了用户的行为信息,而且蕴含了商品本身的内容信息,可以更多体现商品本身的性质信息,因此相比现有技术中的嵌入向量可以实现更为准确的商品推荐效果。To sum up, compared with the traditional graph embedding method to calculate the embedding vector of the commodity, the commodity recommendation method that integrates commodity attribute information proposed in this application can better combine the structural information and attribute information of the nodes in the graph network. In the similar commodity scenario of the recommendation system, the commodity comprehensive embedding vector calculated in this application not only contains the user's behavior information, but also contains the content information of the commodity itself, which can reflect the property information of the commodity itself. The embedding vector in the prior art can achieve a more accurate product recommendation effect.
附图说明Description of drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1是本申请的一种商品推荐方法的一个实施例的流程图;FIG. 1 is a flowchart of an embodiment of a product recommendation method of the present application;
图2是图1中步骤S120的一种具体实施方式的流程图;Fig. 2 is a flow chart of a specific implementation manner of step S120 in Fig. 1;
图3是本申请的一种商品推荐方法的又一个实施例的流程图;FIG. 3 is a flowchart of another embodiment of a product recommendation method of the present application;
图4是本申请的一种商品推荐装置的一个实施例示意图;FIG. 4 is a schematic diagram of an embodiment of a product recommendation device of the present application;
图5是生成单元420的一个实施例示意图;FIG. 5 is a schematic diagram of an embodiment of the generating unit 420;
图6是本申请的一种计算机设备的一个实施例示意图。FIG. 6 is a schematic diagram of an embodiment of a computer device of the present application.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面对本申请实施例中的一些术语进行简要说明。In order to make those skilled in the art better understand the solutions of the present application, some terms in the embodiments of the present application are briefly described below.
embedding是指用一个低维的向量表示一个物体,可以是一个词,或是一个商品,或是一个电影等等。embedding向量的性质是能使距离相近的向量对应的物体有相近的含义,比如embedding(复仇者联盟)和embedding(钢铁侠)之间的距离就会很接近,但embedding(复仇者联盟)和embedding(乱世佳人)的距离就会远一些。Embedding refers to using a low-dimensional vector to represent an object, which can be a word, a commodity, or a movie, etc. The nature of the embedding vector is to make the objects corresponding to the vectors with similar distances have similar meanings. For example, the distance between embedding (Avengers) and embedding (Iron Man) will be very close, but embedding (Avengers) and embedding (Gone with the Wind) distance will be farther.
基于embedding的理念,Google团队发表了一种将单词转换成向量形式的工具,即word2vec(word to vector)算法。在将大规模的文本作为训练语料输入word2vec模型工具后,其训练结束后可以为训练语料中包括的每个词汇映射一个唯一对应的多维度的词向量,从而可以实现把对文本内容的处理简化为向量空间中的向量运算。常见的,例如word2vec可以用在语义分析场景下,在将两个词汇输入预先训练完成的word2vec模型后,可以确定该两个词汇各自对应的词向量。进而通过计算出两个词汇在向量空间上的相似度,即计算两个词向量之间余弦相似度的大小,来确定该两个词汇在文本语义上的相似度。余弦相似度越大,则该两个词汇之间的语义越接近。word2vec为计算向量词提供了一种有效的连续词袋(bag-of-words)和跳字模型(skip-gram)实现。Based on the concept of embedding, the Google team published a tool for converting words into vector form, the word2vec (word to vector) algorithm. After the large-scale text is input into the word2vec model tool as the training corpus, a unique multi-dimensional word vector can be mapped for each word included in the training corpus after the training, thereby simplifying the processing of the text content. is a vector operation in a vector space. Common, for example, word2vec can be used in semantic analysis scenarios. After two words are input into the pre-trained word2vec model, the word vectors corresponding to the two words can be determined. Furthermore, by calculating the similarity of the two words in the vector space, that is, calculating the size of the cosine similarity between the two word vectors, the semantic similarity of the two words in the text is determined. The greater the cosine similarity, the closer the semantics between the two words. word2vec provides an efficient bag-of-words and skip-gram implementation for computing vector words.
根据上述的说明,基于word2vec模型,本申请实施例提出一种商品推荐方法,所最终确定的该网络嵌入向量除不仅包含了用户的行为信息,而且蕴含了商品本身的内容信息,可以更大程度上表征商品的自身性质。According to the above description, based on the word2vec model, an embodiment of the present application proposes a method for recommending products. The final network embedding vector not only contains the user's behavior information, but also contains the content information of the product itself, which can be used to a greater extent. It characterizes the nature of the commodity itself.
为进一步,下面具体参考图1,示出了本申请的一种商品推荐方法的一个实施例的流程图。所述的一种商品推荐方法,包括以下步骤:For further details, referring to FIG. 1 below, a flowchart of an embodiment of a product recommendation method of the present application is shown. The described method for recommending products includes the following steps:
步骤S110,获取多个用户的历史商品浏览记录。Step S110, obtaining historical commodity browsing records of multiple users.
在本实施例中,一种商品推荐方法运行于其上的电子设备(例如可以是服务器/终端设备)上,其可以通过有线连接方式或者无线连接方式接收外部设备发送,或者主动采集用户的历史商品浏览记录。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、超宽带(ultra wide band,UWB)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, a commodity recommendation method runs on an electronic device (for example, a server/terminal device) on which it can receive transmissions from an external device through a wired connection or a wireless connection, or actively collect the user's history. Product browsing history. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, ultra wide band (UWB) connection, and other currently known or developed in the future. wireless connection method.
其中,上述的商品可以涵盖多种概念,例如可以是购物网站中的实际商品,也可以是电影、音乐、视频或者游戏等虚拟商品,本实施例不做具体限制。所获得的历史商品浏览记录中,具体可以包括商品的商品名称或者商品ID信息,以及商品的属性信息。其中,商品的属性信息,或称商品的标签信息,按照不同的分类标准和分类层级,可以存在多种分法。例如,常见的,商品可以按照用途、原材料、生产方法、化学成分、使用状态等属性和特征进行划分,具体商品的属性所对应的分类标准或者分类层级,此处不做具体限制。The above-mentioned commodities may cover various concepts, for example, may be actual commodities in a shopping website, or may be virtual commodities such as movies, music, videos, or games, which are not specifically limited in this embodiment. The obtained historical commodity browsing records may specifically include commodity name or commodity ID information of the commodity, and attribute information of the commodity. Among them, the attribute information of the product, or the label information of the product, can be classified in various ways according to different classification standards and classification levels. For example, commonly, commodities can be classified according to attributes and characteristics such as usage, raw materials, production methods, chemical composition, and usage status, and there are no specific restrictions on the classification standards or classification levels corresponding to the attributes of specific commodities.
在一些可能的实现方式中,获取历史商品浏览记录可以采用数据埋点和浏览时间阈值的方式采集,例如,常见的可以设定若用户浏览在某一商品的浏览界面停留较长时间时,则触发采集用户该次的浏览记录的操作。In some possible implementations, the acquisition of historical commodity browsing records may be collected by means of data buried points and browsing time thresholds. Trigger the operation of collecting the browsing records of the user this time.
在一些可能的实现方式中,获取历史商品浏览记录还可以是针对某种商品属性下的商品进行获取,例如,只获取电子产品或者书籍的历史商品浏览记录,只获取某一生产商所开发的产品。In some possible implementations, the acquisition of historical commodity browsing records may also be for commodities under a certain commodity attribute. product.
在一些可能的实现方式中,需要强调的是,为进一步保证上述历史商品浏览记录的私密和安全性,上述历史商品浏览记录还可以存储于一区块链的节点中。In some possible implementations, it should be emphasized that, in order to further ensure the privacy and security of the above-mentioned historical commodity browsing records, the above-mentioned historical commodity browsing records may also be stored in a node of a blockchain.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
步骤S120,根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列。Step S120: Generate a commodity node sequence and a commodity attribute node sequence according to the historical commodity browsing records.
本实施例中,获取到系统中记录的多个用户的历史商品浏览记录后,可以对该历史商品浏览记录进行统计,确定历史商品浏览记录中所包括的商品节点集合。之后,采用图网络的商品推荐方法对商品节点集合进行处理,输出商品节点序列。最后,按照预先设定的商品属性的分类标准,该分类标准对应一个商品属性的集合,将商品节点序列转换为商品属性节点序列。商品节点序列,例如:商品1-商品6-商品3……商品N,商品属性节点序列,例如类别1-类别4-类别3……类别N,其中每个序列的长度,也即包括的节点个数,可以由用户提前设置。In this embodiment, after the historical commodity browsing records of multiple users recorded in the system are acquired, the historical commodity browsing records can be counted to determine the commodity node set included in the historical commodity browsing records. After that, the commodity node set is processed by the commodity recommendation method of the graph network, and the commodity node sequence is output. Finally, according to a preset classification standard of commodity attributes, the classification standard corresponds to a set of commodity attributes, and the commodity node sequence is converted into a commodity attribute node sequence. Commodity node sequence, for example: commodity 1-commodity 6-commodity 3...commodity N, commodity attribute node sequence, such as category 1-category 4-category 3...category N, where the length of each sequence, that is, the included nodes The number can be set by the user in advance.
步骤S130,结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量。In step S130, combining the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input into the word2vec model, and the commodity node embedding vector and commodity attribute node embedding vector are calculated.
本实施例中,在得到商品节点序列和商品属性节点序列后,参照现有技术中通用的word2vec模型的训练方法并结合负采样策略,将商品节点序列和商品属性节点序列作为训练语料,分别输入word2vec模型中,最终分别计算得到每个商品对应的商品节点嵌入向量和对应的至少一个商品属性类别嵌入向量。In this embodiment, after obtaining the sequence of commodity nodes and the sequence of commodity attribute nodes, referring to the training method of the word2vec model commonly used in the prior art and combining with the negative sampling strategy, the sequence of commodity nodes and the sequence of commodity attribute nodes are used as training corpus, respectively input In the word2vec model, the product node embedding vector corresponding to each product and the corresponding at least one product attribute category embedding vector are finally calculated respectively.
在一些可能的实现方式中,为了将商品在图网络中的结构信息和其对应的属性信息相融合,使得商品和类别节点能够映射到统一的embedding空间,本申请制定的负采样策略如下表1所示:In some possible implementations, in order to fuse the structural information of the product in the graph network with its corresponding attribute information, so that the product and category nodes can be mapped to a unified embedding space, the negative sampling strategy formulated in this application is as follows Table 1 shown:
Figure PCTCN2021082934-appb-000001
Figure PCTCN2021082934-appb-000001
表1Table 1
在通常的场景中,图网络的结构代表用户点击商品的行为信息,边信息代表商品的属性信息。尤其在推荐场景中商品节点个数一般远多于类别节点,训练过程中又希望商品节点和类别节点嵌入同一空间,且能够彼此影响、相互作用、充分学习。因此在构建商品节点的正负样本过程中,选取θ r%的商品节点和(1-θ r)%的类别节点参与其训练,又因为类别节点较少,为了其充分训练而又不失其本身的属性信息,本申请选取θ c%的类别节点和(1-θ c)%的商品节点参与训练,其中θ rcIn a common scenario, the structure of the graph network represents the behavior information of the user clicking on the product, and the side information represents the attribute information of the product. Especially in the recommendation scenario, the number of commodity nodes is generally much more than that of category nodes. During the training process, it is hoped that commodity nodes and category nodes are embedded in the same space, and can influence, interact, and fully learn from each other. Therefore, in the process of constructing positive and negative samples of commodity nodes, θ r % commodity nodes and (1-θ r )% category nodes are selected to participate in its training, and because there are fewer category nodes, in order to fully train them without losing their value For its own attribute information, this application selects θ c % category nodes and (1-θ c )% commodity nodes to participate in training, where θ rc .
步骤S140,对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量。Step S140: Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity.
本实施例中,经过训练得到商品节点嵌入向量E r和类别节点嵌入向量E c后,对于任一商品节点r i对应的综合嵌入向量可计算得到为: In this embodiment, after the commodity node embedding vector E r and the category node embedding vector E c are obtained through training, the comprehensive embedding vector corresponding to any commodity node ri can be calculated as:
E(r i)=E r(r i)||E c(r i) E(r i )=E r (r i )||E c (r i )
其中,||表示向量拼接或向量求和操作。where || represents vector concatenation or vector sum operation.
S150,确定用户访问的目标商品。S150: Determine the target commodity accessed by the user.
本实施例中,目标商品可以是用户当前正在浏览的商品,也可以是用户最近浏览记录中的商品。In this embodiment, the target commodity may be the commodity currently being browsed by the user, or may be the commodity in the user's recent browsing record.
S160,确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。S160: Determine a product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product as a recommended product determined for the target product.
本实施例中,前述计算的商品的综合嵌入向量,不仅包含了用户的行为信息,而且蕴含了商品本身的内容信息,可以更多体现商品本身的性质信息,因此可以通过计算确定与目标商品的综合嵌入向量相近综合嵌入向量所对应的商品,为针对该目标商品所确定的推荐商品。In this embodiment, the comprehensive embedding vector of the commodity calculated above not only contains the behavior information of the user, but also contains the content information of the commodity itself, which can reflect the property information of the commodity itself. Therefore, the relationship between the commodity and the target commodity can be determined by calculation. Commodities corresponding to the integrated embedding vector close to the integrated embedding vector are the recommended commodities determined for the target commodity.
在一些可能的实现方式中,同时利用所述商品节点嵌入向量和商品属性节点嵌入向量进行商品推荐,可以包括:计算所述目标商品的综合嵌入向量与除所述目标商品外,其他所有商品的综合嵌入向量之间的相似度;确定所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品。本实施例中,相似度为余弦相似度,该余弦相似度的值越大,则表明相应的推荐商品与目标商品越接近,因此可以通过设置预设阈值或者参数N来筛选确定该目标商品的推荐商品。In some possible implementations, using the commodity node embedding vector and commodity attribute node embedding vector to perform commodity recommendation at the same time may include: calculating the comprehensive embedding vector of the target commodity and the combination of all other commodities except the target commodity. The similarity between the integrated embedding vectors; it is determined that the similarity is greater than a preset threshold, or the commodities corresponding to the top N integrated embedding vectors in the order of the similarity are determined for the target commodity. recommended products. In this embodiment, the similarity is the cosine similarity. The larger the value of the cosine similarity, the closer the corresponding recommended product is to the target product. Therefore, a preset threshold or parameter N can be set to filter and determine the target product. Recommended product.
在一些可能的实现方式中,同时利用所述商品节点嵌入向量和商品属性节点嵌入向量进行商品推荐,可以包括:确定所述目标商品对应的目标商品属性;计算所述目标商品的综合嵌入向量与所述目标商品属性中,除所述目标商品外的其他所有商品的综合嵌入向量之间的相似度;取所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品。本实施例中,通过设置确定目标商品对应的目标商品属性,只计算目标商品对应的综合嵌入向量与目标商品属性中,除所述目标商品外的其他所有商品中每个商品的综合嵌入向量之间的相似度,进而确定目标商品的推荐商品,减少了计算压力。In some possible implementations, using the commodity node embedding vector and commodity attribute node embedding vector to perform commodity recommendation at the same time may include: determining the target commodity attribute corresponding to the target commodity; calculating the comprehensive embedding vector of the target commodity and In the attributes of the target product, the similarity between the comprehensive embedding vectors of all other products except the target product; the similarity is greater than the preset threshold, or the top N are ranked in the order of the similarity. The commodity corresponding to the comprehensive embedding vector is the recommended commodity determined for the target commodity. In this embodiment, the attribute of the target product corresponding to the target product is determined by setting, and only the comprehensive embedding vector corresponding to the target product and the attribute of the target product are calculated, and the sum of the comprehensive embedding vector of each product in all other products except the target product is calculated. The similarity between the two, and then determine the recommended products of the target product, reducing the calculation pressure.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请实施例中,在获取到多个用户的历史商品浏览记录后,进而根据历史商品浏览记录生成商品节点序列和商品属性节点序列。然后,结合负采样策略,将商品节点序列和商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量。最后,对商品节点嵌入向量和商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量。In the embodiment of the present application, after obtaining the historical commodity browsing records of multiple users, a commodity node sequence and a commodity attribute node sequence are further generated according to the historical commodity browsing records. Then, combined with the negative sampling strategy, the commodity node sequence and commodity attribute node sequence are used as training corpus, and input into the word2vec model respectively, and the commodity node embedding vector and commodity attribute node embedding vector are calculated. Finally, perform vector splicing or vector summation operations on the commodity node embedding vector and commodity attribute node embedding vector to obtain the comprehensive embedding vector corresponding to each commodity.
综上,相较传统的图嵌入方法计算商品的嵌入向量,本申请提出的融合商品属性信息的商品推荐方法能够较好的将图网络中节点的结构信息和属性信息相结合。在推荐系统相似性商品场景中,本申请所计算的商品综合嵌入向量,不仅包含了用户的行为信息,而且蕴含了商品本身的内容信息,可以更多体现商品本身的性质信息,因此通过本申请实施例计算的商品的综合嵌入向量来进行商品推荐,可以提高商品推荐的准确性。To sum up, compared with the traditional graph embedding method to calculate the embedding vector of the commodity, the commodity recommendation method that integrates commodity attribute information proposed in this application can better combine the structural information and attribute information of the nodes in the graph network. In the similar commodity scenario of the recommendation system, the commodity comprehensive embedding vector calculated by this application not only contains the user's behavior information, but also contains the content information of the commodity itself, which can reflect more information on the nature of the commodity itself. Therefore, through this application The comprehensive embedding vector of the commodity calculated in the embodiment is used for commodity recommendation, which can improve the accuracy of commodity recommendation.
在一些可能的实现方式中,具体参照图2,图2为图1所示的步骤S120的一个实施例示意图,可以包括:In some possible implementations, referring specifically to FIG. 2 , FIG. 2 is a schematic diagram of an embodiment of step S120 shown in FIG. 1 , which may include:
步骤S121,统计所述历史商品浏览记录,构建商品节点的图网络。In step S121, the historical commodity browsing records are counted, and a graph network of commodity nodes is constructed.
在本实施例中,获取到系统中记录的多个用户的历史商品浏览记录后,可以对该历史商品浏览记录进行统计,并构建商品节点的图网络。其中,选取历史商品浏览记录中的商品集合r为图网络的图中节点,历史商品浏览记录中的属性信息的集合c作为待构建的图网络的边信息。In this embodiment, after obtaining the historical commodity browsing records of multiple users recorded in the system, statistics can be performed on the historical commodity browsing records, and a graph network of commodity nodes can be constructed. Among them, the commodity set r in the historical commodity browsing record is selected as a graph node of the graph network, and the set c of attribute information in the historical commodity browsing record is selected as the edge information of the graph network to be constructed.
在推荐系统中,某用户的当天的点击商品记录可表示为R={r 1,r 2,…,r i,…,r n},其中r i表示商品集合r中第i个商品,n表示该用户点击商品的个数,则集合R中任意两个节点在图中的边权重加1。通过统计所有用户的历史商品浏览记录,并选取边权重大于预设值w的节点生成最终的图网络,其中V表示图中商品节点集合,
Figure PCTCN2021082934-appb-000002
E表示图中商品之间边集合,e ij∈E,e ij>w,w为正整数。
In the recommendation system, a user’s clicked item record on the day can be expressed as R={r 1 ,r 2 ,…,r i ,…,r n }, where r i represents the i-th item in the item set r,n Indicates the number of items clicked by the user, then the edge weight of any two nodes in the set R in the graph increases by 1. The final graph network is generated by counting the historical commodity browsing records of all users and selecting nodes whose edge weights are greater than the preset value w, where V represents the set of commodity nodes in the graph,
Figure PCTCN2021082934-appb-000002
E represents the set of edges between items in the graph, e ij ∈ E, e ij >w, and w is a positive integer.
步骤S122,根据所述图网络构建商品属性词典,所述商品属性词典包括不同商品节点和不同商品属性的对应关系。Step S122 , constructing a commodity attribute dictionary according to the graph network, where the commodity attribute dictionary includes corresponding relationships between different commodity nodes and different commodity attributes.
在本实施例中,按照预先设定的分类标准或者分类层级,统计图网络中商品的属性信息,并依据图中节点信息,生成商品集合r到类别集合c的映射词典D,即:In this embodiment, according to the preset classification standard or classification level, the attribute information of the commodities in the graph network is counted, and the mapping dictionary D from the commodity set r to the category set c is generated according to the node information in the graph, namely:
D(r i)=c j D(r i )=c j
表示商品集合r中第i个商品属于类别集合c第j个类别,其中,需要说明的是,该映射词典中,每个商品可能对应有一个或者多个类别。Indicates that the i-th commodity in the commodity set r belongs to the j-th category of the category set c. It should be noted that, in the mapping dictionary, each commodity may correspond to one or more categories.
步骤S123,通过随机游走方式将所述图网络转换为商品节点序列。Step S123, converting the graph network into a sequence of commodity nodes by random walk.
本实施例中,借鉴深度游走(Deepwalk)算法,采用随机游走方式根据图网络中包括的商品节点集合,从而得到商品节点序列。其中,Deepwalk算法是一种将随机游走(random walk)和word2vec两种算法相结合的图结构数据挖掘算法。随机游走则指随机以图网络中的一个节点作为起始点,从而生成预设随机游走序列长度的商品节点序列。In this embodiment, the deepwalk algorithm is used for reference, and the commodity node sequence is obtained according to the commodity node set included in the graph network in a random walk manner. Among them, the Deepwalk algorithm is a graph-structured data mining algorithm that combines random walk and word2vec algorithms. Random walk refers to randomly taking a node in the graph network as the starting point to generate a sequence of commodity nodes with a preset random walk sequence length.
在一些可能的实现方式中,具体通过随机游走方式将图网络转换为商品节点序列,可以包括:对所述图网络中的商品节点的出度依次进行归一化处理,确定每个商品节点的出度概率;根据所述出度概率采用随机游走方式,生成所述商品节点序列。In some possible implementations, specifically converting the graph network into a sequence of commodity nodes through a random walk method may include: sequentially normalizing the out-degrees of commodity nodes in the graph network, and determining each commodity node The out-degree probability of ; according to the out-degree probability, a random walk method is used to generate the commodity node sequence.
具体的,首先将图网络中各商品节点的出度归一化得到各个商品节点与相邻其他商品节点之间的出度概率,出度概率越大的边,也即表明该条边两边的两个商品节点越紧密。进而采用基于节点出度概率随机游走的方式,将图网络中的商品节点转化为类似文本的商品序列L r=[r a,…,r k,…,r l],其中r a,r k,r l∈r分别为随机游走序列L r的商品节点,Length(L r)为给定的每条随机游走序列长度。从图网络中选取不同的初始游走商品节点,重复游走k次,生成商品节点序列集合S r,其中K为正整数。其中,一般来说,图可分为有向图和无向图。有向图的所有边都有方向,即确定了顶点到顶点的一个指向;而无向图的所有边都是双向的,即无向边所连接的两个顶点可以互相到达。在一些问题中,可以把无向图当作所有边都是正向和负向的两条有向边组成。顶点的度是指和该顶点相连的边的条数。特别是对于有向图来说,顶点的出边条数称为该顶点的出度,顶点的入边条数称为该顶点的入度。本实施例中,根据出度归一化后得到的出度概率进行随机游走,可以获得更体现商品节点紧密性的图网络。 Specifically, first normalize the out-degree of each commodity node in the graph network to obtain the out-degree probability between each commodity node and other adjacent commodity nodes. The closer the two commodity nodes are. Then, the commodity nodes in the graph network are converted into text-like commodity sequences L r =[r a ,...,r k ,...,r l ], where r a ,r k , r l ∈ r are the commodity nodes of the random walk sequence L r respectively, and Length(L r ) is the given length of each random walk sequence. Select different initial walking commodity nodes from the graph network, and repeat the walking k times to generate a commodity node sequence set S r , where K is a positive integer. Among them, in general, graphs can be divided into directed graphs and undirected graphs. All edges of a directed graph have a direction, that is, a direction from vertex to vertex is determined; while all edges of an undirected graph are bidirectional, that is, two vertices connected by an undirected edge can reach each other. In some problems, an undirected graph can be thought of as consisting of two directed edges where all edges are positive and negative. The degree of a vertex refers to the number of edges connected to the vertex. Especially for a directed graph, the number of out-edges of a vertex is called the out-degree of the vertex, and the number of in-edges of a vertex is called the in-degree of the vertex. In this embodiment, random walk is performed according to the out-degree probability obtained after out-degree normalization, and a graph network that better reflects the closeness of commodity nodes can be obtained.
需要说明的是,步骤S122和步骤S123没有必然的执行次序要求。It should be noted that, steps S122 and S123 do not necessarily require an execution order.
步骤S124,基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列。Step S124, based on the commodity attribute dictionary, convert the commodity node sequence into a commodity attribute node sequence.
本实施例中,基于上述步骤S122中构建的映射词典D,将上述步骤S123中商品节点序列集合S r转化为对应的类别节点序列集合S c,即对于S r中任一商品序列L r,其对应的类别序列L c为: In this embodiment, based on the mapping dictionary D constructed in the above step S122, the commodity node sequence set S r in the above step S123 is converted into the corresponding category node sequence set S c , that is, for any commodity sequence L r in S r , Its corresponding category sequence L c is:
Figure PCTCN2021082934-appb-000003
Figure PCTCN2021082934-appb-000003
其中,
Figure PCTCN2021082934-appb-000004
分别为序列L c中的类别节点。
in,
Figure PCTCN2021082934-appb-000004
are the category nodes in the sequence Lc , respectively.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请的训练过程中,采用传统的word2vec结合负采样的模型架构,在其基础上融入了网络节点的商品节点的结构信息和属性信息,具有模型结构简单,参数量少,训练时间短的优点,能够较好且广泛应用于实时的图网络的场景下。In the training process of this application, the traditional word2vec combined with negative sampling model architecture is adopted, and the structure information and attribute information of the commodity nodes of the network nodes are integrated on the basis of it, which has the advantages of simple model structure, less parameter quantity and short training time. , which can be better and widely used in real-time graph network scenarios.
在一个可能的实现方式中,具体参照图3,图3为本申请实施例中一种商品推荐方法的又一个实施例示意图,可以包括:In a possible implementation, referring specifically to FIG. 3 , FIG. 3 is a schematic diagram of another embodiment of a product recommendation method in the embodiment of the present application, which may include:
步骤S310,获取多个用户的历史商品浏览记录。Step S310, obtaining historical commodity browsing records of multiple users.
需要说明的是,步骤S310与图1所示步骤S110类似,此处不过多赘述。It should be noted that step S310 is similar to step S110 shown in FIG. 1 , and details are not described here.
步骤S320,统计所述历史商品浏览记录,构建商品节点的图网络。In step S320, the historical commodity browsing records are counted, and a graph network of commodity nodes is constructed.
需要说明的是,步骤S320与图2所示步骤S121类似,此处不过多赘述。It should be noted that step S320 is similar to step S121 shown in FIG. 2 , and details are not described here.
步骤S330,根据所述图网络为预设的不同分类标准分别构建对应的商品属性词典,其中每种分类标准分别预先设置对应一种商品属性集合。Step S330 , constructing corresponding commodity attribute dictionaries for different classification standards preset by the graph network respectively, wherein each classification standard is preset corresponding to a commodity attribute set.
本实施例中,可以设定多种分类标准,例如,对图书而言,可以设定分类标准1和分类标准2,分类标准1可以包括图书类别集合,例如{都市,言情,武侠,玄幻,悬疑,游戏,推理},分类标准2则为图书作者名称集合,例如{路遥,郭敬明,东野圭吾,夏目漱石,塞林格,莎士比亚,…}等。由此,从而可以构建多种分类体系下对应的商品属性词典。构建每个分类体系下的商品属性词典的过程,可以参照图2中步骤S122,具体此处不做过多赘述。In this embodiment, multiple classification standards can be set. For example, for books, classification standard 1 and classification standard 2 can be set, and classification standard 1 can include a set of book categories, such as {city, romance, martial arts, fantasy, Suspense, game, reasoning}, the classification standard 2 is a collection of book author names, such as {Lu Yao, Guo Jingming, Higashino Keigo, Natsume Soseki, Salinger, Shakespeare, ...} and so on. Thus, corresponding commodity attribute dictionaries under various classification systems can be constructed. For the process of constructing the commodity attribute dictionary under each classification system, reference may be made to step S122 in FIG. 2 , and details are not repeated here.
步骤S340,通过随机游走方式将所述图网络转换为商品节点序列。Step S340, converting the graph network into a sequence of commodity nodes by random walk.
需要说明的是,步骤S340与图2所示步骤S123类似,此处不过多赘述。It should be noted that step S340 is similar to step S123 shown in FIG. 2 , and details are not described here.
步骤S350,基于所述商品属性词典,将所述商品节点序列转换为不同分类标准对应的商品属性节点序列。Step S350, based on the commodity attribute dictionary, convert the commodity node sequence into commodity attribute node sequences corresponding to different classification standards.
本实施例中,参照步骤S330中,在生成有多个分类体系的商品属性词典时,则在将商品节点序列转换为商品属性节点序列时,也一并转换得到不同分类标准对应的商品属性节点序列。将商品属性节点序列转换得到商品属性节点序列的过程可以参照图2中的步骤S124,具体此处不做过多赘述。In this embodiment, referring to step S330, when a commodity attribute dictionary with multiple classification systems is generated, when the commodity node sequence is converted into a commodity attribute node sequence, commodity attribute nodes corresponding to different classification standards are also converted together. sequence. For the process of converting the commodity attribute node sequence to obtain the commodity attribute node sequence, reference may be made to step S124 in FIG. 2 , and details are not described here.
步骤S360,结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和不同分类标准下的商品属性节点嵌入向量。In step S360, combining the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and commodity attribute node embedding vector under different classification standards are calculated.
本实施例中,参照步骤S350中,在转换得到多个分类体系的商品属性序列时,在将商品属性序列作为训练语料,计算商品属性节点嵌入向量,也一并会得到不同分类体系下对应的商品属性节点嵌入向量。具体计算得到商品节点嵌入向量和不同分类标准下的商品属性节点嵌入向量的过程,可以参照图1中步骤S130,此处不做过多赘述。In this embodiment, referring to step S350, when the commodity attribute sequences of multiple classification systems are obtained after conversion, the commodity attribute sequence is used as the training corpus to calculate the commodity attribute node embedding vector, and the corresponding commodity attribute sequences under different classification systems are also obtained. Product attribute node embedding vector. For the specific calculation process of obtaining the commodity node embedding vector and commodity attribute node embedding vector under different classification standards, reference may be made to step S130 in FIG. 1 , which will not be repeated here.
步骤S370,根据预先设置的所述商品节点嵌入向量对应的权重和不同分类标准对应的权重,对所述商品节点嵌入向量和所述商品属性节点嵌入向量,进行向量加权求和操作,得到每个商品对应的所述综合嵌入向量。Step S370: Perform a vector weighted sum operation on the commodity node embedding vector and the commodity attribute node embedding vector according to the preset weights corresponding to the commodity node embedding vectors and the corresponding weights of different classification standards, to obtain each The comprehensive embedding vector corresponding to the product.
本实施例中,考虑嵌入向量在实际的应用场景中,不同分类标准对应的商品属性节点嵌入向量对商品的综合嵌入向量的影响有所不同,因此可以为计算综合嵌入向量所需合并的商品节点嵌入向量和不同分类标准下对应的商品节点嵌入向量,设置对应的权重值。从而对每个商品对应的商品节点嵌入向量和对应的至少一个分类标准下的商品属性节点嵌入向量,进行向量加权求和操作,得到每个商品对应的综合嵌入向量。In this embodiment, considering that the embedded vector is in the actual application scenario, the impact of the embedded vector of commodity attribute nodes corresponding to different classification standards on the comprehensive embedded vector of the product is different, so it can be the commodity node that needs to be merged to calculate the comprehensive embedded vector. The embedding vector and the corresponding commodity node embedding vector under different classification standards, set the corresponding weight value. Therefore, the vector weighted summation operation is performed on the commodity node embedding vector corresponding to each commodity and the corresponding commodity attribute node embedding vector under at least one classification standard to obtain the comprehensive embedding vector corresponding to each commodity.
举例如,参照步骤S330中所举示例,综合嵌入向量可以是由商品节点嵌入向量A、分类标准1商品属性节点嵌入向量B和分类标准2商品属性节点嵌入向量C合并得到。而实际的图书推荐场景下,优先考虑的重要程度次序为B—A—C,此时可以预先为A、B和C分别设置不同大小的权重参数,例如0.8,0.9.和0.6。从而,之后在进行合并计算某一图书对应的综合嵌入向量时,首先分别确定该图书对应的A、B和C,进而各自的嵌入向量与对应权重进行乘积,加权合并计算得到该图书最终的综合嵌入向量。For example, referring to the example in step S330, the comprehensive embedding vector may be obtained by combining commodity node embedding vector A, classification standard 1 commodity attribute node embedding vector B, and classification standard 2 commodity attribute node embedding vector C. In the actual book recommendation scenario, the priority order of importance is B-A-C. In this case, weight parameters of different sizes can be set for A, B, and C respectively, such as 0.8, 0.9., and 0.6. Therefore, when the comprehensive embedding vector corresponding to a certain book is combined and calculated, the A, B and C corresponding to the book are first determined respectively, and then the respective embedding vectors are multiplied with the corresponding weights, and the final comprehensive calculation of the book is obtained by weighted combination calculation. Embedding vector.
S380,确定用户访问的目标商品。S380, determine the target commodity accessed by the user.
S390,确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。S390: Determine a product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product as a recommended product determined for the target product.
需要说明的是,步骤S380-S390与前述步骤S150-S160类似,此处不再赘述。It should be noted that steps S380-S390 are similar to the foregoing steps S150-S160, and are not repeated here.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请实施例中,考虑嵌入向量在实际的应用场景中,不同分类标准对应的商品属性节点嵌入向量对商品的综合嵌入向量的影响有所不同,因此为计算综合嵌入向量所需合并的商品节点嵌入向量和不同分类标准下对应的商品节点嵌入向量,设置对应的权重值。从而使得,最终计算得到商品对应的综合嵌入向量更能表征商品在不同应用场景下的本质性质,因此通过本申请实施例计算的商品的综合嵌入向量来进行商品推荐,可以提高商品推荐的准确性。In the embodiment of the present application, considering that the embedded vector is in actual application scenarios, the impact of the embedded vector of commodity attribute nodes corresponding to different classification standards on the comprehensive embedded vector of the product is different. The embedding vector and the corresponding commodity node embedding vector under different classification standards, set the corresponding weight value. Therefore, the comprehensive embedding vector corresponding to the product obtained by the final calculation can better represent the essential properties of the product in different application scenarios. Therefore, product recommendation can be performed by using the comprehensive embedding vector of the product calculated in the embodiment of the present application, which can improve the accuracy of product recommendation. .
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing the relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the program is executed, it may include the processes of the foregoing method embodiments. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
进一步参考图4,作为对上述图1所示方法的实现,本申请提供了一种商品推荐装置的一个实施例,该装置实施例与图1所示的商品推荐方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 4 , as an implementation of the method shown in FIG. 1 , the present application provides an embodiment of a product recommendation device, the device embodiment corresponds to the product recommendation method embodiment shown in FIG. 1 , the device Specifically, it can be applied to various electronic devices.
如图4所示,本实施例所述的网络嵌入向量的确定装置400包括:As shown in FIG. 4 , the apparatus 400 for determining a network embedding vector according to this embodiment includes:
获取单元410,用于获取多个用户的历史商品浏览记录;an obtaining unit 410, configured to obtain historical commodity browsing records of multiple users;
生成单元420,根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;generating unit 420, generating a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
训练单元430,用于结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;The training unit 430 is configured to combine the negative sampling strategy, use the commodity node sequence and the commodity attribute node sequence as training corpus, respectively input the word2vec model, and calculate and obtain the commodity node embedding vector and the commodity attribute node embedding vector;
计算单元440,用于对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;A computing unit 440, configured to perform a vector splicing or vector sum operation on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
响应单元450,用于确定用户访问的目标商品;a response unit 450, configured to determine the target commodity accessed by the user;
推荐单元460,用于确定与该目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对该目标商品所确定的推荐商品。The recommending unit 460 is configured to determine a product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product, which is a recommended product determined for the target product.
在一些可能的实现方式中,具体参照图4,图4为生成单元420的一个实施例示意图,可以包括:In some possible implementations, referring specifically to FIG. 4 , FIG. 4 is a schematic diagram of an embodiment of the generating unit 420, which may include:
第一构建子单元421,用于统计所述历史商品浏览记录,构建商品节点的图网络;The first construction subunit 421 is used to count the historical commodity browsing records and construct a graph network of commodity nodes;
第二构建子单元422,用于根据所述图网络构建商品属性词典,所述商品属性词典包括不同商品节点和不同商品属性的对应关系;The second construction subunit 422 is configured to construct a commodity attribute dictionary according to the graph network, and the commodity attribute dictionary includes the corresponding relationship between different commodity nodes and different commodity attributes;
第一转换子单元423,用于通过随机游走方式将所述图网络转换为商品节点序列;The first conversion subunit 423 is used to convert the graph network into a sequence of commodity nodes by random walk;
第二转换子单元424,用于基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列。The second conversion subunit 424 is configured to convert the sequence of commodity nodes into a sequence of commodity attribute nodes based on the commodity attribute dictionary.
在一些可能的实现方式中,第一转换子单元,具体用于对所述图网络中的商品节点的出度依次进行归一化处理,确定每个商品节点的出度概率;根据所述出度概率采用随机游走方式,生成所述商品节点序列。In some possible implementations, the first conversion subunit is specifically configured to perform normalization processing on the out-degrees of commodity nodes in the graph network in turn, and determine the out-degree probability of each commodity node; The degree probability adopts a random walk method to generate the commodity node sequence.
在一些可能的实现方式中,In some possible implementations,
第二构建子单元,具体用于根据所述图网络为预设的不同分类标准分别构建对应的商品属性词典,其中每种分类标准分别预先设置对应一种商品属性集合;The second constructing subunit is specifically configured to construct corresponding commodity attribute dictionaries according to different classification standards preset by the graph network, wherein each classification standard is preset to correspond to a commodity attribute set;
第二转换子单元,具体用于基于所述商品属性词典,将所述商品节点序列转换为不同分类标准对应的商品属性节点序列;a second conversion subunit, specifically configured to convert the commodity node sequence into commodity attribute node sequences corresponding to different classification standards based on the commodity attribute dictionary;
训练单元430,具体用于结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和不同分类标准下的商品属性节点嵌入向量;The training unit 430 is specifically configured to combine the negative sampling strategy, use the commodity node sequence and the commodity attribute node sequence as training corpus, respectively input the word2vec model, and calculate the commodity node embedding vector and commodity attribute node embedding under different classification standards. vector;
计算单元440,具体用于根据预先设置的所述商品节点嵌入向量对应的权重和不同分类标准对应的权重,对所述商品节点嵌入向量和所述商品属性节点嵌入向量,进行向量加权求和操作,得到每个商品对应的所述综合嵌入向量。The calculation unit 440 is specifically configured to perform a vector weighted sum operation on the commodity node embedding vector and the commodity attribute node embedding vector according to the preset weights corresponding to the commodity node embedding vectors and the corresponding weights of different classification standards , to obtain the comprehensive embedding vector corresponding to each commodity.
在一些可能的实现方式中,第一构建子单元421,具体用于统计所述历史商品浏览记录,选取历史商品浏览记录中的商品集合为待构建的图网络的图中节点;根据统计结果确定各个商品节点之间的边权重;选取边权重大于预设阈值的商品节点构建商品节点的图网络。In some possible implementations, the first construction subunit 421 is specifically configured to count the historical commodity browsing records, and select the commodity collection in the historical commodity browsing records as the graph nodes of the graph network to be constructed; determine according to the statistical results Edge weights between each commodity node; select commodity nodes whose edge weights are greater than a preset threshold to construct a graph network of commodity nodes.
在一些可能的实现方式中,网络嵌入向量的确定装置400还可以包括:In some possible implementations, the apparatus 400 for determining the network embedding vector may further include:
推荐商品确定单元,用于确定用户访问的目标商品;确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。The recommended commodity determination unit is used for determining the target commodity accessed by the user; determining the commodity corresponding to the integrated embedding vector close to the integrated embedding vector of the target commodity is the recommended commodity determined for the target commodity.
推荐商品确定单元,包括:Recommended product determination unit, including:
第一相似度计算子单元,用于计算所述目标商品的综合嵌入向量与除所述目标商品外的其他所有商品的综合嵌入向量之间的相似度;a first similarity calculation subunit, configured to calculate the similarity between the comprehensive embedding vector of the target commodity and the comprehensive embedding vectors of all other commodities except the target commodity;
第一推荐商品确定子单元,用于确定所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品,N为正整数。The first recommended commodity determination subunit is used to determine that the similarity is greater than a preset threshold, or the commodities corresponding to the top N comprehensive embedding vectors in the order of the similarity are determined for the target commodity. For the recommended product, N is a positive integer.
在一些可能的实现方式中,推荐商品确定单元,包括:In some possible implementations, the recommended commodity determination unit includes:
第一商品属性确定子单元,用于确定所述目标商品对应的目标商品属性;a first commodity attribute determination subunit, configured to determine the target commodity attribute corresponding to the target commodity;
第二相似度计算子单元,用于计算所述目标商品的综合嵌入向量与所述目标商品属性中,除所述目标商品外的其他所有商品的综合嵌入向量之间的相似度;a second similarity calculation subunit, configured to calculate the similarity between the comprehensive embedding vector of the target commodity and the comprehensive embedding vectors of all other commodities except the target commodity in the attributes of the target commodity;
第二推荐商品确定子单元,用于取所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品,N为正整数。The second recommended commodity determination subunit is used to take the similarity greater than the preset threshold, or the commodity corresponding to the top N comprehensive embedding vectors ranked according to the similarity degree, is determined for the target commodity For the recommended product, N is a positive integer.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请实施例中,网络嵌入向量的确定装置400在获取到多个用户的历史商品浏览记录后,进而根据历史商品浏览记录生成商品节点序列和商品属性节点序列。然后,结合负采样策略,将商品节点序列和商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量。最后,对商品节点嵌入向量和商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量。In the embodiment of the present application, after acquiring the historical commodity browsing records of multiple users, the device 400 for determining the network embedding vector further generates commodity node sequences and commodity attribute node sequences according to the historical commodity browsing records. Then, combined with the negative sampling strategy, the commodity node sequence and commodity attribute node sequence are used as training corpus, and input into the word2vec model respectively, and the commodity node embedding vector and commodity attribute node embedding vector are calculated. Finally, perform vector splicing or vector summation operations on the commodity node embedding vector and commodity attribute node embedding vector to obtain the comprehensive embedding vector corresponding to each commodity.
综上,相较传统的图嵌入方法计算商品的嵌入向量,本申请提出的融合商品属性信息的商品推荐方法能够较好的将图网络中节点的结构信息和属性信息相结合。在推荐系统相似性商品场景中,本申请所计算的商品综合嵌入向量,不仅包含了用户的行为信息,而且 蕴含了商品本身的内容信息,可以更多体现商品本身的性质信息。To sum up, compared with the traditional graph embedding method to calculate the embedding vector of the commodity, the commodity recommendation method that integrates commodity attribute information proposed in this application can better combine the structural information and attribute information of the nodes in the graph network. In the recommendation system similarity commodity scenario, the commodity comprehensive embedding vector calculated by this application not only contains the user's behavior information, but also contains the content information of the commodity itself, which can reflect the property information of the commodity itself.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图6,图6为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of a computer device according to this embodiment.
所述计算机设备6包括通过系统总线相互通信连接存储器601、处理器602、网络接口603。需要指出的是,图中仅示出了具有组件601-603的计算机设备6,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 6 includes a memory 601 , a processor 602 , and a network interface 603 that communicate with each other through a system bus. It should be pointed out that only the computer device 6 with components 601-603 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
所述存储器601至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器601可以是所述计算机设备6的内部存储单元,例如该计算机设备6的硬盘或内存。在另一些实施例中,所述存储器601也可以是所述计算机设备6的外部存储设备,例如该计算机设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器601还可以既包括所述计算机设备6的内部存储单元也包括其外部存储设备。本实施例中,所述存储器601通常用于存储安装于所述计算机设备6的操作系统和各类应用软件,例如上述一种商品推荐方法的计算机可读指令等。此外,所述存储器601还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 601 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 601 may be an internal storage unit of the computer device 6 , such as a hard disk or a memory of the computer device 6 . In other embodiments, the memory 601 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 601 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 601 is generally used to store the operating system and various application software installed on the computer device 6 , such as computer-readable instructions of the above-mentioned method for recommending products. In addition, the memory 601 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器602在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器602通常用于控制所述计算机设备6的总体操作。本实施例中,所述处理器602用于运行所述存储器601中存储的计算机可读指令或者处理数据,例如运行所述一种商品推荐方法的计算机可读指令。The processor 602 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 602 is typically used to control the overall operation of the computer device 6 . In this embodiment, the processor 602 is configured to execute computer-readable instructions stored in the memory 601 or process data, for example, computer-readable instructions for executing the method for recommending a commodity.
所述网络接口603可包括无线网络接口或有线网络接口,该网络接口603通常用于在所述计算机设备6与其他电子设备之间建立通信连接。The network interface 603 may include a wireless network interface or a wired network interface, and the network interface 603 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述一种商品推荐方法的步骤。The present application also provides another implementation manner, that is, to provide a computer-readable storage medium, the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores Computer-readable instructions, executable by at least one processor, to cause the at least one processor to perform the steps of a method for recommending an item as described above.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的商品推荐方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the commodity recommendation method described in the various embodiments of this application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进 行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims (20)

  1. 一种商品推荐方法,包括下述步骤:A product recommendation method, comprising the following steps:
    获取多个用户的历史商品浏览记录;Get the historical commodity browsing records of multiple users;
    根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;Generate a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
    结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
    对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
    确定用户访问的目标商品;Determine the target product that the user visits;
    确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。It is determined that the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
  2. 根据权利要求1所述的商品推荐方法,其中,所述根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列,包括:The product recommendation method according to claim 1, wherein the generating a product node sequence and a product attribute node sequence according to the historical product browsing records comprises:
    统计所述历史商品浏览记录,构建商品节点的图网络;Counting the historical commodity browsing records, and constructing a graph network of commodity nodes;
    根据所述图网络构建商品属性词典,所述商品属性词典包括不同商品节点和不同商品属性的对应关系;constructing a commodity attribute dictionary according to the graph network, the commodity attribute dictionary including the corresponding relationship between different commodity nodes and different commodity attributes;
    通过随机游走方式将所述图网络转换为商品节点序列;Convert the graph network into a sequence of commodity nodes through random walks;
    基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列。Based on the commodity attribute dictionary, the commodity node sequence is converted into a commodity attribute node sequence.
  3. 根据权利要求2所述的商品推荐方法,其中,所述通过随机游走方式将所述图网络转换为商品节点序列,包括:The product recommendation method according to claim 2, wherein the converting the graph network into a sequence of product nodes by a random walk method comprises:
    对所述图网络中的商品节点的出度依次进行归一化处理,确定每个商品节点的出度概率;Normalize the out-degrees of commodity nodes in the graph network in turn to determine the out-degree probability of each commodity node;
    根据所述出度概率采用随机游走方式,生成所述商品节点序列。According to the out-degree probability, the commodity node sequence is generated by adopting a random walk manner.
  4. 根据权利要求2所述的商品推荐方法,其中,The product recommendation method according to claim 2, wherein,
    所述根据所述图网络构建商品属性词典,包括:The building a commodity attribute dictionary according to the graph network includes:
    根据所述图网络为预设的不同分类标准分别构建对应的商品属性词典,其中每种分类标准分别预先设置对应一种商品属性集合;Construct corresponding commodity attribute dictionaries for different preset classification standards according to the graph network, wherein each classification standard is preset to correspond to a commodity attribute set;
    所述基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列,包括:The converting the commodity node sequence into a commodity attribute node sequence based on the commodity attribute dictionary includes:
    基于所述商品属性词典,将所述商品节点序列转换为不同分类标准对应的商品属性节点序列;Based on the commodity attribute dictionary, converting the commodity node sequence into commodity attribute node sequences corresponding to different classification standards;
    所述结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量,包括:In combination with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and commodity attribute node embedding vector are calculated and obtained, including:
    结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和不同分类标准下的商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector under different classification standards are calculated to obtain;
    所述对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量,包括:The vector splicing or vector sum operation is performed on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity, including:
    根据预先设置的所述商品节点嵌入向量对应的权重和不同分类标准对应的权重,对所述商品节点嵌入向量和所述商品属性节点嵌入向量,进行向量加权求和操作,得到每个商品对应的所述综合嵌入向量。According to the preset weights corresponding to the commodity node embedding vectors and the weights corresponding to different classification standards, perform a vector weighted sum operation on the commodity node embedding vectors and the commodity attribute node embedding vectors to obtain the corresponding value of each commodity. The synthetic embedding vector.
  5. 根据权利要求2-4中任一项所述的商品推荐方法,其中,所述统计所述历史商品浏览记录,构建商品节点的图网络,包括:The product recommendation method according to any one of claims 2-4, wherein the counting the historical product browsing records and constructing a graph network of product nodes, comprising:
    统计所述历史商品浏览记录,选取历史商品浏览记录中的商品集合为待构建的图网络的图中节点;Counting the historical commodity browsing records, and selecting the commodity collection in the historical commodity browsing records as the graph nodes of the graph network to be constructed;
    根据统计结果确定各个商品节点之间的边权重;Determine the edge weight between each commodity node according to the statistical result;
    选取边权重大于预设阈值的商品节点构建商品节点的图网络。Select commodity nodes whose edge weights are greater than a preset threshold to construct a graph network of commodity nodes.
  6. 根据权利要求1-4中任一项所述的商品推荐方法,其中,所述确定与所述目标商品 的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品,包括:The product recommendation method according to any one of claims 1-4, wherein the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is determined for the target product. Recommended items, including:
    计算所述目标商品的综合嵌入向量与除所述目标商品外的其他所有商品的综合嵌入向量之间的相似度;Calculate the similarity between the comprehensive embedding vector of the target product and the comprehensive embedding vector of all other products except the target product;
    确定所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品,N为正整数。It is determined that the similarity is greater than a preset threshold, or, the products corresponding to the top N comprehensive embedding vectors in the order of the similarity are the recommended products determined for the target product, and N is a positive integer.
  7. 根据权利要求1-4中任一项所述的商品推荐方法,其中,所述确定与目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品,包括:The product recommendation method according to any one of claims 1-4, wherein the product corresponding to the comprehensive embedding vector determined to be similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product ,include:
    确定所述目标商品对应的目标商品属性;determining the target commodity attribute corresponding to the target commodity;
    计算所述目标商品的综合嵌入向量与所述目标商品属性中,除所述目标商品外的其他所有商品的综合嵌入向量之间的相似度;Calculate the similarity between the comprehensive embedding vector of the target commodity and the comprehensive embedding vector of all other commodities except the target commodity in the attributes of the target commodity;
    取所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品,N为正整数。Taking the product whose similarity is greater than the preset threshold, or the product corresponding to the top N comprehensive embedding vectors in the order of similarity, is the recommended product determined for the target product, and N is a positive integer.
  8. 一种商品推荐装置,包括:A product recommendation device, comprising:
    获取单元,用于获取多个用户的历史商品浏览记录;an acquisition unit, used to acquire historical commodity browsing records of multiple users;
    生成单元,根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;a generating unit, generating a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
    训练单元,用于结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;The training unit is used for combining the negative sampling strategy, using the commodity node sequence and the commodity attribute node sequence as training corpus, respectively inputting the word2vec model, and calculating the commodity node embedding vector and the commodity attribute node embedding vector;
    计算单元,用于对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;a computing unit, configured to perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
    响应单元,用于确定用户访问的目标商品;A response unit, used to determine the target commodity accessed by the user;
    推荐单元,用于确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。A recommending unit, configured to determine a product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product, which is a recommended product determined for the target product.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述一种商品推荐方法的步骤:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the following steps of a commodity recommendation method:
    获取多个用户的历史商品浏览记录;Get the historical commodity browsing records of multiple users;
    根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;Generate a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
    结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
    对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
    确定用户访问的目标商品;Determine the target product that the user visits;
    确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。It is determined that the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
  10. 根据权利要求9所述的计算机设备,其中,所述根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列,包括:The computer device according to claim 9, wherein the generating a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records comprises:
    统计所述历史商品浏览记录,构建商品节点的图网络;Counting the historical commodity browsing records, and constructing a graph network of commodity nodes;
    根据所述图网络构建商品属性词典,所述商品属性词典包括不同商品节点和不同商品属性的对应关系;constructing a commodity attribute dictionary according to the graph network, the commodity attribute dictionary including the corresponding relationship between different commodity nodes and different commodity attributes;
    通过随机游走方式将所述图网络转换为商品节点序列;Convert the graph network into a sequence of commodity nodes through random walks;
    基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列。Based on the commodity attribute dictionary, the commodity node sequence is converted into a commodity attribute node sequence.
  11. 根据权利要求10所述的计算机设备,其中,所述通过随机游走方式将所述图网络转换为商品节点序列,包括:The computer device according to claim 10, wherein the converting the graph network into a sequence of commodity nodes through random walks comprises:
    对所述图网络中的商品节点的出度依次进行归一化处理,确定每个商品节点的出度概率;Normalize the out-degrees of commodity nodes in the graph network in turn to determine the out-degree probability of each commodity node;
    根据所述出度概率采用随机游走方式,生成所述商品节点序列。According to the out-degree probability, the commodity node sequence is generated by adopting a random walk manner.
  12. 根据权利要求10所述的计算机设备,其中,The computer device of claim 10, wherein,
    所述根据所述图网络构建商品属性词典,包括:The building a commodity attribute dictionary according to the graph network includes:
    根据所述图网络为预设的不同分类标准分别构建对应的商品属性词典,其中每种分类标准分别预先设置对应一种商品属性集合;Construct corresponding commodity attribute dictionaries for different preset classification standards according to the graph network, wherein each classification standard is preset to correspond to a commodity attribute set;
    所述基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列,包括:The converting the commodity node sequence into a commodity attribute node sequence based on the commodity attribute dictionary includes:
    基于所述商品属性词典,将所述商品节点序列转换为不同分类标准对应的商品属性节点序列;Based on the commodity attribute dictionary, converting the commodity node sequence into commodity attribute node sequences corresponding to different classification standards;
    所述结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量,包括:In combination with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and commodity attribute node embedding vector are calculated and obtained, including:
    结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和不同分类标准下的商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector under different classification standards are calculated to obtain;
    所述对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量,包括:The vector splicing or vector sum operation is performed on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity, including:
    根据预先设置的所述商品节点嵌入向量对应的权重和不同分类标准对应的权重,对所述商品节点嵌入向量和所述商品属性节点嵌入向量,进行向量加权求和操作,得到每个商品对应的所述综合嵌入向量。According to the preset weights corresponding to the commodity node embedding vectors and the weights corresponding to different classification standards, perform a vector weighted sum operation on the commodity node embedding vectors and the commodity attribute node embedding vectors to obtain the corresponding value of each commodity. The synthetic embedding vector.
  13. 根据权利要求10-12中任一项所述的计算机设备,其中,所述统计所述历史商品浏览记录,构建商品节点的图网络,包括:The computer device according to any one of claims 10-12, wherein the counting the historical commodity browsing records and constructing a graph network of commodity nodes, comprising:
    统计所述历史商品浏览记录,选取历史商品浏览记录中的商品集合为待构建的图网络的图中节点;Counting the historical commodity browsing records, and selecting the commodity collection in the historical commodity browsing records as the graph nodes of the graph network to be constructed;
    根据统计结果确定各个商品节点之间的边权重;Determine the edge weight between each commodity node according to the statistical result;
    选取边权重大于预设阈值的商品节点构建商品节点的图网络。Select commodity nodes whose edge weights are greater than a preset threshold to construct a graph network of commodity nodes.
  14. 根据权利要求9-12中任一项所述的计算机设备,其中,所述确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品,包括:The computer device according to any one of claims 9-12, wherein the determined product corresponding to a comprehensive embedding vector similar to the comprehensive embedding vector of the target product is a recommendation determined for the target product Goods, including:
    计算所述目标商品的综合嵌入向量与除所述目标商品外的其他所有商品的综合嵌入向量之间的相似度;Calculate the similarity between the comprehensive embedding vector of the target product and the comprehensive embedding vector of all other products except the target product;
    确定所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品,N为正整数。It is determined that the similarity is greater than a preset threshold, or, the products corresponding to the top N comprehensive embedding vectors in the order of the similarity are the recommended products determined for the target product, and N is a positive integer.
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述一种商品推荐方法的步骤:A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of a commodity recommendation method described below are implemented:
    获取多个用户的历史商品浏览记录;Get the historical commodity browsing records of multiple users;
    根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列;Generate a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records;
    结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector are obtained by calculation;
    对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量;Perform vector splicing or vector summation operations on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity;
    确定用户访问的目标商品;Determine the target product that the user visits;
    确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品。It is determined that the product corresponding to the comprehensive embedding vector that is similar to the comprehensive embedding vector of the target product is the recommended product determined for the target product.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述根据所述历史商品浏览记录生成商品节点序列和商品属性节点序列,包括:The computer-readable storage medium according to claim 15, wherein the generating a sequence of commodity nodes and a sequence of commodity attribute nodes according to the historical commodity browsing records comprises:
    统计所述历史商品浏览记录,构建商品节点的图网络;Counting the historical commodity browsing records, and constructing a graph network of commodity nodes;
    根据所述图网络构建商品属性词典,所述商品属性词典包括不同商品节点和不同商品 属性的对应关系;Build a commodity attribute dictionary according to the graph network, and the commodity attribute dictionary includes the corresponding relationship between different commodity nodes and different commodity attributes;
    通过随机游走方式将所述图网络转换为商品节点序列;Convert the graph network into a sequence of commodity nodes through random walks;
    基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列。Based on the commodity attribute dictionary, the commodity node sequence is converted into a commodity attribute node sequence.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述通过随机游走方式将所述图网络转换为商品节点序列,包括:The computer-readable storage medium of claim 16, wherein the converting the graph network into a sequence of commodity nodes by random walks comprises:
    对所述图网络中的商品节点的出度依次进行归一化处理,确定每个商品节点的出度概率;Normalize the out-degrees of commodity nodes in the graph network in turn to determine the out-degree probability of each commodity node;
    根据所述出度概率采用随机游走方式,生成所述商品节点序列。According to the out-degree probability, the commodity node sequence is generated by adopting a random walk manner.
  18. 根据权利要求16所述的计算机可读存储介质,其中,The computer-readable storage medium of claim 16, wherein,
    所述根据所述图网络构建商品属性词典,包括:The building a commodity attribute dictionary according to the graph network includes:
    根据所述图网络为预设的不同分类标准分别构建对应的商品属性词典,其中每种分类标准分别预先设置对应一种商品属性集合;Construct corresponding commodity attribute dictionaries for different preset classification standards according to the graph network, wherein each classification standard is preset to correspond to a commodity attribute set;
    所述基于所述商品属性词典,将所述商品节点序列转换为商品属性节点序列,包括:The converting the commodity node sequence into a commodity attribute node sequence based on the commodity attribute dictionary includes:
    基于所述商品属性词典,将所述商品节点序列转换为不同分类标准对应的商品属性节点序列;Based on the commodity attribute dictionary, converting the commodity node sequence into commodity attribute node sequences corresponding to different classification standards;
    所述结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和商品属性节点嵌入向量,包括:In combination with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and commodity attribute node embedding vector are calculated and obtained, including:
    结合负采样策略,将所述商品节点序列和所述商品属性节点序列作为训练语料,分别输入word2vec模型,计算得到商品节点嵌入向量和不同分类标准下的商品属性节点嵌入向量;Combined with the negative sampling strategy, the commodity node sequence and the commodity attribute node sequence are used as training corpus, respectively input the word2vec model, and the commodity node embedding vector and the commodity attribute node embedding vector under different classification standards are obtained by calculation;
    所述对所述商品节点嵌入向量和所述商品属性节点嵌入向量进行向量拼接或者向量求和操作,得到每个商品对应的综合嵌入向量,包括:The vector splicing or vector sum operation is performed on the commodity node embedding vector and the commodity attribute node embedding vector to obtain a comprehensive embedding vector corresponding to each commodity, including:
    根据预先设置的所述商品节点嵌入向量对应的权重和不同分类标准对应的权重,对所述商品节点嵌入向量和所述商品属性节点嵌入向量,进行向量加权求和操作,得到每个商品对应的所述综合嵌入向量。According to the preset weights corresponding to the commodity node embedding vectors and the weights corresponding to different classification standards, perform a vector weighted sum operation on the commodity node embedding vectors and the commodity attribute node embedding vectors to obtain the corresponding value of each commodity. The synthetic embedding vector.
  19. 根据权利要求16-18中任一项所述的计算机可读存储介质,其中,所述统计所述历史商品浏览记录,构建商品节点的图网络,包括:The computer-readable storage medium according to any one of claims 16-18, wherein the counting the historical commodity browsing records to construct a graph network of commodity nodes, comprising:
    统计所述历史商品浏览记录,选取历史商品浏览记录中的商品集合为待构建的图网络的图中节点;Counting the historical commodity browsing records, and selecting the commodity collection in the historical commodity browsing records as the graph nodes of the graph network to be constructed;
    根据统计结果确定各个商品节点之间的边权重;Determine the edge weight between each commodity node according to the statistical result;
    选取边权重大于预设阈值的商品节点构建商品节点的图网络。Select commodity nodes whose edge weights are greater than a preset threshold to construct a graph network of commodity nodes.
  20. 根据权利要求16-18中任一项所述的计算机可读存储介质,其中,所述确定与所述目标商品的综合嵌入向量相近的综合嵌入向量所对应的商品,为针对所述目标商品所确定的推荐商品,包括:The computer-readable storage medium according to any one of claims 16-18, wherein the commodity corresponding to the integrated embedding vector that is similar to the integrated embedding vector of the target commodity is determined for the target commodity. Identified recommended products, including:
    计算所述目标商品的综合嵌入向量与除所述目标商品外的其他所有商品的综合嵌入向量之间的相似度;Calculate the similarity between the comprehensive embedding vector of the target product and the comprehensive embedding vector of all other products except the target product;
    确定所述相似度大于预设阈值,或者,按照所述相似度大小次序排名前N的综合嵌入向量所对应的商品,为针对所述目标商品所确定的所述推荐商品,N为正整数。It is determined that the similarity is greater than a preset threshold, or, the products corresponding to the top N comprehensive embedding vectors in the order of the similarity are the recommended products determined for the target product, and N is a positive integer.
PCT/CN2021/082934 2020-12-18 2021-03-25 Commodity recommendation method and related device thereof WO2022126901A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011504942.5A CN112633973B (en) 2020-12-18 2020-12-18 Commodity recommendation method and related equipment thereof
CN202011504942.5 2020-12-18

Publications (1)

Publication Number Publication Date
WO2022126901A1 true WO2022126901A1 (en) 2022-06-23

Family

ID=75317141

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/082934 WO2022126901A1 (en) 2020-12-18 2021-03-25 Commodity recommendation method and related device thereof

Country Status (2)

Country Link
CN (1) CN112633973B (en)
WO (1) WO2022126901A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308684A (en) * 2023-05-18 2023-06-23 和元达信息科技有限公司 Online shopping platform store information pushing method and system
CN117611245A (en) * 2023-12-14 2024-02-27 浙江博观瑞思科技有限公司 Data analysis management system and method for planning E-business operation activities
CN118429017A (en) * 2024-07-03 2024-08-02 厦门市一码当先信息科技有限公司 Analysis method and system for realizing advertisement putting effect based on clustering algorithm

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633973B (en) * 2020-12-18 2024-07-16 平安科技(深圳)有限公司 Commodity recommendation method and related equipment thereof
CN113254782B (en) * 2021-06-15 2023-05-05 济南大学 Question-answering community expert recommendation method and system
CN115509734A (en) * 2021-06-23 2022-12-23 华为技术有限公司 Data processing method, system and related equipment
CN113496432B (en) * 2021-07-06 2024-09-13 北京爱笔科技有限公司 Mining method, device, equipment and storage medium for entity to be recommended
CN113781158B (en) * 2021-08-23 2024-07-12 湖南大学 Commodity combination recommendation method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190295124A1 (en) * 2018-03-26 2019-09-26 DoorDash, Inc. Dynamic predictive similarity grouping based on vectorization of merchant data
CN111639989A (en) * 2020-04-28 2020-09-08 上海风秩科技有限公司 Commodity recommendation method and readable storage medium
CN111695960A (en) * 2019-03-12 2020-09-22 阿里巴巴集团控股有限公司 Object recommendation system, method, electronic device and storage medium
CN111815403A (en) * 2020-06-19 2020-10-23 北京石油化工学院 Commodity recommendation method and device and terminal equipment
CN112633973A (en) * 2020-12-18 2021-04-09 平安科技(深圳)有限公司 Commodity recommendation method and related equipment thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191240B (en) * 2018-08-14 2021-06-08 北京九狐时代智能科技有限公司 Method and device for recommending commodities

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190295124A1 (en) * 2018-03-26 2019-09-26 DoorDash, Inc. Dynamic predictive similarity grouping based on vectorization of merchant data
CN111695960A (en) * 2019-03-12 2020-09-22 阿里巴巴集团控股有限公司 Object recommendation system, method, electronic device and storage medium
CN111639989A (en) * 2020-04-28 2020-09-08 上海风秩科技有限公司 Commodity recommendation method and readable storage medium
CN111815403A (en) * 2020-06-19 2020-10-23 北京石油化工学院 Commodity recommendation method and device and terminal equipment
CN112633973A (en) * 2020-12-18 2021-04-09 平安科技(深圳)有限公司 Commodity recommendation method and related equipment thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308684A (en) * 2023-05-18 2023-06-23 和元达信息科技有限公司 Online shopping platform store information pushing method and system
CN116308684B (en) * 2023-05-18 2023-08-11 和元达信息科技有限公司 Online shopping platform store information pushing method and system
CN117611245A (en) * 2023-12-14 2024-02-27 浙江博观瑞思科技有限公司 Data analysis management system and method for planning E-business operation activities
CN117611245B (en) * 2023-12-14 2024-05-31 浙江博观瑞思科技有限公司 Data analysis management system and method for planning E-business operation activities
CN118429017A (en) * 2024-07-03 2024-08-02 厦门市一码当先信息科技有限公司 Analysis method and system for realizing advertisement putting effect based on clustering algorithm

Also Published As

Publication number Publication date
CN112633973B (en) 2024-07-16
CN112633973A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
WO2022126901A1 (en) Commodity recommendation method and related device thereof
CN112148987B (en) Message pushing method based on target object activity and related equipment
JP5615931B2 (en) Clustering method and system
US9122989B1 (en) Analyzing website content or attributes and predicting popularity
Xu et al. Improving user recommendation by extracting social topics and interest topics of users in uni-directional social networks
Xu et al. Interdisciplinary topics of information science: a study based on the terms interdisciplinarity index series
WO2019041521A1 (en) Apparatus and method for extracting user keyword, and computer-readable storage medium
CN107895277A (en) Method, electronic installation and the medium of push loan advertisement in the application
CN104598539B (en) A kind of internet event temperature computational methods and terminal
CN114138985B (en) Text data processing method and device, computer equipment and storage medium
CN112131261B (en) Community query method and device based on community network and computer equipment
WO2021175021A1 (en) Product push method and apparatus, computer device, and storage medium
KR20220034701A (en) Tag-based content recommendation method and server performing the same
Pan et al. Recommendation of crowdsourcing tasks based on word2vec semantic tags
Ye et al. Crowdsourcing-enhanced missing values imputation based on Bayesian network
Wang et al. ST-SAGE: A spatial-temporal sparse additive generative model for spatial item recommendation
CN112507170A (en) Data asset directory construction method based on intelligent decision and related equipment thereof
CN114219664B (en) Product recommendation method, device, computer equipment and storage medium
CN112257959A (en) User risk prediction method and device, electronic equipment and storage medium
JP2023554210A (en) Sort model training method and apparatus for intelligent recommendation, intelligent recommendation method and apparatus, electronic equipment, storage medium, and computer program
Shaowen et al. An improved collaborative filtering recommendation algorithm
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
Chen et al. Inferring tag co-occurrence relationship across heterogeneous social networks
Deng et al. A multiuser identification algorithm based on internet of things
CN115099875A (en) Data classification method based on decision tree model and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21904841

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904841

Country of ref document: EP

Kind code of ref document: A1