CN113486235A - Method and system for identifying user interest - Google Patents
Method and system for identifying user interest Download PDFInfo
- Publication number
- CN113486235A CN113486235A CN202110601742.XA CN202110601742A CN113486235A CN 113486235 A CN113486235 A CN 113486235A CN 202110601742 A CN202110601742 A CN 202110601742A CN 113486235 A CN113486235 A CN 113486235A
- Authority
- CN
- China
- Prior art keywords
- user
- time sequence
- behavior data
- product
- interest
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000006399 behavior Effects 0.000 claims abstract description 137
- 230000007613 environmental effect Effects 0.000 claims abstract description 20
- 230000015654 memory Effects 0.000 claims abstract description 20
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 12
- 230000006854 communication Effects 0.000 claims description 35
- 238000004891 communication Methods 0.000 claims description 34
- 238000012545 processing Methods 0.000 claims description 18
- 238000007781 pre-processing Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 13
- 230000002123 temporal effect Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 5
- 230000001502 supplementing effect Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 12
- 238000004590 computer program Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a method and a system for identifying user interest, wherein the method for identifying the user interest comprises the following steps: acquiring user behavior logs on each channel to obtain user behavior data, summarizing the behavior data of the same user, and arranging the behavior data according to a time sequence to obtain a user time sequence behavior data set; extracting time sequence environment characteristics according to the user time sequence behavior data set, and matching label characteristics according to the user time sequence behavior data set; the method comprises the steps of inputting time sequence environmental characteristics and label characteristics into an interest identification model, and outputting the interest intensity of a user on a product, wherein in the interest identification model, a long-time memory network is adopted to represent time sequence behavior characteristics, a convolutional neural network is adopted to represent the label characteristics, and the label characteristics and the time sequence behavior characteristics are fully connected to determine the interest intensity of the user on the product.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and system for identifying user interests.
Background
With the development of technical means, in the same scene, a user often leaves user data on a plurality of media, and the data from different sources describe the interest and preference of the user on a product from various dimensions, for example, in the marketing scene of the product, the user often leaves user data on various media such as a telephone, a short message, a webpage and the like, and the data from each media covers various data types such as plain text, images, videos or voices; in the related art, the user interest identification is performed based on data in a single format generated by a user operating on a single medium, and the identified user interest is often not accurate due to the single data source and the lack of comprehensiveness.
An effective solution has not been proposed to the problem identified in the related art that the user interest is not accurate.
Disclosure of Invention
The embodiment of the application provides a method and a system for identifying user interests, which are beneficial to improving the accuracy of the identified user interests.
In a first aspect, an embodiment of the present application provides a method for identifying a user interest, where the method includes:
acquiring user behavior logs on each channel to obtain user behavior data, summarizing the behavior data of the same user, and arranging the behavior data according to a time sequence to obtain a user time sequence behavior data set;
extracting time sequence environmental characteristics according to the user time sequence behavior data set, and matching label characteristics according to the user time sequence behavior data set, wherein the time sequence environmental characteristics comprise text characteristics, communication characteristics and webpage characteristics, and the label characteristics comprise user label characteristics and product label characteristics;
inputting the time sequence environmental characteristics and the label characteristics into an interest identification model, and outputting the interest intensity of the user on the product, wherein in the interest identification model, a long-time memory network is adopted to represent the time sequence behavior characteristics, a convolutional neural network is adopted to represent the label characteristics, and the label characteristics and the time sequence behavior characteristics are fully connected to determine the interest intensity of the user on the product.
In some of these embodiments, the set of user temporal behavior data includes speech data, and before the extracting the temporal environmental feature, the method includes: preprocessing the user time sequence behavior data set, wherein the preprocessing comprises the following steps: and converting the voice data to obtain text data.
In some embodiments, the user time-series behavior data set includes a product identifier, and after the converting the voice data, the preprocessing further includes: and supplementing product attributes into the user time sequence behavior data set according to the product identification, wherein the product identification comprises a product number or a product name.
In some embodiments, the user time-series behavior data set further includes short message data, and the process of extracting text features includes:
acquiring text data in the user time sequence behavior data set, wherein the text data comprises the short message data and the text data after the voice data conversion processing;
performing word segmentation processing on the text data to obtain segmented words;
a part-of-speech tag is marked on the word to obtain the text characteristic;
the text features are represented by one-hot codes.
In some embodiments, the set of user time series behavior data further includes communication data, and the process of extracting the communication features includes:
acquiring the communication data in the user time sequence behavior data set, wherein the communication data comprises a vibration ring time length, a call time length, a talk time length after connection, an average word spitting rate per conversation or a total talk length;
carrying out segmentation processing on the communication data to obtain the communication characteristics;
the communication characteristics are represented by a one-hot code.
In some embodiments, the user time-series behavior data set further includes a web page record, and the extracting the web page features includes:
acquiring the webpage record in the user time sequence behavior data set, wherein the webpage record comprises page stay time or page operation number;
carrying out segmentation processing on the webpage record to obtain the webpage characteristics;
and expressing the webpage characteristics by using the one-hot codes.
In some of these embodiments, the process of matching product label characteristics comprises:
acquiring the product identification in the user time sequence behavior data set;
matching product label features according to the incidence relation between predefined product identification and the product label features, wherein the product label features comprise the name, the number, the activity time or the popularization region range of the product;
the product label features are represented by a one-hot code.
In some embodiments, the user time series behavior data set includes a user identifier, and the process of matching the user tag characteristics includes:
acquiring the user identification in the user time sequence behavior data set;
matching user tag characteristics according to a predefined association relationship between a user identifier and the user tag characteristics, wherein the user tag characteristics comprise the age, the gender, the region or the occupation of the user;
the user tag feature is represented by a one-hot code.
In a second aspect, an embodiment of the present application provides a system for identifying user interests, where the system includes:
the acquisition module is used for acquiring user behavior logs on various channels to obtain user behavior data, summarizing the behavior data of the same user, and arranging the behavior data according to a time sequence to obtain a user time sequence behavior data set;
the extraction module is used for extracting time sequence environmental characteristics according to the user time sequence behavior data set and matching label characteristics according to the user time sequence behavior data set, wherein the time sequence environmental characteristics comprise text characteristics, communication characteristics and webpage characteristics, and the label characteristics comprise user label characteristics and product label characteristics;
and the output module is used for inputting the time sequence environment characteristics and the label characteristics into an interest identification model and outputting the interest intensity of the user on the product, wherein in the interest identification model, a long-time memory network is adopted to represent the time sequence behavior characteristics, a convolutional neural network is adopted to represent the label characteristics, and the label characteristics and the time sequence behavior characteristics are fully connected to determine the interest intensity of the user on the product.
In some of these embodiments, the set of user temporal behavior data includes speech data, the system further comprising:
a preprocessing module, configured to preprocess the user time series behavior data set, where the preprocessing includes: and converting the voice data to obtain text data.
Compared with the related art, the method for identifying the user interest obtains the user behavior data by obtaining the user behavior logs on each channel, summarizes the behavior data of the same user, and arranges the behavior data according to the time sequence to obtain the user time sequence behavior data set; extracting time sequence environmental characteristics according to the user time sequence behavior data set, and matching label characteristics according to the user time sequence behavior data set, wherein the time sequence environmental characteristics comprise text characteristics, communication characteristics and webpage characteristics, and the label characteristics comprise user label characteristics and product label characteristics; the method comprises the steps of inputting time sequence environmental characteristics and label characteristics into an interest identification model, and outputting the interest intensity of a user on a product, wherein in the interest identification model, a long-time memory network is adopted to represent time sequence behavior characteristics, a convolutional neural network is adopted to represent the label characteristics, and the label characteristics and the time sequence behavior characteristics are fully connected to determine the interest intensity of the user on the product, so that the problem that the user interest identified in the related technology is inaccurate is solved, and the effect of improving the accuracy of user interest identification is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an application environment of a method for identifying user interest according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of user interest identification according to a first embodiment of the present application;
FIG. 3 is a flow chart of pre-processing a user time series behavior data set according to a second embodiment of the present application;
FIG. 4 is a flow chart of extracting text features according to a third embodiment of the present application;
FIG. 5 is a flow chart of extracting communication characteristics according to a fourth embodiment of the present application;
FIG. 6 is a flowchart of extracting features of a web page according to a fifth embodiment of the present application;
FIG. 7 is a flow chart of matching product label characteristics according to a sixth embodiment of the present application;
FIG. 8 is a flow chart of matching user tag characteristics according to a seventh embodiment of the present application;
FIG. 9 is a schematic diagram of an interest recognition model according to an eighth embodiment of the present application;
FIG. 10 is a block diagram of a system for user interest identification according to a ninth embodiment of the present application;
fig. 11 is a block diagram of a system for user interest recognition according to a tenth embodiment of the present application;
fig. 12 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The method for identifying user interest provided by the present application can be applied to an application environment shown in fig. 1, fig. 1 is an application environment schematic diagram of the method for identifying user interest according to the embodiment of the present application, as shown in fig. 1, the terminal 101 can be a smartphone of a salesperson, through the terminal 101, the salesperson can talk with or send a short message to each other with a user, so that the salesperson can introduce a product to the user and receive a feedback of the user to the product, the terminal 101 can also be a notebook computer of the user, through the terminal 101, the user can open a webpage of the product of interest to view details of the product, the terminal 101 and the server 102 can communicate, the server 102 obtains data in the terminal 101, and the server 102 operates the method for identifying user interest provided by the present application, so that the server 102 can determine the strength of interest of the user to the product through the obtained data, the terminal 101 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 102 may be implemented by an independent server or a server cluster composed of a plurality of servers.
The present embodiment provides a method for identifying a user interest, and fig. 2 is a flowchart of a method for identifying a user interest according to a first embodiment of the present application, as shown in fig. 2, the flowchart includes the following steps:
step S201, obtaining user behavior logs on each channel to obtain user behavior data, summarizing the behavior data of the same user, and arranging the behavior data according to a time sequence to obtain a user time sequence behavior data set, for example, the user behavior logs on each channel can be obtained in a point burying manner;
step S202, extracting time sequence environmental characteristics according to a user time sequence behavior data set, and matching label characteristics according to the user time sequence behavior data set, wherein the time sequence environmental characteristics comprise text characteristics, communication characteristics and webpage characteristics, and the label characteristics comprise user label characteristics and product label characteristics;
step S203, inputting time sequence environment characteristics and label characteristics into an interest identification model, and outputting the interest intensity of a user on a product, wherein in the interest identification model, a Long Short Term Memory Network (LSTM) is adopted to represent time sequence behavior characteristics, a Convolutional Neural Network (CNN) is adopted to represent label characteristics, and the label characteristics and the time sequence behavior characteristics are fully connected to determine the interest intensity of the user on the product.
Through steps S201 to S203, compared to the problem that the data source is single and the identified user interest is not accurate enough in the prior art, the present embodiment obtains the user behavior logs on each channel, sorts the data of the same user according to the time sequence relationship, extracts the time sequence environment feature, and inputs the time sequence environment feature and the tag feature into the interest identification model according to the matching tag features of the product and the user, because the data of the present embodiment is from each channel, the data source is rich and comprehensive, and the present embodiment establishes and uses the interest identification model, in the interest identification model, because the time sequence behavior features are all related to the time sequence, the time sequence behavior feature can be represented by adopting a long-time memory network, the tag feature is represented by adopting a convolutional neural network, and the tag feature and the time sequence behavior feature are all connected, so that the interest intensity of the user on the product can be determined more accurately, the problem that the user interest identified in the related technology is not accurate is solved, and the effect of improving the accuracy of user interest identification is achieved.
Further, the user time series behavior data set includes voice data, short message data, and a web page record, where the data includes a product identifier, and in some embodiments, before extracting the time series environment feature, the user time series behavior data set is preprocessed, fig. 3 is a flowchart of preprocessing the user time series behavior data set according to a second embodiment of the present application, and as shown in fig. 3, the flowchart includes the following steps:
step S301, performing conversion processing on the voice data to obtain text data, for example, processing the voice data generated by a telephone call through an Automatic Speech Recognition technology (ASR for short) to obtain text data, where for the voice data that is not connected to the telephone call, the text content corresponding to the voice data may be marked as "not connected", and for the voice data that has no content after ASR conversion, the text content corresponding to the voice data may be marked as "empty text";
step S302, performing error correction processing on the text data, for example, processing errors caused by homophones, nears, fuzzy tones and abnormal pronunciations in the text data, wherein the text data comprises text data after ASR processing and short message data in a user time sequence behavior data set;
step S303, supplementing product attributes to a user time sequence behavior data set according to product identifiers in the data, wherein the product identifiers comprise product numbers or product names, the product attributes can be predefined, for example, when a salesperson introduces a product in a telephone, only the product number of a customer can be told, and the product attributes corresponding to the product identifiers can be supplemented to the user time sequence behavior data set according to the product identifiers, so that more data support can be provided for a subsequent interest identification model to determine the user interest intensity;
step S304, determining data grade information according to the webpage records in the user time sequence behavior data set, and supplementing the data grade information to the user time sequence behavior data set so as to convert the complex behavior data of the user on the webpage into more uniform grade information, for example, determining the stay time grade to which the data belongs according to the stay time data of the user on the webpage, and supplementing the stay time grade to the user time sequence behavior data set.
Through steps S301 to S304, in this embodiment, after the user time series behavior data set is obtained, before the time series environment feature is extracted from the user time series behavior data set, the data in the user time series behavior data set is preprocessed, so that semantic errors in the data can be reduced, and the data content is enriched, thereby further improving the accuracy of user interest identification.
Optionally, in some embodiments, fig. 4 is a flowchart of extracting text features according to a third embodiment of the present application, and as shown in fig. 4, the flowchart includes the following steps:
step S401, acquiring text data in a user time sequence behavior data set, wherein the text data comprises short message data and text data after voice data conversion processing;
step S402, performing word segmentation processing on the text data to obtain segmented words, and optionally performing word segmentation by using a combination mode of a dictionary and a word segmentation tool;
step S403, a part-of-speech tag is marked on the word to obtain text characteristics, wherein the part-of-speech tag can be predefined, for example, five thousand words are obtained after word segmentation aiming at a user time sequence behavior data set of a loan product, and five thousand part-of-speech tags of a small loan can be marked on the five thousand words;
and S404, expressing the text characteristics by using One-Hot codes (One-Hot), and converting the text characteristics from a text format to a numerical format to ensure that the format of the text characteristics meets the format requirement of the interest recognition model on input data.
Through steps S401 to S404, in this embodiment, after the text data in the user time series behavior data set is acquired, word segmentation processing is performed on the text data, a part-of-speech tag is added to a word, and text features are extracted from complex text contents, so that a data basis can be provided for a subsequent interest recognition model to determine the user interest strength.
Optionally, in some embodiments, the user time-series behavior data set further includes communication data, and fig. 5 is a flowchart of extracting communication features according to a fourth embodiment of the present application, as shown in fig. 5, where the flowchart includes the following steps:
step S501, communication data in a user time sequence behavior data set are obtained, wherein the communication data comprise vibration ring time length, call time length, talking time length after connection, average per-conversation word-spitting rate or total talking length;
step S502, carrying out segmentation processing on communication data to obtain communication characteristics;
and S503, expressing the communication characteristics by using the one-hot codes, and converting the communication characteristics from a text format to a numerical value format to enable the format of the communication characteristics to meet the format requirement of the interest recognition model on the input data.
Through steps S501 to S503, because the user often responds quickly to the product in which he is interested in during the telephone communication process, and the query content is increased, in this embodiment, the communication data is extracted from the user time series behavior data set, and the communication data includes, but is not limited to, the ringing duration, the call duration, the speaking duration after connection, the average word-spitting rate per session, or the total speaking length, and is processed in segments to obtain the communication characteristics, so that a data basis can be provided for the subsequent interest recognition model to determine the user interest strength.
Optionally, in some embodiments, fig. 6 is a flowchart of extracting features of a web page according to a fifth embodiment of the present application, and as shown in fig. 6, the flowchart includes the following steps:
step S601, acquiring a webpage record in a user time sequence behavior data set, wherein the webpage record comprises a page stay time, a stay time grade or a page operation number;
step S602, the webpage record is segmented to obtain webpage characteristics;
step S603, the webpage characteristics are represented by the one-hot codes, and the format of the webpage characteristics is enabled to meet the format requirement of the interest recognition model on input data by converting the webpage characteristics from a text format to a numerical value format.
Through steps S601 to S603, since in the process of viewing product details on a web page, a user often looks at a product interested in the user for a longer time and the number of pages viewed is increased, in this embodiment, a web page record is extracted from the user time sequence behavior data set, where the web page record includes, but is not limited to, a page dwell time, a dwell time level, or a page operation number, and the web page record is subjected to segmentation processing to obtain a web page feature, so that a data basis can be provided for a subsequent interest identification model to determine the user interest strength.
Optionally, in some embodiments, fig. 7 is a flowchart of matching product label features according to a sixth embodiment of the present application, and as shown in fig. 7, the flowchart includes the following steps:
step S701, acquiring a product identifier in a user time sequence behavior data set;
step S702, matching product label characteristics according to a predefined association relation between the product identification and the product label characteristics, wherein the product label characteristics comprise the name, the number, the activity time or the popularization region range of a product, and meanwhile, the part-of-speech label corresponding to the product identification can also be used as the product label characteristics;
and step S703, expressing the product label characteristics by using the unique hot code, and converting the product label characteristics from a text format to a numerical value format to ensure that the format of the product label characteristics conforms to the format requirement of the interest identification model on the input data.
Through steps S701 to S703, the present embodiment expands the product tag features for the user time series behavior data set, so as to provide more data support for determining the user interest strength for the subsequent interest recognition model.
Optionally, in some embodiments, fig. 8 is a flowchart of matching user tag features according to a seventh embodiment of the present application, and as shown in fig. 8, the flowchart includes the following steps:
step S801, acquiring a user identifier in a user time sequence behavior data set;
step S802, matching user tag characteristics according to the association relation between the predefined user identification and the user tag characteristics, wherein the user tag characteristics comprise the age, the gender, the region or the occupation of the user;
and step S803, representing the user label characteristics by using the one-hot code, and converting the user label characteristics from a text format to a numerical format to enable the format of the user label characteristics to meet the format requirement of the interest recognition model on input data.
Through steps S801 to S803, the present embodiment expands the user tag features for the user time series behavior data set, so as to provide more data support for determining the user interest strength for the subsequent interest recognition model.
In some embodiments, the interest intensities may be divided into three levels to obtain interest labels corresponding to different levels, fig. 9 is a schematic diagram of an interest recognition model according to an eighth embodiment of the present application, as shown in fig. 9, an objective of the interest recognition model is to recognize the interest intensity of a user in each time sequence stage (e.g., time 1, time 2, … …, time N) for a corresponding product, so as to obtain a change state of the interest intensity of the user along with the time sequence, and after dividing the interest intensity into three levels in the interest recognition model, the following three interest labels are obtained: strong rejection, no intention and intention, adopting a Long Short Term Memory Network (LSTM) of a Recurrent Neural Network (RNN) to represent time sequence behavior characteristics, adopting a Convolutional Neural Network (CNN) to represent label characteristics, finally fully connecting the label characteristics and the time sequence behavior characteristics, and determining an interest label of a user for a product according to data after full connection, weight and standard data.
Optionally, when the standard data of the model is established, a part of the preprocessed data may be extracted, an interest tag is marked on the extracted data according to a marking rule, and the extracted data is manually checked, and the data after being checked can be used as the standard data, for example, the marking rule of the voice data may be: if the voice data contains obviously rejected words (such as dirty words, disturbance-free words and direct negative words), strongly exclusive interest tags are marked on the voice data, if the call duration of the voice data is less than a certain duration (such as 15 seconds), an unintended interest tag is marked on the voice data, and if the voice data contains obviously accepted words (such as 'ok'), an intended interest tag is marked on the voice data; the marking rules of the web page record may be: marking is carried out according to the overall duration and the number of pages checked by a user, whether the page is dragged and slid, the duration of checking a specific page (such as a product introduction page) and the like, and marking results are provided for manual examination and verification to obtain an accurate interest tag, so that original accurate standard data are provided for the overall subsequent model training.
This embodiment also provides a system for identifying user interest, fig. 10 is a block diagram of a system for identifying user interest according to a ninth embodiment of the present application, and as shown in fig. 10, the system includes:
an obtaining module 1001, configured to obtain user behavior logs on various channels, obtain user behavior data, summarize behavior data of the same user, and arrange the behavior data according to a time sequence to obtain a user time sequence behavior data set;
the extracting module 1002 is configured to extract time sequence environmental features according to the user time sequence behavior data set, and match tag features according to the user time sequence behavior data set, where the time sequence environmental features include text features, communication features, and web page features, and the tag features include user tag features and product tag features;
the output module 1003 is configured to input the time sequence environment feature and the tag feature into the interest recognition model, and output the interest intensity of the user on the product, where in the interest recognition model, a long-time memory network is used to represent the time sequence behavior feature, a convolutional neural network is used to represent the tag feature, and the tag feature and the time sequence behavior feature are fully connected to determine the interest intensity of the user on the product.
In some embodiments, the time-series behavior data set of the user includes voice data, fig. 11 is a block diagram of a system for identifying user interest according to a tenth embodiment of the present application, and as shown in fig. 11, the system further includes:
the preprocessing module 1101 is configured to preprocess the user time-series behavior data set, where a process of the preprocessing includes: and converting the voice data to obtain text data.
In one embodiment, fig. 12 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 12, there is provided an electronic device, which may be a server, and an internal structure diagram of which may be as shown in fig. 12. The electronic device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing data. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of user interest identification.
Those skilled in the art will appreciate that the structure shown in fig. 12 is a block diagram of only a portion of the structure relevant to the present disclosure, and does not constitute a limitation on the electronic device to which the present disclosure may be applied, and that a particular electronic device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A method of user interest identification, the method comprising:
acquiring user behavior logs on each channel to obtain user behavior data, summarizing the behavior data of the same user, and arranging the behavior data according to a time sequence to obtain a user time sequence behavior data set;
extracting time sequence environmental characteristics according to the user time sequence behavior data set, and matching label characteristics according to the user time sequence behavior data set, wherein the time sequence environmental characteristics comprise text characteristics, communication characteristics and webpage characteristics, and the label characteristics comprise user label characteristics and product label characteristics;
inputting the time sequence environmental characteristics and the label characteristics into an interest identification model, and outputting the interest intensity of the user on the product, wherein in the interest identification model, a long-time memory network is adopted to represent the time sequence behavior characteristics, a convolutional neural network is adopted to represent the label characteristics, and the label characteristics and the time sequence behavior characteristics are fully connected to determine the interest intensity of the user on the product.
2. The method of claim 1, wherein the set of user temporal behavior data comprises speech data, and wherein before the extracting the temporal environmental features, the method comprises: preprocessing the user time sequence behavior data set, wherein the preprocessing comprises the following steps: and converting the voice data to obtain text data.
3. The method of claim 2, wherein the user time series behavior data set comprises a product identifier, and after the converting the voice data, the preprocessing further comprises: and supplementing product attributes into the user time sequence behavior data set according to the product identification, wherein the product identification comprises a product number or a product name.
4. The method of claim 2, wherein the set of user time series behavior data further comprises short message data, and the extracting text features comprises:
acquiring text data in the user time sequence behavior data set, wherein the text data comprises the short message data and the text data after the voice data conversion processing;
performing word segmentation processing on the text data to obtain segmented words;
a part-of-speech tag is marked on the word to obtain the text characteristic;
the text features are represented by one-hot codes.
5. The method of claim 1, wherein the set of user time series behavior data further comprises communication data, and wherein the extracting the communication characteristics comprises:
acquiring the communication data in the user time sequence behavior data set, wherein the communication data comprises a vibration ring time length, a call time length, a talk time length after connection, an average word spitting rate per conversation or a total talk length;
carrying out segmentation processing on the communication data to obtain the communication characteristics;
the communication characteristics are represented by a one-hot code.
6. The method of claim 1, wherein the user time series behavior data set further comprises a web page record, and the extracting the web page feature comprises:
acquiring the webpage record in the user time sequence behavior data set, wherein the webpage record comprises page stay time or page operation number;
carrying out segmentation processing on the webpage record to obtain the webpage characteristics;
and expressing the webpage characteristics by using the one-hot codes.
7. The method of claim 3, wherein the process of matching product label characteristics comprises:
acquiring the product identification in the user time sequence behavior data set;
matching product label features according to the incidence relation between predefined product identification and the product label features, wherein the product label features comprise the name, the number, the activity time or the popularization region range of the product;
the product label features are represented by a one-hot code.
8. The method of claim 1, wherein the user time series behavior data set comprises a user identification, and wherein the process of matching user tag characteristics comprises:
acquiring the user identification in the user time sequence behavior data set;
matching user tag characteristics according to a predefined association relationship between a user identifier and the user tag characteristics, wherein the user tag characteristics comprise the age, the gender, the region or the occupation of the user;
the user tag feature is represented by a one-hot code.
9. A system for user interest identification, the system comprising:
the acquisition module is used for acquiring user behavior logs on various channels to obtain user behavior data, summarizing the behavior data of the same user, and arranging the behavior data according to a time sequence to obtain a user time sequence behavior data set;
the extraction module is used for extracting time sequence environmental characteristics according to the user time sequence behavior data set and matching label characteristics according to the user time sequence behavior data set, wherein the time sequence environmental characteristics comprise text characteristics, communication characteristics and webpage characteristics, and the label characteristics comprise user label characteristics and product label characteristics;
and the output module is used for inputting the time sequence environment characteristics and the label characteristics into an interest identification model and outputting the interest intensity of the user on the product, wherein in the interest identification model, a long-time memory network is adopted to represent the time sequence behavior characteristics, a convolutional neural network is adopted to represent the label characteristics, and the label characteristics and the time sequence behavior characteristics are fully connected to determine the interest intensity of the user on the product.
10. The system of claim 9, wherein the set of user temporal behavior data comprises voice data, the system further comprising:
a preprocessing module, configured to preprocess the user time series behavior data set, where the preprocessing includes: and converting the voice data to obtain text data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110601742.XA CN113486235A (en) | 2021-05-31 | 2021-05-31 | Method and system for identifying user interest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110601742.XA CN113486235A (en) | 2021-05-31 | 2021-05-31 | Method and system for identifying user interest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113486235A true CN113486235A (en) | 2021-10-08 |
Family
ID=77933802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110601742.XA Pending CN113486235A (en) | 2021-05-31 | 2021-05-31 | Method and system for identifying user interest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113486235A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103235823A (en) * | 2013-05-06 | 2013-08-07 | 上海河广信息科技有限公司 | Method and system for determining current interest of users according to related web pages and current behaviors |
CN110781407A (en) * | 2019-10-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | User label generation method and device and computer readable storage medium |
CN111340112A (en) * | 2020-02-26 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Classification method, classification device and server |
CN111784455A (en) * | 2020-06-30 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Article recommendation method and recommendation equipment |
CN111985247A (en) * | 2020-08-31 | 2020-11-24 | 华侨大学 | Microblog user interest identification method and system based on multi-granularity text feature representation |
-
2021
- 2021-05-31 CN CN202110601742.XA patent/CN113486235A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103235823A (en) * | 2013-05-06 | 2013-08-07 | 上海河广信息科技有限公司 | Method and system for determining current interest of users according to related web pages and current behaviors |
CN110781407A (en) * | 2019-10-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | User label generation method and device and computer readable storage medium |
CN111340112A (en) * | 2020-02-26 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Classification method, classification device and server |
CN111784455A (en) * | 2020-06-30 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Article recommendation method and recommendation equipment |
CN111985247A (en) * | 2020-08-31 | 2020-11-24 | 华侨大学 | Microblog user interest identification method and system based on multi-granularity text feature representation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110444198B (en) | Retrieval method, retrieval device, computer equipment and storage medium | |
CN108595695B (en) | Data processing method, data processing device, computer equipment and storage medium | |
CN109960725B (en) | Text classification processing method and device based on emotion and computer equipment | |
CN109473106A (en) | Vocal print sample collection method, apparatus, computer equipment and storage medium | |
CN111324713B (en) | Automatic replying method and device for conversation, storage medium and computer equipment | |
CN111930940A (en) | Text emotion classification method and device, electronic equipment and storage medium | |
CN112037799A (en) | Voice interrupt processing method and device, computer equipment and storage medium | |
CN113297366B (en) | Emotion recognition model training method, device, equipment and medium for multi-round dialogue | |
CN112291423B (en) | Communication call intelligent response processing method and device, electronic equipment and storage medium | |
CN110597951B (en) | Text parsing method, text parsing device, computer equipment and storage medium | |
CN114449310A (en) | Video editing method and device, computer equipment and storage medium | |
CN111897931A (en) | Session setting method and apparatus, server, computer readable storage medium | |
CN113051384A (en) | User portrait extraction method based on conversation and related device | |
CN112989046A (en) | Real-time speech technology prejudging method, device, computer equipment and storage medium | |
CN113709313B (en) | Intelligent quality inspection method, device, equipment and medium for customer service call data | |
CN113111157B (en) | Question-answer processing method, device, computer equipment and storage medium | |
WO2020057014A1 (en) | Dialogue analysis and evaluation method and apparatus, computer device and storage medium | |
CN109147792A (en) | A kind of voice resume system | |
CN113051924A (en) | Method and system for segmented quality inspection of recorded data | |
CN115659078B (en) | Network information security monitoring method and system based on artificial intelligence | |
CN111382569B (en) | Method and device for identifying entity in dialogue corpus and computer equipment | |
CN113255368B (en) | Method and device for emotion analysis of text data and related equipment | |
CN113486235A (en) | Method and system for identifying user interest | |
CN114528851B (en) | Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium | |
CN115292495A (en) | Emotion analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |