Nothing Special   »   [go: up one dir, main page]

CN105975497A - Automatic microblog topic recommendation method and device - Google Patents

Automatic microblog topic recommendation method and device Download PDF

Info

Publication number
CN105975497A
CN105975497A CN201610268830.1A CN201610268830A CN105975497A CN 105975497 A CN105975497 A CN 105975497A CN 201610268830 A CN201610268830 A CN 201610268830A CN 105975497 A CN105975497 A CN 105975497A
Authority
CN
China
Prior art keywords
text
topic
content
microblogging
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610268830.1A
Other languages
Chinese (zh)
Inventor
徐华
李佳
邓俊辉
孙晓民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201610268830.1A priority Critical patent/CN105975497A/en
Publication of CN105975497A publication Critical patent/CN105975497A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an automatic microblog topic recommendation method and device. The method comprises the following steps: processing text contents of microblogs on the basis of a neural network model so as to obtain feature vectors; classifying the text contents of the microblogs through a softmax classifier according to the feature vectors so as to obtain topic classes; and automatically carrying out topic recommendation on the microblogs without topics according to the topic classes. According to the method, the users and the microblog management platform can be helped to manage massive microblog contents. The invention furthermore discloses an automatic microblog topic recommendation device.

Description

Microblog topic auto recommending method and device
Technical field
The present invention relates to Computer Applied Technology and field of social network, particularly relate to a kind of microblog topic auto recommending method and Device.
Background technology
Text representation is a vital step in the tasks such as web search, information sifting, sentiment analysis.In tradition In machine learning method, generally occur with the form of character representation.Character representation method the most frequently used in Textual study is word bag Submodel.In word bag pattern, the most frequently used feature be word, binary phrase, polynary phrase (n-gram) and some The template characteristic of work extraction.After representing text with the form of feature, conventional model often use word frequency, mutual information, PLSA (Probability Latent Semantic Analysis, probability dive semantic analysis), LDA (Latent Dirichlet Allocation, document subject matter generate model) etc. method filter out maximally effective feature.But, traditional method represent text time, Contextual information can be ignored, also can lose word order information simultaneously.
In recent years, pre-training term vector and deep neural network model are that natural language processing brings new thinking.At word With the help of vector, it is thus proposed that the method for some combination semantemes represents the semanteme of text.Recognition with Recurrent Neural Network can be at O (n) The semanteme of text is built in time.This model by word process whole document, and all semantemes above be saved in one fix In the hidden layer of size.The advantage of Recognition with Recurrent Neural Network is that it can preferably catch contextual information, upper to distance Context information is modeled.But, Recognition with Recurrent Neural Network is one inclined model, such as the Recognition with Recurrent Neural Network for forward For, the word that in text, word rearward is the most forward occupies more leading status.Due to the characteristic of this semantic biasing, follow Ring neutral net build whole text semantic time, the information of text aft section can be comprised more.But the most also The emphasis of not all text is all placed on finally, and this may affect the degree of accuracy of its semantic expressiveness generated.
For the problem solving semantic biasing, it is thus proposed that build text semantic with convolutional neural networks.Convolutional neural networks Utilizing maximum pond technology can find out most useful text fragments from text, its complexity is also O (n).Therefore convolutional Neural Network has bigger potentiality when building text semantic.But, the model of existing convolutional neural networks always uses fairly simple Convolution kernel, such as stationary window.When using this class model, how to determine that window size is a key issue.Work as window Time the least, contextual information may be caused to retain deficiency, it is difficult to word is accurately portrayed;And when window is the biggest, can lead Cause parameter is too much, increases model optimization difficulty.Accordingly, it would be desirable to consider, how to build model, could preferably capture up and down Literary composition information, reduces the difficulty that selection window size is brought.
Research for short text theme receives much concern always, and text message carries out theme the most preparatively dividing is to need A problem to be solved.
Summary of the invention
The purpose of the present invention is intended to solve one of above-mentioned technical problem the most to a certain extent.
To this end, the first of the present invention purpose is to propose a kind of microblog topic auto recommending method.The method can help to use Family and the content of microblog of microblog management magnanimity.
Second object of the present invention is to propose a kind of automatic recommendation apparatus of microblog topic.
For reaching above-mentioned purpose, the microblog topic auto recommending method of first aspect present invention embodiment, including: based on nerve net Network model carries out process to the content of text of microblogging and obtains characteristic vector;By softmax grader according to described characteristic vector The content of text of described microblogging is carried out classification and obtains topic classification;According to described topic classification automatically to not containing the micro-of topic Win and carry out topic recommendation.
The microblog topic auto recommending method of the embodiment of the present invention, is primarily based on neural network model and enters the content of text of microblogging Row process obtains characteristic vector, then passes through softmax grader and classifies the content of text of microblogging according to characteristic vector Obtain topic classification, automatically the microblogging not containing topic is carried out topic recommendation finally according to topic classification.The method can be helped Help user and the content of microblog of microblog management magnanimity.
In some instances, described neural network model includes: convolutional neural networks model and Recognition with Recurrent Neural Network model.
In some instances, described based on neural network model, the content of text of microblogging carried out process to obtain characteristic vector concrete Including: remove the gibberish in the content of text of described microblogging, and obtain newly according to disabling the vocabulary useless stop words of removal Content of text;Each the most single by what the sentence of described new content of text was carried out that convolution operation obtains in described sentence The local feature of unit, and described local feature is carried out maximum operation obtain the characteristic vector of described sentence;Finally utilize and follow Ring neutral net processes the characteristic vector of the content of text obtaining described microblogging to the characteristic vector of described sentence.
In some instances, described gibberish includes :@information, URL information and pictorial information.
For reaching above-mentioned purpose, the automatic recommendation apparatus of microblog topic of second aspect present invention embodiment, including: processing module, Characteristic vector is obtained for the content of text of microblogging being carried out process based on neural network model;Sort module, passes through softmax Grader carries out classification according to described characteristic vector to the content of text of described microblogging and obtains topic classification;Automatically recommending module, For automatically the microblogging not containing topic being carried out topic recommendation according to described topic classification.
The automatic recommendation apparatus of microblog topic of the embodiment of the present invention, first processing module are based on the neural network model literary composition to microblogging This content carries out process and obtains characteristic vector, and then sort module passes through softmax grader according to characteristic vector to microblogging Content of text carries out classification and obtains topic classification, last recommending module automatically according to topic classification automatically to not containing the micro-of topic Win and carry out topic recommendation.This device can help user and the content of microblog of microblog management magnanimity.
In some instances, described neural network model includes: convolutional neural networks model and Recognition with Recurrent Neural Network model.
In some instances, described processing module specifically for: remove the gibberish in the content of text of described microblogging, and New content of text is obtained according to disabling the vocabulary useless stop words of removal;By the sentence of described new content of text is carried out Convolution operation obtains the local feature of each elementary cell in described sentence, and described local feature is carried out maximum operation Obtain the characteristic vector of described sentence;Finally utilize Recognition with Recurrent Neural Network that the characteristic vector of described sentence is carried out process and obtain institute State the characteristic vector of the content of text of microblogging.
In some instances, described gibberish includes :@information, URL information and pictorial information.
Aspect and advantage that the present invention adds will part be given in the following description, and part will become bright from the following description Aobvious, or recognized by the practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage the accompanying drawings below description to embodiment will be apparent from from combining and Easy to understand, wherein:
Fig. 1 is the flow chart of microblog topic auto recommending method according to an embodiment of the invention;
Fig. 2 is the schematic diagram of the convolutional layer of convolutional neural networks model according to an embodiment of the invention;
Fig. 3 is the flow chart that text semantic is built by Recognition with Recurrent Neural Network model according to an embodiment of the invention;
Fig. 4 is the schematic diagram of the characteristic vector of the content of text of microblogging according to an embodiment of the invention;
Fig. 5 is according to the flow chart of the microblog topic auto recommending method of one specific embodiment of the present invention;And
The schematic diagram of the automatic recommendation apparatus of Fig. 6 microblog topic according to an embodiment of the invention.
Detailed description of the invention
Embodiments of the invention are described below in detail, and the example of described embodiment is shown in the drawings, the most identical Or similar label represents same or similar element or has the element of same or like function.Retouch below with reference to accompanying drawing The embodiment stated is exemplary, it is intended to is used for explaining the present invention, and is not considered as limiting the invention.
Fig. 1 is the flow chart of microblog topic auto recommending method according to an embodiment of the invention.
As it is shown in figure 1, this microblog topic auto recommending method may include that
S101, carries out process based on neural network model to the content of text of microblogging and obtains characteristic vector.
It should be noted that in some instances, neural network model may include that convolutional neural networks model and circulation god Through network model.
It is understood that the high fault tolerance of neutral net and nonlinearity descriptive power make it be studied widely and answer With, and the wherein outstanding person in the most each Connectionist model of convolutional neural networks.Convolutional neural networks be one multi-level Neutral net, each layer is made up of multiple two dimensional surfaces, and each plane is made up of multiple independent neurons.In network Comprise computation layer and feature extraction layer.Usually, the input of each neuron is connected with the local field of preceding layer, and carries Taking the feature of this local, once this local feature is extracted, and the position relationship between it and other features is just determined;Each Computation layer is made up of multiple Feature Mapping, and each Feature Mapping is a plane, and the neuron weights in each plane are identical. Owing to the neuron on each mapping face shares weights, decrease the free parameter number of network, and then reduce network parameter The time complexity selected.In network, the output connection value of neuron meets " maximum detection is assumed ", i.e. in a certain zonule In the neuronal ensemble of interior existence, the neuron only exporting maximum just strengthens output valve.According to hypothesis, only one of which is neural Unit can strengthen.The unit of convolutional neural networks is exactly maximum output unit, and also controls the strengthening result of neighbouring unit.Volume Long-pending neutral net, except input and output layer, also convolutional layer, sampling layer and full articulamentum, has in convolutional layer and sampling layer Several characteristic patterns, each layer has multiple plane, constantly revises neuron weights when training.Conplane neuron is weighed It is worth identical, so can have the displacement of same degree, rotational invariance.Due to weights share, so from a plane to The mapping of individual plane can be regarded as convolution algorithm.Between hidden layer and hidden layer, spatial resolution is successively decreased, every layer contain flat Face number is incremented by, and so can be used for detecting more characteristic information.In convolutional layer, the characteristic pattern of preceding layer and one can learn Core carries out convolution, and the output after activation primitive of the result of convolution forms the neuron of this layer, thus constitutes this layer of feature Figure.Convolutional layer for example, as shown in Figure 2.Convolutional neural networks can by three methods realize displacement, scaling and Distortion invariance, i.e. local receptive field, weights are shared and secondary sampling.Local receptive field refers to the neuron of each layer network Only being connected with the neural unit in a small neighbourhood of last layer, by local receptive field, each neuron can extract primary Feature;Weights are shared and are made convolutional neural networks have more preferable parameter, need relatively little of training data and time;Secondary Sampling can reduce the resolution of feature, it is achieved the invariance to the distortion of displacement, scaling and other forms.
It addition, Recognition with Recurrent Neural Network can build the semanteme of text within O (n) time.This model processes whole document, and handle by word All semantemes above are saved in the hidden layer of a fixed size.Recognition with Recurrent Neural Network is again to be that it can be preferably Catch contextual information, the contextual information of distance is modeled.But, Recognition with Recurrent Neural Network is one inclined mould Type, as the Recognition with Recurrent Neural Network of forward, the word that in text, word rearward is the most forward occupies more leading status. Due to the characteristic of this semantic biasing, Recognition with Recurrent Neural Network is when building whole text semantic, after comprising text more The information of face part.When obtaining the semantic expressiveness of sentence and document, it is easy to expect directly along the distribution hypothesis of word, right Document is modeled.But, if using distribution hypothesis to directly generate the vector representation of sentence or document, can run into greatly Sparse Problem.If sentence is regarded as an entirety, word vector model trains the expression of sentence, due to the biggest Most sentences were because only occurring once, and the result of training will have no statistical significance.On the other hand, distribution hypothesis is for word The hypothesis of justice, this most effective to sentence and document by the way of context obtains semanteme, it need to discuss.Therefore need Seeking new thinking to be modeled sentence and document, Recognition with Recurrent Neural Network is exactly a kind of model having very much.Circulation nerve net Network is proposed in nineteen ninety first by Elman et al..The core of this model is each that inputted in text by endless form one by one Word, and safeguard a hidden layer, retain all of information above.Recognition with Recurrent Neural Network is a special case of recurrent neural network, It is believed that its correspondence is the right subtree of any one the non-leaf node tree that is leaf node.This special construction makes Recognition with Recurrent Neural Network has two features: one, owing to securing network structure, and model only can need to build within O (n) time The semanteme of text.This makes Recognition with Recurrent Neural Network more efficiently can be modeled the semanteme of text.Two, from network structure On see, the number of plies of Recognition with Recurrent Neural Network is very deep, has several word in sentence, and which floor network just has.Therefore, tradition side is used During method training Recognition with Recurrent Neural Network, can run into gradient decay or the problem of gradient blast, it is the most square that this needs model to use Method realizes optimization process.Recognition with Recurrent Neural Network is to the building process of text semantic as shown in Figure 3: each word with represent on all The hidden layer of literary composition is combined into new hidden layer, from the first of text word cycle calculations to last word.When mode input institute After some words, the hidden layer that last word is corresponding represents the semanteme of whole text.In optimal way, circulation nerve Network and other network structure the most slightly difference.In common neutral net, back-propagation algorithm can utilize the chain of derivative Formula rule directly calculates and obtains.But in Recognition with Recurrent Neural Network, owing to its hidden layer is to the weight matrix of next hidden layer H is multiplexing, directly extremely difficult to weight matrix derivation.The simplest optimal way of Recognition with Recurrent Neural Network is anti-along the time To communications.First network is launched by the method, marks sample for each, the model reverse biography by general network Hidden layer is updated by technology of broadcasting one by one, and repeatedly updates weight matrix H therein.Due to the problem of gradient decay, use BPTT When optimizing Recognition with Recurrent Neural Network, only propagate the fixing number of plies.In order to solve gradient attenuation problem, Hochreiter and Schmidhuber proposed LSTM model in 1997.This model introduces mnemon, can preserve distance information, It it is a kind of conventional prioritization scheme of Recognition with Recurrent Neural Network.
Specifically, in some instances, based on neural network model, the content of text of microblogging is carried out process and obtain characteristic vector Specifically include: remove the gibberish in the content of text of microblogging, and obtain newly according to disabling the vocabulary useless stop words of removal Content of text.The local of each elementary cell in sentence is obtained by the sentence of new content of text being carried out convolution operation Feature, and local feature is carried out maximum operation obtain the characteristic vector of sentence.Finally utilize Recognition with Recurrent Neural Network to sentence Characteristic vector carry out processing the characteristic vector of the content of text obtaining microblogging.Wherein, the characteristic vector of the content of text of microblogging As shown in Figure 4.
More specifically, in some instances, gibberish includes :@information, URL information and pictorial information.Can manage Solve, remove the gibberish such as information, URL information, pictorial information etc. in microblogging text, then micro-to Chinese Rich content carries out word segmentation processing, and removes useless stop words according to Chinese stoplist.
It should be noted that the useful information of microblogging text data refers to content of microblog.
Wherein, the local spy of each elementary cell in sentence is obtained by the sentence of new content of text being carried out convolution operation Levy, and local feature is carried out maximum operation obtain the characteristic vector of sentence.It is understood that content of microblog is carried out The vector representation study of sentence level.Give and comprise N number of ultimate unit (r1, r2..., rN) sentence x, word rank sentence basic Unit is single word, and the ultimate unit of word rank sentence is the word after participle.When calculating Sentence-level another characteristic, meeting Run into two main problems: the length of different sentences is different, and important information appears in the optional position of sentence.Utilize volume Model is set up in sentence by lamination, calculates Sentence-level another characteristic, can solve two problems above-mentioned.Grasped by convolution Work can obtain each ultimate unit id local feature in sentence, then the local feature obtained is carried out maximum operation, from And obtain the sentence characteristics vector of a regular length.Comprising N number of ultimate unit (r1, r2..., rN) sentence x in, convolutional layer It is that the continuous window of k carries out matrix-vector operation to each size.Size k of convolution window is different, the local message of acquisition Also different.The k being set suitable size by experiment preliminary stage is learnt.The sentence characteristics vector that all convolutional layers are generated Concatenate, obtain the characteristic vector of a new sentence.
S102, carries out classification according to characteristic vector to the content of text of microblogging by softmax grader and obtains topic classification.
It should be noted that grader can be but not limited to softmax grader.
S103, carries out topic recommendation to the microblogging not containing topic automatically according to topic classification.
The microblog topic auto recommending method of the embodiment of the present invention, is primarily based on neural network model and enters the content of text of microblogging Row process obtains characteristic vector, then passes through softmax grader and classifies the content of text of microblogging according to characteristic vector Obtain topic classification, automatically the microblogging not containing topic is carried out topic recommendation finally according to topic classification.The method can be helped Help user and the content of microblog of microblog management magnanimity.
For example, as shown in Figure 5: based on the microblog topic auto recommending method based on Recognition with Recurrent Neural Network in the present invention, The present invention develops a set of for the automatic commending system of Sina's microblog topic.The new content of microblog that this system of users is issued is carried out Recommend to include two stages: be first the automatic pretreatment stage of system, original content of microblog carry out data cleansing, so The rear vector representation utilizing convolutional neural networks and Recognition with Recurrent Neural Network to obtain microblogging rank;Next to that rank recommended in the topic of system Section, system is called the softmax disaggregated model trained and as feature, microblogging vector representation is carried out topic classification, by topic Classification recommends user.The recommendation results of this system can help user and microblog effectively to manage massive micro-blog data.
In order to those skilled in the art become more apparent upon microblog topic auto recommending method, illustrate below in conjunction with Fig. 6: in the face of one New microblogging, obtains the vector representation of sentence level, then further with circulation god first by convolutional neural networks Go out the vector representation of microblogging rank through e-learning, then utilize the model trained to carry out topic classification, by topic Classification automatically recommend user.
The microblog topic auto recommending method of the embodiment of the present invention, is primarily based on neural network model and enters the content of text of microblogging Row process obtains characteristic vector, then passes through softmax grader and classifies the content of text of microblogging according to characteristic vector Obtain topic classification, automatically the microblogging not containing topic is carried out topic recommendation finally according to topic classification.The method can be helped Help user and the content of microblog of microblog management magnanimity.
Corresponding with the microblog topic auto recommending method that above-described embodiment provides, a kind of embodiment of the present invention also provides for one The automatic recommendation apparatus of microblog topic, the automatic recommendation apparatus of microblog topic provided due to the embodiment of the present invention carries with above-described embodiment The microblog topic auto recommending method of confession has same or analogous technical characteristic, therefore in the aforementioned microblog topic side of recommendation automatically The embodiment of method is also applied for the automatic recommendation apparatus of microblog topic that the present embodiment provides, and retouches the most in detail State.As shown in Figure 6, the automatic recommendation apparatus of this microblog topic comprises the steps that processing module 110, sort module 120, automatically pushes away Recommend module 130.
Wherein, processing module 110 obtains characteristic vector for the content of text of microblogging being carried out process based on neural network model.
Sort module 120 carries out classification according to characteristic vector to the content of text of microblogging by softmax grader and obtains topic class Not.
Automatically recommending module 130 is for automatically carrying out topic recommendation to the microblogging not containing topic according to topic classification.
In some instances, neural network model includes: convolutional neural networks model and Recognition with Recurrent Neural Network model.
In some instances, processing module 110 specifically for: remove microblogging content of text in gibberish, and according to Disable the vocabulary useless stop words of removal and obtain new content of text;By the sentence of new content of text is carried out convolution operation Obtain the local feature of each elementary cell in sentence, and local feature is carried out maximum operation obtain the feature of sentence to Amount;Finally utilize Recognition with Recurrent Neural Network that the characteristic vector of sentence is processed the characteristic vector of the content of text obtaining microblogging.
In some instances, gibberish includes :@information, URL information and pictorial information.
The automatic recommendation apparatus of microblog topic of the embodiment of the present invention, first processing module are based on the neural network model literary composition to microblogging This content carries out process and obtains characteristic vector, and then sort module passes through softmax grader according to characteristic vector to microblogging Content of text carries out classification and obtains topic classification, last recommending module automatically according to topic classification automatically to not containing the micro-of topic Win and carry out topic recommendation.This device can help user and the content of microblog of microblog management magnanimity.
In describing the invention, it is to be understood that term " first ", " second " are only used for describing purpose, and can not It is interpreted as instruction or hint relative importance or the implicit quantity indicating indicated technical characteristic.Thus, define " the One ", the feature of " second " can express or implicitly include at least one this feature.In describing the invention, " multiple " It is meant that at least two, such as two, three etc., unless otherwise expressly limited specifically.
In the description of this specification, reference term " embodiment ", " some embodiments ", " example ", " concrete example ", Or specific features, structure, material or the feature that the description of " some examples " etc. means to combine this embodiment or example describes It is contained at least one embodiment or the example of the present invention.In this manual, need not to the schematic representation of above-mentioned term Identical embodiment or example must be directed to.And, the specific features of description, structure, material or feature can be in office One or more embodiments or example combine in an appropriate manner.Additionally, in the case of the most conflicting, this area The feature of the different embodiments described in this specification or example and different embodiment or example can be tied by technical staff Close and combination.
In flow chart or at this, any process described otherwise above or method description are construed as, and represent and include one Or the module of code, fragment or the part of the executable instruction of the more step for realizing specific logical function or process, And the scope of the preferred embodiment of the present invention includes other realization, wherein can not press order that is shown or that discuss, Including according to involved function by basic mode simultaneously or in the opposite order, performing function, this should be by the present invention's Embodiment person of ordinary skill in the field understood.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is can Completing instructing relevant hardware by program, described program can be stored in a kind of computer-readable recording medium, This program upon execution, including one or a combination set of the step of embodiment of the method.
Although above it has been shown and described that embodiments of the invention, it is to be understood that above-described embodiment is exemplary, Being not considered as limiting the invention, those of ordinary skill in the art within the scope of the invention can be to above-described embodiment It is changed, revises, replaces and modification.

Claims (8)

1. a microblog topic auto recommending method, it is characterised in that including:
Based on neural network model, the content of text of microblogging is carried out process and obtain characteristic vector;
According to described characteristic vector, the content of text of described microblogging is carried out classification by softmax grader and obtain topic classification;
Automatically the microblogging not containing topic is carried out topic recommendation according to described topic classification.
2. microblog topic auto recommending method as claimed in claim 1, it is characterised in that described neural network model includes: Convolutional neural networks model and Recognition with Recurrent Neural Network model.
3. microblog topic auto recommending method as claimed in claim 1, it is characterised in that described based on neural network model The content of text of microblogging carries out process obtain characteristic vector and specifically include:
Remove the gibberish in the content of text of described microblogging, and obtain new according to disabling the vocabulary useless stop words of removal Content of text;
The local of each elementary cell in described sentence is obtained by the sentence of described new content of text is carried out convolution operation Feature, and described local feature is carried out maximum operation obtain the characteristic vector of described sentence;
Finally utilize Recognition with Recurrent Neural Network that the characteristic vector of described sentence is processed the spy of the content of text obtaining described microblogging Levy vector.
4. microblog topic auto recommending method as claimed in claim 3, it is characterised in that described gibberish includes :@ Information, URL information and pictorial information.
5. the automatic recommendation apparatus of microblog topic, it is characterised in that including:
Processing module, obtains characteristic vector for the content of text of microblogging being carried out process based on neural network model;
Sort module, carries out classification according to described characteristic vector to the content of text of described microblogging by softmax grader and obtains Topic classification;
Automatically recommending module, for automatically carrying out topic recommendation to the microblogging not containing topic according to described topic classification.
6. the automatic recommendation apparatus of microblog topic as claimed in claim 5, it is characterised in that described neural network model includes: Convolutional neural networks model and Recognition with Recurrent Neural Network model.
7. the automatic recommendation apparatus of microblog topic as claimed in claim 5, it is characterised in that described processing module specifically for:
Remove the gibberish in the content of text of described microblogging, and obtain new according to disabling the vocabulary useless stop words of removal Content of text;
The local of each elementary cell in described sentence is obtained by the sentence of described new content of text is carried out convolution operation Feature, and described local feature is carried out maximum operation obtain the characteristic vector of described sentence;
Finally utilize Recognition with Recurrent Neural Network that the characteristic vector of described sentence is processed the spy of the content of text obtaining described microblogging Levy vector.
8. the automatic recommendation apparatus of microblog topic as claimed in claim 7, it is characterised in that described gibberish includes :@ Information, URL information and pictorial information.
CN201610268830.1A 2016-04-27 2016-04-27 Automatic microblog topic recommendation method and device Pending CN105975497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610268830.1A CN105975497A (en) 2016-04-27 2016-04-27 Automatic microblog topic recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610268830.1A CN105975497A (en) 2016-04-27 2016-04-27 Automatic microblog topic recommendation method and device

Publications (1)

Publication Number Publication Date
CN105975497A true CN105975497A (en) 2016-09-28

Family

ID=56993169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610268830.1A Pending CN105975497A (en) 2016-04-27 2016-04-27 Automatic microblog topic recommendation method and device

Country Status (1)

Country Link
CN (1) CN105975497A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844765A (en) * 2017-02-22 2017-06-13 中国科学院自动化研究所 Notable information detecting method and device based on convolutional neural networks
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
CN107273348A (en) * 2017-05-02 2017-10-20 深圳大学 The topic and emotion associated detecting method and device of a kind of text
CN107832047A (en) * 2017-11-27 2018-03-23 北京理工大学 A kind of non-api function argument based on LSTM recommends method
CN108021934A (en) * 2017-11-23 2018-05-11 阿里巴巴集团控股有限公司 The method and device of more key element identifications
CN108038414A (en) * 2017-11-02 2018-05-15 平安科技(深圳)有限公司 Character personality analysis method, device and storage medium based on Recognition with Recurrent Neural Network
CN108694202A (en) * 2017-04-10 2018-10-23 上海交通大学 Configurable Spam Filtering System based on sorting algorithm and filter method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887443A (en) * 2009-05-13 2010-11-17 华为技术有限公司 Method and device for classifying texts
CN102082619A (en) * 2010-12-27 2011-06-01 中国人民解放军理工大学通信工程学院 Transmission adaptive method based on double credible evaluations
CN102402621A (en) * 2011-12-27 2012-04-04 浙江大学 Image retrieval method based on image classification
CN103617230A (en) * 2013-11-26 2014-03-05 中国科学院深圳先进技术研究院 Method and system for advertisement recommendation based microblog
CN104572892A (en) * 2014-12-24 2015-04-29 中国科学院自动化研究所 Text classification method based on cyclic convolution network
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN104899298A (en) * 2015-06-09 2015-09-09 华东师范大学 Microblog sentiment analysis method based on large-scale corpus characteristic learning
CN105447179A (en) * 2015-12-14 2016-03-30 清华大学 Microblog social network based topic automated recommendation method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887443A (en) * 2009-05-13 2010-11-17 华为技术有限公司 Method and device for classifying texts
CN102082619A (en) * 2010-12-27 2011-06-01 中国人民解放军理工大学通信工程学院 Transmission adaptive method based on double credible evaluations
CN102402621A (en) * 2011-12-27 2012-04-04 浙江大学 Image retrieval method based on image classification
CN103617230A (en) * 2013-11-26 2014-03-05 中国科学院深圳先进技术研究院 Method and system for advertisement recommendation based microblog
CN104572892A (en) * 2014-12-24 2015-04-29 中国科学院自动化研究所 Text classification method based on cyclic convolution network
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN104899298A (en) * 2015-06-09 2015-09-09 华东师范大学 Microblog sentiment analysis method based on large-scale corpus characteristic learning
CN105447179A (en) * 2015-12-14 2016-03-30 清华大学 Microblog social network based topic automated recommendation method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PUYANG XU,RUHI SARIKAYA: "Contextual domain classification in spoken language understanding systems using recurrent neural network", 《 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
刘龙飞,杨亮: "基于卷积神经网络的微博情感倾向性分析", 《中文信息学报》 *
张剑,屈丹: "基于词向量特征的循环神经网络语言模型", 《模式识别与人工智能》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
CN106844765A (en) * 2017-02-22 2017-06-13 中国科学院自动化研究所 Notable information detecting method and device based on convolutional neural networks
CN106844765B (en) * 2017-02-22 2019-12-20 中国科学院自动化研究所 Significant information detection method and device based on convolutional neural network
CN108694202A (en) * 2017-04-10 2018-10-23 上海交通大学 Configurable Spam Filtering System based on sorting algorithm and filter method
CN107273348A (en) * 2017-05-02 2017-10-20 深圳大学 The topic and emotion associated detecting method and device of a kind of text
CN107273348B (en) * 2017-05-02 2020-12-18 深圳大学 Topic and emotion combined detection method and device for text
CN108038414A (en) * 2017-11-02 2018-05-15 平安科技(深圳)有限公司 Character personality analysis method, device and storage medium based on Recognition with Recurrent Neural Network
WO2019085329A1 (en) * 2017-11-02 2019-05-09 平安科技(深圳)有限公司 Recurrent neural network-based personal character analysis method, device, and storage medium
CN108021934A (en) * 2017-11-23 2018-05-11 阿里巴巴集团控股有限公司 The method and device of more key element identifications
CN108021934B (en) * 2017-11-23 2022-03-04 创新先进技术有限公司 Method and device for recognizing multiple elements
CN107832047A (en) * 2017-11-27 2018-03-23 北京理工大学 A kind of non-api function argument based on LSTM recommends method
CN107832047B (en) * 2017-11-27 2018-11-27 北京理工大学 A kind of non-api function argument recommended method based on LSTM

Similar Documents

Publication Publication Date Title
CN109376242B (en) Text classification method based on cyclic neural network variant and convolutional neural network
CN110222178B (en) Text emotion classification method and device, electronic equipment and readable storage medium
CN108763326B (en) Emotion analysis model construction method of convolutional neural network based on feature diversification
Cao et al. A joint model for word embedding and word morphology
CN105975497A (en) Automatic microblog topic recommendation method and device
CN107092596A (en) Text emotion analysis method based on attention CNNs and CCR
Alwehaibi et al. Comparison of pre-trained word vectors for arabic text classification using deep learning approach
Fahad et al. Inflectional review of deep learning on natural language processing
CN106886580B (en) Image emotion polarity analysis method based on deep learning
CN110110323B (en) Text emotion classification method and device and computer readable storage medium
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
Wahid et al. Cricket sentiment analysis from Bangla text using recurrent neural network with long short term memory model
CN110188195B (en) Text intention recognition method, device and equipment based on deep learning
KR20190063978A (en) Automatic classification method of unstructured data
CN107066445A (en) The deep learning method of one attribute emotion word vector
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
Al Wazrah et al. Sentiment analysis using stacked gated recurrent unit for arabic tweets
CN106919557A (en) A kind of document vector generation method of combination topic model
Pan et al. Deep neural network-based classification model for Sentiment Analysis
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN106446147A (en) Emotion analysis method based on structuring features
Khatun et al. Authorship Attribution in Bangla literature using Character-level CNN
CN108733675A (en) Affective Evaluation method and device based on great amount of samples data
CN110321918A (en) The method of public opinion robot system sentiment analysis and image labeling based on microblogging

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160928

RJ01 Rejection of invention patent application after publication