CN108647225A - A kind of electric business grey black production public sentiment automatic mining method and system - Google Patents
A kind of electric business grey black production public sentiment automatic mining method and system Download PDFInfo
- Publication number
- CN108647225A CN108647225A CN201810249344.4A CN201810249344A CN108647225A CN 108647225 A CN108647225 A CN 108647225A CN 201810249344 A CN201810249344 A CN 201810249344A CN 108647225 A CN108647225 A CN 108647225A
- Authority
- CN
- China
- Prior art keywords
- black
- word
- electric business
- grey black
- website
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a kind of electric business grey black to produce public sentiment automatic mining method and system, and this approach includes the following steps:(1) it is scanned for by search engine using the black word of seed as keyword, crawls the text data of website and site information data in search result;(2) text data is pre-processed, the black word of identification acquisition from pretreated text data;(3) site information data are analyzed, identification obtains grey black and produces website;(4) the black word of acquisition is extended in black dictionary;The grey black production website of acquisition is extended in grey black production website library;(5) the black word for obtaining step (2) repeats step (1)~(4) as the black word of seed.The method of the present invention can be found in time, early warning and improvement electric business cheating, realization carry out real-time various dimensions monitoring to the production of electric business grey black.
Description
Technical field
Security technology area more particularly to a kind of electric business grey black production public sentiment in being produced the present invention relates to network grey black are dug automatically
Dig method and system.
Background technology
The production of network grey black is one of the significant problem for endangering internet ecological safety.Traditional technology can be to black-hat
SEO, false comment, artificial flow, social networks are fried the cheatings such as letter and are detected, but with cheating constantly change with
Transfer, existing model and method lose applicability soon.Therefore using the safety-related external public sentiment text of magnanimity, to net
The production of network grey black analyze and monitor in real time, is conducive to that grey black production is found and hit from source.
It is produced different from traditional network grey black, the production of electric business grey black is a kind of novel grey black production just risen for nearest 10 years, is led to
Refer to the cheating that wash sale, brush list, brush flow etc. violate electric business platform related specifications.Brush list etc. is bought from account
Grey black produces platform, then arrives the infrastructure services such as express delivery sky packet, and the production of electric business grey black has formed complete industrial chain, and as electric business is led
The anti-mechanism of practising fraud in domain it is perfect, electric business cheating also tends to specialization, however the distribution produced to grey black of enterprises and existing
Shape is largely manually analyzed, and the grey black production scale being growing can not be coped with.
Invention content
For the production of electric business grey black industrialization, specialization, scale the characteristics of and grey black production corpus, knowledge base it is almost empty
White present situation, the present invention provide a kind of electric business grey black and produce public sentiment automatic mining method and system, can find in time, early warning and improvement
Electric business cheating is realized and carries out real-time various dimensions monitoring to the production of electric business grey black.
The present invention provides following technical solutions:
A kind of electric business grey black production public sentiment automatic mining method, includes the following steps:
(1) it is scanned for by search engine using the black word of seed as keyword, crawls the textual data of website in search result
According to site information data;
(2) text data is pre-processed, the black word of identification acquisition from pretreated text data;
(3) site information data are analyzed, identification obtains grey black and produces website;
(4) the black word of acquisition is extended in black dictionary;The grey black production website of acquisition is extended in grey black production website library;
(5) the black word for obtaining step (2) repeats step (1)~(4) as the black word of seed.
Preferably, the quantity of the black word of the seed is no less than 10.
The black word quantity of seed is more, and the analysis data of acquisition are more, and black word and grey black the production website obtained in step (2) is got over
It is more, but when the black word quantity of seed is excessive, the analysis data volume of acquisition is excessively huge, and subsequent analysis calculation amount is excessive so that
The acquisition efficiency of black word reduces, it is preferred that the quantity of the black word of seed is 10~50.
The black word of seed be manually identify produce relevant word with electric business grey black.
In step (2), the pretreatment includes carrying out duplicate removal, subordinate sentence, participle, part-of-speech tagging and sieve to text data
Choosing;Include the following steps:
It is calculated by text relevant and duplicate removal is carried out to the text data;
With the comma () of Chinese form or English form, fullstop (.), question mark (), exclamation mark (!), colon (:) or branch
(;) it is separator, the text data after duplicate removal is divided into independent sentence;
Chinese word segmentation is carried out to each sentence, sentence is divided into sequence of terms;
Part-of-speech tagging is carried out to each word, rejects function word therein.
After pre-processing above, the text data crawled is converted into the sentence of noise smaller (eliminating meaningless function word)
Son, each sentence are indicated by the sequence of terms with part-of-speech tagging, to reduce subsequent vocabulary treating capacity.
In step (2), the black word of identification acquisition from pretreated text data, including:
(i) the fetching portion language material from pretreated text data is used as training sample after manually marking type of word
With verification sample;
(ii) term vector of each word in training sample is initialized;Training sample is inputted into two-way shot and long term memory network
(Bi-LSTM) it is calculated into row vector in, obtains output vector;
(iii) it using output vector as the input of condition random field (CRF), calculates each word and corresponds to each type of word
Probability;
(iv) stochastic gradient descent algorithm is used to update the network parameter of two-way shot and long term memory network and condition random field;
(v) accurate rate that two-way shot and long term memory network and condition random field are tested using verification collection, if the accurate rate
Reach requirement, then terminate to train, otherwise continues to train;
(vi) two-way shot and long term memory network and condition random field predict that identification obtains to pretreated text data
Take black word.
Two-way shot and long term memory network (Bi-LSTM) is a kind of two-way shot and long term memory network.LSTM(Long-Short
Term memory) neuron in basic RNN is replaced with to the doors (input gate, out gate forget door) and 1 of 3 interactions
A mnemon (memory cell), input gate, which opens stylish input, can just change the historic state preserved in network, output
The historic state that door preserves when opening can be accessed to, and the output after influence, forget door for emptying previously stored go through
History information.The information that LSTM is inputted before being allowed to goes ahead transmission, it is thus possible to learn long-term dependence, exist
The fields such as part-of-speech tagging, name Entity recognition achieve extraordinary application effect.The input of LSTM is unidirectional, is only considered
Above on following influence, the basic thought of Bi-LSTM be to sequence data one LSTM model of each training backward forward, then
The output of two models is combined, all contextual informations can be completely relied on to reach each node in sequence.
In step (i), type of word includes following a few classes:
(a) electric business context noun (ecn);
(b) electric business context verb (ecv);
(c) electric business grey black production personage (ECP);
(d) electric business grey black production things (ECI);
(e) electric business grey black production platform (ECL);
(f) electric business grey black production behavior (ECA);
(g) other black words (OB);
(h) other words (other);
Wherein, (c), (d), (e), (f), (g) type word be black word.
The sample total ratio of the training set and verification collection is 2~9: 1;Most preferably, the training set and verification collect
Sample total ratio is 9: 1.
In step (iv), network is updated using stochastic gradient descent (Stochastic Gradient Descent, SGD)
Parameter is 0.002 when learning rate (learning rate) is initial, and a drag is calculated on collection by verifying after every 5 training
Learning rate is just reduced 1/10th, prevents over-fitting by loss if penalty values do not reduce.
In step (3), identification obtains grey black production website and includes:
(3-1) manually marks the website in partial search results, and structure training set and verification collect;
(3-2) extracts url features, text feature and the html features of training sample;
(3-3) is normalized training sample and verification sample using the number of nonumeric feature as its numerical characteristics;
(3-4) is trained SVM models using the training sample after normalization as the input of SVM models;
(3-5) predicts the suspected site by trained SVM models, and identification obtains grey black and produces website.
It refers to that manually differentiation website is that normal website or grey black produce website that step (3-1) carries out artificial mark to website
(movable platform, connection network, software tool of the production of electric business grey black etc.), and be labeled.
In step (3-2), the url features include URL depth, URL length and domain name length;The text feature
Keyword, webpage including web page contents are averaged word number and webpage number, wherein the keyword of web page contents using TF-IDF values most
Big preceding 10 words;The html features include hyperlink quantity, exterior chain quantity, picture tag quantity, Javascript
Number of labels and button number of labels.
Black word and grey black that step (2) and step (3) obtain are produced into website and extend to black dictionary and grey black production website respectively
In library.User can be as needed, and the black word newly to obtain continues to repeat step (1)~(4), number of repetition as the black word of seed
Depending on the needs of user, to realize the automatic mining for producing public sentiment to electric business grey black.
User can analyze the cheating mode of electric business grey black production according to the black word in black dictionary, to the corresponding anti-cheating of design
It measures;Can the website in grey black production website library relevant departments be fed back to handle.
The present invention additionally provides a kind of electric business grey black production public sentiment automatic mining system simultaneously, including:
Data acquisition module is scanned for by search engine using the black word of seed as keyword, crawls net in search result
The text data and site information data stood;
Analysis module pre-processes the text data, the black word of identification acquisition from pretreated text data;
Site information data are analyzed, identification obtains grey black and produces website;
Enlargement module, including black dictionary and grey black produce website library, and the black word of acquisition is extended in black dictionary and is sent to
Data acquisition module is as the black word of seed;Grey black production website is extended in grey black production website library.
Compared with prior art, beneficial effects of the present invention are:
(1) external information by all kinds of means, the newest corpus of structure electric business grey black production can be obtained in real time;
(2) it can identify that electric business grey black produces relevant black word and grey black production website, structure electric business grey black produces information bank, favorably
In the detection subsequently to electric business cheating and the radical cure to grey black production;
(3) it can realize the automatic mining produced to electric business grey black, save manual analysis cost, be conducive to large scale deployment
And implementation.
Description of the drawings
Fig. 1 is the configuration diagram that electric business grey black produces public sentiment automatic mining system;
Fig. 2 is the flow diagram that electric business grey black produces public sentiment automatic mining method;
Fig. 3 is black word identification process schematic diagram;
Fig. 4 is the structural schematic diagram of the black word identification model based on natural language sequence labelling.
Specific implementation mode
Present invention is further described in detail with reference to the accompanying drawings and examples, it should be pointed out that reality as described below
It applies example to be intended to be convenient for the understanding of the present invention, and does not play any restriction effect to it.
Electric business grey black produces the framework of public sentiment automatic mining system as shown in Figure 1, including mainly reptile module, black word identification mould
Block, black production website identification module and grey black produce enlargement module.
Reptile module crawls the text for including the black word of seed in internet, by text relevant calculate and text duplicate removal,
Obtain electric business grey black production public sentiment corpus;Crawl the site information of the suspected site comprising the black word of seed;
Wherein public feelings information source includes:Microblogging, news, forum, mhkc, wechat, government website, video website and other;
Black word identification module natural language sequence labelling technology, identification electric business grey black produce the black word in public sentiment language material;
Black production website identification module extracts the feature of the suspected site, and using trained disaggregated model, identification grey black produces net
It stands;
Black production enlargement module automates to search engine using the black word of acquisition as searching keyword and sends inquiry request,
Text message, relevant search and the associated recommendation in search result are crawled by reptile module, expand black dictionary;Crawl search knot
The URL (uniform resource locator) and html information of website in fruit expand grey black and produce website list.
The electric business grey black that public sentiment automatic mining system is produced based on electric business grey black produces public sentiment automatic mining method, including following step
Suddenly, as shown in Figure 2:
(1) according to existing black word in black dictionary as the black word of seed, search engine is passed through as keyword using the black word of seed
(such as Baidu) is searched for, and the site information and text message of search result website are crawled;
(2) text crawled is pre-processed, including carries out duplicate removal, subordinate sentence, participle and part-of-speech tagging, it is specific as follows:
With Chinese form or English form ",.!:;" etc. punctuation marks be separator, by text segmentation at independent sentence
Son.
To each independent sentence, carried using language technology platform (Language Technology Platform, LTP)
The Chinese word segmentation function of confession, is divided into sequence of terms by sentence, and such as " can earn commission after brush hand completion task " is divided into " brush
Hand is completed, task, after, just, can be with earning, commission ".
Part-of-speech tagging be in sentence each word mark part of speech classification, part of speech classification include noun, verb, adjective,
28 class such as conjunction, adverbial word, preposition, auxiliary word, interjection, name, place name, prefix, suffix segments sentence and then utilizes language
The part-of-speech tagging function of technology platform, to the word progress part-of-speech tagging in sentence, such as " brush hand/n, completion/v, task/n,
Afterwards/dn, just/d, can be with/v, and earning/v, commission/n ", wherein n indicate that termini generales, v indicate that general verb, d indicate adverbial word, dn
Indicate direction noun.
After part-of-speech tagging, by conjunction, adverbial word, preposition, auxiliary word, interjection this five classes function word rejecting, only retaining has practical meaning
The notional word of justice, to reduce vocabulary treating capacity.
By pre-processing above, all texts are converted into the sentence of noise smaller (eliminating meaningless function word), each
Sentence is indicated by the sequence of terms with part-of-speech tagging, such as " Taobao/ni brush list/v earnings/v commissions/n ", wherein n tables
Show that noun, v indicate that verb, ni indicate institution term.
(3) the black word in the method identification text of natural language sequence labelling is used, is included the following steps (see Fig. 3):
(3-1) manually marks type of word to the word in pretreated corpus of text, and structure training set and verification collect;
Include 2700 sentences in training set, it includes 300 sentences that verification, which is concentrated,;
Word is divided into Types Below:
(a) electric business context noun (ecn);
(b) electric business context verb (ecv);
(c) electric business grey black production personage (ECP);
(d) electric business grey black production things (ECI);
(e) electric business grey black production platform (ECL);
(f) electric business grey black production behavior (ECA);
(g) other black words (OB);
(h) other words (other);
Wherein, (c), (d), (e), (f), (g) type word be black word;
(3-2) is inputted the sentence in training set as the list entries of two-way shot and long term memory network (Bi-LSTM);
Bi-LSTM networks include 4 layers of two-way LSTM, as shown in Figure 4;
(3-3) using the term vector of each word in word2vec initialization sentences, term vector dimension is 200;
LSTM layers and backward LSTM layers of (3-4) forward direction carries out state transmission and vector calculates;By the output vector of Bi-LSTM
As the input of condition random field (CRF), the probability that each word corresponds to each type is calculated;
Update network parameter:It is updated and is joined using stochastic gradient descent (Stochastic Gradient Descent, SGD)
Number is 0.002 when learning rate (learning rate) is initial, calculates the damage of a drag after every 5 training on verification collection
It loses, if penalty values do not reduce, so that learning rate is reduced 1/10th, prevent over-fitting;
(3-5) terminates to train by verifying collection verification model if model accurate rate reaches requirement, otherwise goes to (3-4)
Continue to train;
After (3-6) terminates training, the word of black part of speech type is corresponded to the Text Feature Extraction not marked using trained model.
(4) it identifies black production website, includes the following steps:
(4-1) carries out artificial mark to crawling site information according to the black word of seed, that is, indicate the website be black word website also
It is normal website, construction training set and verification collect, and training set includes 576 black production websites and 2424 normal websites, verification
Collection includes 126 black production websites and 374 normal websites;
(4-2) is to website extraction url features, text feature and the html features in training set.The feature of extraction includes:
1. text feature, which includes the keyword of web page contents, webpage, to be averaged word number and webpage number, wherein web page contents are crucial
Word uses maximum preceding 10 words of TF-IDF values;
2. HTML features include hyperlink quantity, exterior chain quantity, picture tag quantity, Javascript number of labels and
Button number of labels;
3. URL features include URL depth, URL length and domain name length.
The numerical value of feature is normalized in (4-3), for nonumeric feature, using the number of this feature as its numerical value;
(4-4) is trained SVM models using the character numerical value after normalization as the input of SVM models;
(4-5) predicts the suspected site by trained SVM models, and identification obtains grey black and produces website.
(5) the black word of identification and black production website are extended in black dictionary and black production website library.User can be as needed, will
The black word newly identified crawls the text message and site information for including the black word of seed, continues to identify as the black word of seed, urgent need
Black word and black production website.
Technical scheme of the present invention and advantageous effect is described in detail in embodiment described above, it should be understood that
Above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all to be done in the spirit of the present invention
Any modification, supplementary, and equivalent replacement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of electric business grey black produces public sentiment automatic mining method, which is characterized in that include the following steps:
(1) scanned for by search engine using the black word of seed as keyword, crawl in search result the text data of website and
Site information data;
(2) text data is pre-processed, the black word of identification acquisition from pretreated text data;
(3) site information data are analyzed, identification obtains grey black and produces website;
(4) the black word of acquisition is extended in black dictionary;The grey black production website of acquisition is extended in grey black production website library;
(5) the black word for obtaining step (2) repeats step (1)~(4) as the black word of seed.
2. electric business grey black according to claim 1 produces public sentiment automatic mining method, which is characterized in that the black word of seed
Quantity is no less than 10.
3. electric business grey black according to claim 1 produces public sentiment automatic mining method, which is characterized in that described in step (2)
Pretreatment include:
It is calculated by text relevant and duplicate removal is carried out to the text data;
Using the comma of Chinese form or English form, fullstop, question mark, exclamation mark, colon or branch as separator, after duplicate removal
Text data is divided into independent sentence;
Chinese word segmentation is carried out to each sentence, sentence is divided into sequence of terms;
Part-of-speech tagging is carried out to each word, rejects function word therein.
4. electric business grey black according to claim 1 or 3 produces public sentiment automatic mining method, which is characterized in that in step (2),
Identification obtains black word from pretreated text data, including:
(i) the fetching portion language material from pretreated text data is used as training sample after manually marking type of word and tests
Demonstrate,prove sample;
(ii) term vector of each word in training sample is initialized;Training sample is inputted into two-way shot and long term memory network (Bi-
LSTM it is calculated into row vector in), obtains output vector;
(iii) it using output vector as the input of condition random field (CRF), calculates each word and corresponds to the general of each type of word
Rate;
(iv) stochastic gradient descent algorithm is used to update the network parameter of two-way shot and long term memory network and condition random field;
(v) accurate rate that two-way shot and long term memory network and condition random field are tested using verification collection, if the accurate rate reaches
It is required that then terminating to train, otherwise continue to train;
(vi) two-way shot and long term memory network and condition random field predict pretreated text data, and identification obtains black
Word.
5. electric business grey black according to claim 4 produces public sentiment automatic mining method, which is characterized in that in step (i), word
Type includes following a few classes:
(a) electric business context noun (ecn);
(b) electric business context verb (ecv);
(c) electric business grey black production personage (ECP);
(d) electric business grey black production things (ECI);
(e) electric business grey black production platform (ECL);
(f) electric business grey black production behavior (ECA);
(g) other black words (OB);
(h) other words (other);
Wherein, (c), (d), (e), (f), (g) type word be black word.
6. electric business grey black according to claim 4 produces public sentiment automatic mining method, which is characterized in that the training set and test
The sample total ratio of card collection is 2~9: 1.
7. electric business grey black according to claim 4 produces public sentiment automatic mining method, which is characterized in that in step (iv), make
Network parameter is updated with stochastic gradient descent, is 0.002 when learning rate is initial, mould is calculated on collection by verifying after every 5 training
Learning rate is just reduced 1/10th by the loss of type if penalty values do not reduce.
8. electric business grey black according to claim 1 produces public sentiment automatic mining method, which is characterized in that in step (3), identification
Obtaining grey black production website includes:
(3-1) manually marks the website in partial search results, and structure training set and verification collect;
(3-2) extracts url features, text feature and the html features of training sample;
(3-3) is normalized training sample and verification sample using the number of nonumeric feature as its numerical characteristics;
(3-4) is trained SVM models using the training sample after normalization as the input of SVM models;
(3-5) predicts the suspected site by trained SVM models, and identification obtains grey black and produces website.
9. electric business grey black according to claim 8 produces public sentiment automatic mining method, which is characterized in that in step (3-2), institute
The url features stated include URL depth, URL length and domain name length;The text feature include web page contents keyword,
Webpage is averaged word number and webpage number;The html features include hyperlink quantity, exterior chain quantity, picture tag quantity,
Javascript number of labels and button number of labels.
10. a kind of electric business grey black produces public sentiment automatic mining system, which is characterized in that including:
Data acquisition module is scanned for by search engine using the black word of seed as keyword, crawls website in search result
Text data and site information data;
Analysis module pre-processes the text data, the black word of identification acquisition from pretreated text data;To net
Information data of standing is analyzed, and identification obtains grey black and produces website;
Enlargement module, including black dictionary and grey black produce website library, and the black word of acquisition is extended in black dictionary and is sent to data
Acquisition module is as the black word of seed;Grey black production website is extended in grey black production website library.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810249344.4A CN108647225A (en) | 2018-03-23 | 2018-03-23 | A kind of electric business grey black production public sentiment automatic mining method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810249344.4A CN108647225A (en) | 2018-03-23 | 2018-03-23 | A kind of electric business grey black production public sentiment automatic mining method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108647225A true CN108647225A (en) | 2018-10-12 |
Family
ID=63744472
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810249344.4A Pending CN108647225A (en) | 2018-03-23 | 2018-03-23 | A kind of electric business grey black production public sentiment automatic mining method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647225A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947913A (en) * | 2019-01-26 | 2019-06-28 | 浙江乾冠信息安全研究院有限公司 | A kind of grey black produces the keyword lookup method of popularization |
CN110162621A (en) * | 2019-02-22 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Disaggregated model training method, abnormal comment detection method, device and equipment |
CN110321554A (en) * | 2019-06-03 | 2019-10-11 | 任子行网络技术股份有限公司 | Bad text detection method and device based on Bi-LSTM |
CN110442775A (en) * | 2019-08-13 | 2019-11-12 | 杭州安恒信息技术股份有限公司 | Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address |
CN110516024A (en) * | 2019-08-30 | 2019-11-29 | 百度在线网络技术(北京)有限公司 | Map search result shows method, apparatus, equipment and storage medium |
CN111078978A (en) * | 2019-11-29 | 2020-04-28 | 上海观安信息技术股份有限公司 | Web credit website entity identification method and system based on website text content |
CN111581959A (en) * | 2019-01-30 | 2020-08-25 | 北京京东尚科信息技术有限公司 | Information analysis method, terminal and storage medium |
CN112417148A (en) * | 2020-11-11 | 2021-02-26 | 北京京航计算通讯研究所 | Urban waterlogging public opinion result obtaining method and device |
CN112990980A (en) * | 2021-04-09 | 2021-06-18 | 厦门市美亚柏科信息股份有限公司 | Evidence obtaining data-based black grey product advertisement identification method and system |
CN113239254A (en) * | 2021-04-27 | 2021-08-10 | 国家计算机网络与信息安全管理中心 | Card issuing platform-oriented active discovery method and device |
CN113536032A (en) * | 2020-04-10 | 2021-10-22 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Video sequence information mining system, method and application thereof |
CN113887328A (en) * | 2021-09-10 | 2022-01-04 | 天津理工大学 | Method for extracting space-time characteristics of photonic crystal space transmission spectrum in parallel by ECA-CNN fusion dual-channel RNN |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
CN102855320A (en) * | 2012-09-04 | 2013-01-02 | 珠海市君天电子科技有限公司 | Method and device for collecting keyword related URL (uniform resource locator) by search engine |
CN103020123A (en) * | 2012-11-16 | 2013-04-03 | 中国科学技术大学 | Method for searching bad video website |
US20150052098A1 (en) * | 2012-04-05 | 2015-02-19 | Thomson Licensing | Contextually propagating semantic knowledge over large datasets |
CN104516903A (en) * | 2013-09-29 | 2015-04-15 | 北大方正集团有限公司 | Keyword extension method and system and classification corpus labeling method and system |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107800679A (en) * | 2017-05-22 | 2018-03-13 | 湖南大学 | Palm off the detection method of academic journal website |
-
2018
- 2018-03-23 CN CN201810249344.4A patent/CN108647225A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150052098A1 (en) * | 2012-04-05 | 2015-02-19 | Thomson Licensing | Contextually propagating semantic knowledge over large datasets |
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
CN102855320A (en) * | 2012-09-04 | 2013-01-02 | 珠海市君天电子科技有限公司 | Method and device for collecting keyword related URL (uniform resource locator) by search engine |
CN103020123A (en) * | 2012-11-16 | 2013-04-03 | 中国科学技术大学 | Method for searching bad video website |
CN104516903A (en) * | 2013-09-29 | 2015-04-15 | 北大方正集团有限公司 | Keyword extension method and system and classification corpus labeling method and system |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107800679A (en) * | 2017-05-22 | 2018-03-13 | 湖南大学 | Palm off the detection method of academic journal website |
Non-Patent Citations (1)
Title |
---|
ZHIHENG HUANG: "Bidirectional LSTM-CRF Models for Sequence Tagging", 《HTTPS://ARXIV.ORG/》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947913A (en) * | 2019-01-26 | 2019-06-28 | 浙江乾冠信息安全研究院有限公司 | A kind of grey black produces the keyword lookup method of popularization |
CN111581959A (en) * | 2019-01-30 | 2020-08-25 | 北京京东尚科信息技术有限公司 | Information analysis method, terminal and storage medium |
CN110162621A (en) * | 2019-02-22 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Disaggregated model training method, abnormal comment detection method, device and equipment |
CN110321554A (en) * | 2019-06-03 | 2019-10-11 | 任子行网络技术股份有限公司 | Bad text detection method and device based on Bi-LSTM |
CN110442775A (en) * | 2019-08-13 | 2019-11-12 | 杭州安恒信息技术股份有限公司 | Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address |
CN110516024A (en) * | 2019-08-30 | 2019-11-29 | 百度在线网络技术(北京)有限公司 | Map search result shows method, apparatus, equipment and storage medium |
CN110516024B (en) * | 2019-08-30 | 2022-05-20 | 百度在线网络技术(北京)有限公司 | Map search result display method, device, equipment and storage medium |
CN111078978A (en) * | 2019-11-29 | 2020-04-28 | 上海观安信息技术股份有限公司 | Web credit website entity identification method and system based on website text content |
CN111078978B (en) * | 2019-11-29 | 2024-02-27 | 上海观安信息技术股份有限公司 | Network credit website entity identification method and system based on website text content |
CN113536032A (en) * | 2020-04-10 | 2021-10-22 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Video sequence information mining system, method and application thereof |
CN112417148A (en) * | 2020-11-11 | 2021-02-26 | 北京京航计算通讯研究所 | Urban waterlogging public opinion result obtaining method and device |
CN112990980A (en) * | 2021-04-09 | 2021-06-18 | 厦门市美亚柏科信息股份有限公司 | Evidence obtaining data-based black grey product advertisement identification method and system |
CN113239254A (en) * | 2021-04-27 | 2021-08-10 | 国家计算机网络与信息安全管理中心 | Card issuing platform-oriented active discovery method and device |
CN113887328A (en) * | 2021-09-10 | 2022-01-04 | 天津理工大学 | Method for extracting space-time characteristics of photonic crystal space transmission spectrum in parallel by ECA-CNN fusion dual-channel RNN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647225A (en) | A kind of electric business grey black production public sentiment automatic mining method and system | |
CN110427623B (en) | Semi-structured document knowledge extraction method and device, electronic equipment and storage medium | |
CN107330011B (en) | The recognition methods of the name entity of more strategy fusions and device | |
WO2018028077A1 (en) | Deep learning based method and device for chinese semantics analysis | |
CN110489523B (en) | Fine-grained emotion analysis method based on online shopping evaluation | |
CN110516067A (en) | Public sentiment monitoring method, system and storage medium based on topic detection | |
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN107220386A (en) | Information-pushing method and device | |
CN110222178A (en) | Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing | |
CN108846017A (en) | The end-to-end classification method of extensive newsletter archive based on Bi-GRU and word vector | |
CN106919673A (en) | Text mood analysis system based on deep learning | |
CN107038480A (en) | A kind of text sentiment classification method based on convolutional neural networks | |
CN106599032A (en) | Text event extraction method in combination of sparse coding and structural perceptron | |
CN103853824A (en) | In-text advertisement releasing method and system based on deep semantic mining | |
CN106096664A (en) | A kind of sentiment analysis method based on social network data | |
CN106294324A (en) | A kind of machine learning sentiment analysis device based on natural language parsing tree | |
CN110287314B (en) | Long text reliability assessment method and system based on unsupervised clustering | |
CN103593431A (en) | Internet public opinion analyzing method and device | |
CN105740382A (en) | Aspect classification method for short comment texts | |
CN114417851B (en) | Emotion analysis method based on keyword weighted information | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN113742733A (en) | Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device | |
CN110851593A (en) | Complex value word vector construction method based on position and semantics | |
CN113255360A (en) | Document rating method and device based on hierarchical self-attention network | |
CN114169447B (en) | Event detection method based on self-attention convolution bidirectional gating cyclic unit network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181012 |