Nothing Special   »   [go: up one dir, main page]

CN108647225A - A kind of electric business grey black production public sentiment automatic mining method and system - Google Patents

A kind of electric business grey black production public sentiment automatic mining method and system Download PDF

Info

Publication number
CN108647225A
CN108647225A CN201810249344.4A CN201810249344A CN108647225A CN 108647225 A CN108647225 A CN 108647225A CN 201810249344 A CN201810249344 A CN 201810249344A CN 108647225 A CN108647225 A CN 108647225A
Authority
CN
China
Prior art keywords
black
word
electric business
grey black
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810249344.4A
Other languages
Chinese (zh)
Inventor
纪守领
刘倩君
陈建海
伍鸣
伍一鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201810249344.4A priority Critical patent/CN108647225A/en
Publication of CN108647225A publication Critical patent/CN108647225A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of electric business grey black to produce public sentiment automatic mining method and system, and this approach includes the following steps:(1) it is scanned for by search engine using the black word of seed as keyword, crawls the text data of website and site information data in search result;(2) text data is pre-processed, the black word of identification acquisition from pretreated text data;(3) site information data are analyzed, identification obtains grey black and produces website;(4) the black word of acquisition is extended in black dictionary;The grey black production website of acquisition is extended in grey black production website library;(5) the black word for obtaining step (2) repeats step (1)~(4) as the black word of seed.The method of the present invention can be found in time, early warning and improvement electric business cheating, realization carry out real-time various dimensions monitoring to the production of electric business grey black.

Description

A kind of electric business grey black production public sentiment automatic mining method and system
Technical field
Security technology area more particularly to a kind of electric business grey black production public sentiment in being produced the present invention relates to network grey black are dug automatically Dig method and system.
Background technology
The production of network grey black is one of the significant problem for endangering internet ecological safety.Traditional technology can be to black-hat SEO, false comment, artificial flow, social networks are fried the cheatings such as letter and are detected, but with cheating constantly change with Transfer, existing model and method lose applicability soon.Therefore using the safety-related external public sentiment text of magnanimity, to net The production of network grey black analyze and monitor in real time, is conducive to that grey black production is found and hit from source.
It is produced different from traditional network grey black, the production of electric business grey black is a kind of novel grey black production just risen for nearest 10 years, is led to Refer to the cheating that wash sale, brush list, brush flow etc. violate electric business platform related specifications.Brush list etc. is bought from account Grey black produces platform, then arrives the infrastructure services such as express delivery sky packet, and the production of electric business grey black has formed complete industrial chain, and as electric business is led The anti-mechanism of practising fraud in domain it is perfect, electric business cheating also tends to specialization, however the distribution produced to grey black of enterprises and existing Shape is largely manually analyzed, and the grey black production scale being growing can not be coped with.
Invention content
For the production of electric business grey black industrialization, specialization, scale the characteristics of and grey black production corpus, knowledge base it is almost empty White present situation, the present invention provide a kind of electric business grey black and produce public sentiment automatic mining method and system, can find in time, early warning and improvement Electric business cheating is realized and carries out real-time various dimensions monitoring to the production of electric business grey black.
The present invention provides following technical solutions:
A kind of electric business grey black production public sentiment automatic mining method, includes the following steps:
(1) it is scanned for by search engine using the black word of seed as keyword, crawls the textual data of website in search result According to site information data;
(2) text data is pre-processed, the black word of identification acquisition from pretreated text data;
(3) site information data are analyzed, identification obtains grey black and produces website;
(4) the black word of acquisition is extended in black dictionary;The grey black production website of acquisition is extended in grey black production website library;
(5) the black word for obtaining step (2) repeats step (1)~(4) as the black word of seed.
Preferably, the quantity of the black word of the seed is no less than 10.
The black word quantity of seed is more, and the analysis data of acquisition are more, and black word and grey black the production website obtained in step (2) is got over It is more, but when the black word quantity of seed is excessive, the analysis data volume of acquisition is excessively huge, and subsequent analysis calculation amount is excessive so that The acquisition efficiency of black word reduces, it is preferred that the quantity of the black word of seed is 10~50.
The black word of seed be manually identify produce relevant word with electric business grey black.
In step (2), the pretreatment includes carrying out duplicate removal, subordinate sentence, participle, part-of-speech tagging and sieve to text data Choosing;Include the following steps:
It is calculated by text relevant and duplicate removal is carried out to the text data;
With the comma () of Chinese form or English form, fullstop (.), question mark (), exclamation mark (!), colon (:) or branch (;) it is separator, the text data after duplicate removal is divided into independent sentence;
Chinese word segmentation is carried out to each sentence, sentence is divided into sequence of terms;
Part-of-speech tagging is carried out to each word, rejects function word therein.
After pre-processing above, the text data crawled is converted into the sentence of noise smaller (eliminating meaningless function word) Son, each sentence are indicated by the sequence of terms with part-of-speech tagging, to reduce subsequent vocabulary treating capacity.
In step (2), the black word of identification acquisition from pretreated text data, including:
(i) the fetching portion language material from pretreated text data is used as training sample after manually marking type of word With verification sample;
(ii) term vector of each word in training sample is initialized;Training sample is inputted into two-way shot and long term memory network (Bi-LSTM) it is calculated into row vector in, obtains output vector;
(iii) it using output vector as the input of condition random field (CRF), calculates each word and corresponds to each type of word Probability;
(iv) stochastic gradient descent algorithm is used to update the network parameter of two-way shot and long term memory network and condition random field;
(v) accurate rate that two-way shot and long term memory network and condition random field are tested using verification collection, if the accurate rate Reach requirement, then terminate to train, otherwise continues to train;
(vi) two-way shot and long term memory network and condition random field predict that identification obtains to pretreated text data Take black word.
Two-way shot and long term memory network (Bi-LSTM) is a kind of two-way shot and long term memory network.LSTM(Long-Short Term memory) neuron in basic RNN is replaced with to the doors (input gate, out gate forget door) and 1 of 3 interactions A mnemon (memory cell), input gate, which opens stylish input, can just change the historic state preserved in network, output The historic state that door preserves when opening can be accessed to, and the output after influence, forget door for emptying previously stored go through History information.The information that LSTM is inputted before being allowed to goes ahead transmission, it is thus possible to learn long-term dependence, exist The fields such as part-of-speech tagging, name Entity recognition achieve extraordinary application effect.The input of LSTM is unidirectional, is only considered Above on following influence, the basic thought of Bi-LSTM be to sequence data one LSTM model of each training backward forward, then The output of two models is combined, all contextual informations can be completely relied on to reach each node in sequence.
In step (i), type of word includes following a few classes:
(a) electric business context noun (ecn);
(b) electric business context verb (ecv);
(c) electric business grey black production personage (ECP);
(d) electric business grey black production things (ECI);
(e) electric business grey black production platform (ECL);
(f) electric business grey black production behavior (ECA);
(g) other black words (OB);
(h) other words (other);
Wherein, (c), (d), (e), (f), (g) type word be black word.
The sample total ratio of the training set and verification collection is 2~9: 1;Most preferably, the training set and verification collect Sample total ratio is 9: 1.
In step (iv), network is updated using stochastic gradient descent (Stochastic Gradient Descent, SGD) Parameter is 0.002 when learning rate (learning rate) is initial, and a drag is calculated on collection by verifying after every 5 training Learning rate is just reduced 1/10th, prevents over-fitting by loss if penalty values do not reduce.
In step (3), identification obtains grey black production website and includes:
(3-1) manually marks the website in partial search results, and structure training set and verification collect;
(3-2) extracts url features, text feature and the html features of training sample;
(3-3) is normalized training sample and verification sample using the number of nonumeric feature as its numerical characteristics;
(3-4) is trained SVM models using the training sample after normalization as the input of SVM models;
(3-5) predicts the suspected site by trained SVM models, and identification obtains grey black and produces website.
It refers to that manually differentiation website is that normal website or grey black produce website that step (3-1) carries out artificial mark to website (movable platform, connection network, software tool of the production of electric business grey black etc.), and be labeled.
In step (3-2), the url features include URL depth, URL length and domain name length;The text feature Keyword, webpage including web page contents are averaged word number and webpage number, wherein the keyword of web page contents using TF-IDF values most Big preceding 10 words;The html features include hyperlink quantity, exterior chain quantity, picture tag quantity, Javascript Number of labels and button number of labels.
Black word and grey black that step (2) and step (3) obtain are produced into website and extend to black dictionary and grey black production website respectively In library.User can be as needed, and the black word newly to obtain continues to repeat step (1)~(4), number of repetition as the black word of seed Depending on the needs of user, to realize the automatic mining for producing public sentiment to electric business grey black.
User can analyze the cheating mode of electric business grey black production according to the black word in black dictionary, to the corresponding anti-cheating of design It measures;Can the website in grey black production website library relevant departments be fed back to handle.
The present invention additionally provides a kind of electric business grey black production public sentiment automatic mining system simultaneously, including:
Data acquisition module is scanned for by search engine using the black word of seed as keyword, crawls net in search result The text data and site information data stood;
Analysis module pre-processes the text data, the black word of identification acquisition from pretreated text data; Site information data are analyzed, identification obtains grey black and produces website;
Enlargement module, including black dictionary and grey black produce website library, and the black word of acquisition is extended in black dictionary and is sent to Data acquisition module is as the black word of seed;Grey black production website is extended in grey black production website library.
Compared with prior art, beneficial effects of the present invention are:
(1) external information by all kinds of means, the newest corpus of structure electric business grey black production can be obtained in real time;
(2) it can identify that electric business grey black produces relevant black word and grey black production website, structure electric business grey black produces information bank, favorably In the detection subsequently to electric business cheating and the radical cure to grey black production;
(3) it can realize the automatic mining produced to electric business grey black, save manual analysis cost, be conducive to large scale deployment And implementation.
Description of the drawings
Fig. 1 is the configuration diagram that electric business grey black produces public sentiment automatic mining system;
Fig. 2 is the flow diagram that electric business grey black produces public sentiment automatic mining method;
Fig. 3 is black word identification process schematic diagram;
Fig. 4 is the structural schematic diagram of the black word identification model based on natural language sequence labelling.
Specific implementation mode
Present invention is further described in detail with reference to the accompanying drawings and examples, it should be pointed out that reality as described below It applies example to be intended to be convenient for the understanding of the present invention, and does not play any restriction effect to it.
Electric business grey black produces the framework of public sentiment automatic mining system as shown in Figure 1, including mainly reptile module, black word identification mould Block, black production website identification module and grey black produce enlargement module.
Reptile module crawls the text for including the black word of seed in internet, by text relevant calculate and text duplicate removal, Obtain electric business grey black production public sentiment corpus;Crawl the site information of the suspected site comprising the black word of seed;
Wherein public feelings information source includes:Microblogging, news, forum, mhkc, wechat, government website, video website and other;
Black word identification module natural language sequence labelling technology, identification electric business grey black produce the black word in public sentiment language material;
Black production website identification module extracts the feature of the suspected site, and using trained disaggregated model, identification grey black produces net It stands;
Black production enlargement module automates to search engine using the black word of acquisition as searching keyword and sends inquiry request, Text message, relevant search and the associated recommendation in search result are crawled by reptile module, expand black dictionary;Crawl search knot The URL (uniform resource locator) and html information of website in fruit expand grey black and produce website list.
The electric business grey black that public sentiment automatic mining system is produced based on electric business grey black produces public sentiment automatic mining method, including following step Suddenly, as shown in Figure 2:
(1) according to existing black word in black dictionary as the black word of seed, search engine is passed through as keyword using the black word of seed (such as Baidu) is searched for, and the site information and text message of search result website are crawled;
(2) text crawled is pre-processed, including carries out duplicate removal, subordinate sentence, participle and part-of-speech tagging, it is specific as follows:
With Chinese form or English form ",.!:;" etc. punctuation marks be separator, by text segmentation at independent sentence Son.
To each independent sentence, carried using language technology platform (Language Technology Platform, LTP) The Chinese word segmentation function of confession, is divided into sequence of terms by sentence, and such as " can earn commission after brush hand completion task " is divided into " brush Hand is completed, task, after, just, can be with earning, commission ".
Part-of-speech tagging be in sentence each word mark part of speech classification, part of speech classification include noun, verb, adjective, 28 class such as conjunction, adverbial word, preposition, auxiliary word, interjection, name, place name, prefix, suffix segments sentence and then utilizes language The part-of-speech tagging function of technology platform, to the word progress part-of-speech tagging in sentence, such as " brush hand/n, completion/v, task/n, Afterwards/dn, just/d, can be with/v, and earning/v, commission/n ", wherein n indicate that termini generales, v indicate that general verb, d indicate adverbial word, dn Indicate direction noun.
After part-of-speech tagging, by conjunction, adverbial word, preposition, auxiliary word, interjection this five classes function word rejecting, only retaining has practical meaning The notional word of justice, to reduce vocabulary treating capacity.
By pre-processing above, all texts are converted into the sentence of noise smaller (eliminating meaningless function word), each Sentence is indicated by the sequence of terms with part-of-speech tagging, such as " Taobao/ni brush list/v earnings/v commissions/n ", wherein n tables Show that noun, v indicate that verb, ni indicate institution term.
(3) the black word in the method identification text of natural language sequence labelling is used, is included the following steps (see Fig. 3):
(3-1) manually marks type of word to the word in pretreated corpus of text, and structure training set and verification collect; Include 2700 sentences in training set, it includes 300 sentences that verification, which is concentrated,;
Word is divided into Types Below:
(a) electric business context noun (ecn);
(b) electric business context verb (ecv);
(c) electric business grey black production personage (ECP);
(d) electric business grey black production things (ECI);
(e) electric business grey black production platform (ECL);
(f) electric business grey black production behavior (ECA);
(g) other black words (OB);
(h) other words (other);
Wherein, (c), (d), (e), (f), (g) type word be black word;
(3-2) is inputted the sentence in training set as the list entries of two-way shot and long term memory network (Bi-LSTM); Bi-LSTM networks include 4 layers of two-way LSTM, as shown in Figure 4;
(3-3) using the term vector of each word in word2vec initialization sentences, term vector dimension is 200;
LSTM layers and backward LSTM layers of (3-4) forward direction carries out state transmission and vector calculates;By the output vector of Bi-LSTM As the input of condition random field (CRF), the probability that each word corresponds to each type is calculated;
Update network parameter:It is updated and is joined using stochastic gradient descent (Stochastic Gradient Descent, SGD) Number is 0.002 when learning rate (learning rate) is initial, calculates the damage of a drag after every 5 training on verification collection It loses, if penalty values do not reduce, so that learning rate is reduced 1/10th, prevent over-fitting;
(3-5) terminates to train by verifying collection verification model if model accurate rate reaches requirement, otherwise goes to (3-4) Continue to train;
After (3-6) terminates training, the word of black part of speech type is corresponded to the Text Feature Extraction not marked using trained model.
(4) it identifies black production website, includes the following steps:
(4-1) carries out artificial mark to crawling site information according to the black word of seed, that is, indicate the website be black word website also It is normal website, construction training set and verification collect, and training set includes 576 black production websites and 2424 normal websites, verification Collection includes 126 black production websites and 374 normal websites;
(4-2) is to website extraction url features, text feature and the html features in training set.The feature of extraction includes:
1. text feature, which includes the keyword of web page contents, webpage, to be averaged word number and webpage number, wherein web page contents are crucial Word uses maximum preceding 10 words of TF-IDF values;
2. HTML features include hyperlink quantity, exterior chain quantity, picture tag quantity, Javascript number of labels and Button number of labels;
3. URL features include URL depth, URL length and domain name length.
The numerical value of feature is normalized in (4-3), for nonumeric feature, using the number of this feature as its numerical value;
(4-4) is trained SVM models using the character numerical value after normalization as the input of SVM models;
(4-5) predicts the suspected site by trained SVM models, and identification obtains grey black and produces website.
(5) the black word of identification and black production website are extended in black dictionary and black production website library.User can be as needed, will The black word newly identified crawls the text message and site information for including the black word of seed, continues to identify as the black word of seed, urgent need Black word and black production website.
Technical scheme of the present invention and advantageous effect is described in detail in embodiment described above, it should be understood that Above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all to be done in the spirit of the present invention Any modification, supplementary, and equivalent replacement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of electric business grey black produces public sentiment automatic mining method, which is characterized in that include the following steps:
(1) scanned for by search engine using the black word of seed as keyword, crawl in search result the text data of website and Site information data;
(2) text data is pre-processed, the black word of identification acquisition from pretreated text data;
(3) site information data are analyzed, identification obtains grey black and produces website;
(4) the black word of acquisition is extended in black dictionary;The grey black production website of acquisition is extended in grey black production website library;
(5) the black word for obtaining step (2) repeats step (1)~(4) as the black word of seed.
2. electric business grey black according to claim 1 produces public sentiment automatic mining method, which is characterized in that the black word of seed Quantity is no less than 10.
3. electric business grey black according to claim 1 produces public sentiment automatic mining method, which is characterized in that described in step (2) Pretreatment include:
It is calculated by text relevant and duplicate removal is carried out to the text data;
Using the comma of Chinese form or English form, fullstop, question mark, exclamation mark, colon or branch as separator, after duplicate removal Text data is divided into independent sentence;
Chinese word segmentation is carried out to each sentence, sentence is divided into sequence of terms;
Part-of-speech tagging is carried out to each word, rejects function word therein.
4. electric business grey black according to claim 1 or 3 produces public sentiment automatic mining method, which is characterized in that in step (2), Identification obtains black word from pretreated text data, including:
(i) the fetching portion language material from pretreated text data is used as training sample after manually marking type of word and tests Demonstrate,prove sample;
(ii) term vector of each word in training sample is initialized;Training sample is inputted into two-way shot and long term memory network (Bi- LSTM it is calculated into row vector in), obtains output vector;
(iii) it using output vector as the input of condition random field (CRF), calculates each word and corresponds to the general of each type of word Rate;
(iv) stochastic gradient descent algorithm is used to update the network parameter of two-way shot and long term memory network and condition random field;
(v) accurate rate that two-way shot and long term memory network and condition random field are tested using verification collection, if the accurate rate reaches It is required that then terminating to train, otherwise continue to train;
(vi) two-way shot and long term memory network and condition random field predict pretreated text data, and identification obtains black Word.
5. electric business grey black according to claim 4 produces public sentiment automatic mining method, which is characterized in that in step (i), word Type includes following a few classes:
(a) electric business context noun (ecn);
(b) electric business context verb (ecv);
(c) electric business grey black production personage (ECP);
(d) electric business grey black production things (ECI);
(e) electric business grey black production platform (ECL);
(f) electric business grey black production behavior (ECA);
(g) other black words (OB);
(h) other words (other);
Wherein, (c), (d), (e), (f), (g) type word be black word.
6. electric business grey black according to claim 4 produces public sentiment automatic mining method, which is characterized in that the training set and test The sample total ratio of card collection is 2~9: 1.
7. electric business grey black according to claim 4 produces public sentiment automatic mining method, which is characterized in that in step (iv), make Network parameter is updated with stochastic gradient descent, is 0.002 when learning rate is initial, mould is calculated on collection by verifying after every 5 training Learning rate is just reduced 1/10th by the loss of type if penalty values do not reduce.
8. electric business grey black according to claim 1 produces public sentiment automatic mining method, which is characterized in that in step (3), identification Obtaining grey black production website includes:
(3-1) manually marks the website in partial search results, and structure training set and verification collect;
(3-2) extracts url features, text feature and the html features of training sample;
(3-3) is normalized training sample and verification sample using the number of nonumeric feature as its numerical characteristics;
(3-4) is trained SVM models using the training sample after normalization as the input of SVM models;
(3-5) predicts the suspected site by trained SVM models, and identification obtains grey black and produces website.
9. electric business grey black according to claim 8 produces public sentiment automatic mining method, which is characterized in that in step (3-2), institute The url features stated include URL depth, URL length and domain name length;The text feature include web page contents keyword, Webpage is averaged word number and webpage number;The html features include hyperlink quantity, exterior chain quantity, picture tag quantity, Javascript number of labels and button number of labels.
10. a kind of electric business grey black produces public sentiment automatic mining system, which is characterized in that including:
Data acquisition module is scanned for by search engine using the black word of seed as keyword, crawls website in search result Text data and site information data;
Analysis module pre-processes the text data, the black word of identification acquisition from pretreated text data;To net Information data of standing is analyzed, and identification obtains grey black and produces website;
Enlargement module, including black dictionary and grey black produce website library, and the black word of acquisition is extended in black dictionary and is sent to data Acquisition module is as the black word of seed;Grey black production website is extended in grey black production website library.
CN201810249344.4A 2018-03-23 2018-03-23 A kind of electric business grey black production public sentiment automatic mining method and system Pending CN108647225A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810249344.4A CN108647225A (en) 2018-03-23 2018-03-23 A kind of electric business grey black production public sentiment automatic mining method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810249344.4A CN108647225A (en) 2018-03-23 2018-03-23 A kind of electric business grey black production public sentiment automatic mining method and system

Publications (1)

Publication Number Publication Date
CN108647225A true CN108647225A (en) 2018-10-12

Family

ID=63744472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810249344.4A Pending CN108647225A (en) 2018-03-23 2018-03-23 A kind of electric business grey black production public sentiment automatic mining method and system

Country Status (1)

Country Link
CN (1) CN108647225A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947913A (en) * 2019-01-26 2019-06-28 浙江乾冠信息安全研究院有限公司 A kind of grey black produces the keyword lookup method of popularization
CN110162621A (en) * 2019-02-22 2019-08-23 腾讯科技(深圳)有限公司 Disaggregated model training method, abnormal comment detection method, device and equipment
CN110321554A (en) * 2019-06-03 2019-10-11 任子行网络技术股份有限公司 Bad text detection method and device based on Bi-LSTM
CN110442775A (en) * 2019-08-13 2019-11-12 杭州安恒信息技术股份有限公司 Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address
CN110516024A (en) * 2019-08-30 2019-11-29 百度在线网络技术(北京)有限公司 Map search result shows method, apparatus, equipment and storage medium
CN111078978A (en) * 2019-11-29 2020-04-28 上海观安信息技术股份有限公司 Web credit website entity identification method and system based on website text content
CN111581959A (en) * 2019-01-30 2020-08-25 北京京东尚科信息技术有限公司 Information analysis method, terminal and storage medium
CN112417148A (en) * 2020-11-11 2021-02-26 北京京航计算通讯研究所 Urban waterlogging public opinion result obtaining method and device
CN112990980A (en) * 2021-04-09 2021-06-18 厦门市美亚柏科信息股份有限公司 Evidence obtaining data-based black grey product advertisement identification method and system
CN113239254A (en) * 2021-04-27 2021-08-10 国家计算机网络与信息安全管理中心 Card issuing platform-oriented active discovery method and device
CN113536032A (en) * 2020-04-10 2021-10-22 天津职业技术师范大学(中国职业培训指导教师进修中心) Video sequence information mining system, method and application thereof
CN113887328A (en) * 2021-09-10 2022-01-04 天津理工大学 Method for extracting space-time characteristics of photonic crystal space transmission spectrum in parallel by ECA-CNN fusion dual-channel RNN

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN102855320A (en) * 2012-09-04 2013-01-02 珠海市君天电子科技有限公司 Method and device for collecting keyword related URL (uniform resource locator) by search engine
CN103020123A (en) * 2012-11-16 2013-04-03 中国科学技术大学 Method for searching bad video website
US20150052098A1 (en) * 2012-04-05 2015-02-19 Thomson Licensing Contextually propagating semantic knowledge over large datasets
CN104516903A (en) * 2013-09-29 2015-04-15 北大方正集团有限公司 Keyword extension method and system and classification corpus labeling method and system
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN107800679A (en) * 2017-05-22 2018-03-13 湖南大学 Palm off the detection method of academic journal website

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150052098A1 (en) * 2012-04-05 2015-02-19 Thomson Licensing Contextually propagating semantic knowledge over large datasets
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN102855320A (en) * 2012-09-04 2013-01-02 珠海市君天电子科技有限公司 Method and device for collecting keyword related URL (uniform resource locator) by search engine
CN103020123A (en) * 2012-11-16 2013-04-03 中国科学技术大学 Method for searching bad video website
CN104516903A (en) * 2013-09-29 2015-04-15 北大方正集团有限公司 Keyword extension method and system and classification corpus labeling method and system
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN107800679A (en) * 2017-05-22 2018-03-13 湖南大学 Palm off the detection method of academic journal website

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHIHENG HUANG: "Bidirectional LSTM-CRF Models for Sequence Tagging", 《HTTPS://ARXIV.ORG/》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947913A (en) * 2019-01-26 2019-06-28 浙江乾冠信息安全研究院有限公司 A kind of grey black produces the keyword lookup method of popularization
CN111581959A (en) * 2019-01-30 2020-08-25 北京京东尚科信息技术有限公司 Information analysis method, terminal and storage medium
CN110162621A (en) * 2019-02-22 2019-08-23 腾讯科技(深圳)有限公司 Disaggregated model training method, abnormal comment detection method, device and equipment
CN110321554A (en) * 2019-06-03 2019-10-11 任子行网络技术股份有限公司 Bad text detection method and device based on Bi-LSTM
CN110442775A (en) * 2019-08-13 2019-11-12 杭州安恒信息技术股份有限公司 Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address
CN110516024A (en) * 2019-08-30 2019-11-29 百度在线网络技术(北京)有限公司 Map search result shows method, apparatus, equipment and storage medium
CN110516024B (en) * 2019-08-30 2022-05-20 百度在线网络技术(北京)有限公司 Map search result display method, device, equipment and storage medium
CN111078978A (en) * 2019-11-29 2020-04-28 上海观安信息技术股份有限公司 Web credit website entity identification method and system based on website text content
CN111078978B (en) * 2019-11-29 2024-02-27 上海观安信息技术股份有限公司 Network credit website entity identification method and system based on website text content
CN113536032A (en) * 2020-04-10 2021-10-22 天津职业技术师范大学(中国职业培训指导教师进修中心) Video sequence information mining system, method and application thereof
CN112417148A (en) * 2020-11-11 2021-02-26 北京京航计算通讯研究所 Urban waterlogging public opinion result obtaining method and device
CN112990980A (en) * 2021-04-09 2021-06-18 厦门市美亚柏科信息股份有限公司 Evidence obtaining data-based black grey product advertisement identification method and system
CN113239254A (en) * 2021-04-27 2021-08-10 国家计算机网络与信息安全管理中心 Card issuing platform-oriented active discovery method and device
CN113887328A (en) * 2021-09-10 2022-01-04 天津理工大学 Method for extracting space-time characteristics of photonic crystal space transmission spectrum in parallel by ECA-CNN fusion dual-channel RNN

Similar Documents

Publication Publication Date Title
CN108647225A (en) A kind of electric business grey black production public sentiment automatic mining method and system
CN110427623B (en) Semi-structured document knowledge extraction method and device, electronic equipment and storage medium
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
WO2018028077A1 (en) Deep learning based method and device for chinese semantics analysis
CN110489523B (en) Fine-grained emotion analysis method based on online shopping evaluation
CN110516067A (en) Public sentiment monitoring method, system and storage medium based on topic detection
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN107220386A (en) Information-pushing method and device
CN110222178A (en) Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN108846017A (en) The end-to-end classification method of extensive newsletter archive based on Bi-GRU and word vector
CN106919673A (en) Text mood analysis system based on deep learning
CN107038480A (en) A kind of text sentiment classification method based on convolutional neural networks
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN103853824A (en) In-text advertisement releasing method and system based on deep semantic mining
CN106096664A (en) A kind of sentiment analysis method based on social network data
CN106294324A (en) A kind of machine learning sentiment analysis device based on natural language parsing tree
CN110287314B (en) Long text reliability assessment method and system based on unsupervised clustering
CN103593431A (en) Internet public opinion analyzing method and device
CN105740382A (en) Aspect classification method for short comment texts
CN114417851B (en) Emotion analysis method based on keyword weighted information
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN110851593A (en) Complex value word vector construction method based on position and semantics
CN113255360A (en) Document rating method and device based on hierarchical self-attention network
CN114169447B (en) Event detection method based on self-attention convolution bidirectional gating cyclic unit network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181012