CN102402566A - Web user behavior analysis method based on Chinese webpage automatic classification technology - Google Patents
Web user behavior analysis method based on Chinese webpage automatic classification technology Download PDFInfo
- Publication number
- CN102402566A CN102402566A CN2011102278003A CN201110227800A CN102402566A CN 102402566 A CN102402566 A CN 102402566A CN 2011102278003 A CN2011102278003 A CN 2011102278003A CN 201110227800 A CN201110227800 A CN 201110227800A CN 102402566 A CN102402566 A CN 102402566A
- Authority
- CN
- China
- Prior art keywords
- classification
- webpage
- user
- web
- chinese
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 24
- 238000005516 engineering process Methods 0.000 title claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000012216 screening Methods 0.000 claims abstract description 9
- 230000003542 behavioural effect Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract 2
- 230000005540 biological transmission Effects 0.000 description 4
- 230000032683 aging Effects 0.000 description 3
- 239000012141 concentrate Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a web user behavior analysis method based on a Chinese webpage automatic classification technology, which adopts a naive Bayes classification method, automatically infers the category of a webpage browsed by a web user by using the category probability and the joint distribution probability of characteristic items, and analyzes the internet surfing habit of the web user on the basis of webpage classification to obtain a user behavior analysis result. The key technology of the invention is to construct a dynamic training set which can be automatically updated according to the classification accuracy index, so that the training set has timeliness and representativeness. The method is totally divided into four modules: the system comprises a data processing module, a feature extraction module, a webpage classification module and a user behavior analysis module. The data processing module is mainly used for acquiring basic information of a user and source codes of webpages browsed by the user and extracting a Chinese part from the source codes. The feature extraction module is mainly used for screening out feature items capable of describing webpage category features and finally expressing the feature items into a vector form.
Description
Technical field
The invention provides a kind of web user behavior analysis method based on the Chinese web page automatic classification technology; Adopt the Naive Bayes Classification method; The joint distribution probability automated reasoning of use classes probability and characteristic item goes out the classification of the webpage that web user browses; Online custom to web user on the basis of Web page classifying is analyzed, and draws the user behavior analysis result.Gordian technique of the present invention is to have constructed a dynamic training set, can upgrade automatically according to the classify accuracy index, and it is ageing and representative to make training set have more.This method relates to fields such as artificial intelligence, user behavior analysis, Web page classifying, network management.
Background of invention
The fast development of Internet has brought the sharp increase of number of users, and the user is also increasingly high to the requirement of network.The colony of analysis user constitutes and the custom hobby, and the service to the user provides more personalized has become an important research direction.And,, also be the important evidence of planning, design and the management of network to the research of Internet and user behavior thereof along with the diversification of business.
When collection was used for the data of analysis user behavior, we can obtain the URL of the website that the user visits, but and do not know what classification these URL belong to, therefore need URL and the concrete meaning of one's words (like physical culture/finance and economics/military affairs etc.) be mapped.Set up complete, accurate, a dynamic automatic webpage classification system, just can obtain classification under it through URL.On the basis of known access site classification, can carry out depth analysis to the Web business, the network behavior of digging user is known user's behavioural habits and hobby trend, thereby is that the service that provides personalized provides foundation.
Summary of the invention:
Technical matters: the invention provides a kind of web user behavior analysis method based on the Chinese web page automatic classification technology; Adopt the Naive Bayes Classification method; The joint distribution probability automated reasoning of use classes probability and characteristic item goes out the classification of the webpage that web user browses; Online custom to web user on the basis of Web page classifying is analyzed, and draws the user behavior analysis result.Gordian technique of the present invention is to have constructed dynamic training set; An index and a threshold value of estimating classify accuracy is set; After accomplishing, each classification calculates the preparation index of this classification results; If the accuracy index of classification results greater than threshold value, is then upgraded training set automatically, the webpage vector of webpage to be measured is added in the related category of training set.Compare with muscle-setting exercise collection in the past, dynamic training set has ageing and representative more, can make that classification results is more accurate.
Technical scheme: the present invention proposes a kind of web user behavior analysis method based on the Chinese web page automatic classification technology, and its concrete performing step is following:
(1) data acquisition.According to the demand Information Monitoring, mainly be the essential information and the URL that extracts user institute browsing page of gathering Web user.
(2) the webpage source code extracts.Obtain the source code of webpage according to URL, and remove information such as Html mark, text, image, client's script, only stay pure Chinese text.
(3) participle.Adopt maximum double to matching method, mate, the content of Chinese Web text is cut into the set of some entries compositions through entry with Chinese dictionary.
(4) screening keyword.The screening keyword is divided into the screening of key term length and removes two steps of duplicate key speech.At first, the scope of entry is restricted between 2 to 4, the entry in this scope is not little even play interference effect to the classification effect, and these entries are rejected.Then, the entry that repeats to occur in each text is only write down once, and the relevant with it word frequency of record, can improve computing velocity, reduce miscount.
(5) confirm characteristic item.Satisfy x between Chinese keyword in the webpage is generic
2Distribute, so adopt x
2Statistical method is confirmed characteristic item.Calculate the frequency of keyword in of all categories earlier, pass through x then
2Statistical formula is come compute statistics, and bigger preceding 1000 keywords of last selection statistic are as characteristic item.
(6) webpage vector representation.Write down selected characteristic item and relevant with it word frequency, and represent with the form of vector.The element of webpage vector is a characteristic item, and element value is the word frequency of characteristic item in this webpage text.
(7) carry out Web page classifying with the Naive Bayes Classification method.As prior probability, the joint distribution probability of characteristic item is managed theorem according to Bayes and can be obtained posterior probability as conditional probability with class probability.Select the classification of the maximum classification of posterior probability as webpage to be measured.
(8) upgrade training set.A measure index and a threshold value of estimating the classification results accuracy rate is set; After accomplishing, each classification calculates the preparation index of this classification results; If the accuracy index of classification results is then upgraded training set greater than threshold value, the webpage vector of webpage to be measured is added in the related category of training set.Otherwise, keep original training set constant.
(9) Web user behavior analysis.Make up different querying conditions; In conjunction with the classification information of user basic information with the webpage of being browsed; Can draw the distribution situation that user under the different condition browses dissimilar Web webpages; Can draw Web user's behavioural habits and hobby trend according to these information, help to provide personalized more service.
Beneficial effect
Through the web user behavior analysis method based on the Chinese web page automatic classification technology, we can realize:
(1) can upgrade training set automatically according to the classify accuracy index, compared to muscle-setting exercise collection in the past, dynamically training set has ageing and representative more.
(2) on the basis of training set real-time update, adopt the Naive Bayes Classification method to automatic webpage classification, its classification results is more accurate.
(3) based on the Web page classifying result, in conjunction with user's essential information, can carry out deeper mining analysis to Web user's behavior, make analysis result tend to hobby near user's behavioural habits more.
Description of drawings
Fig. 1 is module frame figure of the present invention.
Embodiment
Be elaborated below in conjunction with the technical scheme of accompanying drawing to invention:
The invention provides a kind of Web user behavior analysis method based on the Chinese web page automatic classification technology; Adopt the Naive Bayes Classification method; The joint distribution probability automated reasoning of use classes probability and characteristic item goes out the classification of the webpage that Web user browses; Online custom to web user on the basis of Web page classifying is analyzed, and draws the user behavior analysis result.Its concrete steps are following:
(1) data acquisition.According to the demand Information Monitoring, mainly be the essential information and the URL that extracts user institute browsing page of gathering Web user.User basic information comprises the time of user's IP address, ownership place, browsing page, the IP packet byte number of reception, the IP packet byte number of transmission, the IP bag number of reception, the IP bag number of transmission.
(2) the webpage source code extracts.Extracting the webpage source code is to find the Web text through Web URL, reads the content of Web text.The Web text has comprised a large amount of Html marks, text, image, client's script, and reply Web text carries out pre-service when extracting the webpage source code, and the Html mark of removing, image, client's script only stay pure Chinese text information at last.
(3) participle.Owing to be the separation mark that not have demonstration between the speech of Chinese language literal and the speech; Must each entry in the flow be separated; Under the support of Chinese dictionary, the content of Chinese Web text is cut into the vector that some entries are formed, through with Chinese dictionary in entry mate to come participle.Its main thought is following:
1) presorts speech.Utilize non-Chinese symbols such as punctuate, numeral, English that sentence is cut into a plurality of Chinese character strings;
2) the basic segmenting method of two-way maximum match method conduct that adopts forward maximum match (MM) and reverse maximum match method (RMM) to combine.Two-way all the employing increases the word maximum match, and a cut-off begins progressively to increase backward word from the sentence head, be sky until Chinese character sequence to be slit.The result of this time cutting is the maximum word string of succeeding and mating.
The step of two-way maximum match method is following:
1. get preceding 6 Chinese characters in the current Chinese character sequence of sentence as matching field; Search dictionary,, then mate successfully if in the dictionary a such entry is arranged; Matching field is cut out from current Chinese character sequence as a speech; Put into entry and concentrate, continue execution in step 1., otherwise execution in step 2.;
2. remove behind Chinese character of this matching field afterbody as new matching field; Again with dictionary in entry mate; If the match is successful, then new matching field cuts out from current Chinese character sequence as a speech and puts into entry and concentrate, otherwise continues execution in step 2..If last looking up Chinese characters dictionary is all mated unsuccessful, then this Chinese character is cut out from the current character sequence and put into entry and concentrate;
3. if the current Chinese character sequence of text is not empty, then change step 1., otherwise finish.
(4) screening keyword.Remove length violation and the entry that repeats.Its concrete steps are following:
1. entry length screening, between 4, the entry in this length range is not considered to not quite even play interference effect to the classification effect, and these entries are rejected with the length restriction to 2 of all entries;
2. the entry uniqueness is done qualification, the entry that repeats in each text is only write down once, and write down associated word frequency.All entry frequencies in total vocabulary text are restricted to once,, reduce miscount to improve computing velocity.
(5) confirm characteristic item.Satisfy x between Chinese keyword in the webpage is generic
2Distribute, so adopt x
2Statistical method is confirmed characteristic item.This statistics value is high more, and the independence between keyword is generic is more little, and correlativity is strong more, and promptly keyword acts on big more to such other.x
2Statistical formula is as follows:
Wherein, N
IjBe the frequency that keyword i occurs in classification j, N
I ' jBe the frequency that occurs in keyword i other classifications outside classification j, N
I ' jThe frequency that all entries occur in classification j except that keyword i, N
I ' j 'Be the frequency that all entries except that keyword i occur in other classifications outside classification j, N is the frequency summation of all keywords.
The concrete steps of confirming characteristic item are as follows:
1. calculate the frequency that each keyword occurs respectively in the difference classification, then with all frequency summations;
2. four kinds that calculate between every pair of different keyword and the classification concern frequency.Then according to x
2Computing method obtain the x of each keyword i to classification j
2Statistical value;
3. with all x
2Statistical value is got preceding 1000 keywords as characteristic item by descending sort, accomplishes confirming of characteristic item;
(6) webpage vector representation.Write down selected characteristic item and relevant with it word frequency, and represent with the form of vector.The element of webpage vector is a characteristic item, and element value is the word frequency of characteristic item in this webpage text.
(7) carry out Web page classifying with the Naive Bayes Classification method.Adopt the Naive Bayes Classification method, the joint distribution probability of use classes probability and characteristic item is inferred the classification of document.As prior probability, the joint distribution probability of characteristic item is managed theorem according to Bayes and can be obtained posterior probability as conditional probability with class probability.Select the classification of the maximum classification of posterior probability as webpage to be measured.
Following mask body is introduced the principle and the step of Naive Bayes Classification method.
Make C={c
1, c
2..., c
kBe the set of classification, D={d
1, d
2..., d
nBe training set, n
jBelong to classification c in the expression training set
jNumber of files, n representes the training set total sample number, d is a document to be measured, adopts Laplce's probability estimate to calculate, and can obtain classification c
jPrior probability P (c
j), as follows:
Document d to be measured is made up of the characteristic item that it comprises, i.e. d=(w
1, w
2..., w
m), adopt Laplce's probability estimate to come calculated characteristics item wi to belong to classification c
jProbability, can draw conditional probability P (d|c by characteristic independence condition
j), as follows:
Wherein, TF (w
i| c
j) representation feature item w
iAt classification c
jThe word frequency summation of all documents, V representes the characteristic item sum in the webpage vector.
Because for each classification P (d) all is a constant, according to Bayes' theorem, select the maximum classification of posterior probability, be choosing and then make product P (c
j) P (d|c
j) maximum classification.
Its concrete steps are as follows:
1. calculation training is concentrated the total sample number of the number of files and the training set that belong to of all categories;
2. calculate prior probability according to the prior probability formula;
3. calculate the characteristic item number in the text to be measured, calculate the word frequency number of each characteristic item in different classes respectively;
4. calculate the conditional probability of each classification respectively according to the conditional probability computing formula;
5. corresponding to each classification, try to achieve the product of prior probability and conditional probability;
6. select the classification of the maximum classification of the product of prior probability and conditional probability as document to be measured.
(8) upgrade training set.Index and the threshold value of estimating classify accuracy at first are set, and the computing formula of its accuracy index ES is as follows:
Wherein, P (c
i) P (d|c
i) be the prior probability of classification under the document to be measured and the product of conditional probability, P (c
s) P (d|c
s) be the prior probability of other classifications and the product of conditional probability.
The computing formula of threshold value Threshold is as follows:
Wherein, n representes training set total sample number, n
iBelong to classification c in the expression training set
jNumber of files, P (c
i) P (d|c
i) be the prior probability of all categories and the product of conditional probability.
Each classification is accomplished the back and is calculated the preparation index of this classification results according to formula, if the accuracy index ES of classification results then upgrades training set greater than threshold value Threshold, in the webpage vector sum correlation type adding training set with webpage to be measured.Otherwise, keep original training set constant.
(9) Web user behavior analysis.Make up different querying conditions, in conjunction with the classification information of user basic information with the webpage of being browsed, the behavioural habits of analysis user.User basic information comprises the time of user's IP address, ownership place, browsing page, the IP packet byte number of reception, the IP packet byte number of transmission, the IP bag number of reception, the IP bag number of transmission; Add the webpage classification; Can draw 8 independent conditions; According to the permutation and combination principle, make up these 8 different independent conditions and can obtain 2
8Individual compound condition; Remove the condition of some no practical significances then; Finally can obtain 26 conditions, inquire about, can draw the distribution situation that user under the different condition browses dissimilar Web webpages according to these 26 conditions with actual value; Can draw Web user's behavioural habits and hobby trend according to these information, help to provide personalized more service.
Claims (1)
1. Web user behavior analysis method based on the Chinese web page automatic classification technology is characterized in that the step of this method:
(1) data acquisition.According to the demand Information Monitoring, mainly be the essential information and the URL that extracts user institute browsing page of gathering Web user.
(2) the webpage source code extracts.Obtain the source code of webpage according to URL, and remove information such as Html mark, text, image, client's script, only stay pure Chinese text.
(3) participle.Adopt maximum double to matching method, mate, the content of Chinese Web text is cut into the set of some entries compositions through entry with Chinese dictionary.
(4) screening keyword.The screening keyword is divided into the screening of key term length and removes two steps of duplicate key speech.At first, the scope of entry is restricted between 2 to 4, the entry in this scope is not little even play interference effect to the classification effect, and these entries are rejected.Then, the entry that repeats to occur in each text is only write down once, and the relevant with it word frequency of record, can improve computing velocity, reduce miscount.
(5) confirm characteristic item.Satisfy χ between Chinese keyword in the webpage is generic
2Distribute, so adopt χ
2Statistical method is confirmed characteristic item.Calculate the frequency of keyword in of all categories earlier, pass through χ then
2Statistical formula is come compute statistics, and bigger preceding 1000 keywords of last selection statistic are as characteristic item.
(6) webpage vector representation.Write down selected characteristic item and relevant with it word frequency, and represent with the form of vector.The element of webpage vector is a characteristic item, and element value is the word frequency of characteristic item in this webpage text.
(7) carry out Web page classifying with the Naive Bayes Classification method.As prior probability, the joint distribution probability of characteristic item is managed theorem according to Bayes and can be obtained posterior probability as conditional probability with class probability.Select the classification of the maximum classification of posterior probability as webpage to be measured.
(8) upgrade training set.A measure index and a threshold value of estimating the classification results accuracy rate is set; After accomplishing, each classification calculates the preparation index of this classification results; If the accuracy index of classification results is then upgraded training set greater than threshold value, the webpage vector of webpage to be measured is added in the related category of training set.Otherwise, keep original training set constant.
(9) Web user behavior analysis.Make up different querying conditions; In conjunction with the classification information of user basic information with the webpage of being browsed; Can draw the distribution situation that user under the different condition browses dissimilar Web webpages; Can draw Web user's behavioural habits and hobby trend according to these information, help to provide personalized more service.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011102278003A CN102402566A (en) | 2011-08-09 | 2011-08-09 | Web user behavior analysis method based on Chinese webpage automatic classification technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011102278003A CN102402566A (en) | 2011-08-09 | 2011-08-09 | Web user behavior analysis method based on Chinese webpage automatic classification technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102402566A true CN102402566A (en) | 2012-04-04 |
Family
ID=45884777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011102278003A Pending CN102402566A (en) | 2011-08-09 | 2011-08-09 | Web user behavior analysis method based on Chinese webpage automatic classification technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102402566A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103516563A (en) * | 2013-10-18 | 2014-01-15 | 北京奇虎科技有限公司 | Equipment and method for monitoring abnormal or normal command |
CN103914478A (en) * | 2013-01-06 | 2014-07-09 | 阿里巴巴集团控股有限公司 | Webpage training method and system and webpage prediction method and system |
CN104601435A (en) * | 2013-10-30 | 2015-05-06 | 北京千橡网景科技发展有限公司 | Method and device for recommending friends |
CN104750752A (en) * | 2013-12-31 | 2015-07-01 | 中国移动通信集团公司 | Determination method and device of user community with internet-surfing preference |
CN105224955A (en) * | 2015-10-16 | 2016-01-06 | 武汉邮电科学研究院 | Based on the method for microblogging large data acquisition network service state |
CN107077470A (en) * | 2014-10-31 | 2017-08-18 | 隆沙有限公司 | The semantic classification of focusing |
CN107203740A (en) * | 2017-04-24 | 2017-09-26 | 华侨大学 | A kind of face age estimation method based on deep learning |
CN108596276A (en) * | 2018-05-10 | 2018-09-28 | 重庆邮电大学 | The naive Bayesian microblog users sorting technique of feature based weighting |
CN109063001A (en) * | 2018-07-09 | 2018-12-21 | 北京小米移动软件有限公司 | page display method and device |
CN109214445A (en) * | 2018-08-27 | 2019-01-15 | 陆柒(北京)科技有限公司 | A kind of multi-tag classification method based on artificial intelligence |
CN110363570A (en) * | 2019-06-19 | 2019-10-22 | 北京三快在线科技有限公司 | Classification methods of exhibiting, device, electronic equipment and storage medium in |
US10458806B2 (en) | 2015-01-27 | 2019-10-29 | Beijing Didi Infinity Technology And Development Co., Ltd. | Methods and systems for providing information for an on-demand service |
CN110516157A (en) * | 2019-08-30 | 2019-11-29 | 盈盛智创科技(广州)有限公司 | A kind of document retrieval method, equipment and storage medium |
CN111291071A (en) * | 2020-01-21 | 2020-06-16 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
CN112861956A (en) * | 2021-02-01 | 2021-05-28 | 浪潮云信息技术股份公司 | Water pollution model construction method based on data analysis |
-
2011
- 2011-08-09 CN CN2011102278003A patent/CN102402566A/en active Pending
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914478A (en) * | 2013-01-06 | 2014-07-09 | 阿里巴巴集团控股有限公司 | Webpage training method and system and webpage prediction method and system |
CN103914478B (en) * | 2013-01-06 | 2018-05-08 | 阿里巴巴集团控股有限公司 | Webpage training method and system, webpage Forecasting Methodology and system |
CN103516563A (en) * | 2013-10-18 | 2014-01-15 | 北京奇虎科技有限公司 | Equipment and method for monitoring abnormal or normal command |
CN104601435A (en) * | 2013-10-30 | 2015-05-06 | 北京千橡网景科技发展有限公司 | Method and device for recommending friends |
CN104750752A (en) * | 2013-12-31 | 2015-07-01 | 中国移动通信集团公司 | Determination method and device of user community with internet-surfing preference |
CN104750752B (en) * | 2013-12-31 | 2018-06-15 | 中国移动通信集团公司 | A kind of determining method and apparatus for the preferences user group that surfs the Internet |
CN107077470A (en) * | 2014-10-31 | 2017-08-18 | 隆沙有限公司 | The semantic classification of focusing |
US11892312B2 (en) | 2015-01-27 | 2024-02-06 | Beijing Didi Infinity Technology And Development Co., Ltd. | Methods and systems for providing information for an on-demand service |
US10458806B2 (en) | 2015-01-27 | 2019-10-29 | Beijing Didi Infinity Technology And Development Co., Ltd. | Methods and systems for providing information for an on-demand service |
US11156470B2 (en) | 2015-01-27 | 2021-10-26 | Beijing Didi Infinity Technology And Development Co., Ltd. | Methods and systems for providing information for an on-demand service |
CN105224955A (en) * | 2015-10-16 | 2016-01-06 | 武汉邮电科学研究院 | Based on the method for microblogging large data acquisition network service state |
CN107203740A (en) * | 2017-04-24 | 2017-09-26 | 华侨大学 | A kind of face age estimation method based on deep learning |
CN108596276A (en) * | 2018-05-10 | 2018-09-28 | 重庆邮电大学 | The naive Bayesian microblog users sorting technique of feature based weighting |
CN109063001A (en) * | 2018-07-09 | 2018-12-21 | 北京小米移动软件有限公司 | page display method and device |
CN109063001B (en) * | 2018-07-09 | 2021-06-04 | 北京小米移动软件有限公司 | Page display method and device |
CN109214445A (en) * | 2018-08-27 | 2019-01-15 | 陆柒(北京)科技有限公司 | A kind of multi-tag classification method based on artificial intelligence |
CN110363570A (en) * | 2019-06-19 | 2019-10-22 | 北京三快在线科技有限公司 | Classification methods of exhibiting, device, electronic equipment and storage medium in |
CN110363570B (en) * | 2019-06-19 | 2024-10-25 | 北京三快在线科技有限公司 | Method and device for displaying categories in application, electronic equipment and storage medium |
CN110516157A (en) * | 2019-08-30 | 2019-11-29 | 盈盛智创科技(广州)有限公司 | A kind of document retrieval method, equipment and storage medium |
CN110516157B (en) * | 2019-08-30 | 2022-04-01 | 盈盛智创科技(广州)有限公司 | Document retrieval method, document retrieval equipment and storage medium |
CN111291071A (en) * | 2020-01-21 | 2020-06-16 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
CN111291071B (en) * | 2020-01-21 | 2023-10-17 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
CN112861956A (en) * | 2021-02-01 | 2021-05-28 | 浪潮云信息技术股份公司 | Water pollution model construction method based on data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102402566A (en) | Web user behavior analysis method based on Chinese webpage automatic classification technology | |
CN103914478B (en) | Webpage training method and system, webpage Forecasting Methodology and system | |
CN101593200B (en) | Method for classifying Chinese webpages based on keyword frequency analysis | |
CN105630941B (en) | Web body matter abstracting methods based on statistics and structure of web page | |
El-Fishawy et al. | Arabic summarization in twitter social network | |
CN104199972A (en) | Named entity relation extraction and construction method based on deep learning | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN103455562A (en) | Text orientation analysis method and product review orientation discriminator on basis of same | |
CN102298638A (en) | Method and system for extracting news webpage contents by clustering webpage labels | |
CN104885081A (en) | Search system and corresponding method | |
CN103049435A (en) | Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device | |
JP2013529805A5 (en) | Search method, search system and computer program | |
CN102880723A (en) | Searching method and system for identifying user retrieval intention | |
CN103853824A (en) | In-text advertisement releasing method and system based on deep semantic mining | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN105045901A (en) | Search keyword push method and device | |
CN101853300A (en) | Method and system for identifying and evaluating video downloading service website | |
CN104008203A (en) | User interest discovering method with ontology situation blended in | |
CN105045931A (en) | Video recommendation method and system based on Web mining | |
CN103023714A (en) | Activeness and cluster structure analyzing system and method based on network topics | |
CN102609427A (en) | Public opinion vertical search analysis system and method | |
CN104504024A (en) | Method and system for mining keywords based on microblog content | |
CN103810251A (en) | Method and device for extracting text | |
CN104915422A (en) | Webpage collecting method and device based on browser | |
CN103744954A (en) | Word relevancy network model establishing method and establishing device thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120404 |