Nothing Special   »   [go: up one dir, main page]

CN104346411A - Method and equipment for clustering multiple manuscripts - Google Patents

Method and equipment for clustering multiple manuscripts Download PDF

Info

Publication number
CN104346411A
CN104346411A CN201310346857.4A CN201310346857A CN104346411A CN 104346411 A CN104346411 A CN 104346411A CN 201310346857 A CN201310346857 A CN 201310346857A CN 104346411 A CN104346411 A CN 104346411A
Authority
CN
China
Prior art keywords
manuscript
contribution
manuscripts
keywords
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310346857.4A
Other languages
Chinese (zh)
Other versions
CN104346411B (en
Inventor
王露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201310346857.4A priority Critical patent/CN104346411B/en
Publication of CN104346411A publication Critical patent/CN104346411A/en
Application granted granted Critical
Publication of CN104346411B publication Critical patent/CN104346411B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and equipment for clustering multiple manuscripts. The method comprises the steps of establishing a manuscript classification space according to the classification of a news classification method; extracting keyboards in each manuscript; establishing manuscript coordinates according to the frequency of the extracted keywords so as to map each manuscript into a point in the manuscript classification space; calculating the distance between the manuscripts, and determining the manuscripts between which the distance is smaller than a distance threshold value as a class. According to the method, a great number of news manuscripts can be automatically clustered, and therefore manpower is saved.

Description

Multiple contribution is carried out to the method and apparatus of cluster
Technical field
The application relates to the method and apparatus multiple contribution being carried out to cluster.
Background technology
Society, quantity of information is that geometry quantity increases, and all can produce a large amount of records every day, study document or the contribution of various field such as news, history, science and technology etc., need sometimes to classify to these contributions.
Such as, for newspaper office, news website etc., every day can receive a large amount of Press release, and may need to classify to Press release to report more accurately.Ageing very strong due to Press release, it is very important for classifying to Press release as soon as possible.If by manually classifying to all contributions, then hard work amount can be produced, thus ageing being difficult to of news is caused to ensure.If first a large amount of Press release is divided into a few class by the method for automatic cluster, then through artificial adjustment, then can save a large amount of labor workload.
Therefore, there is the demand of the method and apparatus multiple contribution being carried out automatically to cluster.
Summary of the invention
In order to solve the problem, this application provides the method and apparatus multiple contribution being carried out to cluster, thus automatically can carry out cluster to a large amount of Press release, save manpower.
According to the first aspect of the application, provide a kind of method of multiple contribution being carried out to cluster, comprising:
Contribution classifying space is set up according to the classification of news category method;
Extract the keyword in each contribution;
Frequency according to the keyword extracted sets up contribution coordinate, thus each contribution is mapped as the point in contribution classifying space;
Calculate point that the first contribution in described multiple contribution the maps distance in contribution classifying space respectively and between the point that maps of other contributions, determine in calculated distance, whether to there is the value being less than the first predetermined distance threshold; And
If existed, then the contribution corresponding to the value being less than described first distance threshold is defined as belonging to the identical first kind with described first contribution.
According to the second aspect of the application, provide a kind of equipment multiple contribution being carried out to cluster, comprising:
Set up module, be configured to set up contribution classifying space according to the classification of news category method;
Extraction module, is configured to extract the keyword in each contribution;
Mapping block, is configured to set up contribution coordinate according to the frequency of the keyword extracted, thus each contribution is mapped as the point in contribution classifying space;
Computing module, be configured to calculate point that the first contribution in described multiple contribution the maps distance in contribution classifying space respectively and between the point that maps of other contributions, determine in calculated distance, whether to there is the value being less than the first predetermined distance threshold; And
Cluster module, is configured to the contribution corresponding to the value being less than described first distance threshold to be defined as belonging to the identical first kind with described first contribution.
Brief Description Of Drawings
Fig. 1 is the process flow diagram multiple contribution being carried out to the method for cluster according to the application's embodiment; And
Fig. 2 is the schematic diagram multiple contribution being carried out to the equipment of cluster according to the application's embodiment.
Embodiment
Below in conjunction with embodiment and accompanying drawing to being described in detail the method and apparatus that multiple contribution carries out cluster according to the application's embodiment.
In this application, " news category method " refers to the method for classifying to contribution according to the kind of news, such as, contribution can be categorized as finance and economics, physical culture, science and technology, politics, amusement class etc., can also be football, basketball, tennis, gymnastics etc. by classification sports.
In this application, " contribution classifying space " refers to the space that the classification divided using news category method is set up as dimension.
In this application, the entry in " uncorrelated dictionary " refers to and usually to occur in contribution in contribution but the entry irrelevant with the classification carried out according to news category method, such as, and " we ", " but " etc.
In this application, " cluster " refers to and the contribution with certain correlativity is divided into same class, such as, multiple contribution is divided into politics, amusement class etc.
First with reference to Fig. 1, the method for multiple contribution being carried out to cluster according to the application's embodiment will be described.
In a step 101, contribution classifying space is set up according to the classification of news category method.
In the exemplary embodiment, news category method can comprise multiple classifications such as such as finance and economics, physical culture, science and technology, politics, amusement.In some embodiments, can using the dimension of each classifications such as finance and economics, physical culture, science and technology, politics, amusement as contribution classifying space.Such as, news category method comprises N number of classification, then contribution classifying space can be N dimension space, and the coordinate of the point in contribution classifying space can be expressed as (T1, W1, T2, W2 ..., Tn, Wn), wherein, Ti is i-th dimension of contribution classifying space corresponding to the i-th classification of news category method, and Wi is the weight of Ti.
In a step 102, the keyword in each contribution is extracted.
In some embodiments, step 102 can comprise: calculate the frequency that entry occurs in contribution, and by the entry alternatively keyword higher than the frequency threshold (such as, five times) preset; And remove incoherent candidate keywords (such as, conventional " we ", " but " etc.), thus obtain keyword.
In the exemplary embodiment, uncorrelated dictionary can be preset, wherein record the frequency of occurrences in contribution higher but the word of classifying cannot be carried out according to news category method, such as, " we ", " but " etc.Such as, after acquisition candidate keywords " we ", " football ", judge that entry " we " is present in default uncorrelated dictionary, then " we " are removed from candidate keywords; And judge that entry " football " is not present in default uncorrelated dictionary, be then defined as keyword.
In step 103, the frequency according to the keyword extracted sets up contribution coordinate, thus each contribution is mapped as the point in contribution classifying space.
In the exemplary embodiment, setting up contribution according to the frequency of keyword extracted can comprise: the keyword extracted in contribution is all belonged to the classification carried out entry to news category method, that is, corresponding to a dimension of contribution classifying space, and Ti; And calculate the frequency belonging to all keywords of this classification in contribution and, using this frequency with as the value of contribution in this dimension, Wi, thus each contribution is mapped as the point in contribution classifying space.
Such as, occur football in contribution 8 times, tennis 6 time, dollar 6 times, all belongs to football and tennis to sport category by news category method, by dollar ownership to economic class, thus in contribution classifying space, the value of its sport category dimension is 8+6=14, the value of economic class dimension 6 times.In this way, each contribution can be mapped as the point in contribution classifying space.
In the exemplary embodiment, when the value W1-Wi of frequency in each dimension of contribution is less than frequency threshold, then this contribution is included in " unfiled ".
At step 104, calculate point that the first contribution in multiple contribution the maps distance in contribution classifying space respectively and between the point that maps of other contributions, determine in calculated distance, whether to there is the value being less than the first predetermined distance threshold.
In the exemplary embodiment, the time that can be formed according to contribution determines the first contribution, such as, and can using time contribution the earliest as the first contribution.
In another illustrative embodiments, from multiple contribution, Stochastic choice contribution is as the first contribution.
In another illustrative embodiments, by staff from multiple contribution Stochastic choice contribution as the first contribution.
In step 105, if determining to exist in calculated distance is less than the value of the first predetermined distance threshold, then the contribution corresponding to the value being less than the first distance threshold is defined as belonging to the identical first kind with the first contribution.
In the exemplary embodiment, the size of the first distance threshold can be set by operating personnel according to actual needs.Such as, if by first distance threshold arrange relatively little, make the correlativity of of a sort contribution relatively strong; On the contrary, if by first distance threshold arrange relatively large, make the correlativity of of a sort contribution relatively weak.
In some embodiments, step can also be comprised to the method that multiple contribution carries out cluster:
Determine in multiple contribution, whether to there are the multiple contributions not being confirmed as the first kind, if existed, then therefrom select the second contribution, and calculate point that the second contribution the maps distance in contribution classifying space and between the point that maps of other contributions not being confirmed as the first kind;
Determine in calculated distance, whether to there is the value being less than predetermined second distance threshold value; And
If existed, then the contribution corresponding to the value being less than second distance threshold value is defined as belonging to identical Equations of The Second Kind with the second contribution.
In some embodiments, by repeating similar step, all contributions can be made all to classify, that is: determine in multiple contribution, whether to there are the multiple contributions not being confirmed as the first kind or Equations of The Second Kind, if existed, then therefrom select the 3rd contribution, and calculate point that the 3rd contribution the maps distance in contribution classifying space and between the point that maps of other contributions not being confirmed as the first kind or Equations of The Second Kind; Determine whether to exist in calculated distance the value being less than the 3rd predetermined distance threshold; And if exist, then the contribution corresponding to the value being less than the 3rd distance threshold is defined as belonging to the 3rd identical class with the 3rd contribution, until all contributions are all classified.
By the above-mentioned method of multiple contribution being carried out to cluster according to the application's embodiment, automatic cluster can be carried out to a large amount of contributions, thus the manpower saved.
Referring to Fig. 2, the equipment multiple contribution being carried out to cluster according to the application's embodiment will be described.
As shown in the figure, this equipment can comprise with lower component.
Set up module 201, it can set up contribution classifying space according to the classification of news category method.In some embodiments, set up module 201 and can set up contribution classifying space according to history classification to each classification that entry carries out.
Extraction module 202, it can extract the keyword in each contribution.In some embodiments, extraction module 202 can comprise statistics parts and deleting parts.Statistics parts can add up the frequency that entry occurs in contribution, and by the alternatively keyword of the entry higher than frequency threshold.Deleting parts can delete incoherent entry from candidate keywords, thus obtains keyword.In the exemplary embodiment, deleting parts can judge whether candidate keywords is present in default uncorrelated dictionary, and if so, then this candidate keywords deleted, remaining candidate keywords is then as keyword.
Mapping block 203, it can set up contribution coordinate according to the frequency of the keyword extracted, thus each contribution is mapped as the point in contribution classifying space.In some embodiments, mapping block 203 can comprise ownership parts and summation component.The keyword extracted can belong to the classification according to news category method by ownership parts, that is, the dimension of contribution classifying space.The frequency that summation component can occur in contribution all keywords in this classification (that is, a certain dimension) and, and using this frequency with as the value of contribution in the dimension of the contribution classifying space corresponding to this classification.
Computing module 204, be configured to calculate point that the first contribution in multiple contribution the maps distance in contribution classifying space respectively and between the point that maps of other contributions, determine in calculated distance, whether to there is the value being less than the first predetermined distance threshold.
Cluster module 205, is configured to the contribution corresponding to the value being less than the first distance threshold to be defined as belonging to the identical first kind with the first contribution.In some embodiments, cluster module 205 can also determine whether there are the multiple contributions not being confirmed as the first kind in multiple contribution, if existed, then therefrom select the second contribution, and calculate point that the second contribution the maps distance in contribution classifying space and between the point that maps of other contributions not being confirmed as the first kind; Determine in calculated distance, whether to there is the value being less than predetermined second distance threshold value; And if exist, then the contribution corresponding to the value being less than second distance threshold value is defined as belonging to identical Equations of The Second Kind with the second contribution, until all contributions are all classified.
Should be appreciated that above embodiment is only exemplary herein, but not be the scope limiting the application.For a person skilled in the art, when not departing from spirit and the essence of the application, various modification and improvement can be made, but these modification and improve also should be considered as falling into the application protection domain among.

Claims (10)

1.一种对多个稿件进行聚类的方法,包括:1. A method of clustering a plurality of manuscripts, comprising: 根据新闻分类法的分类建立稿件分类空间;Create a manuscript classification space according to the classification of news taxonomy; 提取每个稿件中的关键词;Extract key words in each manuscript; 根据提取的关键词的频率建立稿件坐标,从而将每个稿件映射为稿件分类空间中的点;Establish manuscript coordinates based on the frequency of the extracted keywords, thereby mapping each manuscript as a point in the manuscript classification space; 计算所述多个稿件中的第一稿件所映射的点在稿件分类空间中分别与其他稿件所映射的点之间的距离,确定所计算出的距离中是否存在小于预定的第一距离阈值的值;以及calculating the distances between the points mapped by the first manuscript among the plurality of manuscripts in the manuscript classification space and the points mapped by other manuscripts, and determining whether there is a distance smaller than a predetermined first distance threshold among the calculated distances; value; and 如果存在,则将小于所述第一距离阈值的值所对应的稿件确定为和所述第一稿件属于相同的第一类。If it exists, determine that the manuscript corresponding to the value smaller than the first distance threshold value belongs to the same first category as the first manuscript. 2.如权利要求1所述的方法,其中,提取每个稿件中的关键词的步骤包括:2. The method according to claim 1, wherein the step of extracting keywords in each manuscript comprises: 计算词条在稿件中出现的频率,并将高于频率阈值的词条作为候选关键词;以及Calculate the frequency of terms that appear in the manuscript and use terms above a frequency threshold as candidate keywords; and 从候选关键词中去掉不相关的词条,从而获得关键词。Remove irrelevant entries from candidate keywords to obtain keywords. 3.如权利要求2所述的方法,其中,从候选关键词中去掉不相关的词条,从而获得关键词的步骤包括:3. method as claimed in claim 2, wherein, remove irrelevant entry from candidate keyword, thereby the step of obtaining keyword comprises: 判断候选关键词是否存在于预设的不相关词库;以及judging whether the candidate keyword exists in a preset irrelevant thesaurus; and 如果是,则将该候选关键词去掉;如果否,则将该候选关键词作为关键词。If yes, remove the candidate keyword; if not, use the candidate keyword as a keyword. 4.如权利要求1所述的方法,其中,根据新闻分类法的分类建立稿件分类空间的步骤包括:4. The method as claimed in claim 1, wherein, the step of establishing the manuscript classification space according to the classification of news taxonomy comprises: 将新闻分类法的每个分类作为稿件分类空间的维度。Each category of the news taxonomy is considered as a dimension of the manuscript taxonomy space. 5.如权利要求1所述的方法,其中,根据提取的关键词的频率建立稿件坐标的步骤包括:5. The method according to claim 1, wherein, the step of establishing manuscript coordinates according to the frequency of extracted keywords comprises: 将提取的关键词归属至根据新闻分类法的每个分类;Attribute the extracted keywords to each category according to the news taxonomy; 计算该分类中的所有关键词在稿件中出现的频率和,并将该频率和作为稿件在该分类所对应的稿件分类空间的维度上的值。Calculate the frequency sum of all keywords in the category that appear in the manuscript, and use the frequency sum as the value of the manuscript in the dimension of the manuscript classification space corresponding to the category. 6.如权利要求1所述的方法,其中,所述第一稿件是根据稿件形成的时间确定的,或者随机选择的,或者通过工作人员选择的。6. The method according to claim 1, wherein the first manuscript is determined according to the time when the manuscript was formed, or selected randomly, or selected by a staff member. 7.如权利要求6所述的方法,还包括:7. The method of claim 6, further comprising: 确定所述多个稿件中是否存在未被确定为第一类的多个稿件,如果存在,则从中选择第二稿件,并计算第二稿件所映射的点在稿件分类空间中与其他未被确定为第一类的稿件所映射的点之间的距离;Determine whether there are a plurality of manuscripts that are not determined as the first category among the plurality of manuscripts, and if so, select a second manuscript from them, and calculate the difference between the points mapped to the second manuscript and other manuscripts that are not determined in the manuscript classification space distance between points mapped for manuscripts of the first category; 确定所计算出的距离中是否存在小于预定的第二距离阈值的值;以及determining whether any of the calculated distances has a value less than a predetermined second distance threshold; and 如果存在,则将小于所述第二距离阈值的值所对应的稿件确定为和所述第二稿件属于相同的第二类。If there is, the manuscript corresponding to the value smaller than the second distance threshold is determined as belonging to the same second category as the second manuscript. 8.一种对多个稿件进行聚类的设备,包括:8. An apparatus for clustering a plurality of manuscripts, comprising: 建立模块,被配置以根据新闻分类法的分类建立稿件分类空间;a building module configured to create a manuscript classification space according to the categories of the news taxonomy; 提取模块,被配置以提取每个稿件中的关键词;an extraction module configured to extract keywords in each manuscript; 映射模块,被配置以根据提取的关键词的频率建立稿件坐标,从而将每个稿件映射为稿件分类空间中的点;a mapping module configured to establish manuscript coordinates based on the frequency of the extracted keywords, thereby mapping each manuscript as a point in the manuscript taxonomy space; 计算模块,被配置以计算所述多个稿件中的第一稿件所映射的点在稿件分类空间中分别与其他稿件所映射的点之间的距离,确定所计算出的距离中是否存在小于预定的第一距离阈值的值;以及The calculation module is configured to calculate the distance between the points mapped by the first manuscript among the plurality of manuscripts and the points mapped by other manuscripts in the manuscript classification space, and determine whether there is a distance smaller than a predetermined distance among the calculated distances. The value of the first distance threshold of ; and 聚类模块,被配置以将小于所述第一距离阈值的值所对应的稿件确定为和所述第一稿件属于相同的第一类。A clustering module configured to determine manuscripts corresponding to values smaller than the first distance threshold as belonging to the same first category as the first manuscripts. 9.如权利要求8所述的设备,所述提取模块包括:9. The apparatus of claim 8, said extraction module comprising: 统计部件,被配置以统计词条在稿件中出现的频率,并将高于频率阈值的词条作为候选关键词;以及A statistical component configured to count the frequency of terms appearing in the manuscript, and use terms higher than the frequency threshold as candidate keywords; and 删除部件,被配置以从候选关键词中删除不相关的词条,从而获得关键词。The deletion component is configured to delete irrelevant terms from candidate keywords, thereby obtaining keywords. 10.如权利要求8所述的设备,所述映射模块包括:10. The device of claim 8, the mapping module comprising: 归属部件,被配置以将提取的关键词归属至根据新闻分类法的分类,an attribution component configured to attribute the extracted keywords to categories according to the news taxonomy, 求和部件,被配置以计算该分类中的所有关键词在稿件中出现的频率和,并将该频率和作为稿件在该分类所对应的稿件分类空间的维度上的值。The summing component is configured to calculate the frequency sum of all keywords in the category appearing in the manuscript, and use the frequency sum as the value of the manuscript in the dimension of the manuscript classification space corresponding to the category.
CN201310346857.4A 2013-08-09 2013-08-09 The method and apparatus that multiple contributions are clustered Expired - Fee Related CN104346411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310346857.4A CN104346411B (en) 2013-08-09 2013-08-09 The method and apparatus that multiple contributions are clustered

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310346857.4A CN104346411B (en) 2013-08-09 2013-08-09 The method and apparatus that multiple contributions are clustered

Publications (2)

Publication Number Publication Date
CN104346411A true CN104346411A (en) 2015-02-11
CN104346411B CN104346411B (en) 2018-11-06

Family

ID=52502023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310346857.4A Expired - Fee Related CN104346411B (en) 2013-08-09 2013-08-09 The method and apparatus that multiple contributions are clustered

Country Status (1)

Country Link
CN (1) CN104346411B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243118A (en) * 2015-09-29 2016-01-13 武汉传神信息技术有限公司 Manuscript data classification method
CN105760526A (en) * 2016-03-01 2016-07-13 网易(杭州)网络有限公司 News classification method and device
CN108536695A (en) * 2017-03-02 2018-09-14 北京嘀嘀无限科技发展有限公司 A kind of polymerization and device of geographical location information point
CN109063983A (en) * 2018-07-18 2018-12-21 北京航空航天大学 A kind of natural calamity loss real time evaluating method based on social media data
CN111209390A (en) * 2020-01-06 2020-05-29 北大方正集团有限公司 News presentation method and system, computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050071365A1 (en) * 2003-09-26 2005-03-31 Jiang-Liang Hou Method for keyword correlation analysis
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News Event Detection Method Based on Metadata Analysis
CN101694670A (en) * 2009-10-20 2010-04-14 北京航空航天大学 Chinese Web document online clustering method based on common substrings
CN102346753A (en) * 2010-08-01 2012-02-08 青岛理工大学 Semi-supervised text clustering method and device fusing pairwise constraints and keywords
CN102855312A (en) * 2012-08-24 2013-01-02 武汉大学 Domain-and-theme-oriented Web service clustering method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050071365A1 (en) * 2003-09-26 2005-03-31 Jiang-Liang Hou Method for keyword correlation analysis
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News Event Detection Method Based on Metadata Analysis
CN101694670A (en) * 2009-10-20 2010-04-14 北京航空航天大学 Chinese Web document online clustering method based on common substrings
CN102346753A (en) * 2010-08-01 2012-02-08 青岛理工大学 Semi-supervised text clustering method and device fusing pairwise constraints and keywords
CN102855312A (en) * 2012-08-24 2013-01-02 武汉大学 Domain-and-theme-oriented Web service clustering method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243118A (en) * 2015-09-29 2016-01-13 武汉传神信息技术有限公司 Manuscript data classification method
CN105760526A (en) * 2016-03-01 2016-07-13 网易(杭州)网络有限公司 News classification method and device
CN105760526B (en) * 2016-03-01 2019-05-07 网易(杭州)网络有限公司 A kind of method and apparatus of news category
CN108536695A (en) * 2017-03-02 2018-09-14 北京嘀嘀无限科技发展有限公司 A kind of polymerization and device of geographical location information point
CN108536695B (en) * 2017-03-02 2021-06-04 北京嘀嘀无限科技发展有限公司 Aggregation method and device of geographic position information points
CN109063983A (en) * 2018-07-18 2018-12-21 北京航空航天大学 A kind of natural calamity loss real time evaluating method based on social media data
CN109063983B (en) * 2018-07-18 2022-06-21 北京航空航天大学 A real-time assessment method of natural disaster losses based on social media data
CN111209390A (en) * 2020-01-06 2020-05-29 北大方正集团有限公司 News presentation method and system, computer readable storage medium
CN111209390B (en) * 2020-01-06 2023-09-05 新方正控股发展有限责任公司 News display method and system and computer readable storage medium

Also Published As

Publication number Publication date
CN104346411B (en) 2018-11-06

Similar Documents

Publication Publication Date Title
KR102080362B1 (en) Query expansion
CN106570144A (en) Method and apparatus for recommending information
CN104346411A (en) Method and equipment for clustering multiple manuscripts
CN103902597A (en) Method and device for determining search relevant categories corresponding to target keywords
CN107180093A (en) Information search method and device and ageing inquiry word recognition method and device
WO2017113592A1 (en) Model generation method, word weighting method, apparatus, device and computer storage medium
CN103176962A (en) Statistical method and statistical system of text similarity
CN111324801B (en) Hot event discovery method in judicial field based on hot words
WO2017075912A1 (en) News events extracting method and system
WO2017084205A1 (en) Network user identity authentication method and system
CN104239321B (en) A kind of data processing method and device of Search Engine-Oriented
CN103268330A (en) Method of Extracting User Interest Based on Image Content
CN107832467A (en) A kind of microblog topic detecting method based on improved Single pass clustering algorithms
CN110928986A (en) Legal evidence sorting and recommending method, device, equipment and storage medium
CN102855245A (en) Image similarity determining method and image similarity determining equipment
CN107085568A (en) A kind of text similarity method of discrimination and device
CN103218368B (en) A kind of method and apparatus excavating hot word
CN116226103A (en) Method for detecting government data quality based on FPGrow algorithm
CN110348717B (en) Base station value scoring method and device based on grid granularity
CN108182294A (en) A kind of film based on frequent item set growth algorithm recommends method and system
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN104462347A (en) Keyword classifying method and device
CN103092838B (en) A kind of method and device for obtaining English words
CN104778202B (en) The analysis method and system of event evolutionary process based on keyword
CN105512270B (en) Method and device for determining related objects

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220617

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 5 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181106