CN104239314A - Search word expanding method and system - Google Patents
Search word expanding method and system Download PDFInfo
- Publication number
- CN104239314A CN104239314A CN201310231653.6A CN201310231653A CN104239314A CN 104239314 A CN104239314 A CN 104239314A CN 201310231653 A CN201310231653 A CN 201310231653A CN 104239314 A CN104239314 A CN 104239314A
- Authority
- CN
- China
- Prior art keywords
- word
- popular
- label
- popular word
- query expansion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/2448—Query languages for particular applications; for extensibility, e.g. user defined types
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a search word expanding method and system. The method comprises the steps that one or more label words are set for each general term to form a label word dictionary; weighting is conducted on each label word where each second general term belongs according to the relationships between the second general terms around the first general terms, and sorting is conducted on the label words; a preset number of high-weight label words are extracted from all the label words where the second general terms belong to serve as the expansion range of the first general terms; when the first general terms are input to be used as search words, the general terms corresponding to the label words in the expansion range of the first general terms are displayed; needed general terms are selected from the general terms corresponding to the label words in the expansion range to be used as expanded search words of the search words. According to the technical scheme, the search word expanding method and system can help a user obtain search words meeting information search targets, and therefore the information search efficiency is improved.
Description
Technical field
The present invention relates to technical field of information retrieval, particularly relate to a kind of method and system of query expansion word.
Background technology
Along with the arrival of information age, people enter the ocean of information.Before the information faces of magnanimity, people are at a loss as to what to do on the contrary, are difficult to the information found at short notice needed for oneself.But the development of computer technology and network technology, provide help in a way to information retrieval field, people can build the search strategy needed for oneself, and utilize computer technology and network technology, obtain correct information.
So-called search strategy is exactly analyzing on the basis that retrieval puts question to, the data source of deterministic retrieval, the word of retrieval, and the scientific arrangement of logical relation between clear and definite term and finding step.Retrieval formula (namely retrieving the expression formula combined of word and each operational symbol) is exactly sense stricto search strategy.
Primary link in retrieving is exactly want clear and definite Search Requirement, if the first step is confused, and the correctness of the last result for retrieval that is just far from being.Because user is to the demand of oneself, particularly potential, fuzzy demand is always very clear and definite, therefore needs to analyze, in the hope of a complete and clear and definite expression.
In the retrieval formula process that structure one is complete and clear and definite, first user needs to find suitable term, but user before retrieval, to the message area needing to obtain, often just grasp some furs, understand some concepts, how to go accurately to build retrieval formula by these preliminary concepts, for user, be very difficult.
Summary of the invention
The object of the invention is to the shortcoming and defect overcoming prior art, a kind of method and system of query expansion word is provided, user can be helped to obtain the term meeting information retrieval target, thus improve Information Retrieval Efficiency.
An embodiment provides a kind of method of query expansion word, comprise the following steps:
The label word being no less than one is set to each popular word, forms label word dictionary;
According to the relation between the second popular word of the first popular word periphery, be weighted to each label word belonging to each second popular word;
Whole label words belonging to described second popular word are sorted according to the weights of each label word;
From the whole label words belonging to described second popular word, extract the high weight label word of predetermined number, as the spreading range of described first popular word;
When inputting the first popular word as term, popular word corresponding for the label word being positioned at described first popular word spreading range is shown;
From the popular word that the label word in described spreading range is corresponding, select the popular word needed, as the query expansion word of term.
Preferably, further comprising the steps of:
According to the relation between the second popular word of the first popular word periphery, be weighted to each second popular word;
Be positioned at popular word corresponding to each label word of described first popular word spreading range to arrange according to the weights order of popular word.
Preferably, the frequency occurred according to the second popular word of the first popular word periphery and/or distance, be weighted to each label word belonging to each second popular word and each second popular word.
Preferably, described label word dictionary adopts man-machine interaction mode to generate.
Preferably, further comprising the steps of:
The source file information at popular word place corresponding for the label word in described spreading range is shown, selects the popular word needed according to described source file information, as the query expansion word of term.
Preferably, the source file of retrieval comprises and is no less than a data source.
Preferably, described data source is news, forum and/or microblogging.
Preferably, described data source is the data in different technologies field or different business field.
An alternative embodiment of the invention additionally provides a kind of system of query expansion word, comprises tag unit, label word dictionary unit, weighted units, sequencing unit, input block and selection unit, wherein,
Tag unit is used for arranging the label word being no less than to each popular word;
Label word dictionary unit is for storing the label word of popular word and correspondence;
Weighted units, for according to the relation between the second popular word of the first popular word periphery, is weighted to each label word belonging to each second popular word and each second popular word;
Sequencing unit is used for the whole label words belonging to described second popular word to sort according to the weights of each label word, from the whole label words belonging to described second popular word, extract the high weight label word of predetermined number, as the spreading range of described first popular word, for the weights order of popular word corresponding for each label word being positioned at described first popular word spreading range according to popular word is arranged;
Input block is for inputting the first popular word as term;
Selection unit is used for, from popular word corresponding to the label word in described spreading range, selecting the popular word needed, as the query expansion word of term.
Preferably, also comprise source file storage unit, described source file storage unit is for storing the source file information at popular word place corresponding to label word in described spreading range;
Described selection unit is used for the popular word selecting needs according to described source file information, as the query expansion word of term.
Have employed technical solution of the present invention, due to from minority term, closely-related term with it can be expanded, thus help user to build the retrieval formula of complete and accurate, improve Information Retrieval Efficiency.
Accompanying drawing explanation
The process flow diagram of the query expansion word that Fig. 1 provides for the embodiment of the present invention;
The structural representation of the query expansion word system that Fig. 2 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail.But embodiments of the present invention are not limited thereto.
The main thought of technical solution of the present invention is exactly to a vocabulary, finds out related vocabulary with it from different aspect, and user is when building information retrieval formula, and the association vocabulary can found out from these, the vocabulary selecting oneself to need is as query expansion word.Here, these vocabulary can be referred to as popular word, and the vocabulary corresponding to different aspect, label word can be referred to as.
The process flow diagram of the query expansion word that Fig. 1 provides for the embodiment of the present invention.As shown in Figure 1, the flow process of this query expansion word comprises the following steps:
Step 101, one or more label word is set to each popular word, forms label word dictionary.
In this label word dictionary, have collected the vocabulary of magnanimity on the one hand, these vocabulary come from various data source, comprise news, forum and/or microblogging etc., or the data in different technologies field or different business field.These vocabulary just become popular word after cutting out from data source, and on the other hand, according to the character of popular word, specification goes out several for representing the vocabulary of popular word attribute, and these vocabulary are exactly label word.For each popular word, all give the label word that one or more is corresponding with it, just define label word dictionary.This label word dictionary can adopt man-machine interaction mode to generate.
Such as " technology " this popular word just can stamp " product information " and " license " these two label words; " study course " this popular word can stamp " product information " and " teaching " these two label words; " Siemens " this popular word can stamp " industry control brand " and " Business Name " these two label words.
By above-mentioned example, can find out that a popular word can one or more label word corresponding, a label word can one or more popular word corresponding conversely.
Step 102, according to the relation between the second popular word of the first popular word periphery, to be weighted to each label word belonging to each second popular word and each second popular word.
This step is for finding out the vocabulary relevant to a popular word, and quantizes degree relevant between the two.
Here the first popular word refers to the arbitrary popular word in data source, and the second popular word then refers to other vocabulary appearing at aforementioned popular word periphery in data source, comprises and appears at vocabulary in front, is also included within the vocabulary of back.
Relation between second popular word of the first popular word periphery can be determined in several ways, such as in data source, certain second popular word is in the position (distance) that the first popular word periphery occurs, and certain second popular word appears at the frequency of the first popular word periphery, etc.
By statistics these quantizating index in data source, just can give each second popular word weighting in data source, each label word belonging to each second popular word can be weighted equally, obtain their weights.
Such as in data source, there is " Siemens technique ", " Siemens's study course ", by adding up the second popular word " technology ", " study course " be positioned at the position of the first popular word " Siemens " periphery and/or the frequency of appearance, just can give " technology ", " study course " these two second popular words are weighted.Be weighted can to " product information " and " license " these two label words corresponding to " technology " simultaneously, be weighted to " product information " and " teaching " these two label words corresponding to " study course ", and due to " technology " and " study course " all corresponding " product information " this label word, so the weights of " product information " are from " technology " and " study course " relation with " Siemens ".
Step 103, the whole label words belonging to the second popular word to be sorted according to the weights of each label word.
For above-mentioned example, exactly the weights that " product information ", " license " and " teaching " obtain according to it are sorted.
Step 104, from the whole label words belonging to this second popular word, extract the high weight label word of predetermined number, as the spreading range of the first popular word.
Due in data source, appear at the second popular word One's name is legion of the first popular word periphery, and each second popular word one or more label word corresponding, so the label word One's name is legion that the first popular word is corresponding, from feasible angle, only select the label word of some, such as 10,20 label words, as the spreading range of the first popular word, these label words best embody the first popular word character.
Such as, for " Siemens " this first popular word, then from three label words " product information ", " license " and " teaching ", select " product information " and " teaching " these two vocabulary as spreading range.
Step 105, be positioned at popular word corresponding to each label word of the first popular word spreading range and arrange according to the weights order of popular word.
As mentioned above, popular word can one or more label word corresponding, and label word can one or more popular word corresponding conversely, is positioned at the same corresponding multiple popular word of each label word of the first popular word spreading range.Such as " product information " this label word is just corresponding " technology " and " study course " these two popular words, so just can sort according to the weights of " technology " and " study course ", the popular word that the popular word that weights are high is lower than weights, more can reflect the character of the first popular word in this label word.
Step 106, when input first popular word is as term, popular word corresponding for the label word being positioned at the first popular word spreading range to be shown.When representing, first arrange according to the weights order of label word, then corresponding in each label word popular word arranges according to respective weights again.
Step 107, from popular word corresponding to the label word in this spreading range, select the popular word needed, as the query expansion word of term.
Such as certain user wishes the technology learning Siemens, and this user does not understand the technology of Siemens at all, then first can determine data source, such as retrieve from forum data, then " Siemens " is inputted as term, then can obtain the vocabulary that some aspects of property the most relevant to " Siemens " are corresponding, include in the label word of such as " product information " " technology " and " study course ", include " study course " in the label word of " teaching ", user, according to the needs of oneself, just can increase " study course " as query expansion word.
If user worries that these query expansion words may have nothing to do with initial term, and when making a mistake, the source file information at popular word place that can be corresponding by the label word in spreading range shows, according to these source file information, user just can judge that whether these expansion words are relevant with initial term, and select the popular word of needs, as the query expansion word of term.
In order to realize above-mentioned flow process, an alternative embodiment of the invention additionally provides a kind of system of query expansion word, as shown in Figure 2, this system comprises tag unit 201, label word dictionary unit 202, weighted units 203, sequencing unit 204, input block 205, selection unit 206 and source file storage unit 207.
Wherein, tag unit arranges one or more label word to each popular word.
Label word dictionary unit stores the label word of popular word and correspondence.
Weighted units, according to the relation between the second popular word of the first popular word periphery, is weighted to each label word belonging to each second popular word and each second popular word.
Whole label words belonging to this second popular word sort according to the weights of each label word by sequencing unit, from the whole label words belonging to this second popular word, extract the high weight label word of predetermined number, as the spreading range of this first popular word, and the weights order of popular word corresponding for each label word being positioned at this first popular word spreading range according to popular word is arranged.
Input block inputs the first popular word as term.
The source file information at the popular word place that the label word in this spreading range of source file cell stores is corresponding.
Selection unit, from popular word corresponding to the label word in this spreading range, selects the popular word needed, as the query expansion word of term; Or select the popular word needed further according to this source file information, as the query expansion word of term.
Have employed technical solution of the present invention, by carrying out weights quantification to the relation between vocabulary, from minority term, closely-related term with it can be expanded, thus help user to build the retrieval formula of complete and accurate, improve Information Retrieval Efficiency.
Above-described embodiment is the present invention's preferably embodiment; but embodiments of the present invention are not restricted to the described embodiments; change, the modification done under other any does not deviate from Spirit Essence of the present invention and principle, substitute, combine, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.
Claims (10)
1. a method for query expansion word, is characterized in that, comprises the following steps:
The label word being no less than one is set to each popular word, forms label word dictionary;
According to the relation between the second popular word of the first popular word periphery, be weighted to each label word belonging to each second popular word;
Whole label words belonging to described second popular word are sorted according to the weights of each label word;
From the whole label words belonging to described second popular word, extract the high weight label word of predetermined number, as the spreading range of described first popular word;
When inputting the first popular word as term, popular word corresponding for the label word being positioned at described first popular word spreading range is shown;
From the popular word that the label word in described spreading range is corresponding, select the popular word needed, as the query expansion word of term.
2. the method for a kind of query expansion word according to claim 1, is characterized in that, further comprising the steps of:
According to the relation between the second popular word of the first popular word periphery, be weighted to each second popular word;
Be positioned at popular word corresponding to each label word of described first popular word spreading range to arrange according to the weights order of popular word.
3. the method for a kind of query expansion word according to claim 2, it is characterized in that, the frequency occurred according to the second popular word of the first popular word periphery and/or distance, be weighted to each label word belonging to each second popular word and each second popular word.
4. the method for a kind of query expansion word according to claim 1, is characterized in that, described label word dictionary adopts man-machine interaction mode to generate.
5. the method for a kind of query expansion word according to claim 1, is characterized in that, further comprising the steps of:
The source file information at popular word place corresponding for the label word in described spreading range is shown, selects the popular word needed according to described source file information, as the query expansion word of term.
6. the method for a kind of query expansion word according to claim 1 or 5, is characterized in that, the source file of retrieval comprises and is no less than a data source.
7. the method for a kind of query expansion word according to claim 6, is characterized in that, described data source is news, forum and/or microblogging.
8. the method for a kind of query expansion word according to claim 6, is characterized in that, described data source is the data in different technologies field or different business field.
9. a system for query expansion word, is characterized in that, comprises tag unit, label word dictionary unit, weighted units, sequencing unit, input block and selection unit, wherein,
Tag unit is used for arranging the label word being no less than to each popular word;
Label word dictionary unit is for storing the label word of popular word and correspondence;
Weighted units, for according to the relation between the second popular word of the first popular word periphery, is weighted to each label word belonging to each second popular word and each second popular word;
Sequencing unit is used for the whole label words belonging to described second popular word to sort according to the weights of each label word, from the whole label words belonging to described second popular word, extract the high weight label word of predetermined number, as the spreading range of described first popular word, for the weights order of popular word corresponding for each label word being positioned at described first popular word spreading range according to popular word is arranged;
Input block is for inputting the first popular word as term;
Selection unit is used for, from popular word corresponding to the label word in described spreading range, selecting the popular word needed, as the query expansion word of term.
10. the system of a kind of query expansion word according to claim 9, is characterized in that, also comprise source file storage unit, and described source file storage unit is for storing the source file information at popular word place corresponding to label word in described spreading range;
Described selection unit is used for the popular word selecting needs according to described source file information, as the query expansion word of term.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310231653.6A CN104239314B (en) | 2013-06-09 | 2013-06-09 | A kind of method and system of query expansion word |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310231653.6A CN104239314B (en) | 2013-06-09 | 2013-06-09 | A kind of method and system of query expansion word |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104239314A true CN104239314A (en) | 2014-12-24 |
CN104239314B CN104239314B (en) | 2018-01-19 |
Family
ID=52227405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310231653.6A Expired - Fee Related CN104239314B (en) | 2013-06-09 | 2013-06-09 | A kind of method and system of query expansion word |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104239314B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897290A (en) * | 2015-12-17 | 2017-06-27 | 中国移动通信集团上海有限公司 | A kind of method and device for setting up keyword models |
CN108228643A (en) * | 2016-12-21 | 2018-06-29 | 北京视联动力国际信息技术有限公司 | A kind of search method and system |
CN113742459A (en) * | 2021-11-05 | 2021-12-03 | 北京世纪好未来教育科技有限公司 | Vocabulary display method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100070506A1 (en) * | 2008-03-18 | 2010-03-18 | Korea Advanced Institute Of Science And Technology | Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall |
CN102375885A (en) * | 2011-10-21 | 2012-03-14 | 北京百度网讯科技有限公司 | Method and device for providing search suggestions corresponding to query sequence |
CN102622358A (en) * | 2011-01-27 | 2012-08-01 | 天脉聚源(北京)传媒科技有限公司 | Method and system for information searching |
-
2013
- 2013-06-09 CN CN201310231653.6A patent/CN104239314B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100070506A1 (en) * | 2008-03-18 | 2010-03-18 | Korea Advanced Institute Of Science And Technology | Query Expansion Method Using Augmented Terms for Improving Precision Without Degrading Recall |
CN102622358A (en) * | 2011-01-27 | 2012-08-01 | 天脉聚源(北京)传媒科技有限公司 | Method and system for information searching |
CN102375885A (en) * | 2011-10-21 | 2012-03-14 | 北京百度网讯科技有限公司 | Method and device for providing search suggestions corresponding to query sequence |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897290A (en) * | 2015-12-17 | 2017-06-27 | 中国移动通信集团上海有限公司 | A kind of method and device for setting up keyword models |
CN106897290B (en) * | 2015-12-17 | 2020-04-24 | 中国移动通信集团上海有限公司 | Method and device for establishing keyword model |
CN108228643A (en) * | 2016-12-21 | 2018-06-29 | 北京视联动力国际信息技术有限公司 | A kind of search method and system |
CN113742459A (en) * | 2021-11-05 | 2021-12-03 | 北京世纪好未来教育科技有限公司 | Vocabulary display method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104239314B (en) | 2018-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103116657B (en) | A kind of individuation search method of network teaching resource | |
CN103425687A (en) | Retrieval method and system based on queries | |
CN107368468A (en) | A kind of generation method and system of O&M knowledge mapping | |
CN102968465B (en) | Network information service platform and the search service method based on this platform thereof | |
CN104182517A (en) | Data processing method and data processing device | |
CN106709024A (en) | Data table source-tracing method and device based on consanguinity analysis | |
CN109933774A (en) | Method for recognizing semantics, device storage medium and electronic device | |
Babu et al. | Improving Quality of Content Based Image Retrieval with Graph Based Ranking | |
CN108831442A (en) | Point of interest recognition methods, device, terminal device and storage medium | |
CN105701133B (en) | Address input method and equipment | |
CN110263021B (en) | Theme library generation method based on personalized label system | |
CN106598919A (en) | Document generation method and device | |
CN103092966A (en) | Vocabulary mining method and device | |
CN108509545A (en) | A kind of comment processing method and system of article | |
CN110209780A (en) | A kind of question template generation method, device, server and storage medium | |
CN105426392A (en) | Collaborative filtering recommendation method and system | |
CN104239314A (en) | Search word expanding method and system | |
CN110457706A (en) | Interest point name preference pattern training method, application method, device and storage medium | |
CN107451617A (en) | One kind figure transduction semisupervised classification method | |
CN104636324B (en) | Topic source tracing method and system | |
CN103455964A (en) | Case clue analyzing system and method based on case information | |
KR20120079630A (en) | Method and system for indexing and searching in multi-modality data | |
CN115017251B (en) | Standard mapping map establishing method and system for smart city | |
CN109543045A (en) | A kind of methods of exhibiting of whole world industrial chain | |
CN105095385A (en) | Method and device for outputting retrieval result |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 300090 Tianjin City Huayuan Industrial Zone Rong Yuan Road No. 1 North B room 322-323 Applicant after: Tianjin mass information technology Limited by Share Ltd Address before: Beijing version information No. 3 port 100029 Beijing city Xicheng District Yumin Road two Applicant before: Tianjin Hylanda Information Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180119 Termination date: 20200609 |