CN109033244B - Search result ordering method and device - Google Patents
Search result ordering method and device Download PDFInfo
- Publication number
- CN109033244B CN109033244B CN201810729232.9A CN201810729232A CN109033244B CN 109033244 B CN109033244 B CN 109033244B CN 201810729232 A CN201810729232 A CN 201810729232A CN 109033244 B CN109033244 B CN 109033244B
- Authority
- CN
- China
- Prior art keywords
- candidate
- question
- search
- answer
- relevance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a search result ordering method and device. The method comprises the following steps: obtaining a user request and candidate results from the first sequencing result, wherein the user request comprises search questions, and the candidate results comprise candidate questions and candidate answers corresponding to each candidate question; acquiring a first correlation index of a search problem and a candidate problem; acquiring a second correlation index of the search question and the candidate answer; and reordering the first ordering result according to the first relevance index and the second relevance index to obtain a second ordering result. Because more specific relevance indexes are added in the second sorting, the sorting result is not limited by a single sorting method, and accurate answer sorting and specific questions processing can be provided better and more conveniently.
Description
Technical Field
The invention relates to the technical field of automatic question answering, in particular to a search result ordering method and a search result ordering device.
Background
With the rapid development of the internet, a great deal of search needs related to aspects of medical knowledge have emerged. For these search requirements, medical intelligent question and answer services were derived.
In medical automatic question answering, because of the specificity related to medical treatment and the rigor of answers, the existing main method is to give answers by carrying out relevance ranking on the existing answer contents. However, these methods relying on single relevance ordering lack a comprehensive measure of question-answer relevance due to one-sidedness, limitation, etc., and it is difficult to give an accurate ordering result. And the question-answering methods in other fields cannot be directly expanded to the medical field.
The scheme (1) is used for sequencing based on the questions and the information of the questions, the key information contained in the answers is ignored, and the expected good sequencing result is very dependent on the goodness of the questions and answers in the original question-answer library.
The scheme (2) is used for sorting based on the information of the questions and the answers, the key information contained in the questions is omitted, in the medical field, the questions are slightly deviated, the answers can be completely different, and therefore the sorting can be inaccurate.
The scheme (3) is ranked based on a question and answer combination method, although the information of the question and the answer is included, one ranking method has a strong emphasis on ranking results, and a relatively complex scene cannot be dealt with in medical intelligent question answering.
Disclosure of Invention
The embodiment of the invention provides a search result ordering method and a search result ordering device, which are used for solving one or more technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a search result ranking method, including:
obtaining a user request and candidate results from the first sequencing result, wherein the user request comprises search questions, and the candidate results comprise candidate questions and candidate answers corresponding to each candidate question;
acquiring a first correlation index of the search question and the candidate question;
acquiring a second relevance index of the search question and the candidate answer;
and reordering the first ordering result according to the first relevance index and the second relevance index to obtain a second ordering result.
With reference to the first aspect, in a first implementation manner of the first aspect, the reordering the first sorting result according to the first correlation index and the second correlation index to obtain a second sorting result includes:
determining a candidate question-answer group included in a high-priority list according to the first relevance index;
determining a candidate question-answer group included in a low-priority list according to the second relevance index;
and merging the candidate question-answer groups in the high-priority list and the candidate question-answer groups in the low-priority list according to the sequence of the high priority before and the low priority after to obtain the second sequencing result.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the determining, according to the first relevance indicator, a candidate question-answer group included in the high-priority list includes:
if at least one first relevance index of a candidate set of questions and answers is above a set threshold, adding the candidate set of questions and answers to a high priority list.
With reference to the first implementation manner of the first aspect, in a third implementation manner of the first aspect, the determining, according to the second relevance indicator, a candidate question-answer group included in the low-priority list includes:
if at least one second correlation index of a candidate set of questions and answers is higher than a set threshold, adding the candidate set of questions and answers to a low priority list.
With reference to the first aspect, in a fourth implementation manner of the first aspect, the obtaining a first relevance indicator of the search question and the candidate question includes at least one of the following manners:
calculating word-level TF-IDF similarity of the search question and the candidate question;
calculating the character level TF-IDF similarity of the search question and the candidate question;
calculating the similarity of the Chinese character pinyin level TF-IDF of the search problem and the candidate problem;
calculating a depth problem similarity of the search problem and the candidate problem;
calculating word vector similarity of the search question and the candidate question;
calculating potential semantic index similarities of the search problem and the candidate problem.
With reference to the first aspect, in a fifth implementation manner of the first aspect, the obtaining a second relevance indicator of the search question and the candidate answer includes at least one of the following manners:
calculating a deep question-answer correlation of the search question and the candidate answer;
calculating a word-level TF-IDF relevance of the search question and the candidate answer;
calculating a character-level TF-IDF relevance of the search question to the candidate answer;
calculating the correlation between the search question and the Chinese character pinyin level TF-IDF of the candidate answer;
calculating a word vector relevance of the search question and the candidate answer;
potential semantic index relevance of the search question to the candidate answer is calculated.
In a second aspect, an embodiment of the present invention provides a search result ranking apparatus, including:
the system comprises a first sequencing module, a second sequencing module and a third sequencing module, wherein the first sequencing module is used for acquiring a user request and a candidate result from a first sequencing result, the user request comprises a search question, and the candidate result comprises a candidate question and a candidate answer corresponding to each candidate question;
a first correlation module, configured to obtain a first correlation index between the search question and the candidate question;
the second correlation module is used for acquiring a second correlation index of the search question and the candidate answer;
and the second sorting module is used for re-sorting the first sorting result according to the first relevance index and the second relevance index to obtain a second sorting result.
With reference to the second aspect, in a first implementation manner of the second aspect, the second sorting module includes:
the high-priority submodule is used for determining a candidate question-answer group included in a high-priority list according to the first correlation index;
the low-priority submodule is used for determining a candidate question-answer group included in a low-priority list according to the second correlation index;
and the merging and sorting submodule is used for merging the candidate question-answer groups in the high-priority list and the candidate question-answer groups in the low-priority list according to the sequence that the high priority is before and the low priority is after, so as to obtain the second sorting result.
With reference to the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the high-priority sub-module is further configured to add a candidate question-answer group into the high-priority list if at least one first correlation index of the candidate question-answer group is higher than a set threshold.
With reference to the first implementation manner of the second aspect, in a third implementation manner of the second aspect, the low-priority sub-module is further configured to add a candidate question-answer group to a low-priority list if at least one second correlation index of the candidate question-answer group is higher than a set threshold.
With reference to the second aspect, in a fourth implementation manner of the second aspect, the embodiment of the present invention includes at least one of the following sub-modules:
a first word-level sub-module for calculating word-level TF-IDF similarity of the search question and the candidate question;
a first character level sub-module for calculating character level TF-IDF similarity of the search question to the candidate question;
the first Chinese character pinyin level submodule is used for calculating the similarity of the Chinese character pinyin levels TF-IDF of the search problem and the candidate problem;
a depth problem sub-module for calculating a depth problem similarity of the search problem and the candidate problem;
a first word vector sub-module for calculating word vector similarity of the search problem and the candidate problem;
a first potential semantic indexing sub-module for calculating potential semantic index similarities of the search problem and the candidate problems.
With reference to the second aspect, in a fifth implementation manner of the second aspect, the embodiment of the present invention includes at least one of the following sub-modules:
the deep question-answer module is used for calculating the deep question-answer correlation between the search question and the candidate answer;
a second word-level sub-module for calculating word-level TF-IDF correlations of the search question and the candidate answers;
a second character level sub-module for calculating a character level TF-IDF correlation of the search question with the candidate answer;
the second Chinese character pinyin level submodule is used for calculating the correlation between the search question and the Chinese character pinyin level TF-IDF of the candidate answer;
a second word vector submodule for calculating word vector relevance of the search question and the candidate answer;
and the second potential semantic indexing submodule is used for calculating the potential semantic indexing correlation of the search question and the candidate answer.
In a third aspect, an embodiment of the present invention provides a search result ranking apparatus, where functions of the apparatus may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the search result ranking apparatus includes a processor and a memory, the memory is used for storing programs supporting the search result ranking apparatus to execute the search result ranking method, and the processor is configured to execute the programs stored in the memory. The search result ranking apparatus may further comprise a communication interface for communicating the search result ranking apparatus with other devices or a communication network.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for a search result ranking apparatus, which includes a program for executing the search result ranking method.
One of the above technical solutions has the following advantages or beneficial effects: the reordering technology based on the main ordering can effectively avoid the defects of singleness of concerned aspects, one-sidedness and limitation of the extracted correlation characteristics.
Another technical scheme in the above technical scheme has the following advantages or beneficial effects: reordering technology is a core module in medical intelligence question answering. And a reordering module is added, so that the medical intelligent question-answer ordering result is further optimized. In other words, on the basis of the existing sorted answers, the positions of part of the results are adjusted, so that some more suitable answer positions are moved forward, and some unsuitable answer positions are moved backward, and the aim of optimizing the sorted results is fulfilled.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
FIG. 1 is a flowchart of a search result ranking method according to an embodiment of the present invention.
FIG. 2 is a flowchart of a search result ranking method according to an embodiment of the present invention.
FIG. 3 is a flowchart of a search result ranking method according to an embodiment of the present invention.
Fig. 4 is a block diagram of a search result ranking apparatus according to an embodiment of the present invention.
Fig. 5 is a block diagram of a search result ranking apparatus according to an embodiment of the present invention.
Fig. 6 is a block diagram of a search result ranking apparatus according to an embodiment of the present invention.
Detailed Description
In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
FIG. 1 is a flowchart of a search result ranking method according to an embodiment of the present invention.
As shown in fig. 1, the search result ranking method may include the steps of:
step S110, obtaining a user request and candidate results from the first ranking result, where the user request includes search questions, and the candidate results include candidate questions and candidate answers corresponding to each candidate question.
And step S120, acquiring a first correlation index of the search question and the candidate question.
Step S130, a second correlation index of the search question and the candidate answer is obtained.
Step S140, according to the first relevance index and the second relevance index, the first sorting result is reordered to obtain a second sorting result.
In the field of intelligent question answering, a user can input a question (namely a search question) to be asked in a search engine according to own requirements. For example, the candidate results searched for based on the search question may include a number of question-answer groups (candidate questions and their corresponding candidate answers). These candidate question-answer sets are then initially ranked in a number of ways, such as: 1) preliminary ranking based on question and manner of question. And coding the candidate questions and the search questions, and sequencing according to the similarity of the candidate questions and the search questions. 2) Preliminary ranking is based on the way the questions and answers are made. And coding the candidate answers and the search questions, and sequencing according to the similarity of the candidate answers and the search questions. 3) Based on the way of merging the questions and the answers. And coding the candidate questions, the candidate answers and the search questions, and sequencing according to the comprehensive similarity.
After the first sorting, a first sorting result may be obtained, from which the user request and a plurality of candidate results may be obtained. The user request may include a search question input by the user, and each candidate result may include one candidate question and one or more corresponding candidate answers.
For a plurality of candidate results in the first ranking result, a first relevance index of the search question and the candidate question and a second relevance index of the search question and the candidate answer can be calculated, and the plurality of candidate results are ranked by combining the two indexes, so that a ranking result which is more relevant to the search question and accurate is obtained.
In one possible implementation, as shown in fig. 2, step S140 includes:
and step S210, determining a candidate question-answer group included in the high-priority list according to the first relevance index.
And step S220, determining a candidate question-answer group included in the low-priority list according to the second relevance index.
Step S230, merging the candidate question-answer groups in the high-priority list and the low-priority list according to the order of the high-priority preceding and the low-priority succeeding, so as to obtain the second sorting result.
In one possible implementation, step S120 includes at least one of the following ways:
calculating the similarity of the word level TF-IDF (Term Frequency-Inverse text Frequency) of the search problem and the candidate problem;
calculating the similarity of the character level TF-IDF of the search problem and the candidate problem;
calculating the similarity of the Chinese character pinyin level TF-IDF of the search problem and the candidate problem;
calculating the depth problem similarity of the search problem and the candidate problem;
calculating word vector similarity of the search problem and the candidate problem;
and calculating the potential semantic index similarity of the search problem and the candidate problem.
For example, the search question and the candidate question may be segmented, and then word level TF-IDF similarity may be calculated based on the segmentation results. The search question and the candidate question may be divided into words and then the character level TF-IDF similarity is calculated based on the result of the division. The search question and candidate question pinyins can be respectively obtained, and then the similarity of the Chinese character pinyin level TF-IDF is calculated according to the pinyins.
The method for calculating the similarity of the pinyin TF-IDF of the Chinese character has the following advantages:
pinyin is one of the important differences between chinese and english, and each chinese text uniquely corresponds to a sequence of pinyin. Most users use the pinyin input method as a Chinese character input tool, namely, firstly input pinyin corresponding to a Chinese character and then select from a plurality of Chinese characters corresponding to the pinyin. This operation results in a possibility that the user may make a wrong selection. If the same pinyin corresponds to different Chinese characters, for example, the pinyin for "life" and "fire" are both "shenghuo", the user may select the wrong homophone. In addition, due to the common use of pinyin input, sometimes a user only knows the pronunciation of a certain character but does not know the specific Chinese character writing method, and the accuracy of Chinese character input is also influenced. In a medical intelligence question-and-answer scenario, various medical search requests entered by all internet users are often not normative text and may include many literal errors. Therefore, by using pinyin to represent the text and then calculating the text similarity, the influence caused by wrongly written characters can be reduced to a certain extent.
The pinyin TF-IDF for a chinese character may be calculated at the character level. For example, for a text S that includes Chinese characters, the Chinese characters in S are converted to phonetic representations (without regard to pitch), while the non-Chinese characters in S retain the original characters. Each individual hanzi pinyin is calculated as an individual character. For example, the Chinese character text "cough with profuse sputum" is converted into four characters of "ke", "sou", "tan" and "duo". Then, the TF-IDF similarity of the pinyin of the Chinese characters can be calculated by means of IDF characteristics of characters, TF-IDF characteristics of texts, cosine similarity and the like.
Depth problem similarity may also be referred to as depth QQ similarity. When deep QQ correlation is achieved, a number of other questions Q' similar to each question Q may be obtained in a manner of problem clustering, etc., and a class-ordered learning (pairwisel learning) pattern is used for training. And then inputting the search question and the candidate question into the result of the model depth QQ similarity obtained by training.
In one possible implementation, step S130 includes at least one of the following ways:
calculating the deep question-answer correlation between the search question and the candidate answer;
calculating the word level TF-IDF correlation between the search question and the candidate answer;
calculating the character level TF-IDF correlation of the search question and the candidate answer;
calculating the correlation between the Chinese character pinyin level TF-IDF of the search question and the candidate answer;
calculating the word vector correlation of the search question and the candidate answer;
potential semantic index relevance of the search question to the candidate answer is calculated.
Wherein, the deep QA (question and answer) correlation can dig out the semantic relation between the search question Q and the candidate answer A of the user. And calculating the correlation between the search question Q and the candidate answer A by utilizing deep learning to adjust a sequencing result obtained according to the similarity of the question and the question.
For example, in a medical intelligence question-and-answer scenario, except for matching user search question QuAnd candidate problem QiIn addition to text similarity, ranking accuracy may be further improved by matching the associations between questions and answers. On the one hand, the two questions may be completely different in textual description, but semantically identical or very similar. If the answers to these two questions are the same or very similar, even if QuAnd QiCan not complete matching and can be according to QuAnd AiAre matched with each other. On the other hand, the question-answer group in the question-answer resource library also has the condition of wrong matching, the cost for making complete matching of questions and answers in the question-answer resource library difficult or achieving complete matching is very high, and Q in the library can be causediCorresponding to AiAnd are not strictly matched. In this case, the ranking results can also be fine-tuned by deep QA correlation.
In one possible implementation, determining the set of candidate questions and answers included in the high priority list according to the first relevance indicator includes:
if at least one first relevance index of a candidate set of questions and answers is above a set threshold, adding the candidate set of questions and answers to a high priority list.
In one possible implementation, determining the set of candidate questions and answers included in the low-priority list according to the second relevance indicator includes:
if at least one second correlation index of a candidate set of questions and answers is higher than a set threshold, adding the candidate set of questions and answers to a low priority list.
Wherein each correlation index may have a set threshold. The thresholds for different correlation indicators may be different. The first relevance indicator mainly reflects the textual similarity of the question to the question. The second relevance index mainly reflects the relevance of the question and the answer. In the embodiment of the present invention, the number and types of the required first correlation indexes and second correlation indexes may be selected according to an actual application scenario. The relevance indicators between the search question and the candidate set of questions and thresholds are then compared to sort the candidate set of questions into different priority lists.
In one example, the various indicators may be compared in an order. Firstly, comparing a certain index, putting the question-answer groups meeting the conditions into a corresponding priority list, comparing the question-answer groups not meeting the conditions according to another index, and so on.
For example, if 10 question-answer groups out of 100 question-answer groups have similarity to the word level of the search question higher than a set threshold, the 10 question-answer groups are added to the high-priority result list. Then, the similarity between the remaining 90 question groups and the character level of the search question is compared with a set threshold, and then 20 groups are obtained from the similarity and added into a high-priority list. And so on, and will not be described in detail.
In another example, multiple metrics may be compared separately and then de-duplicated.
For example, 100 question-answer groups are compared with the similarity at the word level of the search question, and a question-answer group having a similarity at the word level higher than a set threshold is selected from the question-answer groups. And comparing the similarity of the 100 question-answer groups and the character level of the search question, and selecting 40 question-answer groups with the similarity higher than a set threshold value from the similarity. The 40 question-answer groups are deduplicated to obtain 30 question-answer groups, and the 30 question-answer groups are added into the high-priority list (or the high-priority list is added firstly and then deduplicated).
The ranking result is further optimized and adjusted through multiple correlation indexes, so that some more suitable answer positions can be moved forward, and the unsuitable answer positions can be moved backward, and the ranking result is optimized.
In one example, Q is a search problem based on the aboveuAnd candidate problem QiSimilarity of, and search question QuAnd candidate problem AiUsing the method shown in FIG. 3, each question-answer group (Q) in the results of the previous sorting is groupedi,Ai) Sequentially processing from front to back according to the sequence, and comprising the following steps:
step S301, calculating QuAnd QiIf the similarity is above a certain threshold, will be (Q)i,Ai) Add to high priority results list; if the similarity is lower than a certain threshold, discarding the question-answer group; otherwise, the process proceeds to step S302.
For example, two thresholds Y1, Y2 may be set for word level TF-IDF similarity, Y1 being greater than Y2. If Q of the question-answer groupuAnd QiIs greater than Y1, then put on a high priority list. If Q of the question-answer groupuAnd QiIf the word-level TF-IDF similarity is less than Y2, the question-answer group is discarded, the question-answer group without correlation can be eliminated, and the follow-up comparison is reducedThe number of the cells. Other relevance indicators may be compared in the question-answer group between Y1 and Y2. The threshold settings of the various correlation indicators in the example are similar to the comparison and are not repeated below.
Step S302, calculating QuAnd QiIf the similarity is above a certain threshold, will be (Q)i,Ai) Add to high priority results list; if the similarity is lower than a certain threshold, discarding the question-answer group; otherwise, the process proceeds to step S303.
Step S303, calculating QuAnd QiIf the similarity of the Chinese character pinyin TF-IDF is higher than a certain threshold value, (Q) isi,Ai) Add to high priority results list; if the similarity is lower than a certain threshold, discarding the question-answer group; otherwise, the process proceeds to step S304.
Step S304, calculating QuAnd QiDepth QQ similarity, if similarity is above a certain threshold then (Q) will bei,Ai) Add to high priority results list; if the similarity is lower than a certain threshold, discarding the question-answer group; otherwise, the process proceeds to step S305.
Step S305, calculating QuAnd QiIf the similarity is above a certain threshold, then (Q) will bei,Ai) Add to high priority results list; if the similarity is lower than a certain threshold, discarding the question-answer group; otherwise, the process proceeds to step S306.
Step S306, calculating QuAnd QiLSI (Latent Semantic Indexing) similarity of (Q), if the similarity is above a certain thresholdi,Ai) Add to high priority results list; if the similarity is lower than a certain threshold, discarding the question-answer group; otherwise, the process proceeds to step S307.
Step S307, calculating QuAnd AiIf the maximum depth correlation among the candidate results (without entries into the high priority result list) is above a certain threshold, then add the candidate result to the low priority result list; otherwise, setting the low-priority result list to be null; step S308 is performed.
And S308, combining the two result lists according to the principle that the high priority is before and the low priority is after, wherein the combined sorting result is the final sorting result.
It should be noted that the sequence of steps 301 to 308 may be adjusted as needed, and the similarity and the correlation between the user search question and the candidate question, and the similarity and the correlation between the user search question and the candidate answer may be re-ordered by selecting different indexes according to the actual application scenario, which is not limited in the embodiment of the present invention.
The embodiment of the invention adds the reordering method after the main ordering method, and can effectively solve the problem that the ordering result obtained by the main ordering method is incomplete (such as one-sidedness, limitation and the like) and accurate ordering is difficult to give in a medical intelligent question-answering scene. A plurality of specific relevance indexes can be added in the reordering, so that more factors are integrated in sorting and sequencing results, accurate answer sequencing can be given better and more conveniently, and some specific medical problems are processed.
Fig. 4 is a block diagram of a search result ranking apparatus according to an embodiment of the present invention. As shown in fig. 4, the apparatus includes:
a first sorting module 41, configured to obtain a user request and candidate results from a first sorting result, where the user request includes search questions, and the candidate results include candidate questions and candidate answers corresponding to each candidate question;
a first correlation module 42, configured to obtain a first correlation index between the search question and the candidate question;
a second relevance module 43, configured to obtain a second relevance index of the search question and the candidate answer;
and a second sorting module 45, configured to reorder the first sorting result according to the first relevance indicator and the second relevance indicator, so as to obtain a second sorting result.
In a possible implementation manner, the second sorting module 45 further includes:
a high priority sub-module 451 for determining a set of candidate questions and answers included in a high priority list according to the first relevance indicator;
a low priority sub-module 452 configured to determine a candidate question-answer group included in the low priority list according to the second relevance indicator;
the merge sorting submodule 453 is configured to merge the candidate question-answer groups in the high-priority list and the candidate question-answer groups in the low-priority list according to an order that the high priority is before and the low priority is after, so as to obtain the second sorting result.
In one possible implementation, the high priority sub-module 451 is further configured to add a candidate set of questions to the high priority list if at least one first correlation index of the candidate set of questions is above a set threshold.
In one possible implementation, the low priority sub-module 452 is further configured to add a candidate set of questions to the low priority list if at least one second correlation index of the candidate set of questions is above a set threshold.
In one possible implementation, the first correlation module 42 includes at least one of the following sub-modules:
a first word-level sub-module for calculating word-level TF-IDF similarity of the search question and the candidate question;
a first character level sub-module for calculating character level TF-IDF similarity of the search question to the candidate question;
the first Chinese character pinyin level submodule is used for calculating the similarity of the Chinese character pinyin levels TF-IDF of the search problem and the candidate problem;
a depth problem sub-module for calculating a depth problem similarity of the search problem and the candidate problem;
a first word vector sub-module for calculating word vector similarity of the search problem and the candidate problem;
a first potential semantic indexing sub-module for calculating potential semantic index similarities of the search problem and the candidate problems.
In one possible implementation, the second relevance module 43 includes at least one of the following sub-modules:
the deep question-answer module is used for calculating the deep question-answer correlation between the search question and the candidate answer;
a second word-level sub-module for calculating word-level TF-IDF correlations of the search question and the candidate answers;
a second character level sub-module for calculating a character level TF-IDF correlation of the search question with the candidate answer;
the second Chinese character pinyin level submodule is used for calculating the correlation between the search question and the Chinese character pinyin level TF-IDF of the candidate answer;
a second word vector submodule for calculating word vector relevance of the search question and the candidate answer;
a second latent semantic indexing sub-module for calculating a latent semantic index relevance of the search question to the candidate answer
The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.
Fig. 6 is a block diagram of a search result sorting apparatus according to an embodiment of the present invention. As shown in fig. 6, the apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920 implements the search result ranking method in the above embodiments when executing the computer program. The number of the memory 910 and the processor 920 may be one or more.
The device also includes:
and a communication interface 930 for communicating with an external device to perform data interactive transmission.
If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.
An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (12)
1. A method for ranking search results, comprising:
acquiring a user request and a candidate result from a first sequencing result, wherein the first sequencing result is obtained by primarily sequencing a candidate question-answer group in multiple modes; the user request comprises search questions, and the candidate result comprises candidate questions and candidate answers corresponding to the candidate questions;
acquiring a first correlation index of the search question and the candidate question;
acquiring a second relevance index of the search question and the candidate answer;
reordering the first ordering result according to the first relevance index and the second relevance index to obtain a second ordering result;
wherein, according to the first correlation index and the second correlation index, reordering the first ordering result to obtain a second ordering result, comprising:
determining a candidate question-answer group included in a high-priority list according to the first relevance index; wherein the first relevance indicator comprises a plurality of relevance indicators; the determining of the set of candidate questions and answers included in the high priority list includes: for the plurality of relevance indicators, adding a set of candidate questions and answers with relevance above a high threshold to the high priority list and discarding a set of candidate questions and answers with relevance below a low threshold;
determining a candidate question-answer group included in a low-priority list according to the second relevance index;
and merging the candidate question-answer groups in the high-priority list and the candidate question-answer groups in the low-priority list according to the sequence of the high priority before and the low priority after to obtain the second sequencing result.
2. The method of claim 1, wherein determining the set of question-answers candidates included in the high-priority list based on the first relevance metric comprises:
if at least one first relevance index of a candidate set of questions and answers is above a set threshold, adding the candidate set of questions and answers to a high priority list.
3. The method of claim 1, wherein determining the set of question-answers candidates included in the low-priority list based on the second relevance metric comprises:
if at least one second correlation index of a candidate set of questions and answers is higher than a set threshold, adding the candidate set of questions and answers to a low priority list.
4. The method of claim 1, wherein obtaining a first relevance indicator for the search question and the candidate question comprises at least one of:
calculating word-level TF-IDF similarity of the search question and the candidate question;
calculating the character level TF-IDF similarity of the search question and the candidate question;
calculating the similarity of the Chinese character pinyin level TF-IDF of the search problem and the candidate problem;
calculating a depth problem similarity of the search problem and the candidate problem;
calculating word vector similarity of the search question and the candidate question;
calculating potential semantic index similarities of the search problem and the candidate problem.
5. The method of claim 1, wherein obtaining a second relevance indicator for the search question and the candidate answer comprises at least one of:
calculating a deep question-answer correlation of the search question and the candidate answer;
calculating a word-level TF-IDF relevance of the search question and the candidate answer;
calculating a character-level TF-IDF relevance of the search question to the candidate answer;
calculating the correlation between the search question and the Chinese character pinyin level TF-IDF of the candidate answer;
calculating a word vector relevance of the search question and the candidate answer;
potential semantic index relevance of the search question to the candidate answer is calculated.
6. A search result ranking apparatus, comprising:
the first sequencing module is used for acquiring a user request and a candidate result from a first sequencing result, wherein the first sequencing result is obtained by primarily sequencing a candidate question-answer group in multiple modes; the user request comprises search questions, and the candidate result comprises candidate questions and candidate answers corresponding to the candidate questions;
a first correlation module, configured to obtain a first correlation index between the search question and the candidate question;
the second correlation module is used for acquiring a second correlation index of the search question and the candidate answer;
the second sorting module is used for re-sorting the first sorting result according to the first relevance index and the second relevance index to obtain a second sorting result;
wherein the second sorting module comprises:
the high-priority submodule is used for determining a candidate question-answer group included in a high-priority list according to the first correlation index; wherein the first relevance indicator comprises a plurality of relevance indicators; the determining of the set of candidate questions and answers included in the high priority list includes: for the plurality of relevance indicators, adding a set of candidate questions and answers with relevance above a high threshold to the high priority list and discarding a set of candidate questions and answers with relevance below a low threshold;
the low-priority submodule is used for determining a candidate question-answer group included in a low-priority list according to the second correlation index;
and the merging and sorting submodule is used for merging the candidate question-answer groups in the high-priority list and the candidate question-answer groups in the low-priority list according to the sequence that the high priority is before and the low priority is after, so as to obtain the second sorting result.
7. The apparatus of claim 6 wherein the high priority sub-module is further configured to add a set of candidate questions to a high priority list if at least one first correlation index of the set of candidate questions is above a set threshold.
8. The apparatus of claim 6 wherein the low priority sub-module is further configured to add a set of candidate questions to a low priority list if at least one second correlation index of the set of candidate questions is above a set threshold.
9. The apparatus of claim 6, wherein the first correlation module comprises at least one of the following sub-modules:
a first word-level sub-module for calculating word-level TF-IDF similarity of the search question and the candidate question;
a first character level sub-module for calculating character level TF-IDF similarity of the search question to the candidate question;
the first Chinese character pinyin level submodule is used for calculating the similarity of the Chinese character pinyin levels TF-IDF of the search problem and the candidate problem;
a depth problem sub-module for calculating a depth problem similarity of the search problem and the candidate problem;
a first word vector sub-module for calculating word vector similarity of the search problem and the candidate problem;
a first potential semantic indexing sub-module for calculating potential semantic index similarities of the search problem and the candidate problems.
10. The apparatus of claim 6, wherein the second relevance module comprises at least one of the following sub-modules:
the deep question-answer module is used for calculating the deep question-answer correlation between the search question and the candidate answer;
a second word-level sub-module for calculating word-level TF-IDF correlations of the search question and the candidate answers;
a second character level sub-module for calculating a character level TF-IDF correlation of the search question with the candidate answer;
the second Chinese character pinyin level submodule is used for calculating the correlation between the search question and the Chinese character pinyin level TF-IDF of the candidate answer;
a second word vector submodule for calculating word vector relevance of the search question and the candidate answer;
and the second potential semantic indexing submodule is used for calculating the potential semantic indexing correlation of the search question and the candidate answer.
11. An apparatus for ranking search results, the apparatus comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.
12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810729232.9A CN109033244B (en) | 2018-07-05 | 2018-07-05 | Search result ordering method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810729232.9A CN109033244B (en) | 2018-07-05 | 2018-07-05 | Search result ordering method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109033244A CN109033244A (en) | 2018-12-18 |
CN109033244B true CN109033244B (en) | 2020-10-16 |
Family
ID=65522449
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810729232.9A Active CN109033244B (en) | 2018-07-05 | 2018-07-05 | Search result ordering method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033244B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851484A (en) * | 2019-11-13 | 2020-02-28 | 北京香侬慧语科技有限责任公司 | Method and device for obtaining multi-index question answers |
CN110825864A (en) * | 2019-11-13 | 2020-02-21 | 北京香侬慧语科技有限责任公司 | Method and device for obtaining answers to questions |
CN113761084B (en) * | 2020-06-03 | 2023-08-08 | 北京四维图新科技股份有限公司 | POI search ranking model training method, ranking device, method and medium |
CN112784600B (en) * | 2021-01-29 | 2024-01-16 | 北京百度网讯科技有限公司 | Information ordering method, device, electronic equipment and storage medium |
CN115552393A (en) | 2021-04-29 | 2022-12-30 | 京东方科技集团股份有限公司 | Question and answer processing method and device, electronic equipment and computer readable storage medium |
CN113326420B (en) | 2021-06-15 | 2023-10-27 | 北京百度网讯科技有限公司 | Question retrieval method, device, electronic equipment and medium |
CN115203598B (en) * | 2022-07-20 | 2023-09-19 | 贝壳找房(北京)科技有限公司 | Information ordering method in real estate field, electronic equipment and storage medium |
CN116013488B (en) * | 2023-03-27 | 2023-06-02 | 中国人民解放军总医院第六医学中心 | Intelligent security management system for medical records with self-adaptive data rearrangement function |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8412514B1 (en) * | 2005-10-27 | 2013-04-02 | At&T Intellectual Property Ii, L.P. | Method and apparatus for compiling and querying a QA database |
CN108153876A (en) * | 2017-12-26 | 2018-06-12 | 爱因互动科技发展(北京)有限公司 | Intelligent answer method and system |
CN108170739A (en) * | 2017-12-18 | 2018-06-15 | 深圳前海微众银行股份有限公司 | Problem matching process, terminal and computer readable storage medium |
-
2018
- 2018-07-05 CN CN201810729232.9A patent/CN109033244B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8412514B1 (en) * | 2005-10-27 | 2013-04-02 | At&T Intellectual Property Ii, L.P. | Method and apparatus for compiling and querying a QA database |
CN108170739A (en) * | 2017-12-18 | 2018-06-15 | 深圳前海微众银行股份有限公司 | Problem matching process, terminal and computer readable storage medium |
CN108153876A (en) * | 2017-12-26 | 2018-06-12 | 爱因互动科技发展(北京)有限公司 | Intelligent answer method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109033244A (en) | 2018-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109033244B (en) | Search result ordering method and device | |
CN111382255B (en) | Method, apparatus, device and medium for question-answering processing | |
US10460029B2 (en) | Reply information recommendation method and apparatus | |
CN108549656B (en) | Statement analysis method and device, computer equipment and readable medium | |
CN109726274B (en) | Question generation method, device and storage medium | |
EP2930628A1 (en) | Searching method, client and server | |
CN109885180B (en) | Error correction method and apparatus, computer readable medium | |
CN114757176A (en) | Method for obtaining target intention recognition model and intention recognition method | |
CN109388634B (en) | Address information processing method, terminal device and computer readable storage medium | |
US20160188569A1 (en) | Generating a Table of Contents for Unformatted Text | |
CN112559709A (en) | Knowledge graph-based question and answer method, device, terminal and storage medium | |
CN113793663A (en) | Medical data processing method and system | |
CN109657043B (en) | Method, device and equipment for automatically generating article and storage medium | |
CN114528391A (en) | Method, device and equipment for training question-answer pair scoring model and storage medium | |
CN111931480B (en) | Text main content determining method and device, storage medium and computer equipment | |
KR20190090636A (en) | Method for automatically editing pattern of document | |
CN115310436A (en) | Document outline extraction method and device, electronic equipment and storage medium | |
CN109508390B (en) | Input prediction method and device based on knowledge graph and electronic equipment | |
CN111611781A (en) | Data labeling method, question answering method, device and electronic equipment | |
CN114706841B (en) | Query content library construction method and device, electronic equipment and readable storage medium | |
CN115563242A (en) | Automobile information screening method and device, electronic equipment and storage medium | |
CN112905026A (en) | Method, device, storage medium and computer equipment for displaying word suggestions | |
CN112509692A (en) | Method, apparatus, electronic device and storage medium for matching medical expressions | |
CN117556006B (en) | Standard problem determining method and device, electronic equipment and storage medium | |
CN110941765A (en) | Search intention identification method, information search method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |