Nothing Special   »   [go: up one dir, main page]

CN104750819B - The Biomedical literature search method and system of a kind of word-based grading sorting algorithm - Google Patents

The Biomedical literature search method and system of a kind of word-based grading sorting algorithm Download PDF

Info

Publication number
CN104750819B
CN104750819B CN201510147696.5A CN201510147696A CN104750819B CN 104750819 B CN104750819 B CN 104750819B CN 201510147696 A CN201510147696 A CN 201510147696A CN 104750819 B CN104750819 B CN 104750819B
Authority
CN
China
Prior art keywords
vocabulary
candidate
query
word
mrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510147696.5A
Other languages
Chinese (zh)
Other versions
CN104750819A (en
Inventor
徐博
林鸿飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201510147696.5A priority Critical patent/CN104750819B/en
Publication of CN104750819A publication Critical patent/CN104750819A/en
Application granted granted Critical
Publication of CN104750819B publication Critical patent/CN104750819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The Biomedical literature search method and system of a kind of word-based grading sorting algorithm, search method include S1, search engine inquiry extraction step;S2, candidate extend vocabulary extraction step;S3, candidate extend feature extraction and the annotation step of vocabulary;S4, candidate extend vocabulary order models training step;S5, on-line search engine queries and extraction step;S6, online candidate extend word retrieval and its feature extraction and marking step;S7, Query Result return to step.Searching system includes search engine inquiry extraction module, candidate extends vocabulary extraction module, candidate extends the feature extraction of vocabulary and labeling module, candidate extend vocabulary order models training module, Query Reconstruction module, Query Result and return to module.The present invention, by utilizing word grading sorting algorithm and the intrinsic dictionary resources selection of biomedical sector most to express the specialized vocabulary of customer information requirement in query expansion, completes retrieval tasks, improves the performance of retrieval from query expansion angle.

Description

The Biomedical literature search method and system of a kind of word-based grading sorting algorithm
Technical field
The present invention relates to data mining and search engine technique field, especially a kind of life of word-based grading sorting algorithm Thing medical literature retrieval method and system.
Background technology
In recent years, as the fast development in biomedical (Biomedicine) field, biomedical correlative study achieve More valuable achievement, these achievements not only facilitate some treatments for once seeming insoluble disease, from more far-reaching From the point of view of, also promote the mankind for the development that itself recognizes and deeply.
But as the increase at full speed of Biomedical literature quantity, the quantity of relevant information are also being exponentially increased, sea The document of amount and information are that the acquisition of information of biomedical researcher and related practitioner bring problem, and traditional craft Information acquiring pattern gradually becomes no longer to be applicable, therefore, it is necessary to by means of information retrieval technology and method, assist related Personnel obtain required information.
The inquiry that traditional information retrieval technique can be submitted according to user, correlation row is carried out to document or webpage Sequence, and ranking results are returned into user.And traditional information retrieval method is directly applied to the retrieval of Biomedical literature In task, it is difficult to obtain preferably retrieval performance.Its reason is to fail the inherent characteristicses for sufficiently considering biomedical sector, For example biomedical sector has more specialized vocabulary, and often there is many synonyms and abbreviation in these specialized vocabularies simultaneously The situation of word.The characteristics of sufficiently if biomedical sector can be considered in traditional information retrieval method, it will further Improve the performance of biomedical information retrieval.
Query expansion technology is one of the key technology in conventional IR field.It can be in original the looking into of user's submission On the basis of inquiry, it is intended to according to the retrieval of user, inquiry is supplemented and perfect, is intended to so as to more be met user search Inquiry, improve the performance of retrieval.Existing enquiry expanding method can be divided into two major classes:One kind is looking into based on collection of document Extended method is ask, this kind of method is therefrom extracted using total data collection of document or partial data collection of document as research object Content associated with the query, improves original query;Another kind of is the query expansion technology based on outside extended resources, external resource Dictionary resources, searching system inquiry log, Anchor Text and wikipedia etc. are mainly included, many researchs show to expand using outside Exhibition resource improves original query, can preferably complete query expansion task, and then lift the performance of retrieval.
Because biomedical sector has the Domain resources such as more dictionary, if can be during information retrieval, fully The inquiry submitted using these resources to user is supplemented and perfect, and the performance of retrieval will there is a strong possibility that property gets a promotion.
The literature search for being directed to biomedical sector is established, first it should be recognized that the characteristics of the field and resource. There is substantial amounts of specialized vocabulary in the document of biomedical sector, and these vocabulary contain many synonyms and abbreviation Etc. complex situations, this brings huge challenge for the foundation of searching system, such as drug acetaminophen, its English name Word is called paracetamol, and in international standard classification of drug, its title is paracetamol (acetaminophen), in medicinal chemistry art, its scientific name is C8H9NO2 or NO2BE01, is directed to a variety of titles of the above Situation, if only inquiring about one of name in retrieval, it is difficult to retrieve all related documents.It is worth rejoice It is that also there is many intrinsic knowledge bases and resource, such as MeSH (MeSH in biomedical sector:Medical Subject Headings) and gene ontology (GO:Gene Ontology) etc., if can be sufficiently sharp during retrieval With these resources, it will bring huge lifting to the performance of Biomedical literature retrieval.
Sequence study (learning to rank) algorithm is a series of supervision being used in information retrieval to document ordering The general name of learning algorithm, it is mainly characterized by using the technology of machine learning to solve the sequencing problem in information retrieval, And obtain preferable retrieval ordering performance.Wherein sequencing problem can also regard the select permeability of an optimal item as, therefore, Ranking Algorithm is applied to multiple other tasks in recent years, such as according to user and the history of article in commending system Information is that user recommends corresponding article etc..
The content of the invention
It is an object of the invention to provide one kind can provide the user more accurate Biomedical literature, more effectively full The information requirement of sufficient user, effectively supplement and improve the Biomedical literature inspection of the word-based grading sorting algorithm of user's inquiry Rope method and system.
The present invention solves technical scheme used by prior art problem:A kind of biology doctor of word-based grading sorting algorithm Document retrieval method, including following off-line training step and online query stage are learned, wherein, off-line training step includes following step Suddenly:
S1, search engine inquiry extraction step:Recorded according to the historical query of search engine, extract more group pollings and every The preceding N bars Query Result document obtained in individual inquiry;And by inquiry and Query Result document collection into an inquiry pond, wherein N is natural number;
S2, candidate extend vocabulary extraction step:The preceding N bars each inquired about in inquiry pond are inquired about according to biomedical resource Specialized vocabulary in result document is extracted, and is counted and obtained what each specialized vocabulary occurred in the Query Result document The weighted sum of number or occurrence number;The number that occurs according to each specialized vocabulary in Query Result document or number Weighted sum descending arranges, and selects occurrence number highest or M specialized vocabulary of weighted sum highest of number as candidate's expansion word Converge, wherein M is natural number;
S3, candidate extend feature extraction and the annotation step of vocabulary:
Candidate extends the feature extraction of vocabulary and mark is carried out simultaneously;Wherein, the correlation that vocabulary is extended to candidate marks By contrasting the retrieval performance of original query and candidate extension vocabulary being added to the height of retrieval performance when in original query It is low to mark;The evaluation index of retrieval performance height includes:Accuracy rate, Average Accuracy, NDCG values and MRR values;Correlation mark The concrete mode of note is as follows:
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, and eval (query+term) is to comment Scores of the valency target function eval () when candidate's extension vocabulary term is added to inquiry query by evaluation, eval (query) For score of the evaluation index function when query is inquired about in evaluation;Label is labeled as 1 expression candidate extension vocabulary and inquiry Query is related;Label is labeled as 0 expression candidate extension vocabulary and inquiry query is incoherent;
Candidate extends the feature extraction of vocabulary, is the preceding N bars returned from the inquiry in biomedical resource and inquiry pond Extracted in Query Result document candidate extend the distributed intelligence of vocabulary, distributed intelligence of candidate's vocabulary in biomedical resource with And it is that training order models are prepared that candidate, which extends vocabulary and the correlation information of original query, and extracting same candidate's extension After the various features of vocabulary, all characteristic values are normalized, by the control of all characteristic values on [0,1] section, Normalized process is as follows:
Wherein, minValue and maxValue is respectively the minimum value and maximum of a certain feature;
S4, candidate extend vocabulary order models training step:Marked according to the degree of correlation of candidate's extension vocabulary and a variety of Feature, train to obtain the weighted value of every kind of feature using word grading sorting algorithm, concretely comprise the following steps:Select quilt in a step S3 The candidate for being labeled as correlation extends vocabulary and some is marked as incoherent candidate and extends vocabulary forming a word packet, selection Some such word packets are used as training sample;The random feature for each of which candidate's expansion word assigns initial weight, leads to Characteristic weighing score is crossed to be ranked up the correlation candidate extension vocabulary in the packet of each word;The sequence knot being grouped according to each word Fruit, global weight loss is calculated, the weight per one-dimensional characteristic is adjusted according to the Grad of loss function dynamic, wherein sequence loss For:Wherein NumSample is that candidate extends the quantity that vocabulary is grouped, loss in word packetiTo be every The penalty values of individual word packet, the penalty values are obtained by calculating the sorting position of related expanding vocabulary, and sorting position is more forward right The penalty values answered are smaller;By a process on loop iteration, until overall loss value be less than that a certain threshold value or reach specifies repeatedly The training of generation number is completed, the order models that the characteristic value of final choice is completed as training;
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step:For the new inquiry of the online submission of user, N1 bars before retrieval obtains Query Result;The specialized vocabulary in preceding N1 bars retrieval result and its various features are extracted according to biomedical resource, its Middle N1 is natural number;
S6, online candidate extend word retrieval and its feature extraction and marking step:According to biomedical resource to newly looking into Ask and extend the feature extracting method of vocabulary extracting method and candidate's extension vocabulary to preceding N1 bars using off-line phase S2-S3 candidate Online query stage specialized vocabulary and its various features in retrieval result are extracted, and obtain online query stage candidate extension Vocabulary, the feature of extraction are used to weigh importance of candidate's extension vocabulary in expanding query;Train what is obtained according to step S4 Feature weight, extend vocabulary for online query stage candidate and given a mark, and select K1 forward candidate of fraction to extend vocabulary It is added in the new inquiry submitted online and is used as expanding query, wherein K1 is natural number;
Vocabulary is extended for some the online query stage candidate for marking and extracting using biomedical resource, it It is divided intoWherein FeatureNum is the sum of feature, aiIt is sequence mould The weighted value of ith feature, feature in typei(term) be online query stage candidate extend corresponding to vocabulary term i-th The characteristic value of individual feature;
Vocabulary score is extended according to online query stage candidate to be ranked up it, and selected and sorted forward K1 is online Inquiry phase candidate extends vocabulary as when extending vocabulary and being added in the new inquiry submitted online, the online query rank that is added Section candidate, which extends weight of the vocabulary in expanding query, to be expressed as Wherein sign is sign function, sign when in the new inquiry that online query stage candidate's expansion word remittance abroad is submitted online now =1, otherwise sign=0, weightoriginalThe weighted value for being the new inquiry submitted online in expanding query;
S7, Query Result return to step:Retrieved according to expanding query, retrieval result is returned into user.
In step S2, specialized vocabulary weighted sum of occurrence number in the Query Result document isWherein countiThe number occurred for the vocabulary in i-th document, diFor The decay factor of i piece documents.
In step s3, evaluation index function eval () is Average Accuracy function, i.e.,:
Wherein, RelDocqueryFor the number of given inquiry query relevant documentation, rank (i) is represented in document results The position of i-th relevant documentation in sorted lists.
In step sl, when situation about being recorded without historical query, by constructing biomedical inquiry and search method Mode, it is artificial to be inquired about and its record of result;The search method uses vector space model, BM25 retrieval models or base In the language model of different smoothing methods.
Penalty values are in step S4:Wherein rankiFor candidate's expansion word of correlation row are grouped in word The position sorted in table.
Biomedical resource refers to the dictionary or knowledge base for including biomedical specialized vocabulary.
The feature that the candidate extends vocabulary includes frequency TF, Hou Xuankuo that candidate's extension vocabulary occurs in result document Open up the TF-IDF values of vocabulary, candidate extend document number, candidate's extension vocabulary that vocabulary occurs jointly with original query with it is original Inquire about occur jointly in one text window number, in biomedical resource the existing number of candidate's expansion word remittance abroad, In biomedical resource, comprising the candidate extend vocabulary term concepts number and biomedical technical term concept it Between inclusion relation.
A kind of Biomedical literature searching system of word-based grading sorting algorithm, including off-line training part and online inspection Rope part;The off-line training part is included with lower part:
Search engine inquiry extraction module:For according to the historical query of search engine record, extract more group pollings and The preceding N bars Query Result document obtained in each inquiry;And by inquiry and Query Result document collection into an inquiry pond, its Middle N is natural number;
Candidate extends vocabulary extraction module:For when given user inquires about, using the intrinsic resource of biomedical sector, In the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and to the professional word The frequency or the weighted sum of occurrence number that remittance occurs in Query Result document are recorded;Looked into according to each specialized vocabulary The weighted sum descending arrangement of the number occurred in result document or occurrence number is ask, selects M specialty of occurrence number highest Vocabulary extends vocabulary as candidate, and wherein M is natural number;
Candidate extends feature extraction and the labeling module of vocabulary:For candidate extend vocabulary extraction module in obtained by Candidate, which extends, extracts associated feature in vocabulary, and extends influence of the vocabulary for retrieval performance according to candidate, and mark is waited The degree of correlation of choosing extension vocabulary;
Candidate extends vocabulary order models training module:For utilizing word grading sorting algorithm, in extraction candidate's expansion word Converge after feature and mark candidate's extension vocabulary degree of correlation, training vocabulary order models obtain each feature that candidate extends vocabulary Weighted value:The candidate that correlation is noted as in the feature extraction of one candidate's extension vocabulary of selection and labeling module extends vocabulary Incoherent one word packet of candidate's extension vocabulary composition is marked as with some, selects some such words to be grouped and is used as training Sample;The random feature for each of which candidate's expansion word assigns initial weight, by characteristic weighing score to each word point Correlation candidate extension vocabulary in group is ranked up;The ranking results being grouped according to each word, global weight loss is calculated, according to The Grad dynamic of loss function adjusts the weight per one-dimensional characteristic, wherein sequence loss is:Its Middle NumSample is that candidate extends the quantity that vocabulary is grouped, loss in word packetiFor the penalty values of each word packet, the loss Value is obtained by calculating the sorting position of related expanding vocabulary, and the more forward corresponding penalty values of sorting position are smaller;Pass through circulation A process in iteration, completion is trained until overall loss value is less than a certain threshold value or reaches the iterations specified, will finally be selected The order models that the characteristic value selected is completed as training;
The on-line search part includes:
Query Reconstruction module:Vocabulary marking is extended for the specialized vocabulary extraction in newly inquiring about and candidate;Including searching online Rope engine queries extraction module, online candidate extend word retrieval and its feature extraction and scoring modules, wherein, on-line search is drawn Inquiry extraction module is held up for the new inquiry to the online submission of user, N1 bar Query Results before retrieval obtains;According to biomedicine Resource is extracted to the specialized vocabulary in preceding N1 bars retrieval result and its various features, and wherein N1 is natural number;Online candidate The candidate that extension word retrieval and its feature extraction and scoring modules are exported using vocabulary order models extends vocabulary weighted value and obtained Divide and calculate corresponding weight, and add it in original query, be expanded inquiry;
Query Result returns to module:For the result document for retrieving to obtain by expanding query, user is returned to.
The beneficial effects of the present invention are:The present invention is mainly from the angle of query expansion, by query expansion The special of customer information requirement can be most expressed using resource selections such as the intrinsic dictionaries of word grading sorting algorithm and biomedical sector Industry vocabulary, more efficiently the completing retrieval of the task, so as to provide the user the properer retrieval result of demand therewith, this hair The bright resource using in biomedical sector, original query is supplemented and improved, and then improve the performance of retrieval.When use TREC bases Because the set of task data in literature is as data acquisition system, document is carried out as reference retrieval model using traditional BM25 retrieval models During retrieval, 25.62% literature search accuracy rate can be obtained;And method and system involved in the present invention is used on this basis When being retrieved, 26.30% literature search accuracy rate can be obtained, retrieval performance is obviously improved and the present invention Involved peek-a-boo can be effectively retrieved inquires about mostly concerned Biomedical literature with user, improves user's Satisfaction.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of search method of the present invention;
Fig. 2 is the logical construction schematic diagram of searching system of the present invention.
Embodiment
Below in conjunction with the drawings and the specific embodiments, the present invention will be described:
Fig. 1 is a kind of schematic flow sheet of the Biomedical literature search method of word-based grading sorting algorithm of the present invention, A kind of Biomedical literature search method of word-based grading sorting algorithm, including following off-line training step and online query rank Section, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step:Recorded according to the historical query of search engine, extract more group pollings and every The preceding N bars Query Result document obtained in individual inquiry;And by inquiry and Query Result document collection into an inquiry pond, N is Natural number.In the present embodiment, N=10;
Wherein, the historical query record of search engine is primarily referred to as being directed to the searching system of Biomedical literature and recorded Query history and corresponding Query Result, these inquiry and corresponding Query Result will be used for order models under off-line state Training.
, can be by way of constructing biomedical inquiry and retrieval when the situation without relevant historical inquiry record, people Work is inquired about and its record of retrieval result.Search method can use a variety of order models in conventional IR, bag Include but be not limited to vector space model, BM25 retrieval models, the language model based on different smoothing methods etc..
S2, candidate extend vocabulary extraction step:The preceding N bars each inquired about in inquiry pond are inquired about according to biomedical resource Specialized vocabulary in result document is extracted, and is counted and obtained what each specialized vocabulary occurred in the Query Result document The weighted sum of number or occurrence number;The number that occurs according to each specialized vocabulary in Query Result document or number Weighted sum descending arranges, and selects occurrence number highest or number weighted sum M specialized vocabulary of highest as candidate's expansion word Converge, wherein M is natural number;
Wherein, biomedical resource refers to the resources such as the dictionary comprising biomedical specialized vocabulary or knowledge base, including But it is not limited to:The super word of MeSH (MeSH), gene ontology (GO) and Unified Medical Language System (UMLS) issue Converge storehouse (Metathesaurus), semantic network (Semantic Network) and expert's semantic dictionary instrument (SPECIALIST Lexicon and Lexical Tools) etc..
Exemplified by using MeSH MeSH as biomedical resource used in the present invention, corresponding to extraction inquiry Specialized vocabulary in preceding N pieces Query Result document, wherein each specialized vocabulary extracted has corresponded to it and gone out in a document Existing number or the weighted sum of occurrence number.Such as specialized vocabulary term in a preceding N pieces document occurrence number weighted sum byIt is calculated, wherein countiTime occurred for the vocabulary in i-th document Number, diFor the decay factor of i-th document, the number weighted sum of specialized vocabulary is used for carrying out the word frequency occurred in different document Weighting, so that the word frequency in the forward document that sorts has bigger weight, control causes in the document of sequence more rearward Comprising specialized vocabulary obtain score it is fewer.According to above-mentioned formulaIn Count (term) value is ranked up to selected specialized vocabulary from high to low, or according to score (term) value by height Selected specialized vocabulary is ranked up to low, extension vocabulary of the selected and sorted preceding M vocabulary the most forward as candidate, M value is 150 in the present embodiment.
S3, candidate extend feature extraction and the correlation annotation step of vocabulary:
Candidate extends the feature extraction of vocabulary and mark is carried out simultaneously;Wherein, the correlation that vocabulary is extended to candidate marks Realized by the retrieval performance for contrasting the retrieval performance of original query and being added to the extension vocabulary when in original query.Candidate The thinking of mark for extending vocabulary is:Single candidate extension vocabulary is added in original query and retrieved, if retrieval result The lifting of performance, then mark the extension vocabulary has correlation with original query.The evaluation index of retrieval performance includes but unlimited Schedule:Accuracy rate (Precision), Average Accuracy (MAP), NDCG values and MRR values etc..The concrete mode of mark is as follows:
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, and eval (query+term) is to comment Scores of the valency target function eval () when candidate's extension vocabulary term is added to given inquiry query by evaluation, eval (query) score for evaluation index function in the given inquiry query of evaluation.When adding a certain candidate's vocabulary with original query When the evaluation score retrieved is more than the evaluation score that original query is retrieved in itself, candidate extension vocabulary is labeled as 1, being labeled as 1 means that the vocabulary to original query is related;And when original query is retrieved plus a certain candidate's vocabulary Evaluation score retrieved in itself no more than original query evaluation score when, candidate extension vocabulary is labeled as 0, mark It is incoherent when meaning the vocabulary with original query for 0.
In the present embodiment, evaluation function eval () is Average Accuracy, i.e.,:
Wherein, RelDocqueryFor the number of given inquiry query relevant documentation, rank (i) is represented in document results The position of i-th relevant documentation in sorted lists, such as rank (3)=5 represent the 3rd related text in sort result list Shelves appear in the 5th position of sorted lists.
It is the preceding N returned from the inquiry in biomedical resource and inquiry pond and candidate extends the feature extraction of vocabulary The distributed intelligence of distributed intelligence, candidate's vocabulary that candidate's extension vocabulary is extracted in result document in biomedical resource is ask in investigation And candidate extends vocabulary and the correlation information of original query etc. and prepared for training order models, and extracting same candidate After the various features for extending vocabulary, all characteristic values are normalized;So that all characteristic values are controlled in [0,1] section On, normalized detailed process is:
MinValue and maxValue are respectively a certain The minimum value and maximum of feature.
Wherein, the feature for extending vocabulary specifically includes:
1st, candidate extends the frequency TF that vocabulary occurs in result document.This feature can be according to specialized vocabulary term in result Occurrence number obtains in document.
2nd, candidate extends the TF-IDF values of vocabulary.TF-IDF is one of classical model of information retrieval field, can be used to weigh The relative importance that measure word converges, computational methods are as shown by the following formula:
Wherein count (term) is that candidate extends the number that vocabulary occurs in i-th result document, and TotalDoc is instruction Practice the total number of documents in data, df (term) is the number for occurring the document that the candidate extends vocabulary.
3rd, candidate extends the document number that vocabulary occurs jointly with original query.This feature can be used for calculating original query The degree of correlation of vocabulary is extended with candidate.
4th, candidate extends the number that vocabulary occurs jointly with original query in one text window.This feature is used for calculating The query word in original query extends the degree of correlation of vocabulary with the candidate within the specific limits, and wherein text window refers to same A piece occurs in the range of the document of original query word and candidate's vocabulary, the word being spaced between the extension vocabulary and original query word Number.
5th, in biomedical resource such as MeSH, the existing number of candidate's expansion word remittance abroad.This feature is used for calculating and weighing The candidate extends segment information of the vocabulary in biomedical resource.
6th, in biomedical resource such as MeSH, the number of the term concepts of vocabulary is extended comprising the candidate.Cured in biology Often there is the relation included between technical term concept, this feature can equally weigh some candidate's vocabulary in biomedicine Importance in resource.
The candidate extracted more than is extended in lexical feature, and feature 1 and feature 2 are used for weighing candidate's extension vocabulary in document Distributed intelligence in set;Feature 3 and feature 4 are used for weighing the degree of correlation information that candidate extends vocabulary and original query;And Feature 5 and feature 6 are used for weighing distributed intelligence of candidate's extension vocabulary in biomedical resource.Extension involved in the present invention Lexical feature includes but is not limited to features described above, by above-mentioned manifold extraction, can be used as word grading sorting algorithm Input, preferably weigh candidate extend vocabulary significance level.
S4, candidate extend vocabulary order models training step:The correlation of vocabulary is extended according to the candidate obtained in step S3 Degree marks and various features are as input, trains to obtain the weight of every kind of feature using the order models of word grading sorting algorithm Value, concretely comprise the following steps selection one step S3 in be noted as correlation candidate extend vocabulary (i.e. label for 1 when it is corresponding Candidate extends vocabulary) and it is some be marked as incoherent candidate extend vocabulary (i.e. label for 0 when corresponding candidate extend Vocabulary) one word packet of composition, select some such word packets to be used as training sample;At random vocabulary is extended for each candidate Word feature assign initial weight, by characteristic weighing score to each word packet in related expanding vocabulary be ranked up;Root The ranking results being grouped according to each word, global weight loss is calculated, adjusted according to the Grad of loss function dynamic per one-dimensional spy The weight of sign, wherein sequence loss is:Wherein NumSample is that candidate extends vocabulary in word packet The quantity of packet, lossiFor the penalty values of each word packet, the penalty values are obtained by calculating the sorting position of related expanding vocabulary Arrive, the more forward corresponding penalty values of sorting position are smaller;By a process on loop iteration, until overall loss value is less than a certain Threshold value reaches the iterations training completion specified, the order models that the characteristic value of final choice is completed as training;This 100 termination training of iteration are selected in embodiment.
Penalty values are in the present embodiment:Wherein rankiIt is grouped for candidate's expansion word of correlation in word The position sorted in list, when it makes number one, loss is 0, loses and is maximized when it rolls into last place.In addition, The calculation formula of penalty values is including but not limited to this calculation formula.
In order models, the calculation formula for extending vocabulary final score is as follows:
Wherein, FeatureNum is the sum of feature, aiFor the weighted value of ith feature, featurei(term) it is candidate The characteristic value of ith feature corresponding to vocabulary term.The order models obtained train herein after can be used for test query correlation Extension vocabulary selection.Above step is completed in off-line case.
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step:For the new inquiry of the online submission of user, N1 bars before retrieval obtains Query Result;The specialized vocabulary in preceding N1 bars retrieval result and its various features are extracted according to biomedical resource, its Middle N1 is natural number;
It should be noted that in the case of this step refers to online, submitted as user to Biomedical literature search engine After inquiry, this method can obtain preliminary search sequence N1 piece Query Results the most forward automatically, for the expansion inquired about user The processing such as exhibition, the processing is transparent for user.
S6, online candidate extend word retrieval and its feature extraction and marking step:According to biomedical resource to newly looking into Ask and extend the feature extracting method of vocabulary extracting method and candidate's extension vocabulary to preceding N1 bars using off-line phase S2-S3 candidate Online query stage specialized vocabulary and its various features in retrieval result are extracted, and obtain online query stage candidate extension Vocabulary, the feature of extraction are used to weigh importance of candidate's extension vocabulary in expanding query;Train what is obtained according to step S4 Feature weight, extend vocabulary for online query stage candidate and given a mark, new inquiry is built according to marking, and select fraction to lean on K1 preceding online query stage candidate extends the extension that vocabulary is added in the new inquiry submitted online as on-line stage and looked into Ask, wherein K1 is natural number;
Vocabulary is extended for some the online query stage candidate for marking and extracting using biomedical resource, it It is divided intoWherein FeatureNum is the sum of feature, aiIt is sequence mould The weighted value of ith feature, feature in typei(term) be online query stage candidate extend corresponding to vocabulary term i-th The characteristic value of individual feature;
Vocabulary score is extended according to online query candidate to be ranked up it, and the K1 vocabulary conduct that selected and sorted is forward When on-line stage candidate extension vocabulary is added in new inquiry, the on-line stage candidate added extends vocabulary in expanding query Weight can be expressed asWherein sign is sign function, Sign=1 when in the new inquiry that on-line stage candidate's expansion word remittance abroad is submitted online now, otherwise sign=0, weightoriginalThe weighted value for being the new inquiry submitted online in expanding query;
The concrete form of final expanding query is as follows:
(weight1 queryoriginal weight2(w1 term1 w2 term2…wk termk))
Wherein weight1The weight for being the new inquiry submitted online in expanding query, weight2For the extension newly added All weights in expanding query of vocabulary, w1,w2,…,wKTo extend vocabulary term1,term2,…,termKCorresponding Fraction weight, K are the number of the extension vocabulary of final choice.Weight in the present embodiment1Value is 0.5, weight2Value is 0.5, K value is 50.
S7, Query Result return to step:Retrieved according to expanding query, retrieval result is returned into user, completes inspection Rope process.
Corresponding with the above method, present invention also offers a kind of inspection of the Biomedical literature of word-based grading sorting algorithm Cable system.Accompanying drawing 2 gives the building-block of logic of the system.
A kind of Biomedical literature searching system of word-based grading sorting algorithm, including off-line training part and online inspection Rope part;The off-line training part is included with lower part:
Search engine inquiry extraction module:For according to the historical query of search engine record, extract more group pollings and The preceding N bars Query Result document obtained in each inquiry;And by inquiry and Query Result document collection into an inquiry pond, its Middle N is natural number;Search engine inquiry extraction module can retrieve the biology associated with user's inquiry according to the inquiry of user Medical literature, and the result of retrieval is returned into user, and internal system for the computings such as the extension of inquiry and operation to It is transparent for family to can't see.
Candidate extends vocabulary extraction module:For when given user inquires about, using the intrinsic resource of biomedical sector, In the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and to the professional word The number (frequency) or the weighted sum of occurrence number that remittance occurs in Query Result document are recorded;According to each professional word The number or the weighted sum descending arrangement of occurrence number that remittance occurs in Query Result document, select occurrence number highest M Individual specialized vocabulary extends vocabulary as candidate, and wherein M is natural number;
Candidate extends feature extraction and the labeling module of vocabulary:For candidate extend vocabulary extraction module in obtained by Candidate, which extends, extracts associated feature in vocabulary, and extends influence of the vocabulary for retrieval performance according to candidate, and mark is waited The degree of correlation of choosing extension vocabulary;In off-line training, candidate, which will extend the degree of correlation mark of vocabulary and various features, to be used for The input of word grading sorting algorithm;In online query, the module is used to extract the feature letter associated with candidate's extension vocabulary Breath.
Candidate extends vocabulary order models training module:For utilizing word grading sorting algorithm, in extraction candidate's expansion word Converge after feature and mark candidate's extension vocabulary degree of correlation, training vocabulary order models output candidate extends each feature of vocabulary Weighted value;The weighted value can be used in the measurement of the significance level of the extension vocabulary to unknown inquiry.Specially:Selection one Candidate extend vocabulary feature extraction and labeling module in be noted as the candidate of correlation and extend vocabulary and some be marked as not Related candidate extends vocabulary and forms a word packet, selects some such word packets to be used as training sample;Random is wherein The feature of each candidate's expansion word assigns initial weight, and the correlation candidate in the packet of each word is expanded by characteristic weighing score Exhibition vocabulary is ranked up;The ranking results being grouped according to each word, global weight loss is calculated, according to the Grad of loss function Dynamic adjusts the weight per one-dimensional characteristic, wherein sequence loss is:Wherein NumSample is word point Candidate extends the quantity of vocabulary packet, loss in groupiFor the penalty values of each word packet, the penalty values are by calculating related expanding The sorting position of vocabulary obtains, and the more forward corresponding penalty values of sorting position are smaller;Pass through a process, Zhi Daozong on loop iteration Bulk diffusion value is less than the iterations training that a certain threshold value or reach is specified and completed, and the characteristic value using final choice is as having trained Into order models.
The on-line search part includes:
Query Reconstruction module:Vocabulary marking is extended for the specialized vocabulary extraction in newly inquiring about and candidate;Including searching online Rope engine queries extraction module, online candidate extend word retrieval and its feature extraction and scoring modules, wherein, on-line search is drawn Inquiry extraction module is held up for the new inquiry to the online submission of user, N1 bar Query Results before retrieval obtains;According to biomedicine Resource is extracted to the specialized vocabulary in preceding N1 bars retrieval result and its various features, and wherein N1 is natural number.Online candidate The candidate that extension word retrieval and its feature extraction and scoring modules are exported using vocabulary order models extends vocabulary weighted value and obtained Divide and calculate corresponding weight, and add it in original query, be expanded inquiry.
Query Result returns to module, for the result document for retrieving to obtain by expanding query, returns to user.User obtains Returning result be actually result of the returning result after query expansion that it submits input, and the process pair of query expansion It is sightless for user.
According to the above-mentioned description for being directed to method and system embodiment involved in the present invention, in conjunction with specific embodiments Illustrate.Assume that user has completed the training of order models by historical data in the present embodiment, when user submits one " during mad cow disease " (rabid ox diseases), system is first according to the word in preliminary search before examination document for individual new inquiry Frequency information, selects the extension vocabulary of candidate, and wherein candidate extends 10 extension vocabulary in the top in vocabulary and its correlation Property mark situation it is as shown in the table:
Ranking Vocabulary Correlation
1 Disease (disease) It is related
2 Prions (prion) It is related
3 Cause (causes) It is uncorrelated
4 Infectious (infectivity) It is related
5 Conversion (conversion) It is uncorrelated
6 Cow (ox) It is related
7 Spongiform (spongy tissue) It is related
8 Fatal (fatal) It is uncorrelated
9 Encephalopathies (epileptic encephalopathic) It is related
10 Mad (madness) It is related
As can be seen from the above table, the candidate of 10 is extended in vocabulary before ranking, and uncorrelated vocabulary has 3, if directly Add it in original query, negative impact can be produced to retrieval performance.Next from document and biomedical dictionary The extraction feature related to candidate's extension vocabulary in MeSH, and the weight of every kind of feature is obtained using order models, to all Candidate extends vocabulary and is given a mark and sorted again.
10 extension vocabulary is as shown in the table before the ranking of final choice after sequence.As can be seen from the table, pass through 10 inquiries for sorting the most forward in expanding query after sequence is perfect are relative words.By these inquiries according to its normalizing Sequence score after change is added in original query, the performance of retrieval can further be improved by carrying out retrieval as weight.
The description of above-described embodiment is explained and illustrates the biomedicine of word-based grading sorting algorithm provided by the invention Document retrieval method and system.This method and system can utilize what the resources such as the knowledge base of biomedical sector were submitted to user Original query is extended, and has been used word grading sorting algorithm to be used to extend vocabulary importance measures in extension, has been passed through inquiry Expansion process to the inquiry that user submits carried out supplement and it is perfect, ensure that the accuracy of Query Result, further meet The information requirement of user.
Above content is to combine specific optimal technical scheme further description made for the present invention, it is impossible to is assert The specific implementation of the present invention is confined to these explanations.For general technical staff of the technical field of the invention, On the premise of not departing from present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the present invention's Protection domain.

Claims (8)

1. a kind of Biomedical literature search method of word-based grading sorting algorithm, it is characterised in that including following offline instruction Practice stage and online query stage, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step:Recorded according to the historical query of search engine, extract more group pollings and each look into The preceding N bars Query Result document obtained in inquiry;And by inquiry and Query Result document collection into an inquiry pond, wherein N is Natural number;
S2, candidate extend vocabulary extraction step:According to biomedical resource to inquiring about in pond the preceding N bars Query Result each inquired about Specialized vocabulary in document is extracted, and is counted and obtained the number that each specialized vocabulary occurs in the Query Result document Or the weighted sum of occurrence number;The weighting of the number or number that occur according to each specialized vocabulary in Query Result document Arranged with descending, select occurrence number highest or M specialized vocabulary of weighted sum highest of number to extend vocabulary as candidate, its Middle M is natural number;
S3, candidate extend feature extraction and the annotation step of vocabulary:
Candidate extends the feature extraction of vocabulary and mark is carried out simultaneously;Wherein, the correlation mark that vocabulary is extended to candidate passes through The height for contrasting the retrieval performance of original query and candidate extension vocabulary being added to retrieval performance when in original query comes Mark;The evaluation index of retrieval performance height includes:Accuracy rate, Average Accuracy, NDCG values and MRR values;Correlation mark Concrete mode is as follows:
<mrow> <mi>l</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mo>=</mo> <mfenced open = "{" close = "}"> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <mrow> <mi>e</mi> <mi>v</mi> <mi>a</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>q</mi> <mi>u</mi> <mi>e</mi> <mi>r</mi> <mi>y</mi> <mo>+</mo> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>m</mi> <mo>)</mo> </mrow> <mo>&gt;</mo> <mi>e</mi> <mi>v</mi> <mi>a</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>q</mi> <mi>u</mi> <mi>e</mi> <mi>r</mi> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>e</mi> <mi>v</mi> <mi>a</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>q</mi> <mi>u</mi> <mi>e</mi> <mi>r</mi> <mi>y</mi> <mo>+</mo> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>m</mi> <mo>)</mo> </mrow> <mo>&amp;le;</mo> <mi>e</mi> <mi>v</mi> <mi>a</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>q</mi> <mi>u</mi> <mi>e</mi> <mi>r</mi> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, and eval (query+term) refers to for evaluation Scores of the scalar functions eval () when candidate's extension vocabulary term is added to inquiry query by evaluation, eval (query) is to comment Score of the valency target function when query is inquired about in evaluation;Label is labeled as 1 expression candidate extension vocabulary and inquiry query Related;Label is labeled as 0 expression candidate extension vocabulary and inquiry query is incoherent;
Candidate extends the feature extraction of vocabulary, is the preceding N bars inquiry returned from the inquiry in biomedical resource and inquiry pond Candidate is extracted in result document and extends the distributed intelligence and time of the distributed intelligence, candidate's vocabulary of vocabulary in biomedical resource Choosing extension vocabulary and the correlation information of original query are prepared for training order models, and extend vocabulary extracting same candidate Various features after, all characteristic values are normalized, by all characteristic values control on [0,1] section, normalizing The process of change is as follows:
<mrow> <mi>n</mi> <mi>e</mi> <mi>w</mi> <mi>F</mi> <mi>e</mi> <mi>a</mi> <mi>t</mi> <mi>u</mi> <mi>r</mi> <mi>e</mi> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> <mo>=</mo> <mfrac> <mrow> <mi>o</mi> <mi>l</mi> <mi>d</mi> <mi>F</mi> <mi>e</mi> <mi>a</mi> <mi>t</mi> <mi>u</mi> <mi>r</mi> <mi>e</mi> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> <mo>-</mo> <mi>min</mi> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> </mrow> <mrow> <mi>max</mi> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> <mo>-</mo> <mi>min</mi> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> </mrow> </mfrac> </mrow>
Wherein, minValue and maxValue is respectively the minimum value and maximum of a certain feature;
S4, candidate extend vocabulary order models training step:The degree of correlation mark and various features of vocabulary are extended according to candidate, Train to obtain the weighted value of every kind of feature using word grading sorting algorithm, concretely comprise the following steps:It is marked in one step S3 of selection For correlation candidate extend vocabulary and it is some be marked as incoherent candidate and extend vocabulary forming a word packet, select some Such word packet is used as training sample;The random feature for each of which candidate's expansion word assigns initial weight, passes through spy Sign weight score is ranked up to the correlation candidate extension vocabulary in the packet of each word;The ranking results being grouped according to each word, Global weight loss is calculated, the weight per one-dimensional characteristic is adjusted according to the Grad of loss function dynamic, wherein sequence loss is:Wherein NumSample is that candidate extends the quantity that vocabulary is grouped, loss in word packetiFor each word The penalty values of packet, the penalty values are obtained by calculating the sorting position of related expanding vocabulary, and sorting position is more forward corresponding Penalty values are smaller;By a process on loop iteration, until overall loss value is less than a certain threshold value or reaches the iteration specified time Number training is completed, the order models that the characteristic value of final choice is completed as training;
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step:For the new inquiry of the online submission of user, N1 bars inquiry before retrieval obtains As a result;The specialized vocabulary in preceding N1 bars retrieval result and its various features are extracted according to biomedical resource, wherein N1 For natural number;
S6, online candidate extend word retrieval and its feature extraction and marking step:According to biomedical resource to new inquiry profit The feature extracting method that vocabulary extracting method and candidate's extension vocabulary are extended with off-line phase S2-S3 candidate is retrieved to preceding N1 bars As a result online query stage specialized vocabulary and its various features in are extracted, and obtain online query stage candidate's expansion word Converge, the feature of extraction is used to weigh importance of candidate's extension vocabulary in expanding query;The spy for training to obtain according to step S4 Weight is levied, extending vocabulary for online query stage candidate is given a mark, and is selected K1 forward candidate of fraction to extend vocabulary and added Enter and expanding query is used as into the new inquiry submitted online, wherein K1 is natural number;
Vocabulary is extended for some the online query stage candidate for marking and extracting using biomedical resource, it is scored atWherein FeatureNum is the sum of feature, aiIt is in order models The weighted value of ith feature, featurei(term) it is special i-th that online query stage candidate is extended corresponding to vocabulary term The characteristic value of sign;
Vocabulary score is extended according to online query stage candidate to be ranked up it, and the K1 online query that selected and sorted is forward When stage candidate extension vocabulary is added in the new inquiry submitted online as extension vocabulary, the online query stage added waits Weight of the choosing extension vocabulary in expanding query can be expressed asIts Middle sign is sign function, sign=when in the new inquiry that online query stage candidate's expansion word remittance abroad is submitted online now 1, otherwise sign=0, weightoriginalThe weighted value for being the new inquiry submitted online in expanding query;
S7, Query Result return to step:Retrieved according to expanding query, retrieval result is returned into user.
A kind of 2. Biomedical literature search method of word-based grading sorting algorithm according to claim 1, it is characterised in that In step S2, specialized vocabulary weighted sum of occurrence number in the Query Result document isIts Middle countiThe number occurred for the vocabulary in i-th document, diFor the decay factor of i-th document.
3. a kind of Biomedical literature search method of word-based grading sorting algorithm according to claim 1, its feature It is, in step s3, evaluation index function eval () is Average Accuracy function, i.e.,:
<mrow> <msub> <mi>eval</mi> <mrow> <mi>M</mi> <mi>A</mi> <mi>P</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>RelDoc</mi> <mrow> <mi>q</mi> <mi>u</mi> <mi>e</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> </mrow> </mfrac> <mo>&amp;CenterDot;</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <msub> <mi>RelDoc</mi> <mrow> <mi>q</mi> <mi>u</mi> <mi>e</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> </mrow> </msubsup> <mfrac> <mi>i</mi> <mrow> <mi>r</mi> <mi>a</mi> <mi>n</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, RelDocqueryFor the number of given inquiry query relevant documentation, rank (i) represents to sort in document results The position of i-th relevant documentation in list.
4. a kind of Biomedical literature search method of word-based grading sorting algorithm according to claim 1, its feature It is, in step sl, when situation about being recorded without historical query, by the side for constructing biomedical inquiry and search method Formula, it is artificial to be inquired about and its record of result;The search method is using vector space model, BM25 retrieval models or is based on The language model of different smoothing methods.
5. a kind of Biomedical literature search method of word-based grading sorting algorithm according to claim 1, its feature It is, penalty values are in step S4:Wherein rankiIt is related candidate's expansion word in word group list The position of sequence.
6. a kind of Biomedical literature search method of word-based grading sorting algorithm according to claim 1, its feature It is, biomedical resource refers to the dictionary or knowledge base for including biomedical specialized vocabulary.
7. a kind of Biomedical literature search method of word-based grading sorting algorithm according to claim 1, its feature It is, the feature that the candidate extends vocabulary includes frequency TF, the candidate's extension that candidate's extension vocabulary occurs in result document The TF-IDF values of vocabulary, candidate extend the document number that vocabulary occurs jointly with original query, candidate's extension vocabulary is looked into original Ask occur jointly in one text window number, in biomedical resource the existing number of candidate's expansion word remittance abroad, in life In thing medical resource, the number of the term concepts of vocabulary is extended comprising the candidate and between biomedical technical term concept Inclusion relation.
8. a kind of Biomedical literature searching system of word-based grading sorting algorithm, it is characterised in that including off-line training portion Point and on-line search part;The off-line training part is included with lower part:
Search engine inquiry extraction module:For being recorded according to the historical query of search engine, more group pollings and each are extracted The preceding N bars Query Result document obtained in inquiry;And by inquiry and Query Result document collection into an inquiry pond, wherein N For natural number;
Candidate extends vocabulary extraction module:For when given user inquires about, using the intrinsic resource of biomedical sector, searching In the top n Query Result document that rope engine queries extraction module obtains, extraction obtains specialized vocabulary, and the specialized vocabulary is existed The frequency or the weighted sum of occurrence number occurred in Query Result document is recorded;Tied according to each specialized vocabulary in inquiry The number occurred in fruit document or the weighted sum descending arrangement of occurrence number, select occurrence number M specialized vocabulary of highest Vocabulary is extended as candidate, wherein M is natural number;
Candidate extends feature extraction and the labeling module of vocabulary:For the candidate obtained by being extended in candidate in vocabulary extraction module Associated feature is extracted in extension vocabulary, and influence of the vocabulary for retrieval performance is extended according to candidate, mark candidate expands Open up the degree of correlation of vocabulary;
Candidate extends vocabulary order models training module:For utilizing word grading sorting algorithm, it is special to extend vocabulary in extraction candidate After the mark candidate that seeks peace extends vocabulary degree of correlation, training vocabulary order models obtain the power that candidate extends each feature of vocabulary Weight values:If one candidate of selection extend vocabulary feature extraction and labeling module in be noted as correlation candidate extend vocabulary and It is dry to be marked as incoherent one word packet of candidate's extension vocabulary composition, select some such words to be grouped and be used as training sample This;The random feature for each of which candidate's expansion word assigns initial weight, and each word is grouped by characteristic weighing score Interior correlation candidate extension vocabulary is ranked up;The ranking results being grouped according to each word, global weight loss is calculated, according to damage The Grad dynamic for losing function adjusts the weight of every one-dimensional characteristic, wherein sequence loss is:Wherein NumSample is that candidate extends the quantity that vocabulary is grouped, loss in word packetiFor the penalty values of each word packet, the penalty values Obtained by the sorting position for calculating related expanding vocabulary, the more forward corresponding penalty values of sorting position are smaller;Changed by circulation Dai Shangyi processes, completion is trained until overall loss value is less than a certain threshold value or reaches the iterations specified, by final choice Characteristic value as training complete order models;
The on-line search part includes:
Query Reconstruction module:Vocabulary marking is extended for the specialized vocabulary extraction in newly inquiring about and candidate;Draw including on-line search Inquiry extraction module, online candidate extension word retrieval and its feature extraction and scoring modules are held up, wherein, on-line search engine is looked into Extraction module is ask for the new inquiry to the online submission of user, N1 bar Query Results before retrieval obtains;According to biomedical resource Specialized vocabulary in preceding N1 bars retrieval result and its various features are extracted, wherein N1 is natural number;Online candidate's extension Word retrieval and its feature extraction and scoring modules extend vocabulary weighted value score meter using the candidate of vocabulary order models output Corresponding weight is calculated, and is added it in original query, be expanded inquiry;
Query Result returns to module:For the result document for retrieving to obtain by expanding query, user is returned to.
CN201510147696.5A 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm Active CN104750819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510147696.5A CN104750819B (en) 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510147696.5A CN104750819B (en) 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm

Publications (2)

Publication Number Publication Date
CN104750819A CN104750819A (en) 2015-07-01
CN104750819B true CN104750819B (en) 2018-01-23

Family

ID=53590503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510147696.5A Active CN104750819B (en) 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm

Country Status (1)

Country Link
CN (1) CN104750819B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095838A (en) * 2016-06-01 2016-11-09 比美特医护在线(北京)科技有限公司 A kind of data processing method and device
US20180025121A1 (en) * 2016-07-20 2018-01-25 Baidu Usa Llc Systems and methods for finer-grained medical entity extraction
CN106294654B (en) * 2016-08-04 2018-01-19 首都师范大学 A kind of body sort method and system
CN106919649B (en) * 2017-01-19 2020-06-26 北京奇艺世纪科技有限公司 Entry weight calculation method and device
CN108509461A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of sequence learning method and server based on intensified learning
CN110019888A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of searching method and device
CN108520038B (en) * 2018-03-31 2020-11-10 大连理工大学 Biomedical literature retrieval method based on sequencing learning algorithm
CN109508392A (en) * 2018-09-28 2019-03-22 中国标准化研究院 A kind of technical literature index announcement search method
CN109857731A (en) * 2019-01-11 2019-06-07 吉林大学 A kind of peek-a-boo and search method of biomedicine entity relationship
CN113434767A (en) * 2021-07-07 2021-09-24 携程旅游信息技术(上海)有限公司 UGC text content mining method, system, device and storage medium
CN113486156A (en) * 2021-07-30 2021-10-08 北京鼎普科技股份有限公司 ES-based associated document retrieval method
CN113742459B (en) * 2021-11-05 2022-03-04 北京世纪好未来教育科技有限公司 Vocabulary display method and device, electronic equipment and storage medium
CN115016873B (en) * 2022-05-05 2024-07-12 上海乾臻信息科技有限公司 Front-end data interaction method, system, electronic equipment and readable storage medium
CN115659047B (en) * 2022-11-11 2023-07-28 南京汇宁桀信息科技有限公司 Medical document retrieval method based on hybrid algorithm
CN117076658B (en) * 2023-08-22 2024-05-03 南京朗拓科技投资有限公司 Quotation recommendation method, device and terminal based on information entropy

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942302A (en) * 2014-04-16 2014-07-23 苏州大学 Method for establishment and application of inter-relevance-feedback relational network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7287025B2 (en) * 2003-02-12 2007-10-23 Microsoft Corporation Systems and methods for query expansion

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942302A (en) * 2014-04-16 2014-07-23 苏州大学 Method for establishment and application of inter-relevance-feedback relational network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种基于位置优化的排序学习方法;林原等;《山东大学学报(工学版)》;20120229;全文 *
个性化智能搜索引擎中查询扩展技术研究;朱玉皎;《万方数据》;20121225;全文 *
基于模板抽取和丰富特征的药名词典生成;徐博等;《第五届全国信息检索学术会议论文集》;20091114;全文 *

Also Published As

Publication number Publication date
CN104750819A (en) 2015-07-01

Similar Documents

Publication Publication Date Title
CN104750819B (en) The Biomedical literature search method and system of a kind of word-based grading sorting algorithm
CN104699730B (en) For identifying the method and system of the relation between candidate answers
CN104699763B (en) The text similarity gauging system of multiple features fusion
CN105404632B (en) System and method for carrying out serialized annotation on biomedical text based on deep neural network
CN107133213A (en) A kind of text snippet extraction method and system based on algorithm
CN104331449B (en) Query statement and determination method, device, terminal and the server of webpage similarity
CN106484675A (en) Fusion distributed semantic and the character relation abstracting method of sentence justice feature
CN109344236A (en) One kind being based on the problem of various features similarity calculating method
CN108520038B (en) Biomedical literature retrieval method based on sequencing learning algorithm
CN109948143A (en) The answer extracting method of community&#39;s question answering system
CN102662931A (en) Semantic role labeling method based on synergetic neural network
CN110879831A (en) Chinese medicine sentence word segmentation method based on entity recognition technology
CN109492105B (en) Text emotion classification method based on multi-feature ensemble learning
CN109635083A (en) It is a kind of for search for TED speech in topic formula inquiry document retrieval method
CN110298036A (en) A kind of online medical text symptom identification method based on part of speech increment iterative
CN111008215B (en) Expert recommendation method combining label construction and community relation avoidance
CN110851593B (en) Complex value word vector construction method based on position and semantics
CN108090223A (en) A kind of opening scholar portrait method based on internet information
CN113064999B (en) Knowledge graph construction algorithm, system, equipment and medium based on IT equipment operation and maintenance
Pandiaraj et al. Effective heart disease prediction using hybridmachine learning
CN106611016B (en) A kind of image search method based on decomposable word packet model
CN104537280B (en) Protein interactive relation recognition methods based on text relation similitude
CN113360647A (en) 5G mobile service complaint source-tracing analysis method based on clustering
CN107679121B (en) Mapping method and device of classification system, storage medium and computing equipment
CN101533398A (en) Method for searching pattern matching index

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant