Nothing Special   »   [go: up one dir, main page]

CN112035599B - Query method and device based on vertical search, computer equipment and storage medium - Google Patents

Query method and device based on vertical search, computer equipment and storage medium Download PDF

Info

Publication number
CN112035599B
CN112035599B CN202011229548.5A CN202011229548A CN112035599B CN 112035599 B CN112035599 B CN 112035599B CN 202011229548 A CN202011229548 A CN 202011229548A CN 112035599 B CN112035599 B CN 112035599B
Authority
CN
China
Prior art keywords
query
statement
sentence
keyword
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011229548.5A
Other languages
Chinese (zh)
Other versions
CN112035599A (en
Inventor
李加庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Sushang Bank Co ltd
Original Assignee
Nanjing Xingyun Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xingyun Digital Technology Co Ltd filed Critical Nanjing Xingyun Digital Technology Co Ltd
Priority to CN202011229548.5A priority Critical patent/CN112035599B/en
Publication of CN112035599A publication Critical patent/CN112035599A/en
Application granted granted Critical
Publication of CN112035599B publication Critical patent/CN112035599B/en
Priority to CA3177671A priority patent/CA3177671A1/en
Priority to CA3138556A priority patent/CA3138556A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Low-Molecular Organic Synthesis Reactions Using Catalysts (AREA)

Abstract

The invention discloses a query method, a query device, computer equipment and a storage medium based on vertical search, wherein the method comprises the following steps: performing regular matching on received initial query sentences, acquiring first sentences which meet matching rules in the initial query sentences, and determining first attribute categories corresponding to the first sentences; preprocessing a second statement which does not meet a matching rule in the initial query statement to obtain a keyword corresponding to the second statement; classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword; generating a target query statement according to the first statement, the first attribute category, the keyword and the second attribute category; and calling a preset search engine interface, matching a query result according to the target query statement, realizing search intention identification of the query statement for the vertical search engine, improving query efficiency and improving user experience.

Description

Query method and device based on vertical search, computer equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a vertical search based query method, apparatus, computer device, and storage medium.
Background
At present, search technologies are widely applied in various fields, and with the continuous increase of the scale of various information data, in order to use internal data resources more efficiently, enterprises with corresponding resources and capabilities tend to build vertical search engines to provide high-quality information retrieval services for internal and external clients according to specific application scenarios.
The vertical search engine receives keywords input by a user, queries in the inverted index documents, calculates the relevance between the index content and the input keywords, sorts the keywords according to the relevance, and finally gives a search result according to the relevance from high to low. On one hand, generally, the internal data of the enterprise often has multi-dimensional characteristics, and the query requirement often needs to be searched in multiple dimensions, and the input of the user often contains keywords of multiple attribute dimensions. On the other hand, a good vertical search engine not only needs to provide a data query function, but also supports a data query retrieval capability of providing multiple dimensions by one input, so as to improve the accuracy of a query result. Therefore, it is required that the vertical search engine can intelligently identify the query keyword input by the user and the data attribute field to which the query keyword belongs, so as to provide support for further optimizing the search query statement, thereby improving the accuracy of the search result and improving the search experience.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a query method, an apparatus, a computer device, and a storage medium based on vertical search, which support intention identification of multi-field search of a search engine in the vertical field, and further improve accuracy of search results and search experience.
In order to solve one or more technical problems, the invention adopts the technical scheme that:
in a first aspect, a vertical search based query method is provided, which includes the following steps:
performing regular matching on received initial query sentences, acquiring first sentences which meet matching rules in the initial query sentences, and determining first attribute categories corresponding to the first sentences;
preprocessing a second sentence which does not meet the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, and classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword;
generating a target query statement according to the first statement, the first attribute category, the keyword and the second attribute category;
and calling a preset search engine interface, and matching a query result according to the target query statement.
In some embodiments, the preprocessing a second statement that does not satisfy a matching rule in the initial query statement, and the obtaining a keyword corresponding to the second statement includes:
performing word segmentation processing on a second sentence which does not meet the matching rule in the initial query sentence to obtain a word segmentation result;
and determining the keywords of the second sentence according to the word segmentation result and a preset rule.
In some embodiments, before performing the word segmentation processing on the second sentence which does not satisfy the matching rule in the initial query sentence, the method includes:
and denoising the second sentence to remove the noise characters in the second sentence.
In some embodiments, the generating a target query statement from the first statement, the first attribute category, the keyword, and the second attribute category comprises:
generating data pairs respectively based on the first sentence and the corresponding first attribute category, the keyword and the corresponding second attribute category;
and generating a target query statement according to the data pair and a preset index rule of a search engine.
In some embodiments, the method further comprises a training process of the classification model, comprising:
acquiring training data according to a service scene;
and training a preset classifier by using the training data to obtain a trained classification model.
In some embodiments, the pre-set classifier comprises a logistic regression classifier or a support vector machine classifier.
In a second aspect, a vertical search based query device is provided, the device comprising:
the matching module is used for performing regular matching on the received initial query statement, acquiring a first statement which meets a matching rule in the initial query statement, and determining a first attribute category corresponding to the first statement;
the acquisition module is used for preprocessing a second statement which does not meet the matching rule in the initial query statement to acquire a keyword corresponding to the second statement;
the classification module is used for classifying each keyword by using a pre-trained classification model to acquire a second attribute category of each keyword;
a generation module, configured to generate a target query statement according to the first statement, the first attribute category, the keyword, and the second attribute category;
and the query module is used for calling a preset search engine interface and matching a query result according to the target query statement.
In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the following steps are implemented:
performing regular matching on received initial query sentences, acquiring first sentences which meet matching rules in the initial query sentences, and determining first attribute categories corresponding to the first sentences;
preprocessing a second sentence which does not meet the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, and classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword;
generating a target query statement according to the first statement, the first attribute category, the keyword and the second attribute category;
and calling a preset search engine interface, and matching a query result according to the target query statement.
In a fourth aspect, there is provided a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:
performing regular matching on received initial query sentences, acquiring first sentences which meet matching rules in the initial query sentences, and determining first attribute categories corresponding to the first sentences;
preprocessing a second sentence which does not meet the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, and classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword;
generating a target query statement according to the first statement, the first attribute category, the keyword and the second attribute category;
and calling a preset search engine interface, and matching a query result according to the target query statement.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the query method, device, computer equipment and storage medium based on vertical search provided by the embodiments of the present invention perform regular matching on a received initial query sentence to obtain a first sentence satisfying a matching rule in the initial query sentence, determine a first attribute type corresponding to the first sentence, pre-process a second sentence not satisfying the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, classify each keyword by using a pre-trained classification model to obtain a second attribute type of each keyword, generate a target query sentence according to the first sentence, the first attribute type, the keyword and the second attribute type, call a preset search engine interface, match a query result according to the target query sentence, and implement search intention identification of the query sentence for a vertical search engine, the efficiency of inquiry is improved and user experience is promoted.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating the device components of a search intent recognition system in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating training of a text model in accordance with an illustrative embodiment;
FIG. 3 is a flowchart illustrating identifying an attribute category for a keyword in accordance with an illustrative embodiment;
FIG. 4 is a flow diagram illustrating a vertical search based query method in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating a vertical search based querying device, according to an example embodiment;
FIG. 6 is a schematic diagram illustrating an internal architecture of a computer device, according to an example embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
As described in the background, in a specific field, such as the vertical search engine field, it is generally required to construct data of the vertical field, such as structured business data, into a vertical search engine to better provide query search service of the business data, wherein text-type data is not lacked. By constructing the search engine, the query function of the service data can be provided for the service by means of the efficient indexing technology of the search engine. Often, a good vertical search engine not only needs to provide a data query function, but also supports a data query retrieval capability of providing multiple dimensions by one input. This requires that the vertical search engine be able to intelligently identify the query terms entered by the user and the data attribute fields to which they pertain, so as to provide support for further optimizing the search query statement.
In order to solve the above problems, the embodiment of the present invention creatively provides a query method based on vertical search, the method includes a text recognition method for a vertical search engine, a multi-category text classification model of field attributes available for a user to search is trained based on service scene data after structured and unstructured cleaning in the vertical field, and short text attribute recognition processing is performed on input keywords of the user, so as to provide the search engine to search for different fields. Taking an enterprise information search engine as an example, the information is based on enterprises, such as text information of enterprise names, legal names and the like, character string information of registration numbers, uniform credit codes and the like, numerical information of registration capital and the like, and some other information. In order to inquire a certain enterprise, the vertical search engine supports text information search of enterprise names or legal information and also supports accurate matching of character strings of registration numbers and uniform credit codes.
Fig. 1 is an architecture diagram illustrating a search intention recognition system according to an exemplary embodiment, and referring to fig. 1, the system is composed of at least a memory, a system bus, a processor, and a network, wherein the memory may be composed of a plurality of storage media RAM, and specific properties of the memory are not limited herein.
Specifically, the above scheme can be realized by the following steps:
step one, constructing a multi-class text classification model for a vertical search engine based on service scene data, specifically, in the embodiment of the present invention, the step includes the following processes:
(1) business scenario data preparation and extraction
Specifically, firstly, the service data of the vertical search engine to be constructed is analyzed in combination with the service scene requirements of the vertical search engine, data extraction is performed from a related database to obtain structured data, preparation is made for establishing index data of the vertical search engine, and training data is provided for model training.
(2) Determining search dimension targets
The search fields of a vertical search engine, i.e., the fields that are desired to be provided to a user search query when creating a vertical search engine, are determined. For example, for a set of search engine systems for querying property information, a query function for multiple fields such as property source cell name information, property source agent name information, etc. needs to be provided. And taking out all data under the fields according to the structured data obtained in the previous step, and setting the labels of the fields on the corresponding field data to form the labeled data of multiple categories.
(3) And (5) performing processes of feature extraction, model training and the like.
FIG. 2 is a flow diagram illustrating training of a text model according to an exemplary embodiment, and referring to FIG. 2, features of corresponding fields are selected according to the above data format. In specific implementation, word segmentation processing can be carried out on related text content fields according to characters or words, characteristics of the words are extracted, corresponding characteristic vectors are generated and serve as model training data, such as TF-IDF characteristics, and TF-IDF characteristic vectors are generated. The training mode of the classification model can select a classifier based on the scimit-leann machine learning library, such as a logistic regression classifier or a support vector machine classifier, and can also construct other classifiers.
Taking the example of constructing a classification model of enterprise names and human names, the classification model is mainly used for distinguishing and inquiring the company names and natural human legal persons or actual control persons or company high-management and the like, and is used for different inquiry logics (such as enterprise name search, legal person search, high-management search, combined search and the like). When constructing the classification model, the process is as follows:
firstly, extracting training data from a business database or a search engine, wherein the data are an enterprise name list and a person name list, and labels are 'enterprise name' and 'person name', and the forms are as follows:
enterprise name: [
'Beijing company of furniture',
'Shenzhen software Limited',
……,
'Hubei star mechanical plant'
]
Name of person: [
'any of a',
' Zhang,
' Chen,
……,
'Li'
]
Second, a data set is constructed, examples of which are as follows:
(' furniture of Beijing city ' company ', ' Enterprise name ')
('Shenzhen software Limited', 'Enterprise name')
……
('Hebei xi mechanical plant', 'Business name')
(' ren's word ', ' name ')
('Zhang xing', 'name')
……
('Lixing', 'name')
Then, word segmentation processing is carried out on the data set according to characters, and the processing result is as follows:
(' Beijing city ' house with limited company ', ' Enterprise name ')
('Shenzhen star software restricted company', 'Enterprise name')
……
('Zhang xing', 'name')
……
('Lixing', 'name')
Then, the data set is cut into a training set and a testing set by adopting a random sequence, for example, according to the following steps of 4: 1, adopting a scimit-learn machine learning library to carry out TF-IDF text vector extraction, generating a TF-IDF matrix of a training set, and selecting a classifier (such as naive Bayes, logistic regression, support vector machine and the like) to carry out model training to obtain a classifier;
and finally, testing and evaluating the prediction capability of the classifier, and performing model evaluation on the classifier by adopting the test set generated in the previous step so as to evaluate the practicability of the classifier.
And step two, identifying the search intention of the received initial query statement, and generating a target query statement.
Specifically, fig. 3 is a flowchart illustrating a process of identifying attribute categories of keywords according to an exemplary embodiment, and referring to fig. 3, in an embodiment of the present invention, first, a regular matching is performed on an input initial query sentence, a keyword search is supported for a result obtained by the regular matching, then, a text character purification process is performed on a result not obtained by the matching, taking a text form as an example, noise characters in the text are removed, for example, useless characters and punctuations are removed, a chinese segmentation process is performed, and a keyword list included in the initial query sentence is extracted.
And secondly, calling the classification model obtained in the above steps for each keyword to perform classification processing, and obtaining the attribute category of each keyword. The combination of the dimension(s) of the keyword is input as a judgment for judging the search intention of the user, the search word is continuously processed by error correction, association and the like according to the search intention, and a (keyword, attribute) data pair is output.
In the embodiment of the invention, the vertical search engine accepts character input in any form, so that the query input character string (namely the initial query sentence) needs to be preprocessed, different inputs are judged, and the input character string is subjected to attribute judgment and output.
Examples are as follows:
step 201: after receiving the initial query statement, performing regular matching on the input initial query statement, judging whether the input initial query statement conforms to code formats such as registration codes or enterprise credit codes, and the like, if so, marking the character string as a corresponding code attribute, and outputting the character string. Otherwise, proceed to step 202.
For example:
(1) inputting 91320000608950986L and outputting social uniform credit code "
(2) Inputting 'future technology' and entering the next step of processing.
Step 202: and inputting the preprocessed initial query sentence into a text classifier, and outputting a corresponding prediction attribute category.
For example:
inputting 'future technology' and the classifier output is 'enterprise name'
Inputting Zhang III, and outputting the name of the person by the classifier.
And step three, constructing a target query statement, calling a preset search engine interface, and matching a query result according to the target query statement.
Specifically, based on the keyword attribute pair obtained in the previous step, a query statement (i.e., a target query statement) adapted to the data index of the underlying search engine is constructed, and a unified interface of the search engine is called to obtain a query data result.
As a preferred implementation manner, in the embodiment of the present invention, a search intention identification system and an apparatus for searching for enterprise information may also be constructed in advance based on the search intention identification module, so as to support query input of multiple attributes during enterprise information search, and adapt to retrieval of different attribute information according to the attribute category returned by the search intention identification module.
Example two
FIG. 4 is a flow diagram illustrating a vertical search based query method according to an exemplary embodiment, and referring to FIG. 4, the method includes the steps of:
s1, performing regular matching on the received initial query statement, acquiring a first statement which meets the matching rule in the initial query statement, and determining a first attribute category corresponding to the first statement.
S2: preprocessing a second sentence which does not meet the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, and classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword.
Specifically, in the embodiment of the present invention, any form of character input is accepted, that is, the initial query statement is not limited, so that the query input character string needs to be preprocessed, different inputs are determined, and the attribute of the input character string is determined and output.
Specifically, in order to improve the precision and the query efficiency of the search query, in the embodiment of the present invention, the search intention of the user is identified according to the received initial query sentence, and in the specific implementation, the keywords included in the second sentence may be extracted first, and then each keyword is classified by using a pre-trained classification model, so as to obtain the attribute category of each keyword.
S3: and generating a target query statement according to the first statement, the first attribute category, the keyword and the second attribute category.
Specifically, based on the first sentence, the first attribute type, the keyword and the corresponding second attribute type obtained in the above steps, a query sentence adapted to the data index of the underlying search engine is constructed.
S4: and calling a preset search engine interface, and matching a query result according to the target query statement.
As a preferred implementation manner, in an embodiment of the present invention, the preprocessing the second statement that does not satisfy the matching rule in the initial query statement, and acquiring the keyword corresponding to the second statement includes:
performing word segmentation processing on a second sentence which does not meet the matching rule in the initial query sentence to obtain a word segmentation result;
and determining the keywords of the second sentence according to the word segmentation result and a preset rule.
Specifically, in the embodiment of the present invention, a keyword matching rule is predefined, and the segmentation result is matched according to the keyword matching rule, so that the segmentation result meeting the requirement is obtained as the keyword.
As a preferred implementation manner, in the embodiment of the present invention, before performing word segmentation processing on a second statement that does not satisfy a matching rule in the initial query statement, the method includes:
and denoising the second sentence to remove the noise characters in the second sentence.
Specifically, in order to improve the query efficiency and the query accuracy, in the embodiment of the present invention, denoising processing may be further performed on a second sentence that does not satisfy the matching rule in the initial query sentence, so as to remove noise characters in the second sentence, for example, remove useless characters and punctuations.
As a preferred implementation manner, in an embodiment of the present invention, the generating a target query statement according to the first statement, the first attribute category, the keyword, and the second attribute category includes:
generating data pairs respectively based on the first sentence and the corresponding first attribute category, the keyword and the corresponding second attribute category;
and generating a target query statement according to the data pair and a preset index rule of a search engine.
As a preferred implementation manner, in an embodiment of the present invention, the method further includes a training process of the classification model, including:
acquiring training data according to a service scene;
and training a preset classifier by using the training data to obtain a trained classification model.
As a preferred implementation manner, in an embodiment of the present invention, the preset classifier includes a logistic regression classifier or a support vector machine classifier.
Fig. 5 is a schematic structural diagram illustrating a vertical search based query device according to an exemplary embodiment, where the device includes:
the matching module is used for performing regular matching on the received initial query statement, acquiring a first statement which meets a matching rule in the initial query statement, and determining a first attribute category corresponding to the first statement;
the acquisition module is used for preprocessing a second statement which does not meet the matching rule in the initial query statement to acquire a keyword corresponding to the second statement;
the classification module is used for classifying each keyword by using a pre-trained classification model to acquire a second attribute category of each keyword;
a generation module, configured to generate a target query statement according to the first statement, the first attribute category, the keyword, and the second attribute category;
and the query module is used for calling a preset search engine interface and matching a query result according to the target query statement.
As a preferred implementation manner, in an embodiment of the present invention, the obtaining module includes:
the word segmentation unit is used for performing word segmentation processing on a second sentence which does not meet the matching rule in the initial query sentence to obtain a word segmentation result;
and the matching unit is used for determining the keywords of the second sentence according to the word segmentation result and a preset rule.
As a preferred implementation manner, in an embodiment of the present invention, the apparatus further includes:
and the denoising module is used for denoising the second statement and removing the noise characters in the second statement.
As a preferred implementation manner, in an embodiment of the present invention, the generating module is specifically configured to:
generating data pairs respectively based on the first sentence and the corresponding first attribute category, the keyword and the corresponding second attribute category;
and generating a target query statement according to the data pair and a preset index rule of a search engine.
As a preferred implementation manner, in an embodiment of the present invention, the apparatus further includes:
the training module is used for acquiring training data according to the service scene; and training a preset classifier by using the training data to obtain a trained classification model.
As a preferred implementation manner, in an embodiment of the present invention, the preset classifier includes a logistic regression classifier or a support vector machine classifier.
Fig. 6 is a schematic diagram illustrating an internal configuration of a computer device according to an exemplary embodiment, which includes a processor, a memory, and a network interface connected through a system bus, as shown in fig. 6. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of optimization of an execution plan.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
As a preferred implementation manner, in an embodiment of the present invention, the computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the following steps when executing the computer program:
performing regular matching on received initial query sentences, acquiring first sentences which meet matching rules in the initial query sentences, and determining first attribute categories corresponding to the first sentences;
preprocessing a second sentence which does not meet the matching rule in the initial query sentence, obtaining keywords corresponding to the second sentence, and classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword;
generating a target query statement according to the first statement, the first attribute category, the keyword and the second attribute category;
and calling a preset search engine interface, and matching a query result according to the target query statement.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
performing word segmentation processing on a second sentence which does not meet the matching rule in the initial query sentence to obtain a word segmentation result;
and determining the keywords of the second sentence according to the word segmentation result and a preset rule.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
and denoising the second sentence to remove the noise characters in the second sentence.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
generating data pairs respectively based on the first sentence and the corresponding first attribute category, the keyword and the corresponding second attribute category;
and generating a target query statement according to the data pair and a preset index rule of a search engine.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
acquiring training data according to a service scene;
and training a preset classifier by using the training data to obtain a trained classification model.
In an embodiment of the present invention, a computer-readable storage medium is further provided, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the following steps:
performing regular matching on received initial query sentences, acquiring first sentences which meet matching rules in the initial query sentences, and determining first attribute categories corresponding to the first sentences;
preprocessing a second sentence which does not meet the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, and classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword;
generating a target query statement according to the first statement, the first attribute category, the keyword and the second attribute category;
and calling a preset search engine interface, and matching a query result according to the target query statement.
As a preferred implementation manner, in the embodiment of the present invention, when executed by the processor, the computer program further implements the following steps:
performing word segmentation processing on a second sentence which does not meet the matching rule in the initial query sentence to obtain a word segmentation result;
and determining the keywords of the second sentence according to the word segmentation result and a preset rule.
As a preferred implementation manner, in the embodiment of the present invention, when executed by the processor, the computer program further implements the following steps:
and denoising the second sentence to remove the noise characters in the second sentence.
As a preferred implementation manner, in the embodiment of the present invention, when executed by the processor, the computer program further implements the following steps:
generating data pairs respectively based on the first sentence and the corresponding first attribute category, the keyword and the corresponding second attribute category;
and generating a target query statement according to the data pair and a preset index rule of a search engine.
As a preferred implementation manner, in the embodiment of the present invention, when executed by the processor, the computer program further implements the following steps:
acquiring training data according to a service scene;
and training a preset classifier by using the training data to obtain a trained classification model.
In summary, the technical solution provided by the embodiment of the present invention has the following beneficial effects:
the query method, device, computer equipment and storage medium based on vertical search provided by the embodiments of the present invention perform regular matching on a received initial query sentence to obtain a first sentence satisfying a matching rule in the initial query sentence, determine a first attribute type corresponding to the first sentence, pre-process a second sentence not satisfying the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, classify each keyword by using a pre-trained classification model to obtain a second attribute type of each keyword, generate a target query sentence according to the first sentence, the first attribute type, the keyword and the second attribute type, call a preset search engine interface, match a query result according to the target query sentence, and implement search intention identification of the query sentence for a vertical search engine, the efficiency of inquiry is improved and user experience is promoted.
It should be noted that: in the query device based on vertical search provided in the foregoing embodiment, when triggering query service, only the division of the functional modules is illustrated, and in practical application, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the vertical search based query device provided by the above embodiment and the vertical search based query method embodiment belong to the same concept, that is, the device is based on the vertical search based query method, and the specific implementation process thereof is detailed in the method embodiment and is not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A vertical search based query method, the method comprising:
performing regular matching on received initial query sentences, acquiring first sentences which meet matching rules in the initial query sentences, and determining first attribute categories corresponding to the first sentences;
preprocessing a second sentence which does not meet the matching rule in the initial query sentence to obtain a keyword corresponding to the second sentence, and classifying each keyword by using a pre-trained classification model to obtain a second attribute category of each keyword;
performing error correction and association processing on the initial query statement according to the keyword and the corresponding second attribute category to obtain a processing result, and respectively generating a data pair based on the processing result, the first statement and the corresponding first attribute category, the keyword and the corresponding second attribute category;
generating a target query statement according to the data pair and a preset index rule of a search engine;
and calling a preset search engine interface, and matching a query result according to the target query statement.
2. The vertical search based query method according to claim 1, wherein the preprocessing a second sentence which does not satisfy a matching rule in the initial query sentence, and the obtaining a keyword corresponding to the second sentence comprises:
performing word segmentation processing on a second sentence which does not meet the matching rule in the initial query sentence to obtain a word segmentation result;
and determining the keywords of the second sentence according to the word segmentation result and a preset rule.
3. The vertical search based query method of claim 2, wherein before performing the word segmentation processing on the second sentence which does not satisfy the matching rule in the initial query sentence, the method comprises:
and denoising the second sentence to remove the noise characters in the second sentence.
4. The vertical search based query method according to any one of claims 1 to 3, wherein the method further comprises a training process of a classification model, comprising:
acquiring training data according to a service scene;
and training a preset classifier by using the training data to obtain a trained classification model.
5. The vertical search based query method of claim 4, wherein the pre-set classifier comprises a logistic regression classifier or a support vector machine classifier.
6. A vertical search based query device, the device comprising:
the matching module is used for performing regular matching on the received initial query statement, acquiring a first statement which meets a matching rule in the initial query statement, and determining a first attribute category corresponding to the first statement;
the acquisition module is used for preprocessing a second statement which does not meet the matching rule in the initial query statement to acquire a keyword corresponding to the second statement;
the classification module is used for classifying each keyword by using a pre-trained classification model to acquire a second attribute category of each keyword;
a generating module, configured to perform error correction and association processing on the initial query statement according to the keyword and the corresponding second attribute category, obtain a processing result, and generate a data pair based on the processing result, the first statement, the corresponding first attribute category, the keyword, and the corresponding second attribute category, respectively; generating a target query statement according to the data pair and a preset index rule of a search engine;
and the query module is used for calling a preset search engine interface and matching a query result according to the target query statement.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 5 are implemented when the computer program is executed by the processor.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202011229548.5A 2020-11-06 2020-11-06 Query method and device based on vertical search, computer equipment and storage medium Active CN112035599B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202011229548.5A CN112035599B (en) 2020-11-06 2020-11-06 Query method and device based on vertical search, computer equipment and storage medium
CA3177671A CA3177671A1 (en) 2020-11-06 2021-11-08 Enquiring method and device based on vertical search, computer equipment and storage medium
CA3138556A CA3138556A1 (en) 2020-11-06 2021-11-08 Apparatuses, storage medium and method of querying data based on vertical search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011229548.5A CN112035599B (en) 2020-11-06 2020-11-06 Query method and device based on vertical search, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112035599A CN112035599A (en) 2020-12-04
CN112035599B true CN112035599B (en) 2021-08-27

Family

ID=73572806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011229548.5A Active CN112035599B (en) 2020-11-06 2020-11-06 Query method and device based on vertical search, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112035599B (en)
CA (2) CA3177671A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818005B (en) * 2021-02-03 2024-02-02 北京清科慧盈科技有限公司 Structured data searching method, device, equipment and storage medium
CN113254587B (en) * 2021-05-31 2023-10-13 北京奇艺世纪科技有限公司 Search text recognition method and device, computer equipment and storage medium
CN113590919A (en) * 2021-07-29 2021-11-02 小船出海教育科技(北京)有限公司 Search request processing method and device, electronic equipment and computer readable medium
CN114943234B (en) * 2022-06-27 2024-03-19 企查查科技股份有限公司 Enterprise name linking method, enterprise name linking device, computer equipment and storage medium
CN115563167B (en) * 2022-12-02 2023-03-31 浙江大华技术股份有限公司 Data query method, electronic device and computer-readable storage medium
CN117763109B (en) * 2023-12-21 2024-06-11 湖南领众档案管理有限公司 Data checking method for file full-text retrieval
CN117519702B (en) * 2023-12-29 2024-03-19 冠骋信息技术(苏州)有限公司 Search page design method and system based on low code collocation
CN118568056B (en) * 2024-08-01 2024-10-08 浪潮通用软件有限公司 Electronic archive retrieval method, device and medium based on dynamically constructed prompt words

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data
CN110020063B (en) * 2017-07-18 2021-09-03 北京京东尚科信息技术有限公司 Vertical search method and system
CN107577755B (en) * 2017-08-31 2020-06-19 江西博瑞彤芸科技有限公司 Searching method
CN107958406A (en) * 2017-11-30 2018-04-24 北京小度信息科技有限公司 Inquire about acquisition methods, device and the terminal of data

Also Published As

Publication number Publication date
CA3177671A1 (en) 2022-05-06
CN112035599A (en) 2020-12-04
CA3138556A1 (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN111222305B (en) Information structuring method and device
CN112069298A (en) Human-computer interaction method, device and medium based on semantic web and intention recognition
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
US10565253B2 (en) Model generation method, word weighting method, device, apparatus, and computer storage medium
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN111078837A (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN112257419A (en) Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof
CN115470338B (en) Multi-scenario intelligent question answering method and system based on multi-path recall
CN110597844A (en) Heterogeneous database data unified access method and related equipment
CN113064980A (en) Intelligent question and answer method and device, computer equipment and storage medium
CN113157867A (en) Question answering method and device, electronic equipment and storage medium
CN108287848B (en) Method and system for semantic parsing
CN109885651B (en) Question pushing method and device
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
CN117539990A (en) Problem processing method and device, electronic equipment and storage medium
CN111325033B (en) Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN115577080A (en) Question reply matching method, system, server and storage medium
CN111104422B (en) Training method, device, equipment and storage medium of data recommendation model
CN117235137B (en) Professional information query method and device based on vector database
CN113886535B (en) Knowledge graph-based question and answer method and device, storage medium and electronic equipment
CN111460088A (en) Similar text retrieval method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 834, Yingying building, No.99, Tuanjie Road, yanchuangyuan, Jiangbei new district, Nanjing, Jiangsu Province

Applicant after: Nanjing Xingyun Digital Technology Co.,Ltd.

Address before: Room 834, Yingying building, No.99, Tuanjie Road, yanchuangyuan, Jiangbei new district, Nanjing, Jiangsu Province

Applicant before: Suning financial technology (Nanjing) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240618

Address after: The 7th, 8th, 9th, 27th, 28th, and 29th floors of Building 4, No. 248 Lushan Road, Jianye District, Nanjing City, Jiangsu Province, 210000, and the 1st and 2nd floors of the podium of Building 4

Patentee after: Jiangsu Sushang Bank Co.,Ltd.

Country or region after: China

Address before: Room 834, Yingying building, No.99, Tuanjie Road, yanchuangyuan, Jiangbei new district, Nanjing, Jiangsu Province

Patentee before: Nanjing Xingyun Digital Technology Co.,Ltd.

Country or region before: China