Nothing Special   »   [go: up one dir, main page]

CN107621892B - Method and device for acquiring information - Google Patents

Method and device for acquiring information Download PDF

Info

Publication number
CN107621892B
CN107621892B CN201710970949.8A CN201710970949A CN107621892B CN 107621892 B CN107621892 B CN 107621892B CN 201710970949 A CN201710970949 A CN 201710970949A CN 107621892 B CN107621892 B CN 107621892B
Authority
CN
China
Prior art keywords
pinyin
information
entry
query information
entries
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710970949.8A
Other languages
Chinese (zh)
Other versions
CN107621892A (en
Inventor
肖求根
詹金波
郑利群
邓卓彬
陈丽然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710970949.8A priority Critical patent/CN107621892B/en
Publication of CN107621892A publication Critical patent/CN107621892A/en
Application granted granted Critical
Publication of CN107621892B publication Critical patent/CN107621892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The embodiment of the application discloses a method and a device for acquiring information. One embodiment of the method comprises: receiving query information, wherein the query information comprises pinyin query information and/or semantic query information, the pinyin query information is used for querying terms corresponding to the pinyin query information, the semantic query information is used for querying terms corresponding to the semantic query information through term vector similarity, and the term vector similarity is represented through the distance of the terms in a vector space; inquiring candidate entries corresponding to the inquiry information from an entry library; and sequencing and displaying the candidate entries. The method and the device can acquire the candidate entries meeting the requirements of pinyin query information and semantic query information, and improve the accuracy of acquiring the entries by the user.

Description

Method and device for acquiring information
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to the technical field of information acquisition, and particularly relates to a method and a device for acquiring information.
Background
With the development of science and technology, information is transmitted more and more frequently. People can carry out data connection with various intelligent equipment through the network, and then realize the mutual transmission of information, have improved the informationization level of people's work and life.
The smart device may be installed with an information input application (e.g., an input method), and typically, the user obtains corresponding text information by spelling pinyin information in the information input application, and then stores the text information or interacts with others.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for acquiring information, so as to solve the technical problems mentioned in the above background.
In a first aspect, an embodiment of the present application provides a method for acquiring information, where the method includes: receiving query information, wherein the query information comprises pinyin query information and/or semantic query information, the pinyin query information is used for querying terms corresponding to the pinyin query information, the semantic query information is used for querying terms corresponding to the semantic query information through term vector similarity, and the term vector similarity is represented through the distance of the terms in a vector space; inquiring candidate entries corresponding to the inquiry information from an entry library; and sequencing and displaying the candidate entries.
In some embodiments, the pinyin query information includes a pinyin to be queried and tone conditions corresponding to the pinyin to be queried, the pinyin to be queried is used for querying a term matching the pinyin to be queried, the tone conditions are used for defining the pronunciation of the pinyin to be queried, and the tone conditions include that the tone conditions include one, two, three and/or four tones.
In some embodiments, when the pinyin to be queried includes pinyin information, the pinyin to be queried includes initials and/or finals, and querying the candidate entries corresponding to the query information from the entry library includes: inquiring initial entries corresponding to initial consonants and/or vowels contained in the pinyin to be inquired from an entry library through the pinyin index, wherein the pinyin index is used for inquiring the initial entries corresponding to the pinyin to be inquired from the entry library; and determining candidate entries from the initial entries through the tone indexes, wherein the tone indexes are used for inquiring the candidate entries corresponding to the tone conditions from the initial entries.
In some embodiments, when the pinyin to be queried contains text information, the pinyin information of the text information is used as the pinyin to be queried, and querying the candidate entry corresponding to the query information from the entry library includes: inquiring an initial entry corresponding to pinyin information of character information contained in pinyin to be inquired from an entry library through the pinyin index, wherein the pinyin index is used for inquiring the initial entry corresponding to the pinyin to be inquired from the entry library; and determining candidate entries from the initial entries through the tone indexes, wherein the tone indexes are used for inquiring the candidate entries corresponding to the tone conditions from the initial entries.
In some embodiments, the method further includes a step of constructing a vocabulary entry library, a pinyin index, and a tone index, and the step of constructing the vocabulary entry library, the pinyin index, and the tone index includes: obtaining a corpus, and dividing the corpus into entries to obtain an entry library; and constructing a pinyin index and a tone index through the pinyin information of the entries.
In some embodiments, the sorting and displaying the candidate entries includes: and sequencing and displaying the candidate entries according to the frequency of the candidate entries appearing in the corpus or the word vector similarity between the candidate entries and the semantic query information.
In a second aspect, an embodiment of the present application provides an apparatus for acquiring information, where the apparatus includes: the information receiving unit is used for receiving query information, wherein the query information comprises pinyin query information and/or semantic query information, the pinyin query information is used for querying terms corresponding to the pinyin query information, the semantic query information is used for querying terms corresponding to the semantic query information through term vector similarity, and the term vector similarity is represented through the distance of the terms in a vector space; the candidate entry query unit is used for querying the candidate entries corresponding to the query information from the entry library; and the display unit is used for sequencing and displaying the candidate entries.
In some embodiments, the pinyin query information includes a pinyin to be queried and tone conditions corresponding to the pinyin to be queried, the pinyin to be queried is used for querying a term matching the pinyin to be queried, the tone conditions are used for defining the pronunciation of the pinyin to be queried, and the tone conditions include that the tone conditions include one, two, three and/or four tones.
In some embodiments, when the pinyin to be queried includes pinyin information, the pinyin to be queried includes initials and/or finals, and the candidate entry query unit includes: a first initial entry obtaining subunit, configured to query, from an entry library, an initial entry corresponding to an initial and/or a final included in a pinyin to be queried through the pinyin index, where the pinyin index is used to query, from the entry library, the initial entry corresponding to the pinyin to be queried; and the first candidate entry obtaining subunit is used for determining the candidate entries from the initial entries through the tone indexes, and the tone indexes are used for inquiring the candidate entries corresponding to the tone conditions from the initial entries.
In some embodiments, when the pinyin to be queried contains text information, the pinyin information of the text information is used as the pinyin to be queried, and the candidate entry querying unit includes: a second initial entry obtaining subunit, configured to query, from the entry library, an initial entry corresponding to pinyin information of text information included in a pinyin to be queried through the pinyin index, where the pinyin index is used to query, from the entry library, the initial entry corresponding to the pinyin to be queried; and the second candidate entry obtaining subunit is used for determining the candidate entries from the initial entries through the tone indexes, and the tone indexes are used for inquiring the candidate entries corresponding to the tone conditions from the initial entries.
In some embodiments, the apparatus further includes a construction unit configured to construct a vocabulary entry base, a pinyin index, and a pitch index, where the construction unit includes: the vocabulary entry base building subunit is used for acquiring the corpus and dividing the corpus into vocabulary entries to obtain a vocabulary entry base; and the index constructing subunit is used for constructing a pinyin index and a tone index according to the pinyin information of the vocabulary entry.
In some embodiments, the display unit includes: and sequencing and displaying the candidate entries according to the frequency of the candidate entries appearing in the corpus or the word vector similarity between the candidate entries and the semantic query information.
In a third aspect, an embodiment of the present application provides a server, including: one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to perform the method for obtaining information of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for acquiring information of the first aspect.
According to the method and the device for acquiring information, the candidate entries are selected from the entry library through the pinyin query information and/or the semantic query information contained in the query information, and the candidate entries meeting the requirements of the pinyin query information and the semantic query information at the same time can be acquired; and then, the candidate entries are sorted and displayed, so that the accuracy of obtaining the entries by the user is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for obtaining information according to the present application;
FIG. 3 is a schematic illustration of an application scenario of a method for obtaining information according to the present application;
FIG. 4 is a schematic block diagram illustrating one embodiment of an apparatus for obtaining information according to the present application;
FIG. 5 is a block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for obtaining information or the apparatus for obtaining information of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various information processing applications, such as a web browser application, an input method application, an information editing application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting information processing, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, for example, a server that performs data processing on query information sent from the terminal devices 101, 102, and 103 to obtain corresponding candidate entries. After receiving the query information, the server 105 searches and displays the corresponding candidate entry in the entry library according to the pinyin query information and/or the semantic query information contained in the query information.
It should be noted that the method for acquiring information provided in the embodiment of the present application may be executed by the terminal devices 101, 102, and 103 individually, or may also be executed by the terminal devices 101, 102, and 103 and the server 105 together. Accordingly, the means for acquiring information may be provided in the terminal devices 101, 102, 103, or in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for obtaining information in accordance with the present application is shown. The method for acquiring information comprises the following steps:
step 201, receiving query information.
In the present embodiment, the electronic device (e.g., the server 105 shown in fig. 1) on which the method for acquiring information operates may receive inquiry information from the terminal devices 101, 102, 103 with which the user acquires information, through a wired connection manner or a wireless connection manner. The query information comprises pinyin query information and/or semantic query information. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
When a user needs to acquire character information meeting a specific requirement (for example, the character information may be rhymes, hypernyms, and the like), query information may be sent to the server 105 through the terminal devices 101, 102, and 103. When the user needs to obtain information meeting the rhyme-retention requirement, the query information can comprise pinyin query information; when a user needs to obtain a similar meaning word related to some specific terms, the query information may include semantic query information (i.e., the semantic query information includes the specific terms); when the user needs to obtain information simultaneously meeting the requirements of rhyme and similar meaning words, the query information can simultaneously comprise pinyin query information and semantic query information. The pinyin query information is used for querying entries corresponding to the pinyin query information; the semantic query information is used for querying the entry corresponding to the semantic query information through word vector similarity, and the word vector similarity is represented through the distance of the entry in a vector space.
In some optional implementation manners of this embodiment, the pinyin query information includes a pinyin to be queried and tone conditions corresponding to the pinyin to be queried, the pinyin to be queried is used to query a vocabulary entry matched with the pinyin to be queried, the tone conditions are used to define pronunciation of the pinyin to be queried, and the tone conditions include that the tone conditions include one tone, two tones, three tones, and/or four tones.
The above-mentioned information satisfying the rhyme-retention requirement is also taken as an example. In order to satisfy the specified rhyme-entering requirement, the pinyin query information input by the user generally comprises the pinyin to be queried and tone conditions corresponding to the pinyin to be queried. For example, the pinyin to be queried may be: jiangnan; the corresponding tone conditions may be: one sound, one sound. The pinyin query information including the pinyin to be queried and the tone condition may be in the following form: { jiangnan |11}, { jiang1| nan1}, and the like. Wherein, { jiangnan |11} indicates that the tone condition of the pinyin "jiang" to be queried is "one sound", and the tone condition of the pinyin "nan" to be queried is "one sound". { jiang _1| nan _1} has the same meaning as { jiangnan |11} and is different in form. That is, the pinyin and pitch condition to be queried may be in the form of: { spelling to be queried | tone condition } or { spelling to be queried _ tone condition | spelling to be queried _ tone condition }. The entries corresponding to the pinyin query information may be: jiangnan, and the like. When the pinyin to be queried is not changed and the tone condition is changed, the entries corresponding to the pinyin query information may be: jiang nan, etc.
Optionally, when the tone condition is not limited, the entry corresponding to the pinyin query information may be an entry having the same pinyin as the pinyin to be queried. When the pinyin to be queried is not limited, the entries corresponding to the pinyin query information can be entries with the same tone and tone conditions.
It should be noted that, according to actual needs, the forms of the pinyin to be queried and the tone condition may be various types, and are not limited to the above { pinyin to be queried | tone condition } or { pinyin to be queried _ tone condition | pinyin to be queried _ tone condition }, which is not described herein again.
Step 202, querying candidate terms corresponding to the query information from a term library.
Upon receiving the query information, the server 105 may query the entry candidate corresponding to the query information from the entry library.
In some optional implementation manners of this embodiment, when the pinyin to be queried includes pinyin information, the pinyin to be queried includes initials and/or finals, and querying the candidate entry corresponding to the query information from the entry library may include the following steps:
firstly, inquiring an initial entry corresponding to an initial consonant and/or a final contained in the pinyin to be inquired from an entry library through the pinyin index.
The pinyin to be queried can simultaneously comprise the initial consonant and the final, and also can only comprise the initial consonant or only comprise the final, which is determined according to the actual requirement.
When the pinyin to be queried contains pinyin information, the server 105 may query the initial entry corresponding to the initial consonant and/or the final contained in the pinyin to be queried from the entry library through the pinyin index. The pinyin index is used for inquiring the initial entry corresponding to the pinyin to be inquired from the entry library.
And secondly, determining candidate entries from the initial entries through the tone indexes.
On the basis of obtaining the initial entries, the entries meeting the tone conditions are screened out from the initial entries through tone indexes and serve as candidate entries. The tone index is used for inquiring candidate entries corresponding to tone conditions from the initial entries.
Therefore, the entry meeting the requirement of the pinyin to be inquired can be obtained through the pinyin to be inquired, the tone of the obtained entry can be further selected, and the accuracy and pertinence of obtaining the entry are improved.
In some optional implementation manners of this embodiment, when the pinyin to be queried includes text information, the pinyin information of the text information is used as the pinyin to be queried, and querying the candidate entry corresponding to the query information from the entry library may include the following steps:
firstly, inquiring an initial entry corresponding to the pinyin information of the character information contained in the pinyin to be inquired from an entry library through the pinyin index.
When the pinyin to be queried contains character information, the pinyin to be queried, which is input by the user, can be considered as the pinyin information of the character information. The server 105 then queries the corresponding initial entry from the entry library by using the pinyin index. The pinyin index is used for inquiring the initial entry corresponding to the pinyin to be inquired from the entry library.
And secondly, determining candidate entries from the initial entries through the tone indexes.
The same process is carried out as above, on the basis of obtaining the initial entries, the entries meeting the tone condition are screened out from the initial entries through the tone index to be used as candidate entries. The tone index is used for inquiring candidate entries corresponding to tone conditions from the initial entries.
It should be noted that, in practice, the server 105 may also obtain the entries from the entry library through the tone condition, and then screen out the candidate entries through the pinyin to be queried.
In some optional implementation manners of this embodiment, the method may further include a step of constructing a vocabulary entry library, a pinyin index, and a tone index, where the step of constructing the vocabulary entry library, the pinyin index, and the tone index may include the following steps:
the method comprises the steps of firstly, obtaining a corpus, and dividing the corpus into entries to obtain an entry library.
The term bank contains terms. The entries in the entry library of the present application may be from a specified corpus (e.g., an article, a book, etc.), or may be from a broader corpus (e.g., a dictionary, etc.), which is determined according to actual needs.
After obtaining the corpus, the server 105 may divide the content of the corpus into entries according to the usage habit, professional belongings, and the like, thereby obtaining an entry library corresponding to the corpus.
After the entry library is obtained, a corresponding word vector can be set for each entry. The word vectors can be set by considering the above-mentioned classification of the vocabulary entry and other factors, and the word vectors of the vocabulary entries belonging to the same classification are set to be similar values.
And secondly, constructing a pinyin index and a tone index through the pinyin information of the entries.
After obtaining the entries, the server 105 may obtain pinyin information and tone information of each entry, and obtain pinyin indexes and tone indexes corresponding to the entry library.
In some optional implementation manners of this embodiment, the querying the candidate entry corresponding to the query information from the entry library may further include: and querying candidate entries corresponding to the semantic query information from the entry library.
In practice, in order to meet the requirements of composition word filling and the like, near-meaning words of a certain category or a certain entry need to be acquired, and at this time, the entry can be acquired by a method of semantic query information.
When the query information contains semantic query information, the server 105 may search the term library for candidate terms of the semantic query information having the same or similar semantics. For example, it may be: the server 105 acquires the word vector of each entry in the entry library, and then calculates the distance between the word vector of the semantic query information and the word vector of the entry in the vector space, and uses the distance as the word vector similarity between the word vector of the semantic query information and the word vector of the entry.
It should be noted that if the semantic query information is also contained in the entry base, the word vector similarity can be directly calculated; and if the semantic query information is not contained in the vocabulary entry base, setting a word vector for the semantic query information according to the classification information of the semantic query information, and then calculating the similarity of the word vector.
It should be noted that the query information of the present application may only include pinyin query information or semantic query information, or may also include pinyin query information or semantic query information. When the query information includes pinyin query information or semantic query information, the entry corresponding to the pinyin query information may be obtained first, and then the entry obtained by the pinyin query information is screened through the semantic query information to obtain a candidate entry. Similarly, the entry corresponding to the semantic query information may be obtained first, and then the entry obtained from the semantic query information is screened by the pinyin query information to obtain candidate entries.
Step 203, the candidate entries are sorted and displayed.
After the candidate entries are obtained, the candidate entries may be sorted and displayed according to conditions such as alphabetical order, frequency of use, and the like.
In some optional implementation manners of this embodiment, the sorting and displaying the candidate entries may include: and sequencing and displaying the candidate entries according to the frequency of the candidate entries appearing in the corpus or the word vector similarity between the candidate entries and the semantic query information.
When the query information only comprises pinyin query information or when the query information comprises both pinyin query information and semantic query information, entries are obtained according to the semantic query information, and then candidate entries are selected from the entries obtained by the semantic query information through the pinyin query information, and the candidate entries can be sorted and displayed according to the frequency of the candidate entries appearing in the corpus. Similarly, when the query information only includes the semantic query information, or when the query information includes both the pinyin query information and the semantic query information, the entries are obtained according to the pinyin query information, and then the candidate entries are selected from the entries obtained by the pinyin query information through the semantic query information, and the word vector similarity between the candidate entries and the semantic query information can be ranked and displayed.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for acquiring information according to the present embodiment. In the application scenario of fig. 3, the query information of the user includes both pinyin query information ({ ai3| u4}) and semantic query information (cloud disk). The server 105 queries the candidate entries corresponding to the query information from the current entry library, sorts the candidate entries, and displays the sorted candidate entries as shown in fig. 3.
The method provided by the above embodiment of the application selects the candidate entry from the entry library through the pinyin query information and/or the semantic query information contained in the query information, and can acquire the candidate entry simultaneously meeting the requirements of the pinyin query information and the semantic query information; and then, the candidate entries are sorted and displayed, so that the accuracy of obtaining the entries by the user is improved.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for acquiring information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the apparatus 400 for acquiring information of the present embodiment may include: an information receiving unit 401, a candidate entry searching unit 402, and a display unit 403. The information receiving unit 401 is configured to receive query information, where the query information includes pinyin query information and/or semantic query information, where the pinyin query information is used to query terms corresponding to the pinyin query information, the semantic query information is used to query terms corresponding to the semantic query information through term vector similarity, and the term vector similarity is represented by a distance of the terms in a vector space; the candidate entry query unit 402 is configured to query candidate entries corresponding to the query information from the entry library; the display unit 403 is used for sorting and displaying the candidate entries.
In some optional implementation manners of this embodiment, the pinyin query information may include a pinyin to be queried and tone conditions corresponding to the pinyin to be queried, where the pinyin to be queried is used to query a term matching the pinyin to be queried, the tone conditions are used to define pronunciation of the pinyin to be queried, and the tone conditions include that the tone conditions include one, two, three, and/or four tones.
In some optional implementation manners of this embodiment, when the pinyin to be queried includes pinyin information, the pinyin to be queried includes initials and/or finals, and the candidate entry querying unit 402 may include: a first initial entry obtaining sub-unit (not shown in the drawing) and a first candidate entry obtaining sub-unit (not shown in the drawing). The first initial entry obtaining subunit is used for inquiring an initial entry corresponding to an initial consonant and/or a final contained in the pinyin to be inquired from the entry library through the pinyin index, and the pinyin index is used for inquiring an initial entry corresponding to the pinyin to be inquired from the entry library; the first candidate entry obtaining subunit is configured to determine, from the initial entries, candidate entries according to the tone index, where the tone index is used to query, from the initial entries, candidate entries corresponding to tone conditions.
In some optional implementation manners of this embodiment, when the pinyin to be queried includes text information, the pinyin information of the text information is used as the pinyin to be queried, and the candidate entry querying unit 402 may include: a second initial entry obtaining sub-unit (not shown in the drawing) and a second candidate entry obtaining sub-unit (not shown in the drawing). The second initial entry obtaining subunit is used for inquiring an initial entry corresponding to the pinyin information of the character information contained in the pinyin to be inquired from the entry library through the pinyin index, and the pinyin index is used for inquiring an initial entry corresponding to the pinyin to be inquired from the entry library; the second candidate entry obtaining subunit is configured to determine, from the initial entries, candidate entries according to the tone index, where the tone index is used to query candidate entries corresponding to tone conditions from the initial entries.
In some optional implementations of this embodiment, the apparatus 400 for obtaining information may further include a constructing unit (not shown in the figure) for constructing a vocabulary entry library, a pinyin index, and a tone index, where the constructing unit may include: a term library construction subunit (not shown in the figure) and an index construction subunit (not shown in the figure). The vocabulary entry library construction subunit is used for acquiring a corpus and dividing the corpus into vocabulary entries to obtain a vocabulary entry library; the index constructing subunit is used for constructing a pinyin index and a tone index according to the pinyin information of the vocabulary entry.
In some optional implementations of this embodiment, the display unit 403 may include: and sequencing and displaying the candidate entries according to the frequency of the candidate entries appearing in the corpus or the word vector similarity between the candidate entries and the semantic query information.
The present embodiment further provides a server, including: one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to perform the above-described method for obtaining information.
The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which program, when being executed by a processor, carries out the above-mentioned method for acquiring information.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a server according to embodiments of the present application is shown. The server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an information receiving unit, a candidate entry querying unit, and a display unit. Where the names of the units do not in some cases constitute a limitation on the units themselves, for example, a display unit may also be described as a "unit for displaying candidate terms".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: receiving query information, wherein the query information comprises pinyin query information and/or semantic query information, the pinyin query information is used for querying terms corresponding to the pinyin query information, the semantic query information is used for querying terms corresponding to the semantic query information through term vector similarity, and the term vector similarity is represented through the distance of the terms in a vector space; inquiring candidate entries corresponding to the inquiry information from an entry library; and sequencing and displaying the candidate entries.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (14)

1. A method for obtaining information, the method comprising:
receiving query information, wherein the query information comprises pinyin query information and/or semantic query information, the pinyin query information is used for querying terms corresponding to the pinyin query information, the semantic query information is used for querying terms corresponding to the semantic query information through term vector similarity, and the term vector similarity is represented through the distance of the terms in a vector space;
inquiring candidate entries corresponding to the inquiry information from an entry library;
sequencing and displaying the candidate entries;
if the semantic query information is contained in the entry library, calculating the word vector similarity of the entry corresponding to the semantic query information; if the semantic query information is not contained in the entry library, generating word vectors of the semantic query information from the classification information of the semantic query information, and then calculating word vector similarity; the term library is obtained by dividing the obtained corpus into terms.
2. The method according to claim 1, wherein the pinyin query information includes a pinyin to be queried and tone conditions corresponding to the pinyin to be queried, the pinyin to be queried is used for querying a term matching the pinyin to be queried, the tone conditions are used for defining the pronunciation of the pinyin to be queried, and the tone conditions include that the tone conditions include one, two, three and/or four tones.
3. The method according to claim 2, wherein when the pinyin to be queried contains pinyin information, the pinyin to be queried contains initials and/or finals, and
the querying the candidate entries corresponding to the query information from the entry library comprises:
inquiring initial entries corresponding to initial consonants and/or vowels contained in pinyin to be inquired from an entry library through a pinyin index, wherein the pinyin index is used for inquiring the initial entries corresponding to the pinyin to be inquired from the entry library;
and determining candidate entries from the initial entries through a tone index, wherein the tone index is used for inquiring the candidate entries corresponding to tone conditions from the initial entries.
4. The method as claimed in claim 2, wherein when the pinyin for inquiry contains text information, the pinyin information of the text information is used as the pinyin for inquiry, and
the querying the candidate entries corresponding to the query information from the entry library comprises:
inquiring an initial entry corresponding to pinyin information of character information contained in pinyin to be inquired from an entry library through a pinyin index, wherein the pinyin index is used for inquiring the initial entry corresponding to the pinyin to be inquired from the entry library;
and determining candidate entries from the initial entries through a tone index, wherein the tone index is used for inquiring the candidate entries corresponding to tone conditions from the initial entries.
5. The method according to any one of claims 1 to 4, wherein the method further comprises the step of constructing a vocabulary entry base, a pinyin index and a tone index, and the step of constructing the vocabulary entry base, the pinyin index and the tone index comprises:
obtaining a corpus, and dividing the corpus into entries to obtain an entry library;
and constructing a pinyin index and a tone index through the pinyin information of the entries.
6. The method of claim 5, wherein the ranking and displaying the candidate terms comprises:
and sequencing and displaying the candidate entries according to the frequency of the candidate entries appearing in the corpus or the word vector similarity between the candidate entries and the semantic query information.
7. An apparatus for obtaining information, the apparatus comprising:
the information receiving unit is used for receiving query information, wherein the query information comprises pinyin query information and/or semantic query information, the pinyin query information is used for querying terms corresponding to the pinyin query information, the semantic query information is used for querying terms corresponding to the semantic query information through term vector similarity, and the term vector similarity is represented through the distance of the terms in a vector space;
the candidate entry query unit is used for querying candidate entries corresponding to the query information from the entry library;
the display unit is used for sequencing and displaying the candidate entries;
if the semantic query information is contained in the entry library, calculating the word vector similarity of the entry corresponding to the semantic query information; if the semantic query information is not contained in the entry library, generating word vectors of the semantic query information from the classification information of the semantic query information, and then calculating word vector similarity; the term library is obtained by dividing the obtained corpus into terms.
8. The apparatus according to claim 7, wherein the pinyin query information includes a pinyin to be queried and tone conditions corresponding to the pinyin to be queried, the pinyin to be queried is used to query terms matching the pinyin to be queried, the tone conditions are used to define pronunciation of the pinyin to be queried, and the tone conditions include that the tone conditions include one, two, three and/or four tones.
9. The apparatus according to claim 8, wherein when the pinyin to be queried contains pinyin information, the pinyin to be queried contains initials and/or finals, and
the candidate entry query unit includes:
the first initial entry obtaining subunit is used for inquiring an initial entry corresponding to an initial consonant and/or a final contained in the pinyin to be inquired from an entry library through a pinyin index, and the pinyin index is used for inquiring the initial entry corresponding to the pinyin to be inquired from the entry library;
and the first candidate entry obtaining subunit is used for determining the candidate entries from the initial entries through the tone indexes, and the tone indexes are used for inquiring the candidate entries corresponding to the tone conditions from the initial entries.
10. The apparatus as claimed in claim 8, wherein when the pinyin for inquiry contains text information, the pinyin information of the text information is used as the pinyin for inquiry, and
the candidate entry query unit includes:
a second initial entry obtaining subunit, configured to query, from the entry library, an initial entry corresponding to pinyin information of text information included in a pinyin to be queried through a pinyin index, where the pinyin index is used to query, from the entry library, the initial entry corresponding to the pinyin to be queried;
and the second candidate entry obtaining subunit is used for determining the candidate entries from the initial entries through the tone indexes, and the tone indexes are used for inquiring the candidate entries corresponding to the tone conditions from the initial entries.
11. The apparatus according to any one of claims 7-10, wherein the apparatus further comprises a construction unit for constructing a vocabulary entry base, a pinyin index, and a pitch index, the construction unit comprising:
the vocabulary entry base building subunit is used for acquiring the corpus and dividing the corpus into vocabulary entries to obtain a vocabulary entry base;
and the index constructing subunit is used for constructing a pinyin index and a tone index according to the pinyin information of the vocabulary entry.
12. The apparatus of claim 11, wherein the display unit comprises:
and sequencing and displaying the candidate entries according to the frequency of the candidate entries appearing in the corpus or the word vector similarity between the candidate entries and the semantic query information.
13. A server, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.
CN201710970949.8A 2017-10-18 2017-10-18 Method and device for acquiring information Active CN107621892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710970949.8A CN107621892B (en) 2017-10-18 2017-10-18 Method and device for acquiring information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710970949.8A CN107621892B (en) 2017-10-18 2017-10-18 Method and device for acquiring information

Publications (2)

Publication Number Publication Date
CN107621892A CN107621892A (en) 2018-01-23
CN107621892B true CN107621892B (en) 2021-03-09

Family

ID=61092670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710970949.8A Active CN107621892B (en) 2017-10-18 2017-10-18 Method and device for acquiring information

Country Status (1)

Country Link
CN (1) CN107621892B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410918B (en) * 2018-10-15 2020-01-24 百度在线网络技术(北京)有限公司 Method and device for acquiring information

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050240A (en) * 2014-05-26 2014-09-17 北京奇虎科技有限公司 Method and device for determining categorical attribute of search query word

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100837750B1 (en) * 2006-08-25 2008-06-13 엔에이치엔(주) Method for searching chinese language using tone signs and system for executing the method
JP2008059169A (en) * 2006-08-30 2008-03-13 Casio Comput Co Ltd Chinese example sentence retrieval apparatus and program for process of retrieving chinese example sentence
CN101539428A (en) * 2009-04-28 2009-09-23 北京四维图新科技股份有限公司 Searching method with first letter of pinyin and intonation in navigation system and device thereof
CN102147796B (en) * 2010-02-05 2014-10-15 阿里巴巴集团控股有限公司 Vocabulary searching method and device
CN106126494B (en) * 2016-06-16 2018-12-28 上海智臻智能网络科技股份有限公司 Synonym finds method and device, data processing method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050240A (en) * 2014-05-26 2014-09-17 北京奇虎科技有限公司 Method and device for determining categorical attribute of search query word

Also Published As

Publication number Publication date
CN107621892A (en) 2018-01-23

Similar Documents

Publication Publication Date Title
CN107256267B (en) Query method and device
CN113590776B (en) Knowledge graph-based text processing method and device, electronic equipment and medium
US10599760B2 (en) Intelligent form creation
CN113204621B (en) Document warehouse-in and document retrieval method, device, equipment and storage medium
CN108228567B (en) Method and device for extracting short names of organizations
US20220121668A1 (en) Method for recommending document, electronic device and storage medium
CN114036322A (en) Training method for search system, electronic device, and storage medium
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN112579733A (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN113268560A (en) Method and device for text matching
KR101694727B1 (en) Method and apparatus for providing note by using calculating degree of association based on artificial intelligence
CN107621892B (en) Method and device for acquiring information
KR102712013B1 (en) Method and device for transmitting information
CN106599082B (en) Retrieval method, related device and electronic equipment
CN112784861B (en) Similarity determination method, device, electronic equipment and storage medium
CN108920707B (en) Method and device for labeling information
CN112445959A (en) Retrieval method, retrieval device, computer-readable medium and electronic device
CN112148865A (en) Information pushing method and device
CN109857838B (en) Method and apparatus for generating information
CN113239273B (en) Method, apparatus, device and storage medium for generating text
KR102308521B1 (en) Method and device for updating information
CN115470790A (en) Method and device for identifying named entities in file
CN114020245A (en) Page construction method and device, equipment and medium
CN109190034B (en) Method and device for acquiring information
CN110647623B (en) Method and device for updating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant