Nothing Special   »   [go: up one dir, main page]

CN109241360B - Matching method and device of combined character strings and electronic equipment - Google Patents

Matching method and device of combined character strings and electronic equipment Download PDF

Info

Publication number
CN109241360B
CN109241360B CN201810954338.9A CN201810954338A CN109241360B CN 109241360 B CN109241360 B CN 109241360B CN 201810954338 A CN201810954338 A CN 201810954338A CN 109241360 B CN109241360 B CN 109241360B
Authority
CN
China
Prior art keywords
character string
string
combined
basic
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810954338.9A
Other languages
Chinese (zh)
Other versions
CN109241360A (en
Inventor
周书恒
祝慧佳
赵智源
郭亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201810954338.9A priority Critical patent/CN109241360B/en
Publication of CN109241360A publication Critical patent/CN109241360A/en
Application granted granted Critical
Publication of CN109241360B publication Critical patent/CN109241360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a matching method and device for a combined character string and electronic equipment, wherein the method comprises the following steps: acquiring a main character string and a combined character string; constructing index information of the combined character string and a basic substring; wherein the basic substring is obtained by splitting the combined string; performing mode matching on the main character string based on the basic sub character string to obtain a matched character string; determining a target combined character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.

Description

Matching method and device of combined character strings and electronic equipment
Technical Field
The embodiment of the specification relates to the technical field of internet, in particular to a matching method and device of a combined character string and electronic equipment.
Background
A matching method of character strings is a method for matching whether or not a given substring (hereinafter, abbreviated as a substring) exists from a main character string (hereinafter, abbreviated as a main string). For example: the main string is: zifuchuanpipe; the substrings are: at pipe, the matching goals are: whether the 'pipe' exists in the 'zifuchuanpipe' is matched, if so, the matched character string is output, and if not, null is output (which indicates that no substring exists in the main string); obviously, in this example, the sub-string "pipe" is included in the main string, the matched character string "pipe" may be output. In general, the above-described matching manner for a single substring may be referred to as pattern matching, e.g., such a form that matches substring B from primary string a. Similarly, for matching multiple single substrings, pattern matching is performed once for each substring. For example, substring B, substring C, substring D are matched from main string a, respectively.
In the related art, a preset matching algorithm may be generally used to improve the efficiency of string matching, for example, an Aho-corestick algorithm (which may be generally referred to as AC automata in the industry), a Boyer-Moore algorithm, a Horspool algorithm, etc. may be used. However, the above-described matching algorithm is more efficient for pattern matching, and is less efficient for matching of a combination string. The combination character string refers to the case that multiple substrings are matched at the same time, for example, whether a matched substring B and a substring C exist at the same time from the main string A, the substring B and the substring C can be marked as a substring B ^ substring C, wherein ^ represents 'and'.
Therefore, an efficient matching scheme for combining strings is needed.
Disclosure of Invention
The embodiment of the specification provides a matching method and device for a combined character string and an electronic device, and a buried point analysis method and device and an electronic device:
according to a first aspect of embodiments of the present specification, there is provided a matching method of a combined character string, the method including:
acquiring a main character string and a combined character string;
constructing index information of the combined character string and a basic substring; wherein the basic substring is obtained by splitting the combined string;
performing mode matching on the main character string based on the basic sub character string to obtain a matched character string;
determining a target combined character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.
According to a second aspect of embodiments herein, there is provided a matching apparatus for combining character strings, the apparatus including:
an acquisition unit that acquires a main character string and a combination character string;
establishing an index unit, and establishing index information of the basic sub-character string aiming at the combined character string; wherein the basic substring is obtained by splitting the combined string;
the mode matching unit is used for carrying out mode matching on the main character string based on the basic sub character string to obtain a matched character string;
the combination matching unit is used for determining a target combination character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.
According to a fifth aspect of embodiments herein, there is provided an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured as any one of the above matching methods for a combined character string.
The embodiment of the specification provides a matching scheme of a combined character string, and information construction is carried out on a basic sub-string of the combined character string by utilizing an indexing technology (such as inverted index); then, performing pattern matching on the basic substrings by using an AC (alternating current) automatic machine, and finally screening matched character strings based on index information so as to obtain target combined character strings existing in the main string; the efficiency of the matching of the combined character strings is effectively improved.
Drawings
Fig. 1 is a flowchart of a matching method for a combined character string provided in an embodiment of the present specification;
FIG. 2 is a diagram illustrating a string matching process provided in an embodiment of the present disclosure;
fig. 3 is a hardware configuration diagram of a matching apparatus for combining character strings according to an embodiment of the present specification;
fig. 4 is a block diagram of a matching apparatus for combining character strings according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
As described above, in the related art, a preset matching algorithm may be used to improve the efficiency of string matching, for example, an Aho-corestick algorithm (which may be generally referred to as AC automata in the art), a Boyer-Moore algorithm, a Horspool algorithm, or the like may be used.
In order to compare the efficiency of matching the character strings, the efficiency of matching the character strings can be generally determined according to an index of time complexity. Taking an AC automaton as an example, when pattern matching is performed, the time complexity is O (n + m + z), n represents the length of a main string, m represents the length of all substrings, and z represents the number of matches.
Taking an AC automaton as an example, when performing combination matching, the following steps need to be performed:
step 1: and splitting the combined character string, taking the obtained single substring as a basic substring (provided with y basic substrings in total), and obtaining matched characters (provided with x matched substrings in total) matched from the main string by using an AC automaton.
For example, the main string is "zifuchuanpipe", and the combined string is "zi ^ ch", "ua ^ fa", "np ^ uc"; then the base substring may be zi, ch, ua, fa, np, uc (6), the matching substring may be zi, ch, ua, np, uc (5), and the base substring fa does not match.
Step 2: and (3) detecting whether the combined character string can be combined by the characters obtained in the step (1) or not. Therefore, the temporal complexity of the combination matching is 0(x × y).
Wherein x is equal to z under the aforementioned pattern matching, and y is equal to O (n + m + z) under the aforementioned pattern matching; therefore, after converting the combination matching into the aforementioned matching pattern, the time complexity is 0(x × y) ═ 0(z × O (n + m + z)); it can be seen that as the number of the combined character strings increases, the number of the basic substrings may increase, and finally, the time complexity increases continuously; efficiency is severely impacted by using ac automata directly for string matching.
In order to solve the above problem, the present specification provides a matching method of a combined character string, which may be described below with reference to an example shown in fig. 1, and may include the steps of:
step 110: a main character string and a combined character string are obtained.
Step 120: constructing index information of the combined character string and a basic substring; and splitting the basic substring by the combined string to obtain the basic substring.
Step 130: and carrying out mode matching on the main character string based on the basic sub character string to obtain a matched character string.
Step 140: determining a target combined character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.
The embodiments provided in the present specification can be applied to a device for performing string matching, such as a client or a server.
The client may refer to a client device in hardware, such as a desktop computer, a laptop computer, a tablet computer, a smart phone, a handheld computer, a personal digital assistant ("PDA"), or any other wired or wireless processor-driven device.
The client may also refer to an application client on software, such as an instant messaging APP, an electronic wallet APP, and the like, and these application clients may have a requirement for matching character strings, such as retrieving chat records in the instant messaging APP, searching for merchant information in the electronic wallet APP, and the like.
The client also can be a soft and hard combined client, such as a smart phone with an electronic wallet APP installed.
In an embodiment, the matching of the character strings may be performed locally by the client, that is, the client may perform the matching of the character strings based on a local computing resource. In practical applications, after determining the main character string, the user may input the combined character string, so that the client automatically performs matching of the combined character string.
In an embodiment, the server may include a server for character string matching, a server cluster, or a cloud platform constructed based on the server cluster.
In one embodiment, the server may respond to a string matching request initiated by the client. In practical application, after determining the main character string, the user can input the combined character string, the client can further send the main character string and the combined character string to the server, and the server completes matching of the combined character string.
In an embodiment, the Index information may include Inverted Index (Inverted Index) information. Namely: the index information for constructing the combined string and the basic sub-string may specifically include:
establishing inverted index information of the combined character string and the basic substring based on the HashMap;
and counting the number of basic substrings in the combined string, and recording the number into the inverted index information.
HashMap (HashMap) is a way to efficiently establish a mapping relationship.
The inverted index is a database retrieval mode with high efficiency, and is particularly suitable for retrieval under mass data. There are many specific ways for inverted indexes, and in a broad sense, a search server based on Lucene (full text search engine) can be regarded as using inverted indexes; for example including ES
(ElasticSearch)、Solr。
It should be noted that the use of the inverted index is only one implementation manner provided in this specification, and any other manner that can establish index information in practical application may be used as an implementation manner of the embodiment in this specification.
In one embodiment, the base substring may be obtained by:
splitting the combined character string to obtain a single character string;
and taking the single character string as a basic substring.
Fig. 2 is a schematic diagram of a combined string matching process provided in this specification. In this example, the master string is "BACE", and the combined strings are "A ^ B", "C ^ D", "A ^ C", "E"; first, the reverse index information needs to be constructed:
step 11: splitting the combined character string to obtain a basic sub string;
splitting the 'A ^ B' to obtain single character strings 'A' and 'B';
splitting the 'C ^ D' to obtain a single character string 'C' and 'D';
splitting the 'A ^ C' to obtain a single character string 'A' and 'C';
splitting the 'E' to obtain a single character string 'E';
after deduplication, the remaining base substrings are "a", "B", "C", "D", and "E".
The individual character strings described in this specification do not refer to a character string of one character, but character strings separated by a symbol "^" such as a combined character string "abc ^ ac ^ db" split the resulting base substrings into "abc", "ac", and "db".
Step 12: establishing inverted index information of the combined character string and the basic substring based on the HashMap, which can be as follows:
A→A^B,A^C
B→A^B
C→A^C,C^D
D→C^D
E→E
wherein A → A ^ B, A ^ C represent the base substring A to map A ^ B, A ^ C; the others are similar.
Step 13: counting the number of the basic substrings in the combined string, and recording the number into the inverted index information, which can be as follows:
A→A^B(2),A^C(2)
B→A^B(2)
C→A^C(2),C^D(2)
D→C^D(2)
E→E(1)
wherein A → A ^ B (2), A ^ C (2) represents the base substring A maps A ^ B, A ^ C, and the combination string A ^ B has 2 base substrings, A ^ C has 2 base substrings; the others are similar.
In one embodiment, the pattern matching may be implemented using an AC automaton. That is, the performing pattern matching on the main character string based on the basic sub character string to obtain a matching character string may specifically include:
establishing a corresponding AC automaton aiming at the basic substring;
and carrying out pattern matching on the main character string based on the AC automaton of each basic character string to obtain a matched character string.
As shown in fig. 2, the main string is "BACE", and the base string has "a", "B", "C", "D", and "E"; and performing pattern matching on the main strings by the AC automata based on the 5 basic substrings respectively:
according to the AC automaton of the basic substring A, whether the character A exists in the main string BACE matching; because A exists in the main string, matching can be successful, and a matching character string 'A' can be obtained;
according to the AC automaton of the basic substring B, whether the character B exists in the main string BACE matching; because B exists in the main string, matching can be successful, and a matching character string 'B' can be obtained;
according to the AC automaton of the basic substring C, whether the character C exists in the main string BACE matching; because C exists in the main string, matching can be successful, and a matching character string 'C' can be obtained;
according to an AC automaton of the basic substring D, whether a character D exists in the main string BACE matching; because D does not exist in the main string, the matching can fail, and null is output, or the next pattern matching is directly entered;
according to the AC automaton of the basic substring E, whether the character E exists in the main string BACE matching; because E exists in the main string, matching can be successful, and a matching character string 'E' can be obtained;
in summary, matching strings "a", "B", "C", and "E" can be obtained.
In an embodiment, the determining a target combination string from the index information according to the matching string may specifically include:
searching the combined character string mapped by the matched character string from the index information;
counting the frequency of searching each combined character string;
and determining the combined character strings with the frequency consistent with the number of the basic sub character strings as target combined character strings.
As shown in fig. 2, after the matching character string is obtained, the combined character string mapped by the matching character string may be searched from the foregoing index information:
the combined string that matches the string "A" mapping includes "A ^ B" and "A ^ C";
the combined string that matches the string "B" mapping includes "A ^ B";
the combined string that matches the string "C" mapping includes "C ^ D" and "A ^ C";
the combined string that matches the string "E" mapping includes "E".
Further, the frequency of searching each combined character string is counted:
as can be seen, the combination string "A ^ B" is searched by the matching strings "A" and "B", so the frequency of the combination string "A ^ B" is 2;
the combined character string 'A ^ C' is searched by the matched character strings 'A' and 'C', so the frequency of the combined character string 'A ^ C' is 2;
the combined character string 'C ^ D' is searched by the matched character string 'C', so the frequency of the combined character string 'C ^ D' is 1;
the combination string "E" is searched for by the matching string "E", and thus, the frequency of the combination string "E" is 1.
Finally, the frequency of the combined string is compared with the number of basic substrings, which have been obtained in the preceding step 13, and can therefore be obtained directly from the index information:
the frequency of the combined character string 'A ^ B' is 2, and the number of the basic substrings is 2, so that the combined character string 'A ^ B' is consistent with the basic substrings;
the frequency of the combined character string 'A ^ C' is 2, the number of the basic substrings is 2, and the combined character string 'A ^ C' is consistent with the basic substrings;
the frequency of the combined character string 'C ^ D' is 1, the number of the basic substrings is 2, and the combined character string 'C ^ D' is inconsistent with the basic substrings;
the frequency of the combined character string E is 1, and the number of the basic substrings is 1, so that the combined character string E is consistent with the basic substrings;
in summary, the finally determined target combination strings can be "A ^ B", "A ^ C" and "E"; the target combination string may be output.
The efficiency of the above process is illustrated by the time complexity:
1. assume that the number of obtained combination strings is d, and the number of base substrings is y (where d < ═ y).
2. The number of matching strings obtained by the pattern matching in step 130 is set as x.
3. Assuming that the number of combined strings of the matching string map in step 140 is set to m, the time complexity of this step is 0(x), i.e., x times of searching.
4. The number of the mapped combined character strings after de-duplication is assumed to be n (wherein n < ═ d); the time complexity of this step is 0(m) ═ 0(x × T), where T is the average length of the index sequence, according to zipf law T < < d; thus, the temporal complexity of this step < <0(x × d);
5. the time complexity of step 140 is 0(n), i.e., n combined strings are compared for n frequencies and numbers after deduplication.
In summary, the total time complexity is 0(x) +0(m) +0(n) ═ 0(x + m + n).
Because O (x + m + n) < <0(x + x d + d) <0(x y); where 0(x y) is the aforementioned temporal complexity of the combinatorial matching using existing AC automata; therefore, after the embodiment provided by the specification is adopted, the time complexity is far smaller than that of a mode of only adopting an AC automaton, and the matching efficiency of the combined character strings is improved.
In the embodiments provided in this specification, an indexing technique (e.g., inverted index) is used to construct information of a base substring of a combined string; then, performing pattern matching on the basic substrings by using an AC (alternating current) automatic machine, and finally screening matched character strings based on index information so as to obtain target combined character strings existing in the main string; the efficiency of the matching of the combined character strings is effectively improved.
Corresponding to the embodiment of the matching method of the combined character string, the specification also provides an embodiment of a matching device of the combined character string. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer business program instructions in the nonvolatile memory into the memory for operation through the processor of the device in which the device is located. From a hardware aspect, as shown in fig. 3, the hardware structure diagram of the apparatus where the matching device for combining character strings is located in this specification is shown, except for the processor, the network interface, the memory, and the nonvolatile memory shown in fig. 3, the apparatus where the device is located in the embodiment may also include other hardware according to the actual matching function of the combining character strings, which is not described again.
Referring to fig. 4, a block diagram of an apparatus for matching a composite character string according to an embodiment of the present disclosure is provided, where the apparatus corresponds to the embodiment shown in fig. 1, and the apparatus includes:
an acquisition unit 210 that acquires a main character string and a combination character string;
an index establishing unit 220 for establishing index information of the basic sub-character string aiming at the combined character string; wherein the basic substring is obtained by splitting the combined string;
a pattern matching unit 230, performing pattern matching on the main character string based on the basic sub character string to obtain a matching character string;
a combination matching unit 240 for determining a target combination character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.
Optionally, the index establishing unit 220 specifically includes:
establishing inverted index information of the combined character string and the basic substring based on the HashMap;
and counting the number of basic substrings in the combined string, and recording the number into the inverted index information.
Optionally, the pattern matching unit 230 specifically includes:
establishing a corresponding AC automaton aiming at the basic substring; and carrying out pattern matching on the main character string based on the AC automaton of each basic character string to obtain a matched character string.
Optionally, the combination matching unit 240 specifically includes:
searching the combined character strings mapped by the matched character strings from the index information, and counting the frequency of searching each combined character string; and determining the combined character strings with the frequency consistent with the number of the basic sub character strings as target combined character strings.
Optionally, the apparatus further comprises:
the basic sub-string obtaining unit splits the combined character string to obtain a single character string; and using the single character string as a base substring.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
Fig. 4 above describes the internal functional modules and the structural schematic of the combined character string matching apparatus, and the substantial execution subject may be an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring a main character string and a combined character string;
constructing index information of the combined character string and a basic substring; wherein the basic substring is obtained by splitting the combined string;
performing mode matching on the main character string based on the basic sub character string to obtain a matched character string;
determining a target combined character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.
Optionally, the constructing index information of the basic character for the combined character string specifically includes:
establishing inverted index information of the combined character string and the basic substring based on the HashMap;
and counting the number of basic substrings in the combined string, and recording the number into the inverted index information.
Optionally, the performing pattern matching on the main character string based on the basic sub character string to obtain a matching character string specifically includes:
establishing a corresponding AC automaton aiming at the basic substring;
and carrying out pattern matching on the main character string based on the AC automaton of each basic character string to obtain a matched character string.
Optionally, the determining a target combined character string from the index information according to the matching character string specifically includes:
searching the combined character string mapped by the matched character string from the index information;
counting the frequency of searching each combined character string;
and determining the combined character strings with the frequency consistent with the number of the basic sub character strings as target combined character strings.
Optionally, the method further includes:
splitting the combined character string to obtain a single character string;
and taking the single character string as a basic substring.
In the above embodiments of the electronic device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the electronic device, since it is substantially similar to the embodiment of the method, the description is simple, and for the relevant points, reference may be made to part of the description of the embodiment of the method.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

Claims (11)

1. A matching method of a combined character string, the method comprising:
acquiring a main character string and a combined character string;
constructing index information of the combined character string and a basic substring; wherein the basic substring is obtained by splitting the combined string;
performing mode matching on the main character string based on the basic sub character string to obtain a matched character string;
determining a target combined character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.
2. The method according to claim 1, wherein the constructing index information of the basic character for the combined character string specifically includes:
establishing inverted index information of the combined character string and the basic substring based on the HashMap;
and counting the number of basic substrings in the combined string, and recording the number into the inverted index information.
3. The method according to claim 1, wherein the pattern matching of the main character string based on the basic sub character string to obtain a matching character string specifically includes:
establishing a corresponding AC automaton aiming at the basic substring;
and carrying out pattern matching on the main character string based on the AC automaton of each basic character string to obtain a matched character string.
4. The method according to claim 1, wherein the determining a target combination string from the index information according to the matching string specifically includes:
searching the combined character string mapped by the matched character string from the index information;
counting the frequency of searching each combined character string;
and determining the combined character strings with the frequency consistent with the number of the basic sub character strings as target combined character strings.
5. The method of claim 1, further comprising:
splitting the combined character string to obtain a single character string;
and taking the single character string as a basic substring.
6. A matching device for a combined character string, the device comprising:
an acquisition unit that acquires a main character string and a combination character string;
establishing an index unit, and establishing index information of the basic sub-character string aiming at the combined character string; wherein the basic substring is obtained by splitting the combined string;
the mode matching unit is used for carrying out mode matching on the main character string based on the basic sub character string to obtain a matched character string;
the combination matching unit is used for determining a target combination character string from the index information according to the matching character string; wherein the basic substring of the target combination string exists simultaneously with the main string.
7. The apparatus according to claim 6, wherein the index creating unit specifically includes:
establishing inverted index information of the combined character string and the basic substring based on the HashMap;
and counting the number of basic substrings in the combined string, and recording the number into the inverted index information.
8. The apparatus according to claim 6, wherein the pattern matching unit specifically includes:
establishing a corresponding AC automaton aiming at the basic substring; and carrying out pattern matching on the main character string based on the AC automaton of each basic character string to obtain a matched character string.
9. The apparatus according to claim 6, wherein the combination matching unit specifically includes:
searching the combined character strings mapped by the matched character strings from the index information, and counting the frequency of searching each combined character string; and determining the combined character strings with the frequency consistent with the number of the basic sub character strings as target combined character strings.
10. The apparatus of claim 6, the apparatus further comprising:
the basic sub-string obtaining unit splits the combined character string to obtain a single character string; and using the single character string as a base substring.
11. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured as the method of any of the preceding claims 1-5.
CN201810954338.9A 2018-08-21 2018-08-21 Matching method and device of combined character strings and electronic equipment Active CN109241360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810954338.9A CN109241360B (en) 2018-08-21 2018-08-21 Matching method and device of combined character strings and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810954338.9A CN109241360B (en) 2018-08-21 2018-08-21 Matching method and device of combined character strings and electronic equipment

Publications (2)

Publication Number Publication Date
CN109241360A CN109241360A (en) 2019-01-18
CN109241360B true CN109241360B (en) 2021-08-20

Family

ID=65071245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810954338.9A Active CN109241360B (en) 2018-08-21 2018-08-21 Matching method and device of combined character strings and electronic equipment

Country Status (1)

Country Link
CN (1) CN109241360B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222286A (en) * 2019-05-21 2019-09-10 平安普惠企业管理有限公司 Information acquisition method, device, terminal and computer readable storage medium
CN110516118A (en) * 2019-08-13 2019-11-29 出门问问(武汉)信息科技有限公司 A kind of character string matching method, equipment and computer storage medium
CN111797285A (en) * 2020-06-30 2020-10-20 深圳壹账通智能科技有限公司 Character string fuzzy matching method, device, equipment and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271464A (en) * 2007-11-26 2008-09-24 北京九城网络软件有限公司 Search method of internet search engine
JP2008234204A (en) * 2007-03-19 2008-10-02 Ricoh Co Ltd Document retrieval device, method and program
KR20100072997A (en) * 2008-12-22 2010-07-01 한국전자통신연구원 System for string matching based on tokenization and method thereof
CN102750379A (en) * 2012-06-25 2012-10-24 华南理工大学 Fast character string matching method based on filtering type
CN104375992A (en) * 2013-08-12 2015-02-25 中国移动通信集团浙江有限公司 Address matching method and device
CN105243086A (en) * 2015-09-08 2016-01-13 北京北大千方科技有限公司 Vehicle information query method and device
CN105488068A (en) * 2014-09-19 2016-04-13 阿里巴巴集团控股有限公司 Methods and apparatuses for searching music and establishing index, and search result judgment method
CN107679202A (en) * 2017-09-30 2018-02-09 北京银通易汇科技有限公司 A kind of method and device that inverted index is set
CN108197315A (en) * 2018-02-01 2018-06-22 中控技术(西安)有限公司 A kind of method and apparatus for establishing participle index database

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008234204A (en) * 2007-03-19 2008-10-02 Ricoh Co Ltd Document retrieval device, method and program
CN101271464A (en) * 2007-11-26 2008-09-24 北京九城网络软件有限公司 Search method of internet search engine
KR20100072997A (en) * 2008-12-22 2010-07-01 한국전자통신연구원 System for string matching based on tokenization and method thereof
CN102750379A (en) * 2012-06-25 2012-10-24 华南理工大学 Fast character string matching method based on filtering type
CN104375992A (en) * 2013-08-12 2015-02-25 中国移动通信集团浙江有限公司 Address matching method and device
CN105488068A (en) * 2014-09-19 2016-04-13 阿里巴巴集团控股有限公司 Methods and apparatuses for searching music and establishing index, and search result judgment method
CN105243086A (en) * 2015-09-08 2016-01-13 北京北大千方科技有限公司 Vehicle information query method and device
CN107679202A (en) * 2017-09-30 2018-02-09 北京银通易汇科技有限公司 A kind of method and device that inverted index is set
CN108197315A (en) * 2018-02-01 2018-06-22 中控技术(西安)有限公司 A kind of method and apparatus for establishing participle index database

Also Published As

Publication number Publication date
CN109241360A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
US10210243B2 (en) Method and system for enhanced query term suggestion
CN110162695B (en) Information pushing method and equipment
JP5575902B2 (en) Information retrieval based on query semantic patterns
CN107590214B (en) Recommendation method and device for search keywords and electronic equipment
US10152478B2 (en) Apparatus, system and method for string disambiguation and entity ranking
CN109241360B (en) Matching method and device of combined character strings and electronic equipment
JP6932360B2 (en) Object search method, device and server
CN107679186B (en) Method and device for searching entity based on entity library
US20120084226A1 (en) Measuring or estimating user credibility
US10812500B2 (en) Method of cyberthreat detection by learning first-order rules on large-scale social media
WO2015154679A1 (en) Method and device for ranking search results of multiple search engines
CN105488176A (en) Data processing method and device
WO2016034062A1 (en) Information lookup method and device
CN111582967A (en) Content search method, device, equipment and storage medium
WO2017065891A1 (en) Automated join detection
CN113918807A (en) Data recommendation method and device, computing equipment and computer-readable storage medium
CN110659406A (en) Searching method and device
US11379669B2 (en) Identifying ambiguity in semantic resources
US20230066149A1 (en) Method and system for data mining
CN110959157B (en) Accelerating large-scale similarity computation
CN111597379B (en) Audio searching method and device, computer equipment and computer-readable storage medium
CN114579573B (en) Information retrieval method, information retrieval device, electronic equipment and storage medium
CN114861062B (en) Information filtering method and device
CN113420214B (en) Electronic transaction object recommendation method, device and equipment
CN110442656B (en) Method and device for determining common association object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant