Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
Bamboo Garden Trimming Problem: Priority Schedulings
Next Article in Special Issue
Applications of Non-Uniquely Decodable Codes to Privacy-Preserving High-Entropy Data Representation
Previous Article in Journal
An Improved ABC Algorithm and Its Application in Bearing Fault Diagnosis with EEMD
Previous Article in Special Issue
Efficient Algorithms for Computing the Inner Edit Distance of a Regular Language via Transducers
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Permuted Pattern Matching Algorithms on Multi-Track Strings

Graduate School of Information Sciences, Tohoku University, Miyagi 980-8579, Japan
*
Author to whom correspondence should be addressed.
Algorithms 2019, 12(4), 73; https://doi.org/10.3390/a12040073
Submission received: 28 February 2019 / Revised: 2 April 2019 / Accepted: 3 April 2019 / Published: 8 April 2019
(This article belongs to the Special Issue String Matching and Its Applications)
Figure 1
<p>The multi-track AC-automaton <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">MTAC</mi> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> </semantics></math> for <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mo>{</mo> <msub> <mi mathvariant="double-struck">P</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi mathvariant="double-struck">P</mi> <mn>2</mn> </msub> <mo>,</mo> <msub> <mi mathvariant="double-struck">P</mi> <mn>3</mn> </msub> <mo>}</mo> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="double-struck">P</mi> <mn>1</mn> </msub> <mo>=</mo> <mfenced separators="" open="(" close=")"> <mi mathvariant="monospace">aaabb</mi> <mo>,</mo> <mi mathvariant="monospace">abaab</mi> <mo>,</mo> <mi mathvariant="monospace">bbaaa</mi> </mfenced> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="double-struck">P</mi> <mn>2</mn> </msub> <mo>=</mo> <mfenced separators="" open="(" close=")"> <mi mathvariant="monospace">abab</mi> <mo>,</mo> <mi mathvariant="monospace">abba</mi> <mo>,</mo> <mi mathvariant="monospace">bbab</mi> </mfenced> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="double-struck">P</mi> <mn>3</mn> </msub> <mo>=</mo> <mfenced separators="" open="(" close=")"> <mi mathvariant="monospace">aabbab</mi> <mo>,</mo> <mi mathvariant="monospace">bababb</mi> <mo>,</mo> <mi mathvariant="monospace">baaaab</mi> </mfenced> </mrow> </semantics></math>. The asterisk “*” is a special symbol that matches any symbol in <math display="inline"><semantics> <mo>Σ</mo> </semantics></math>.</p> ">
Figure 2
<p>The multi-track permuted matching automaton <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">MTPMA</mi> <mo>(</mo> <mi mathvariant="double-struck">P</mi> <mo>)</mo> </mrow> </semantics></math> for <math display="inline"><semantics> <mrow> <mi mathvariant="double-struck">P</mi> <mo>=</mo> <mfenced separators="" open="(" close=")"> <mi mathvariant="monospace">aaabb</mi> <mo>,</mo> <mi mathvariant="monospace">abaab</mi> <mo>,</mo> <mi mathvariant="monospace">bbaaa</mi> </mfenced> </mrow> </semantics></math>. The asterisk is a special symbol that matches with any symbols in <math display="inline"><semantics> <mo>Σ</mo> </semantics></math>.</p> ">
Figure 3
<p>(<b>a</b>) Track trie of <math display="inline"><semantics> <mrow> <mi mathvariant="double-struck">P</mi> <mo>=</mo> <mo>(</mo> <mi mathvariant="monospace">bbaba</mi> <mo>,</mo> <mi mathvariant="monospace">abbba</mi> <mo>,</mo> <mi mathvariant="monospace">aaabb</mi> <mo>)</mo> </mrow> </semantics></math>; (<b>b</b>) example of mismatch when the track trie cannot find the transition; (<b>c</b>) example of mismatch when the number of tracks is more than the weight of the node.</p> ">
Figure 4
<p>Running time of the multi-track KMP (MTKMP), multi-track AC-automaton (MTACA), multi-track permuted matching automaton (MTPMA) algorithms on permuted pattern matching with respect to (<b>a</b>) the text length, (<b>b</b>) the track count, (<b>c</b>) the pattern length, and (<b>d</b>) the alphabet size.</p> ">
Figure 5
<p>Running time of the multi-track Buyer–Moore (MTBM) and multi-track Horspool (MTH) algorithms on permuted pattern matching with respect to (<b>a</b>) the text length, (<b>b</b>) the track count, (<b>c</b>) the pattern length, and (<b>d</b>) the alphabet size.</p> ">
Figure 6
<p>Running time of the filtering algorithms (Knuth–Morris–Pratt (Filter-KMP), Boyer–Moore (Filter-BM), and Horspool (Filter-H)) for permuted pattern matching with respect to (<b>a</b>) the text length, (<b>b</b>) the track count, (<b>c</b>) the pattern length, and (<b>d</b>) the alphabet size.</p> ">
Figure 6 Cont.
<p>Running time of the filtering algorithms (Knuth–Morris–Pratt (Filter-KMP), Boyer–Moore (Filter-BM), and Horspool (Filter-H)) for permuted pattern matching with respect to (<b>a</b>) the text length, (<b>b</b>) the track count, (<b>c</b>) the pattern length, and (<b>d</b>) the alphabet size.</p> ">
Figure 7
<p>Comparison of the running time of the AC-automaton-based algorithm with the proposed algorithms for permuted pattern matching with respect to (<b>a</b>) the text length, (<b>b</b>) the track count, (<b>c</b>) the pattern length, and (<b>d</b>) the alphabet size.</p> ">
Versions Notes

Abstract

:
A multi-track string is a tuple of strings of the same length. Given the pattern and text of two multi-track strings, the permuted pattern matching problem is to find the occurrence positions of all permutations of the pattern in the text. In this paper, we propose several algorithms for permuted pattern matching. Our first algorithm, which is based on the Knuth–Morris–Pratt (KMP) algorithm, has a fast theoretical computing time with O ( m k ) as the preprocessing time and O ( n k log σ ) as the matching time, where n, m, k, σ , and occ denote the length of the text, the length of the pattern, the number of strings in the multi-track, the alphabet size, and the number of occurrences of the pattern, respectively. We then improve the KMP-based algorithm by using an automaton, which has a better experimental running time. The next proposed algorithms are based on the Boyer–Moore algorithm and the Horspool algorithm that try to perform pattern matching. These algorithms are the fastest experimental algorithms. Furthermore, we propose an extension of the AC-automaton algorithm that can solve dictionary matching on multi-tracks, which is a task to find multiple multi-track patterns in a multi-track text. Finally, we propose filtering algorithms that can perform permuted pattern matching quickly in practice.

1. Introduction

The pattern matching problem on strings is to find all occurrence positions of a pattern string in a text string. Pattern matching algorithms such as the Knuth–Morris–Pratt (KMP) algorithm [1], the Boyer–Moore algorithm [2], and the Horspool algorithm [3] perform pattern matching quickly by preprocessing the pattern. On the other hand, pattern matching can be accelerated by preprocessing the text into some indexing structure such as suffix trees [4], suffix arrays [5], or position heaps [6].
The permuted pattern matching problem, proposed by Katsura et al. [7,8], is a generalization of the pattern matching problem, where we compare tuples of strings. We call a tuple of strings of the same length a multi-track string. The permuted pattern matching problem is as follows: given two multi-track strings T = ( T 1 , T 2 , , T k ) and P = ( P 1 , P 2 , , P k ) where | T 1 | = = | T k | = n | P 1 | = = | P k | = m , to find all positions i such that ( T 1 [ i : i + m 1 ] , , T k [ i : i + m 1 ] ) is a permutation of P . The problem can be solved by constructing indexing structures from the text such as multi-track suffix trees [7], multi-track position heaps [8], and filtering multi-set trees [9], or by preprocessing the pattern such as the AC-automaton-based algorithm [7].
Multi-track strings can model various types of real data when sequences in the data have the same meaning. For example, data from a three-dimensional accelerometer in a smartphone can be considered a multi-track string. Consider two smartphones with the same position, and rotate one of them several times where each rotation is 90 degrees on the x, y, or z axis. If we perform the same motion on both smartphones, the generated data from three-dimensional accelerometer of one phone is a permutation of the other one. By using permuted pattern matching, we can find occurrences of a pattern on both smartphones without permuting the pattern.
In this paper, we propose several algorithms that solve permuted pattern matching quickly. These algorithms can be classified into three groups. The first group consists of algorithms that are based on the KMP algorithm [1]. This group includes the multi-track KMP, multi-track AC-automaton, and multi-track permuted matching automaton algorithms. The second group consists of algorithms that are based on the Boyer–Moore algorithm [2] and the Horspool algorithm [3]. This group includes the multi-track Boyer–Moore and multi-track Horspool algorithms. The third group consists of filtering algorithms. Moreover, we conduct experiments and show that our algorithms perform permuted pattern matching faster than existing algorithms. The worst case running times of the proposed algorithms and existing algorithms are summarized in Table 1, where d is the total length of the patterns and σ is the size of the alphabet.
Preliminary versions of this work appeared in [10,11].

2. Notation and Definition on Multi-Track Strings

Let Σ be a totally-ordered alphabet and σ = | Σ | be the size of the alphabet. An element of Σ * is called a string. For a string W Σ * , the length of W is denoted by | W | . The empty string, denoted by ε , is a string of length zero. For a string W Σ * of length n, W [ i ] denotes the i th symbol of W, and W [ i : j ] = W [ i ] W [ i + 1 ] W [ j ] denotes a substring of W that begins at i and ends at j for 1 i j n . For convenience, we abbreviate W [ 1 : i ] to W [ : i ] and W [ i : n ] to W [ i : ] ; these terms are known as the prefix and the suffix of W, respectively. Moreover, let W [ i : j ] = ε if i > j . The reverse string of W is denoted by W R = W [ n ] W [ n 1 ] W [ 2 ] W [ 1 ] . For two strings, X and Y, X Y denotes that X is lexicographically smaller than Y, and X Y denotes that either X = Y or X Y .
A multi-track symbol C = ( c 1 , c 2 , , c k ) is a k-tuple of symbols c i Σ . A multi-track string (or multi-track for short) W = ( W 1 , W 2 , , W k ) is a k-tuple of strings W i Σ * where | W 1 | = | W 2 | = = | W k | , and each W i is called the i th track of W . The length n of strings in W is called the length of W and denoted by | W | len , and the number k of tracks in W is called the track count of W and is denoted by | W | num . For a multi-track symbol C = ( c 1 , c 2 , , c k ) , we define C [ i ] = c i . For a multi-track W of track count k = | W | num , W [ i ] = ( W 1 [ i ] , W 2 [ i ] , , W k [ i ] ) denotes the i th multi-track symbol, and W [ i ] [ j ] denotes W j [ i ] for 1 i | W | len and 1 j | W | num . Moreover, W [ i : j ] = ( W 1 [ i : j ] , W 2 [ i : j ] , , W k [ i : j ] ) denotes a substring of W that begins at i and ends at j for 1 i j | W | len . Similarly to the notation for strings, W [ : i ] and W [ i : ] mean W [ 1 : i ] and W [ i : | W | len ] , which are called the prefix and the suffix of W , respectively.
Let r = ( r 1 , r 2 , , r k ) be a permutation of ( 1 , 2 , , k ) . For a multi-track W = ( W 1 , W 2 , , W k ) , W r = W r 1 , r 2 , , r k = ( W r 1 , , W r k ) is called a permuted multi-track of W . The sorted index SI ( W ) of a multi-track W is a permutation ( r 1 , , r k ) such that W r i W r i + 1 for any 1 i < k , where we assume r i < r i + 1 in the case W r i = W r i + 1 . The sorted multi-track sort ( W ) is defined as W SI ( W ) . The reverse of a multi-track is W = ( W 1 , , W k ) is W R = ( W 1 R , , W k R ) . The sorted index of the reverse multi-track, denoted by RI ( W ) , is a permutation ( r 1 , , r k ) such that w r i R w r i + 1 R for any 1 i < k .
Lemma 1
([7]). Let SI W be a two-dimensional array such that SI W [ i ] = SI ( W [ i : ] ) for 1 i n . SI W can be computed in O ( n k ) time ffline, where n = | P | len and k = | P | num .
Lemma 2.
Let RI W be a two-dimensional array such that RI W [ i ] = RI ( W [ i : ] ) for 1 i n . RI W can be computed in O ( n ( k + σ ) ) time nline, where n = | P | len and k = | P | num .
Proof. 
Clearly, RI W can be computed in O ( n ( k + σ ) ) by using a radix sort algorithm. □
For two multi-tracks X = ( X 1 , X 2 , , X k ) and Y = ( Y 1 , Y 2 , , Y k ) of the same length, we say that X permuted-matches Y if X = Y r for some permutation r , and it is denoted by X Y . □
Lemma 3
([7]). For two multi-tracks X and Y , X Y if and only if sort ( X ) = sort ( Y ) .
Throughout this paper, we assume that P is a pattern with | P | num = k and | P | len = m , and T is text with | T | num = k and | T | len = n m . The pattern matching problem on multi-tracks is defined as follows.
Definition 1
(Permuted pattern matching ([7]). Given a multi-track text T and a multi-track pattern P , compute all positions i that satisfy P T [ i : i + m 1 ] . We call such positions ccurrence position of P in T .
For example, given a text T = aabaaaaa , abaabbaa , baaababa and a pattern P = aba , baa , aaa , we can see that the pattern P matches T at two because T [ 2 : 4 ] = P . Moreover, the pattern P permuted-matches T at six, since P 3 , 2 , 1 = aaa , baa , aba = T [ 6 : 8 ] . Therefore, we should output two and six in this case.
We remark that Katsura et al. [7] defined a more general problem called the sub-permuted pattern matching, where we have | T | num | P | num , and our task is to find a partial permutation ( r 1 , , r | P | num ) of ( 1 , , | T | num ) and a position i for which P T r 1 , , r | P | num [ i : i + m 1 ] holds. However, in this paper, we only consider the case | T | num = | P | num , which was called full-permuted pattern matching in [7].

3. KMP-Based Permuted Pattern Matching Algorithms

In this section, we propose three algorithms for permuted pattern matching based on the KMP algorithm. In Section 3.1, we introduce our first algorithm, the multi-track KMP algorithm (MTKMP algorithm) which uses the border array of a multi-track pattern. We remark that this algorithm is a variant of the generalized KMP algorithm proposed by Matsuoka et al. [12]. They showed that the KMP algorithm can be applied to string classes that satisfy some equality and border properties. In this paper, we show some properties of multi-track strings so that the KMP algorithm can be applied to multi-track strings and give the exact computation time of the algorithm.
In Section 3.2, we extend it to the multi-track AC-automaton algorithm, which can perform dictionary matching on multi-tracks. Last, in Section 3.3, we introduce another extension of the MTKMP algorithm, the permuted matching automaton algorithm, which can perform permuted pattern matching faster than the MTKMP algorithm.

3.1. Multi-Track KMP Algorithm

Similar to the original KMP algorithms, the multi-track KMP algorithm uses multi-track border arrays to compute the shift amount when the algorithm finds a mismatch. First, we define borders for multi track strings.
Definition 2
(Borders of multi-tracks). A orde of a multi-track P is any multi-track that permuted-matches with a prefix and a suffix of P . Moreover, a border B of P is called rope if B Algorithms 12 00073 i015 P .
We can easily confirm the following lemma.
Lemma 4.
For any two borders B 1 and B 2 of P , if | B 1 | len < | B 2 | len , then B 1 is a border of B 2 .
Let T [ i : i + j ] P [ 1 : j + 1 ] for some i and j. If T [ i + x : i + x + m 1 ] P [ 1 : m ] for some x j , then T [ i : i + ( j x ) ] P [ 1 : ( j x ) + 1 ] and T [ i + x : i + j ] P [ 1 : ( j x ) + 1 ] hold, i.e., P [ 1 : ( j x ) + 1 ] is a proper border of P [ 1 : j + 1 ] . Therefore, if T [ i : i + j ] P [ 1 : j + 1 ] and T [ i : i + j ] Algorithms 12 00073 i015 P [ 1 : j + 1 ] , we can safely shift P by x, where ( j x + 1 ) is the length of the longest proper border of P [ 1 : j + 1 ] . We store the length of the longest proper border of P [ 1 : j + 1 ] for 1 j m as the border array of P .
Definition 3
(Multi-track border array). Given a multi-track P of | P | len = m , he border arra Border P of P is an array of length m + 1 , where the i th element of Border P is the length of the longest proper border of P [ 1 : i ] . Formally,
Border P [ i ] = max { j P [ 1 : j ] P [ i j + 1 : i ] , 0 j < i } .
For algorithmic purpose, we define Border P [ 0 ] = 1 . Algorithm 1 constructs the multi-track border array of an input P .
Lemma 5.
Given a multi-track P of | P | len = m and | P | num = k , Algorithm 1 constructs the border array of P in O ( m k ) time.
Proof. 
First, we show the correctness of the algorithm by induction. Assume we have computed the longest proper border of P [ : j ] and stored it in Border [ j ] for 1 j i . Let P [ : b ] be the longest proper border of P [ : i + 1 ] . Clearly, P [ : b 1 ] is a border of P [ : i ] . Using Lemma 4, we can find b 1 by finding max { j P [ : j ] is a proper border of P [ : i ] and P [ : j + 1 ] P [ i j + 1 : i + 1 ] } . Since Border [ j ] is the longest proper border of P [ : j ] , the algorithm can find b 1 by updating j Border [ j ] from j = i until P [ : Border [ j ] + 1 ] P [ i Border [ j ] + 1 : i + 1 ] holds. Therefore, Algorithm 1 computes the border array of P correctly.
Next, we show the computation time of the algorithm. Using Lemma 1, SI can be computed in O ( m k ) . Both while loops are called O ( m ) times at most, since i j i m + 1 , and the value of i always increases each time the outer loop is called, while the value of i j always increases each time the inner loop is called. Each comparison of P [ j ] SI ( P [ 1 : ] ) and P [ i ] SI ( P [ i j : ] ) consumes O ( k ) time; hence, Algorithm 1 runs in O ( m k ) time. □
Algorithm 2 shows the pseudocode of the MTKMPalgorithm. The MTKMP algorithm performs permuted pattern matching from left to right of the pattern and the text. The sorted pattern P sort and the array SI save the sorted indices of suffixes of text used to perform permuted-matching between the pattern and a substring of the text. If symbols on the text and the pattern are mismatched, we shift the matching position of the pattern using Border . If the pattern matches a substring of the text, then the MTKMP algorithm outputs the occurrence position at the text and shifts the pattern according to Border .
Algorithm 1: Multi-track border array construction algorithm.
Algorithms 12 00073 i001
Algorithm 2: The multi-track KMP algorithm.
Algorithms 12 00073 i002
Theorem 1.
Given a text T and a pattern P , the MTKMP algorithm computes all occurrence positions of P in T in O ( n k + m k ) time.
Proof. 
According to the above arguments, we can shift P by x, where ( j x + 1 ) is the length of the longest proper border of P [ 1 : j + 1 ] if T [ i : i + j ] P [ 1 : j + 1 ] and T [ i : i + j ] Algorithms 12 00073 i015 P [ 1 : j + 1 ] . Since Border [ i ] is the longest proper border of P [ : i ] , the algorithm finds the occurrence position of P in T correctly.
Using Lemma 1, SI can be computed in O ( n k ) . From Lemma 5, constructMTBorderArray P runs in O ( m k ) . Both while loops are called O ( n ) times at most, since i j i n + 1 , and the value of i always increases each time the outer loop is called, while the value of i j always increases each time the inner loop is called. Each comparison of P [ j ] SI ( P [ 1 : ] ) and T [ i ] SI ( T [ i j : ] ) consumes O ( k ) time. Hence, Algorithm 2 runs in O ( n k + m k ) time. □

3.2. Multi-Track AC-Automaton

In this subsection, we introduce a data structure called a multi-track AC-automaton that can perform dictionary matching on multi-tracks, namely permuted dictionary matching. Let D = { P 1 , P 2 , , P r } be a set of patterns of the same track count k, called a dictionary, and let d = i = 1 r m i be the total length of the patterns in D, where m i = | P i | len . The multi-track AC-automaton of D, denoted by MTAC ( D ) , consists of a set of states and three functions: goto, failure, and output functions. Figure 1 shows an example of MTAC ( D ) .
For a dictionary D, the states of MTAC ( D ) are S = { sort ( P i ) [ : j ] P i D , 0 j m i } . Each state corresponds to a prefix of sort ( P i ) for some P i D . Unlike the original AC-automaton, the multi-track AC-automaton uses a multi-track symbol, instead of a single symbol, to define the goto function. The goto, failure, and output functions of MTAC ( D ) are defined as follows.
Definition 4
(Goto function). We define the goto function δ of MTAC ( D ) by δ ( sort ( P i ) [ : j ] ) , C ) = sort ( P i ) [ : j + 1 ] iff C = sort ( P i ) [ j + 1 ] for a state sort ( P i ) [ : j ] and a multi-track symbol C .
Definition 5
(Failure function). The failure link of a state sort ( P i ) [ : j ] is defined as flink ( sort ( P i ) [ : j ] ) = sort ( P i [ l : j ] ) , where P i [ l : j ] is the longest proper suffix of P i [ : j ] such that P i [ l : j ] permuted-matches a prefix of some pattern in D.
Definition 6
(Output function).The output output ( P i [ : j ] ) of a state sort ( P i ) [ : j ] is the set of patterns that permuted-matches P i [ l : j ] for some 1 l j .
Next, we explain how to construct MTAC ( D ) . We use multi-track symbol tries to implement the goto function.
Definition 7
(Multi-track symbol trie). Let C = { C 1 , . . . , C z } be a set of multi-track symbols. The multi-track symbol trie of C is the trie for the set of strings { C 1 [ 1 ] C 1 [ 2 ] C 1 [ | C 1 | num ] , , C z [ 1 ] C z [ 2 ] C z [ | C z | num ] } .
For a state s S , we use the multi-track symbol trie of a set { C δ ( s , C ) NULL } and each leaf of multi-track symbol tries labeled by δ ( s , C ) . Since a child of a node in multi-track symbol tries can be searched in O ( log σ ) time, δ ( s , C ) can be computed in O ( k log σ ) time. The goto function δ can be constructed by using Algorithm 3.
Algorithm 3: Multi-track AC-automaton goto function construction algorithm.
Algorithms 12 00073 i003
Lemma 6.
Algorithm 3 constructs the goto function of the multi-track AC-automaton of a dictionary D = { P 1 , P 2 , , P r } in O ( d k log σ ) time.
Proof. 
The goto function δ ( s , C ) is executed O ( d ) times, and δ ( s , C ) can be computed in O ( k log σ ) time. □
After constructing the goto function, we can construct the failure function of MTAC ( D ) . Algorithm 4 shows a construction algorithm for the failure function of a multi-track AC-automaton. In order to simplify the construction algorithm, we use a special state that reads any multi-track symbol to get to the root state.
Algorithm 4: Multi-track AC-automaton failure function construction algorithm.
Algorithms 12 00073 i004
Lemma 7.
Algorithm 4 constructs the failure function of the multi-track AC-automaton of a dictionary D = { P 1 , P 2 , , P r } in O ( d k log σ ) time.
Proof. 
We can bound the running time of Algorithm 4 by counting the number of executions of δ ( s , C ) . For each pattern P i , let s i , j be a state such that s i , j = δ ( root , P i [ : j ] ) for 1 j m i . Let f i , j be the number of executions of flink when finding flink ( s i , j ) . The maximum value of f i , j is bounded by depth ( flink ( s i , j 1 ) ) + 1 . Because the depth of flink ( s i , j ) is, at most, depth ( flink ( s i , j 1 ) ) f i , j + 1 , we get f i , j depth ( flink ( s i , j 2 ) ) f i , j 1 + 2 recursively. By solving this formula, we get j = 1 m i f i , j 2 m i and i = 1 r j = 1 m i f i , j i = 1 r 2 m i = 2 d . Moreover, each δ ( s , C ) is executed in O ( k log σ ) time. □
Last, by using the goto and failure functions, Algorithm 4 constructs the output function of a multi-track AC-automaton.
Lemma 8.
Algorithm 5 constructs the output function of the multi-track AC-automaton of a dictionary D = { P 1 , P 2 , , P r } in O ( d k log σ ) time.
Proof. 
Updating the output function can be performed in O ( 1 ) time by using lists to save the output function and updating it by concatenating the list (see [13]). The proof is similar to the proofs of Lemmas 3 and 4. □
From Lemmas 6–8, we get the following theorem.
Theorem 2.
The multi-track AC-automaton of a dictionary D = { P 1 , P 2 , , P r } can be constructed in O ( d k log σ ) time.
Algorithm 5: Multi-track AC-automaton output function construction algorithm.
Algorithms 12 00073 i005
By using the multi-track AC-automaton of D, we can perform permuted dictionary matching on a text T , as shown in Algorithm 6. Let v be the current state of the multi-track AC-automaton. For each position i on T , Algorithm 6 uses the sorted index of T [ i depth ( v ) : ] to determine the permutation of T [ i ] that is used in the goto function.
Algorithm 6: Permuted dictionary matching algorithm by using multi-track AC-automaton.
Algorithms 12 00073 i006
Theorem 3.
Permuted dictionary matching on a multi-track text T can be performed in O ( n k log σ ) time by using the multi-track AC-automaton of D.
Proof. 
The running time of Algorithm 6 can be evaluated by counting the number of executions of δ ( s , C ) . First, for each i, δ ( s , C ) is executed at least once on the v transition. Next, δ ( s , C ) is executed to check whether the transition is NULL or not. In this case, the number of executions of δ ( s , C ) is the same as that of flink . The latter is, at most, n, because whenever δ ( s , C ) is executed, the depth of v is increased by one, and whenever flink is executed, the depth of v is decreased by at least one. Therefore, the number of executions of δ ( s , C ) is O ( n ) . □

3.3. Multi-Track Permuted Matching Automaton

In this subsection, we describe a data structure called a multi-track permuted matching automaton that can perform permuted pattern matching on a multi-track text T online, by preprocessing a multi-track pattern P . For a multi-track pattern P = ( P 1 , P 2 , . . . , P k ) , the multi-track permuted matching automaton of P , denoted by MTPMA ( P ) , consists of a set of states and two functions, goto and failure functions. In addition, each state of the multi-track permuted matching automaton has a weight in order to determine whether flink should be executed or not. Figure 2 shows an example of a multi-track permuted matching automaton.
Let S be the set of states of MTPMA ( P ) . Each state of MTPMA ( P ) corresponds to a prefix of P i , i.e., S = { P i [ : j ] 1 i k , 0 j m } . Each state s has a weight, which is the number of tracks containing s as a prefix. Moreover, a state s is called an accept state if s = P i for some i. The goto and failure functions of MTPMA ( P ) are defined as follows.
Definition 8
(Goto function).We define the goto function δ as δ ( P i [ : j ] , c ) = P i [ : j + 1 ] iff c = P i [ j + 1 ] for each state P i [ : j ] and symbol c.
Definition 9
(Failure function). Let S j = { P i [ : j ] 1 i k } be the set of states whose depth is j. The failure link of the state is flink ( P i [ : j ] ) = P x [ : y ] S y such that P x [ : y ] is a proper suffix of P i [ : j ] and P [ : y ] is the longest prefix of P that permuted-matches with a proper suffix of P [ : j ] . The latter condition is similar to that of the multi-track KMP algorithm.
Next, we explain how to construct MTPMA ( P ) . Algorithm 7 constructs the goto function of a multi-track permuted matching automaton. We add a weight on each state each time we visit the state when reading each track.
Lemma 9.
Algorithm 7 constructs the goto function of MTPMA ( P ) in O ( m k log σ ) time.
Proof. 
For each track, the number of executions of δ ( s , c ) is m, and there are k tracks in a pattern P . Moreover, δ ( s , c ) can be executed in O ( log σ ) time. □
After constructing the goto function, Algorithm 8 constructs the failure function of MTPMA ( P ) . For each state s, we use a multi-track border array to determine the depth of flink ( s ) .
Lemma 10.
Algorithm 8 constructs the failure function of P in O ( m k log σ ) time.
Proof. 
Similarly to the proof of Theorem 7, the failure and goto functions are executed O ( m k ) times. Moreover, the execution time of the failure function is O ( 1 ) , and that of the goto function is O ( log σ ) . □
From Lemmas 9 and 10 we get the following theorem.
Theorem 4.
The multi-track permuted matching automaton of a pattern P can be constructed in O ( m k log σ ) time.
Finally, by using MTPMA ( P ) , Algorithm 9 can perform permuted pattern matching on a multi-track text T . Algorithm 9 uses k pointers activeStates to point to the current states. Note that all activeStates always have the same depth. Algorithm 9 uses two conditions to determine whether it should execute the failure function or not. The first condition is when it cannot find the goto transition, and the second condition is when the number of state pointers in the state is more than the weight of the state. If any of the activeStates fail, then all of the activeStates execute the failure function, otherwise activeStates executes the goto function.
Theorem 5.
By using MTPMA ( P ) , Algorithm 9 performs permuted pattern matching on a multi-track string T in O ( n k log σ ) time.
Proof. 
Similarly to the proof of Theorem 3, the number of executions of the failure and goto function is O ( n k ) . Since the execution time of the failure function is O ( 1 ) and the goto function is O ( log σ ) , Algorithm 9 runs in O ( n k log σ ) time. □
Algorithm 7: Multi-track permuted matching automaton goto function construction algorithm.
Algorithms 12 00073 i007
Algorithm 8: Multi-track permuted matching automaton failure function construction algorithm.
Algorithms 12 00073 i008
Algorithm 9: Multi-track permuted matching automaton matching algorithm.
Algorithms 12 00073 i009

4. Multi-Track Boyer–Moore and Horspool Algorithms

In this section, we propose two permuted pattern matching algorithms that are based on the Boyer–Moore algorithm and the Horspool algorithm, which we call MT-BM and MT-H , respectively. The original Boyer–Moore algorithm uses two functions GoodSuf (good suffixes) and BadSym (bad symbols), and the Horspool algorithm only uses BadSym to determine how much the position of a substring to compare should be shifted when a mismatch is found between the input patten and the substring of the text. For multi-track strings, we use a permuted match to define GoodSuf instead of a complete match, and we use sorted multi-track symbols for BadSym instead of symbols. Formally, those functions are defined as follows on multi-tracks.
Definition 10
(Good suffixes function). For a multi-track P of length | P | len = m , let A [ i ] = { 0 < s < i P [ i s + 1 : m s ] P [ i + 1 : m ] , P [ i s : m s ] Algorithms 12 00073 i015 P [ i : m ] } and B [ i ] = { i s < m P [ 1 : m s ] P [ s + 1 : m ] } . The good suffixes function is defined as GoodSuf P [ m ] = 1 and GoodSuf P [ i ] = min A [ i ] B [ i ] { m } for 0 i < m .
Definition 11
(Bad symbols function). For a multi-track P of length | P | len = m and a multi-track symbol C , BadSym P ( C ) is the first occurrence position of sort ( C ) in P R [ 2 : ] . The function BadSym P ( C ) returns m if there is no occurrence of sort ( C ) in P R [ 2 : ] .
In the implementation, GoodSuf is computed by using an array, while BadSym can be computed by using a trie of the multi-track symbols. We perform permuted-match instead of exact matching when computing GoodSuf . We also use another array suf to compute GoodSuf .
Definition 12
(Suffixes). For a multi-track P of length | P | len = m , suf P [ i ] is the maximum value of l such that P [ i l + 1 : i ] P [ m l + 1 : m ] for 1 i m .
Algorithm 10 shows how to construct GoodSuf and BadSym . The array GoodSuf is computed by ComputeGoodSuf , which uses array suf computed by ComputeSuf . Note that we compute RI ( P [ : i ] ) at the beginning (Lines 2 and 26) of the algorithm and will not recompute them when we use the values later.
Algorithm 10: MT-BM and MT-H preprocessing functions.
Algorithms 12 00073 i010
Lemma 11.
The function ComputeSuf computes the array suf in O ( m ( k + σ ) ) time.
Proof. 
First, RI can be computed in O ( m ( k + σ ) ) time by using radix sort. The for loop is executed m 1 times, and the while loop in Line 11 is executed m times at most through the whole run, because k is always reduced in each loop. A comparison of two multi-track symbols of the pattern that was executed in each loop can be computed in O ( k ) time. □
Lemma 12.
The function ComputeGoodSuf computes GoodSuf in O ( m ) time.
Proof. 
All the for loops are executed m times at most. The while loop is executed m times at most through the whole execution of the algorithm, since j is always increased and does not exceed m. □
Lemma 13.
The function ComputeBadSym computes BadSym in O ( m ( k log σ + σ ) ) time.
Proof. 
RI can be computed in O ( m ( k + σ ) ) time by using radix sort. Each edge in the trie of BadSym can be accessed in O ( log σ ) time by using a binary search. Since the depth of the trie is, at most, k, each BadSym ( P [ i ] ) for 1 i m can be added and accessed in O ( k log σ ) time. □
By using both GoodSuf P and BadSym P , MT-BM outputs the positions of the text that are permuted-matched with the pattern. The matching algorithm of MT-BM is shown in Algorithm 11.
Algorithm 11: Multi-track Boyer–Moore algorithm.
Algorithms 12 00073 i011
Theorem 6.
Given a multi-track text T and a pattern P , MT-BM outputs all occurrence positions of P in T in O ( n k ( m + log σ ) + n ( k + σ ) ) time with O ( m ( k log σ + σ ) ) preprocessing time.
Proof. 
From Lemmas 11–13, Algorithm 11 needs O ( m ( k log σ + σ ) ) time for preprocessing. Next, RI can be computed in O ( n ( k + σ ) ) time by using radix sort. In the outer while loop starting at Line 7, the value of j is increased by at least one, so the loop is executed n m + 2 times at most. In each execution of the outer loop, the inner while loop is executed m times at most, where multi-track symbols of the pattern and the text can be compared in O ( k ) time. BadSym can be accessed in O ( k log σ ) time, and GoodSuf can be executed in O ( 1 ) time. □
Similarly to the original Horspool algorithm, the multi-track Horspool algorithm (MT-H) uses BadSym to shift the pattern.
Theorem 7.
Given a multi-track text T and a pattern P , MT-H outputs all occurrence positions of P in T in O ( n k ( m + log σ ) + n ( k + σ ) ) time with O ( m ( k log σ + σ ) ) preprocessing time.
Proof. 
Similar to the proof of Theorem 6. □

Boyer–Moore and Horspool Matching Algorithms with Track Trie

The two algorithms presented in the previous subsections determine if two multi-tracks permuted-match by sorting them. In this subsection, we present another idea for this task using a data structure called a track trie. The track trie TrackTrie ( P ) of a multi-track P stores all the reversed strings of the tracks of P , that is { P 1 R , P 2 R , , P k R } . Figure 3a shows the track trie of a multi-track pattern P = ( bbaba , abbba , aaabb ) .
Algorithm 12 is the construction algorithm of TrackTrie ( P ) . For a node s of TrackTrie ( P ) and a symbol c Σ , the goto function δ ( s , c ) returns the child of s that has an edge labeled c. We naturally extend it to the domain Σ * by δ ( s , ε ) = s and δ ( s , a w ) = δ ( δ ( s , a ) , w ) for any a Σ and w Σ * . We also associate a weight with each node to find mismatch on a text, as we explain later.
Algorithm 12: Track trie construction algorithm.
Algorithms 12 00073 i012
Theorem 8.
Algorithm 12 constructs TrackTrie ( P ) in O ( m k log σ ) time.
Proof. 
The function goto can be calculated in O ( log σ ) time by binary search. On each execution of the inner for loop (Line 6), Algorithm 12 executes goto to check the child nodes of activeNode . If there is no node with an edge labeled P [ j ] [ i ] , then a new node is constructed, which can be done in O ( 1 ) time. On the other hand, if there is a node with an edge labeled P [ j ] [ i ] , Algorithm 12 accesses the child node and then increases its weight by one. The total number of iterations of the inner loop is m k . □
For a given multi-track text T and a position i, Algorithm 13 finds a mismatch position in two cases: (1) when a track cannot find its goto destination, and (2) when the number of tracks that have the same string w is more than the weight of the node that represents the string δ ( r o o t , w ) . Those mismatch conditions are illustrated in Figure 3b,c, respectively. Figure 3b shows that the track trie cannot find a transition for the second symbol b of the third track. On the other hand, Figure 3c shows that T 2 [ 3 : ] has two “ bba ” on its track; however, the P [ 3 : ] has only one “ bba ” on its track, i.e., the node that represents “ bba ” has one on its weight.
Algorithm 13: Track trie matching algorithm.
Algorithms 12 00073 i013
Theorem 9.
Given a multi-track text T and a position j, Algorithm 13 finds a mismatch position in the pattern in O ( m k log σ ) time.
Proof. 
For each position i + j on the text, Algorithm 13 executes goto to check whether activeNodes [ l ] has a child node with an edge labeled T [ i + j ] [ l ] for 1 l k . If there is no child node with an edge labeled T [ i + j ] [ l ] , then Algorithm 13 considers it a mismatch and returns the mismatch position. On the other hand, if there is such a child node, Algorithm 13 updates activeNodes [ l ] to the child node and then checks whether the number of tracks of T [ i + j : ] that contain T [ i + j : i + m ] [ l ] as a prefix is more than the weight of the child node. If the number of tracks exceeds the weight, then Algorithm 13 treats it as mismatch and returns the mismatch position. The total number of iterations of the inner loop is, at most, m k . □
Although the worst case time complexity remains the same, by using track trie, both MT-BM and MT-H can match the pattern to the text faster, because we do not need to compute the reverse-sorted index of the text. First, we construct the track trie of the pattern by using constructTrackTrie ( P ) . Then, we replace Line 10 of Algorithm 11 by matchTrackTrie ( T , j ) to find a mismatch position.

5. Filtering Algorithm on a Multi-Track String

In this section, we propose filtering algorithms for permuted pattern matching. The filtering algorithm uses a function to transform a multi-track string into another sequence. Then, the algorithm implements a pattern matching algorithm on the transformed pattern and text to get candidate positions where the pattern may permuted-match. Finally, every candidate position is checked for whether the pattern is permuted-matched in each position or not.
The filtering algorithm uses a function ϕ that inputs a multi-track W and outputs a sequence of length | W | len . The function ϕ must have a false-positive property, that is, for two multi-tracks X and Y , if X Y , then ϕ ( X ) = ϕ ( Y ) . We can use any hash function such as the Karp–Rabin fingerprint [14] and locality-sensitive hashing [15]. We describe the filtering algorithm by using a simple function β that is defined as follows.
Definition 13
(Bucket function β). For W = ( W 1 , W 2 , , W k ) , β ( W ) = ( B 1 , B 2 , , B σ ) such that B j [ i ] is the number of the j th symbol in Z [ i ] .
For example, for a multi-track W = ( abab , bbac , aabb , cabb , abba ) , β ( W ) is as follows,
a b a b b b a c a a b b c a b b a b b a , β ( Z ) = 3 2 2 1 1 3 3 3 1 0 0 1 .
β ( W ) can be computed in O ( n ( k + σ ) ) time by counting all symbols in W .
Algorithm 14 shows an example of filtering algorithm for permuted pattern matching. This algorithm uses β as a hash function, and the KMP algorithm as a patten matching algorithm. First, the algorithm computes P = β ( P ) and T = β ( T ) and then constructs the border array Border P using the same procedure as the KMP algorithm. Next, the algorithm performs pattern matching on P and T . When a mismatch occurs, the pattern is shifted by using Border P . If P matches T [ i : j ] , then the algorithm checks whether the pattern P permuted-matches T [ i : j ] or not, and it outputs the position if true.
Algorithm 14: Filtering algorithm for permuted pattern matching.
Algorithms 12 00073 i014
Theorem 10.
Given a multi-track text T and a pattern P , Algorithm 14 computes all occurrence positions of P in T in O ( ( m + n ) ( σ + k ) + c m k ) time, where c is the number of candidates.
Proof. 
As described above, P = β ( P ) and T = β ( T ) can be computed in O ( m ( σ + k ) ) and O ( n ( σ + k ) ) , respectively. The border array Border P can be constructed in O ( m σ ) time, and pattern matching can be performed in O ( n σ ) time, since the comparison P [ i ] = P [ i ] needs O ( σ ) time. In addition, the algorithm takes O ( c m k ) time to check the candidates. □

6. Experiments

We performed some sets of experiments to evaluate the practical performance of the proposed algorithms. We measured the running time of our algorithms and the AC-automaton-based algorithm [7]. All experiments were performed on a computer with an Intel Xeon CPU E5-2609 8 core, 2.40 GHz, 256 GB memory, and a Debian Wheezy operating system. We set the parameter values as n = 100,000, m = 10 , k = 1000, and σ = 2 , and we changed one of the parameters in each experiment to see the relation between the parameters and the running time of the algorithms. We used randomly-generated texts and patterns. The source code of our algorithm is available at https://github.com/ushitora/permutedpatternmatching.
The first set of experiments compared the running times of KMP-based algorithms, namely the multi-track KMP algorithm, the multi-track AC-automaton algorithm, and the multi-track permuted matching automaton algorithm. The purpose of this experiment was to see the changes in the running time of the multi-track KMP algorithm when extended to the multi-track AC-automaton algorithm and the multi-track permuted matching automaton algorithm. The results of the experiment are shown in Figure 4. From the results, we can see that the multi-track AC-automaton algorithm was slower than the multi-track KMP algorithm. The reason for this is that the multi-track AC-automaton algorithm uses state transitions instead of comparing multi-track symbols directly. This time difference is not significant considering that the multi-track AC-automaton algorithm focuses on dictionary matching instead of single pattern matching. Next, we can see that the multi-track permuted matching automaton algorithm is faster than the multi-track KMP algorithm. The multi-track permuted matching automaton algorithm can perform matching faster because the multi-track permuted matching automaton algorithm does not need to sort the text.
The next experiment compared the running time of the multi-track Boyer–Moore algorithm and the multi-track Horspool algorithm. We also checked the effectiveness of track tries in reducing the matching times of both algorithms. Figure 5 shows the result of the experiment. While the multi-track Horspool was slightly faster than the multi-track Boyer–Moore algorithm, there was no significant difference between these algorithms. We can also see that track tries significantly reduced the matching times of both algorithms, since we do not need to sort the text when using track tries.
The third experiment set compared the running times of the filtering algorithms. We used three algorithms, the KMP algorithm, the Boyer–Moore algorithm, and the Horspool algorithm, as filtering algorithms. Figure 6 shows the running times of the filtering algorithms. There was no significant difference in running time between the filtering algorithms. We can see that the filtering algorithms were fast apart from when the pattern length was m = 1 . The algorithms became very slow, because there were many occurrence position candidates when m = 1 . The occurrence position candidates will be fewer if the pattern length is longer.
Lastly, we compared the proposed algorithms with the AC-automaton-based algorithm [7]. Figure 7 shows the running time of the AC-automaton-based algorithm [7] and the proposed algorithms. We chose the fastest algorithms among the groups. We can see that the proposed algorithms were faster than the AC-automaton-based algorithm overall. The multi-track Boyer–Moore algorithm with track trie was the fastest among the algorithms.

7. Conclusions

We proposed some algorithms for permuted pattern matching on multi-track strings. We primarily focused on the algorithms that preprocess the pattern before performing permuted pattern matching, instead of constructing indexing structures from the text. We showed that our proposed algorithms are faster than the AC-automaton-based algorithm. Moreover, we proposed an algorithm for dictionary matching on multi-track strings, which, to the best of our knowledge, is the first algorithm for this problem. Because of its considerable difficulty owing to a larger number of permutations, only a few algorithms for sub-permuted pattern matching have been proposed. Therefore, developing an algorithm for the sub-permuted pattern matching problem must be addressed in future work.

Author Contributions

Conceptualization, D.H., K.N., R.Y. and A.S.; methodology, D.H., Y.U., R.Y. and A.S.; software, D.H.; validation, D.H., R.Y. and A.S.; formal analysis, D.H., K.N., R.Y. and A.S.; investigation, D.H.; data curation, D.H.; writing—original draft preparation, D.H.; writing—review and editing, D.H., R.Y. and A.S.; visualization, D.H.; funding acquisition, D.H. and A.S.

Funding

This work was supported by JSPS KAKENHI Grant Numbers JP15H05706, JP24106010, JP25240003, and JP19K20208. This work was also supported by ImPACT Program of the Council for Science, Technology and Innovation (Cabinet Office, Government of Japan). Diptarama Hendrian was supported by a research grant from Tohoku University Division for International Advanced Research and Education.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Knuth, D.E.; Morris, J.H., Jr.; Pratt, V.R. Fast Pattern Matching in Strings. SIAM J. Comput. 1977, 6, 323–350. [Google Scholar] [CrossRef]
  2. Boyer, R.S.; Moore, J.S. A fast string searching algorithm. Commun. ACM 1977, 20, 762–772. [Google Scholar] [CrossRef]
  3. Horspool, R.N. Practical fast searching in strings. Softw. Pract. Exp. 1980, 10, 501–506. [Google Scholar] [CrossRef]
  4. Weiner, P. Linear pattern matching algorithms. In Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), Iowa City, IA, USA, 15–17 October 1973; pp. 1–11. [Google Scholar] [CrossRef]
  5. Manber, U.; Myers, G. Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput. 1993, 22, 935–948. [Google Scholar] [CrossRef]
  6. Ehrenfeucht, A.; McConnell, R.M.; Osheim, N.; Woo, S.W. Position heaps: A simple and dynamic text indexing data structure. J. Discret. Algorithms 2011, 9, 100–121. [Google Scholar] [CrossRef]
  7. Katsura, T.; Narisawa, K.; Shinohara, A.; Bannai, H.; Inenaga, S. Permuted Pattern Matching on Multi-track Strings. In Proceedings of the SOFSEM 2013: 39th International Conference on Current Trends in Theory and Practice of Computer Science, Špindlerův Mlýn, Czech Republic, 26–31 January 2013; pp. 280–291. [Google Scholar]
  8. Katsura, T.; Otomo, Y.; Narisawa, K.; Shinohara, A. Position Heaps for Permuted Pattern Matching on Multi-Track Strings. In Proceedings of the Student Research Forum Papers and Posters at SOFSEM 2015, Pec pod Snezkou, Czech Republic, 24–29 January 2015; pp. 41–53. [Google Scholar]
  9. Narisawa, K.; Katsura, T.; Ota, H.; Shinohara, A. Filtering Multi-set Tree: Data Structure for Flexible Matching Using Multi-track Data. Interdiscip. Inf. Sci. 2015, 21, 37–47. [Google Scholar] [CrossRef]
  10. Diptarama, Y.U.; Narisawa, K.; Shinohara, A. KMP based pattern matching algorithms for multi-track strings. In Proceedings of the Student Research Forum Papers and Posters at SOFSEM 2016, Harrachov, Czech Republic, 23–28 January 2016; Volume 2016, pp. 100–107. [Google Scholar]
  11. Diptarama, Y.U.; Yoshinaka, R.; Shinohara, A. Fast Full Permuted Pattern Matching Algorithms on Multi-track Strings. In Proceedings of the Prague Stringology Conference 2016, Prague, Czech Republic, 29–31 August 2016; pp. 7–21. [Google Scholar]
  12. Matsuoka, Y.; Aoki, T.; Inenaga, S.; Bannai, H.; Takeda, M. Generalized pattern matching and periodicity under substring consistent equivalence relations. Theor. Comput. Sci. 2016, 656, 225–233. [Google Scholar] [CrossRef]
  13. Aho, A.V.; Corasick, M.J. Efficient string matching: An aid to bibliographic search. Commun. ACM 1975, 18, 333–340. [Google Scholar] [CrossRef]
  14. Karp, R.M.; Rabin, M.O. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 1987, 31, 249–260. [Google Scholar] [CrossRef]
  15. Indyk, P.; Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing—STOC ’98, Dallas, TX, USA, 24–26 May 1998; ACM Press: New York, NY, USA, 1998; pp. 604–613. [Google Scholar] [CrossRef]
Figure 1. The multi-track AC-automaton MTAC ( D ) for D = { P 1 , P 2 , P 3 } , where P 1 = aaabb , abaab , bbaaa , P 2 = abab , abba , bbab , and P 3 = aabbab , bababb , baaaab . The asterisk “*” is a special symbol that matches any symbol in Σ .
Figure 1. The multi-track AC-automaton MTAC ( D ) for D = { P 1 , P 2 , P 3 } , where P 1 = aaabb , abaab , bbaaa , P 2 = abab , abba , bbab , and P 3 = aabbab , bababb , baaaab . The asterisk “*” is a special symbol that matches any symbol in Σ .
Algorithms 12 00073 g001
Figure 2. The multi-track permuted matching automaton MTPMA ( P ) for P = aaabb , abaab , bbaaa . The asterisk is a special symbol that matches with any symbols in Σ .
Figure 2. The multi-track permuted matching automaton MTPMA ( P ) for P = aaabb , abaab , bbaaa . The asterisk is a special symbol that matches with any symbols in Σ .
Algorithms 12 00073 g002
Figure 3. (a) Track trie of P = ( bbaba , abbba , aaabb ) ; (b) example of mismatch when the track trie cannot find the transition; (c) example of mismatch when the number of tracks is more than the weight of the node.
Figure 3. (a) Track trie of P = ( bbaba , abbba , aaabb ) ; (b) example of mismatch when the track trie cannot find the transition; (c) example of mismatch when the number of tracks is more than the weight of the node.
Algorithms 12 00073 g003
Figure 4. Running time of the multi-track KMP (MTKMP), multi-track AC-automaton (MTACA), multi-track permuted matching automaton (MTPMA) algorithms on permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Figure 4. Running time of the multi-track KMP (MTKMP), multi-track AC-automaton (MTACA), multi-track permuted matching automaton (MTPMA) algorithms on permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Algorithms 12 00073 g004
Figure 5. Running time of the multi-track Buyer–Moore (MTBM) and multi-track Horspool (MTH) algorithms on permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Figure 5. Running time of the multi-track Buyer–Moore (MTBM) and multi-track Horspool (MTH) algorithms on permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Algorithms 12 00073 g005
Figure 6. Running time of the filtering algorithms (Knuth–Morris–Pratt (Filter-KMP), Boyer–Moore (Filter-BM), and Horspool (Filter-H)) for permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Figure 6. Running time of the filtering algorithms (Knuth–Morris–Pratt (Filter-KMP), Boyer–Moore (Filter-BM), and Horspool (Filter-H)) for permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Algorithms 12 00073 g006aAlgorithms 12 00073 g006b
Figure 7. Comparison of the running time of the AC-automaton-based algorithm with the proposed algorithms for permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Figure 7. Comparison of the running time of the AC-automaton-based algorithm with the proposed algorithms for permuted pattern matching with respect to (a) the text length, (b) the track count, (c) the pattern length, and (d) the alphabet size.
Algorithms 12 00073 g007
Table 1. Comparison of the algorithms for permuted pattern matching. Multi-track AC-automaton can find occurrences of multiple patterns.
Table 1. Comparison of the algorithms for permuted pattern matching. Multi-track AC-automaton can find occurrences of multiple patterns.
AlgorithmPreprocessing TimeMatching Time
AC-automaton-based [7] O ( m k log σ ) O ( n k log σ )
Multi-track KMP O ( m k ) O ( n k )
MTAC-automaton O ( d k log σ ) O ( n k log σ )
MT permuted matching automaton O ( m k log σ ) O ( n k log σ )
MT Boyer–Moore O ( m ( k log σ + σ ) ) O ( n k ( m + log σ ) + n ( k + σ ) )
MT Horspool O ( m ( k log σ + σ ) ) O ( n k ( m + log σ ) + n ( k + σ ) )

Share and Cite

MDPI and ACS Style

Hendrian, D.; Ueki, Y.; Narisawa, K.; Yoshinaka, R.; Shinohara, A. Permuted Pattern Matching Algorithms on Multi-Track Strings. Algorithms 2019, 12, 73. https://doi.org/10.3390/a12040073

AMA Style

Hendrian D, Ueki Y, Narisawa K, Yoshinaka R, Shinohara A. Permuted Pattern Matching Algorithms on Multi-Track Strings. Algorithms. 2019; 12(4):73. https://doi.org/10.3390/a12040073

Chicago/Turabian Style

Hendrian, Diptarama, Yohei Ueki, Kazuyuki Narisawa, Ryo Yoshinaka, and Ayumi Shinohara. 2019. "Permuted Pattern Matching Algorithms on Multi-Track Strings" Algorithms 12, no. 4: 73. https://doi.org/10.3390/a12040073

APA Style

Hendrian, D., Ueki, Y., Narisawa, K., Yoshinaka, R., & Shinohara, A. (2019). Permuted Pattern Matching Algorithms on Multi-Track Strings. Algorithms, 12(4), 73. https://doi.org/10.3390/a12040073

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop