Computer Science > Data Structures and Algorithms

arXiv:2301.09477 (cs)

[Submitted on 23 Jan 2023]

Title:Sliding Window String Indexing in Streams

Authors:Philip Bille, Johannes Fischer, Inge Li Gørtz, Max Rishøj Pedersen, Tord Joakim Stordalen

View PDF

Abstract:Given a string $S$ over an alphabet $\Sigma$, the 'string indexing problem' is to preprocess $S$ to subsequently support efficient pattern matching queries, i.e., given a pattern string $P$ report all the occurrences of $P$ in $S$. In this paper we study the 'streaming sliding window string indexing problem'. Here the string $S$ arrives as a stream, one character at a time, and the goal is to maintain an index of the last $w$ characters, called the 'window', for a specified parameter $w$. At any point in time a pattern matching query for a pattern $P$ may arrive, also streamed one character at a time, and all occurrences of $P$ within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching.
Our main result is a simple $O(w)$ space data structure that uses $O(\log w)$ time with high probability to process each character from both the input string $S$ and the pattern string $P$. Reporting each occurrence from $P$ uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream. We also consider a delayed variant of the problem, where a query may be answered at any point within the next $\delta$ characters that arrive from either stream. We present an $O(w + \delta)$ space data structure for this problem that improves the above time bounds to $O(\log(w/\delta))$. In particular, for a delay of $\delta = \epsilon w$ we obtain an $O(w)$ space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees.

Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2301.09477 [cs.DS]
	(or arXiv:2301.09477v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2301.09477

Submission history

From: Tord Stordalen [view email]
[v1] Mon, 23 Jan 2023 15:21:12 UTC (177 KB)

Computer Science > Data Structures and Algorithms

Title:Sliding Window String Indexing in Streams

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Sliding Window String Indexing in Streams

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators