Abstract
The suffix array is a fundamental data structure to support string analysis efficiently. It took about 26 years for the sequential suffix array construction algorithm to achieve \(\mathcal {O}(n)\) time complexity and in-place sorting. In this paper, we develop the D-Limited Parallel Induce (DLPI) algorithm, the first \(\mathcal {O}(\frac{n}{p})\) time parallel suffix array construction algorithm. The basic idea of DLPI includes two aspects: dividing the \(\mathcal {O}(n)\) size problem into p reduced sub-problems with size \(\mathcal {O}(\frac{n}{p})\) so we can handle them on p processors in parallel; developing an efficient parallel induce sorting method to achieve correct order for all the reduced sub-problems. The complete algorithm description is given to show the implementation method of the proposed idea. The time and space complexity analysis and proof are also given to show the correctness and efficiency of the proposed algorithm. The proposed DLPI algorithm can handle large strings with scalable performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 55–69. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_5
Deo, M., Keely, S.: Parallel suffix array and least common prefix for the GPU. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 197–206 (2013). https://doi.org/10.1145/2442516.2442536
Flick, P., Aluru, S.: Parallel distributed memory construction of suffix and longest common prefix arrays. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–10 (2015)
Futamura, N., Aluru, S., Kurtz, S.: Parallel suffix sorting. Electrical Engineering and Computer Science - All Scholarship 64 (2001). https://surface.syr.edu/eecs/64
Helman, D.R., JáJá, J., Bader, D.A.: A new deterministic parallel sorting algorithm with an experimental evaluation. J. Exp. Algorithmics (JEA) 3, 4-es (1998). https://doi.org/10.1145/297096.297128
Homann, R., Fleer, D., Giegerich, R., Rehmsmeier, M.: mkESA: enhanced suffix array construction tool. Bioinformatics 25(8), 1084–1085 (2009). https://doi.org/10.1093/bioinformatics/btp112
JáJá, J.: An Introduction to Parallel Algorithms, vol. 10, p. 133889. Addison-Wesley, Reading (1992)
Kärkkäinen, J.: Fast BWT in small space by blockwise suffix sorting. Theoret. Comput. Sci. 387(3), 249–257 (2007). https://doi.org/10.1016/j.tcs.2007.07.018
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45061-0_73
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM (JACM) 53(6), 918–936 (2006). https://doi.org/10.1145/1217856.1217858
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_14
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discret. Algorithms 3(2–4), 143–156 (2005). https://doi.org/10.1016/j.jda.2004.08.002
Kulla, F., Sanders, P.: Scalable parallel suffix array construction. Parallel Comput. 33(9), 605–612 (2007)
Lao, B., Nong, G., Chan, W.H., Pan, Y.: Fast induced sorting suffixes on a multicore machine. J. Supercomput. 74(7), 3468–3485 (2018). https://doi.org/10.1007/s11227-018-2395-5
Lao, B., Nong, G., Chan, W.H., Xie, J.Y.: Fast in-place suffix sorting on a multicore computer. IEEE Trans. Comput. 67(12), 1737–1749 (2018). https://doi.org/10.1109/TC.2018.2842050
Larsson, N.J., Sadakane, K.: Faster suffix sorting. Theoret. Comput. Sci. 387(3), 258–272 (2007). https://doi.org/10.1016/j.tcs.2007.07.017
Li, Z., Li, J., Huo, H.: Optimal in-place suffix sorting. Inf. Comput. 285, 104818 (2022)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). https://doi.org/10.1137/0222058
Maniscalco, M.A., Puglisi, S.J.: Faster lightweight suffix array construction. In: Proceedings of International Workshop on Combinatorial Algorithms (IWOCA), pp. 16–29 (2006)
Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40(1), 33–50 (2004). https://doi.org/10.1007/s00453-004-1094-1
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM (JACM) 23(2), 262–272 (1976). https://doi.org/10.1145/321941.321946
Nong, G.: Practical linear-time O(1)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. (TOIS) 31(3), 1–15 (2013). https://doi.org/10.1145/2493175.2493180
Osipov, V.: Parallel suffix array construction for shared memory architectures. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 379–384. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34109-0_40
Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. (CSUR) 39(2), 4-es (2007). https://doi.org/10.1145/1242471.1242472
Shun, J.: Fast parallel computation of longest common prefixes. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 387–398. IEEE (2014). https://doi.org/10.1109/SC.2014.37
Shun, J., et al.: Brief announcement: the problem based benchmark suite. In: Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 68–70 (2012). https://doi.org/10.1145/2312005.2312018
Wang, L., Baxter, S., Owens, J.D.: Fast parallel skew and prefix-doubling suffix array construction on the GPU. Concurr. Comput. Pract. Exp. 28(12), 3466–3484 (2016). https://doi.org/10.1002/cpe.3867
Acknowledgement
This research was funded in part by NSF grant number CCF-2109988.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Du, Z., Zhang, S., Bader, D.A. (2023). Parallel Suffix Sorting for Large String Analytics. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. https://doi.org/10.1007/978-3-031-30442-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-30442-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30441-5
Online ISBN: 978-3-031-30442-2
eBook Packages: Computer ScienceComputer Science (R0)