Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2903150.2903172acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Public Access

Sequential pattern mining with the Micron automata processor

Published: 16 May 2016 Publication History

Abstract

Sequential pattern mining (SPM) is a widely used data mining technique for discovering common sequences of events in large databases. When compared with the simple set mining problem and string mining problem, the hierarchical structure of sequential pattern mining (due to the need to consider frequent subsets within each itemset, as well as order among itemsets) and the resulting large permutation space makes SPM extremely expensive on conventional processor architectures. We propose a hardware-accelerated solution of the SPM using Micron's Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). The Generalized Sequential Pattern (GSP) algorithm for SPM searching exposes massive parallelism, and is therefore well-suited for AP acceleration. We implement the multi-pass pruning strategy of the GSP via the AP's fast reconfigurability. A generalized automaton structure is proposed by flattening sequential patterns to simple strings to reduce compilation time and to minimize overhead of reconfiguration. Up to 90X and 29X speedups are achieved by the AP-accelerated GSP on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on multicore CPU by up to 452X and 49X speedups. The AP advantage grows further with larger datasets.

References

[1]
Micron Automata Processor website, 2015. http://www.micronautomata.com/documentation.
[2]
C. C. Aggarwal and J. Han, editors. Frequent Pattern Mining. Springer International Publishing, Cham, 2014.
[3]
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. ICDE'95, pages 3--14. IEEE, 1995.
[4]
S. Cong, J. Han, J. Hoeflinger, and D. Padua. A sampling-based framework for parallel data mining. In Proc. PPoPP '05. ACM, 2005.
[5]
P. Dlugosch et al. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE TPDS, 25(12):3088--3098, 2014.
[6]
W. Fang et al. Frequent itemset mining on graphics processors. In Proc. DaMoN '09, 2009.
[7]
P. Fournier-Viger et al. Spmf: A java open-source pattern mining library. Journal of Machine Learning Research, 15:3569--3573, 2014.
[8]
V. Guralnik and G. Karypis. Parallel tree-projection-based sequence mining algorithms. Parallel Comput., 30(4):443--472, Apr. 2004.
[9]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD '00. ACM, 2000.
[10]
K. Hryniów. Parallel pattern mining-application of gsp algorithm for graphics processing units. In ICCC '12, pages 233--236. IEEE, 2012.
[11]
H. Noyes. Micron automata processor architecture: Reconfigurable and massively parallel automata processing. In Proc. of Fifth International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2014. Keynote presentation.
[12]
J. Pei et al. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans. on Knowl. and Data Eng., 16(11):1424--1440, 2004.
[13]
I. Roy and S. Aluru. Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM T COMPUT BI, 13(1):99--111, 2016.
[14]
T. Shintani and M. Kitsuregawa. Mining algorithms for sequential patterns in parallel: Hash based approach. In Proceedings of the Second Pacific--Asia Conference on Knowledge Discovery and Data mining, pages 283--294, 1998.
[15]
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. EDBT '96, 1996.
[16]
K. Wang, Y. Qi, J. Fox, M. Stan, and K. Skadron. Association rule mining with the micron automata processor. In Proc. IPDPS '15, 2015.
[17]
M. J. Zaki. Scalable algorithms for association mining. IEEE Trans. on Knowl. and Data Eng., 12(3):372--390, 2000.
[18]
M. J. Zaki. Parallel sequence mining on shared-memory machines. J. Parallel Distrib. Comput., 61(3):401--426, 2001.
[19]
M. J. Zaki. Spade: An efficient algorithm for mining frequent sequences. Mach. Learn., 42(1-2):31--60, 2001.
[20]
F. Zhang, Y. Zhang, and J. D. Bakos. Accelerating frequent itemset mining on graphics processing units. J. Supercomput., 66(1):94--117, 2013.
[21]
Y. Zu et al. GPU-based NFA implementation for memory efficient high speed regular expression matching. In Proc. PPoPP '12, pages 129--140. ACM, 2012.

Cited By

View all
  • (2024)Secco: Codesign for Resource Sharing in Regular-Expression Accelerators2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473854(219-224)Online publication date: 22-Jan-2024
  • (2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
  • (2022)DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata ProcessorACM Transactions on Architecture and Code Optimization10.1145/355697619:4(1-26)Online publication date: 7-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '16: Proceedings of the ACM International Conference on Computing Frontiers
May 2016
487 pages
ISBN:9781450341288
DOI:10.1145/2903150
  • General Chairs:
  • Gianluca Palermo,
  • John Feo,
  • Program Chairs:
  • Antonino Tumeo,
  • Hubertus Franke
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2016

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. apriori
  2. automata processor
  3. sequential pattern mining

Qualifiers

  • Research-article

Funding Sources

Conference

CF'16
Sponsor:
CF'16: Computing Frontiers Conference
May 16 - 19, 2016
Como, Italy

Acceptance Rates

CF '16 Paper Acceptance Rate 30 of 94 submissions, 32%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)21
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Secco: Codesign for Resource Sharing in Regular-Expression Accelerators2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473854(219-224)Online publication date: 22-Jan-2024
  • (2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
  • (2022)DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata ProcessorACM Transactions on Architecture and Code Optimization10.1145/355697619:4(1-26)Online publication date: 7-Oct-2022
  • (2022)CAMA: Energy and Memory Efficient Automata Processing in Content-Addressable Memories2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00011(25-37)Online publication date: Apr-2022
  • (2021)Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching AccelerationMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480934(311-323)Online publication date: 18-Oct-2021
  • (2020)Accelerating Legacy String Kernels via Bounded Automata LearningProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378503(235-249)Online publication date: 9-Mar-2020
  • (2020)Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00017(86-98)Online publication date: Feb-2020
  • (2020)Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM48280.2020.00027(138-147)Online publication date: May-2020
  • (2019)Innovations in the Memory SystemSynthesis Lectures on Computer Architecture10.2200/S00933ED1V01Y201906CAC04814:2(1-151)Online publication date: 10-Sep-2019
  • (2019)Automata Processing in Reconfigurable ArchitecturesACM Transactions on Reconfigurable Technology and Systems10.1145/331457612:2(1-25)Online publication date: 17-May-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media