research-article

Frequent subtree mining on the automata processor: challenges and opportunities

Authors:

Elaheh Sadredini,

Kevin SkadronAuthors Info & Claims

ICS '17: Proceedings of the International Conference on Supercomputing

Article No.: 4, Pages 1 - 11

https://doi.org/10.1145/3079079.3079084

Published: 14 June 2017 Publication History

Abstract

Frequency counting of complex patterns such as subtrees is more challenging than for simple itemsets and sequences, as the number of possible candidate patterns in a tree is much higher than one-dimensional data structures, with dramatically higher processing times. In this paper, we propose a new and scalable solution for frequent subtree mining (FTM) on the Automata Processor (AP), a new and highly parallel accelerator architecture. We present a multi-stage pruning framework on the AP, called AP-FTM, to reduce the search space of FTM candidates. This achieves up to 353X speedup at the cost of a small reduction in accuracy, on four real-world and synthetic datasets, when compared with PatternMatcher, a practical and exact CPU solution. To provide a fully accurate and still scalable solution, we propose a hybrid method to combine AP-FTM with a CPU exact-matching approach, and achieve up to 262X speedup over PatternMatcher on a challenging database. We also develop a GPU algorithm for FTM, but show that the AP also outperforms this. The results on a synthetic database show the AP advantage grows further with larger datasets.

References

[1]

A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. 2011. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media. Association for Computational Linguistics.

Digital Library

[2]

C. Bo, K. Wang, J. Fox, and K. Skadron. 2016. Entity Resolution Acceleration using the Automata Processor. In Proceedings of the IEEE International Conference on Big Data. IEEE.

[3]

Y. Chi and J. Kok. 2001. Frequent subtree mining-an overview. Fundamenta Informaticae 21.

Digital Library

[4]

P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems (TPDS) 25, 12.

[5]

R. Iváncsy and I. Vajk. 2007. Automata Theory Approach for Solving Frequent Pattern Discovery Problems. International Journal of Computer, Electrical, Automation, Control and Information Engineering, World Academy of Science, Engineering and Technology 1, 8.

[6]

H. Tan, F. Hadzic, T. S Dillon, E. Chang, and L. Feng. 2008. Tree model guided candidate generation for mining frequent subtrees from XML documents. ACM Transactions on Knowledge Discovery from Data (TKDD).

Digital Library

[7]

S. Tatikonda and S. Parthasarathy. 2009. Mining tree-structured data on multicore systems. Very Large Data Base (VLDB), ACM.

Digital Library

[8]

S. Tatikonda, S. Parthasarathy, and T. Kurc. 2006. TRIPS and TIDES: new algorithms for tree mining. The Conference on Information and Knowledge Management (CIKM), ACM.

Digital Library

[9]

T. Tracy II, Y. Fu, I. Roy, E. Jonas, and P. Glendenning. 2016. Towards machine learning on the Automata Processor. In International Conference on High Performance Computing. Springer.

[10]

J. Wadden, V. Dang, N. Brunelle, T. Tracy II, D. Guo, E. Sadredini, K. Wang, C. Bo, G. Robins, M. Stan, and K. Skadron. 2016. ANMLzoo: a benchmark suite for exploring bottlenecks in Automata Processing engines and architectures. In in Workload Characterization (IISWC). IEEE.

[11]

C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. 2004. Efficient pattern-growth methods for frequent tree pattern mining. In Proc. of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Springer.

[12]

K. Wang, Y. Qi, J. J Fox, M. R Stan, and K. Skadron. 2015. Association rule mining with the Micron Automata Processor. In IEEE International Parallel Distributed Processing Symposium (IPDPS).

Digital Library

[13]

K. Wang, E. Sadredini, and K. Skadron. Hierarchical Pattern Mining with the Micron Automata Processor. In International Journal of Parallel Programming (IJPP). 2017.

[14]

K. Wang, E. Sadredini, and K. Skadron. 2016. Sequential Pattern Mining with the Micron Automata Processor. In ACM International Conference on Computing Frontiers (CF).

Digital Library

[15]

M. J. Zaki. 2002. Efficiently mining frequent trees in a forest. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Digital Library

[16]

M. J. Zaki. 2005. Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering (TKDE) 17, 8.

Digital Library

Cited By

Ge TZhang TLiu HTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624848
Liu HPai SJog A(2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3579453
Wang XGong LCao JLou WWang WWang CZhou XIenne PZhang Z(2023)hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573190(185-196)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573190
Show More Cited By

Index Terms

Frequent subtree mining on the automata processor: challenges and opportunities
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, single data
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures

Recommendations

Frequent Subtree Mining - An Overview
Advances in Mining Graphs, Trees and Sequences

Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for ...
Frequent Subtree Mining - An Overview
Advances in Mining Graphs, Trees and Sequences

Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for ...
Mining Frequent Embedded Subtree from Tree-Like Databases
ICICIS '11: Proceedings of the 2011 International Conference on Internet Computing and Information Services

Mining frequent sub tree from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '17: Proceedings of the International Conference on Supercomputing

June 2017

300 pages

ISBN:9781450350204

DOI:10.1145/3079079

General Chairs:
William D. Gropp
University of Illinois at Urbana-Champaign, Illinois
,
Pete Beckman
Argonne National Laboratory/Northwestern University, Illinois
,
Program Chairs:
Zhiyuan Li
Purdue University, West Lafayette, Indiana
,
Francisco J. Cazorla
IIIA-CSIC and Barcelona Supercomputing Center, Barcelona, Spain

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS '17

Sponsor:

SIGARCH

ICS '17: 2017 International Conference on Supercomputing

June 14 - 16, 2017

Illinois, Chicago

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
212
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ge TZhang TLiu HTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624848
Liu HPai SJog A(2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3579453
Wang XGong LCao JLou WWang WWang CZhou XIenne PZhang Z(2023)hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573190(185-196)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573190
Huang YChen ZLi DYang K(2022)CAMA: Energy and Memory Efficient Automata Processing in Content-Addressable Memories2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00011(25-37)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00011
Sadredini ERahimi RImani MSkadron K(2021)Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching AccelerationMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480934(311-323)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480934
Liu HPai SJog ALarus JCeze LStrauss K(2020)Why GPUs are Slow at Executing NFAs and How to Make them FasterProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378471(251-265)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378471
Sadredini ERahimi RLenjani MStan MSkadron KLarus JCeze LStrauss K(2020)FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378459(219-234)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378459
Sadredini ERahimi RLenjani MStan MSkadron K(2020)Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00017(86-98)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00017
Rahimi RSadredini EStan MSkadron K(2020)Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM48280.2020.00027(138-147)Online publication date: May-2020
https://doi.org/10.1109/FCCM48280.2020.00027
Sadredini ERahimi RVerma VStan MSkadron K(2019)eAPProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358324(87-99)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358324
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents