Frequent subtree mining on the automata processor: challenges and opportunities
E Sadredini, R Rahimi, K Wang… - Proceedings of the …, 2017 - dl.acm.org
Proceedings of the International Conference on Supercomputing, 2017•dl.acm.org
Frequency counting of complex patterns such as subtrees is more challenging than for
simple itemsets and sequences, as the number of possible candidate patterns in a tree is
much higher than one-dimensional data structures, with dramatically higher processing
times. In this paper, we propose a new and scalable solution for frequent subtree mining
(FTM) on the Automata Processor (AP), a new and highly parallel accelerator architecture.
We present a multi-stage pruning framework on the AP, called AP-FTM, to reduce the search …
simple itemsets and sequences, as the number of possible candidate patterns in a tree is
much higher than one-dimensional data structures, with dramatically higher processing
times. In this paper, we propose a new and scalable solution for frequent subtree mining
(FTM) on the Automata Processor (AP), a new and highly parallel accelerator architecture.
We present a multi-stage pruning framework on the AP, called AP-FTM, to reduce the search …
Frequency counting of complex patterns such as subtrees is more challenging than for simple itemsets and sequences, as the number of possible candidate patterns in a tree is much higher than one-dimensional data structures, with dramatically higher processing times. In this paper, we propose a new and scalable solution for frequent subtree mining (FTM) on the Automata Processor (AP), a new and highly parallel accelerator architecture. We present a multi-stage pruning framework on the AP, called AP-FTM, to reduce the search space of FTM candidates. This achieves up to 353X speedup at the cost of a small reduction in accuracy, on four real-world and synthetic datasets, when compared with PatternMatcher, a practical and exact CPU solution. To provide a fully accurate and still scalable solution, we propose a hybrid method to combine AP-FTM with a CPU exact-matching approach, and achieve up to 262X speedup over PatternMatcher on a challenging database. We also develop a GPU algorithm for FTM, but show that the AP also outperforms this. The results on a synthetic database show the AP advantage grows further with larger datasets.
ACM Digital Library