Abstract
Mining frequent itemsets (FIs) in transaction databases is a very popular task in data mining. It helps create meaningful and effective representations for customer transactions which is a key step in the process of transaction classification and clustering. To improve the quality of these representations, previous studies have adapted vector embedding methods to learn transaction embeddings from items and FIs. However, FIs are still a simple pattern type that ignores important information about transactions such as the purchase quantities of items and their unit profits. To address this issue, we propose to learn transaction embeddings from items and high-utility itemsets (HUIs), a more general pattern type. Since HUIs were shown to be more appropriate than FIs for a wide range of applications, we take for hypothesis that transaction embeddings learned from HUIs will be more representative and meaningful. We introduce an unsupervised method, named Hui2Vec, to learn transaction embeddings by combining both singleton items and HUIs. We demonstrate the superior quality of the embedding achieved with the proposed method compared to the embeddings learned from items and FIs on four datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: ICDE 2007, pp. 716–725 (2007)
Fournier-Viger, P., Lin, J.C.-W., Vo, B., Chi, T.T., Zhang, J., Le, H.B.: A survey of itemset mining. Wiley Interdiscip. Data Min. Knowl. Discov. 7(4), e1207 (2017)
He, Z., Feiyang, G., Zhao, C., Liu, X., Jun, W., Wang, J.: Conditional discriminative pattern mining: concepts and algorithms. Inf. Sci. 375, 1–15 (2017)
Kameya, Y., Sato, T.: RP-growth, Top-k mining of relevant patterns with minimum support raising. In: SIAM International Conference on Data Mining 2012, pp. 816–827 (2012)
Nguyen, D., Nguyen, T.D., Luo, W., Venkatesh, S.: Trans2Vec: learning transaction embedding via items and frequent itemsets. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 361–372. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_29
Zida, S., Fournier-Viger, P., Chun-Wei Lin, J., Wu, C.W., Tseng, V.S.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)
Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Fournier-Viger, P., Wu, C.-W., Tseng, V.S.: Novel concise representations of high utility itemsets using generator patterns. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 30–43. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_3
Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert Syst. Appl. 42(5), 2371–2381 (2015)
Tseng, V.S., Shie, B.E., Wu, C.W., Yu, P.S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)
Thilagu, M., Nadarajan, R.: Effciently mining of effective web traversal patterns with average utility. In: Proceedings of the International Conference on Communication, Computing, and Security, pp. 444–451. CRC Press (2012)
Fournier-Viger, P., Lin, J.C.-W., Nkambou, R., Vo, B., Tseng, V.S. (eds.): High-Utility Pattern Mining. SBD, vol. 51. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04921-8
Liu, Y., Cheng, C., Tseng, V.S.: Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinform. 14(230) (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML 2014, pp. 1188–1196 (2014)
Chen, M.: Efficient vector representation for documents through corruption. In: ICLR 2017 (2017)
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3389–3393 (2014)
Lan, G.C., Hong, T.P., Tseng, V.S.: An efficient projection-based indexing approach for mining high utility itemsets. Knowl. Inf. Syst. 38(1), 85–107 (2014)
Liu, J., Wang, K., Fung, B.: Direct discovery of high utility itemsets without candidate generation. In: Proceedings of the 12th IEEE International Conference on Data Mining, IEEE, Brussels, Belgium, December 2012, p. 984989 (2012)
Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79
Song, W., Liu, Y., Li, J.: BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Proc. Int. J. Data Warehous. Min. 10(1), 1–15 (2014)
Yun, U., Ryang, H., Ryu, K.H.: High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst. Appl. 41(8), 3861–3878 (2014)
Grohe, M.: Word2vec, Node2vec, Graph2vec, X2vec: towards a theory of vector embeddings of structured data. In: ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, pp. 1–16 (2020)
Luo, J., Xiao, S., Jiang, S.: Ripple2Vec: node embedding with ripple distance of structures. Data Sci. Eng. 7, 156–174 (2022)
Cao, S., Lu, W., Xu, Q.: GraRep. Learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 891–900 (2015)
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114 (2016)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Grover, A., Leskovec., J.: Node2Vec: scalable feature learning for networks. In: Krishnapuram, B.B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal., S.: Graph2Vec: learning distributed representations of graphs. ArXiv (CoRR), arXiv:1707.05005 [cs.AI] (2017)
Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. pp. 2609–2615 (2018)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: ACL 2015, pp. 1702–1712 (2015)
Acknowledgment
Authors would like to thank the authors of Trans2Vec [5] for providing their source code. This work is partially supported by NARD Intelligence\(^{1}\).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Belghith, K., Fournier-Viger, P., Jawadi, J. (2022). Hui2Vec: Learning Transaction Embedding Through High Utility Itemsets. In: Roy, P.P., Agarwal, A., Li, T., Krishna Reddy, P., Uday Kiran, R. (eds) Big Data Analytics. BDA 2022. Lecture Notes in Computer Science, vol 13773. Springer, Cham. https://doi.org/10.1007/978-3-031-24094-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-24094-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24093-5
Online ISBN: 978-3-031-24094-2
eBook Packages: Computer ScienceComputer Science (R0)