Abstract
Clustering customers, predicting account balances, scoring credits, detecting risk cash flows, etc. are the problems that have been focused on research in the banking sector. With the explosion of big data, these problems will take a new approach. This paper proposes a new solution based on historical information of balances to cluster customers. The work has implemented clustering algorithms for time series in a big data environment. In addition, stream data clustering was tested with positive results. The result of customer clustering helps to make marketing decisions, forecasting of customer deposits in the following month, etc.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shi, W., et al.: Effective prediction of missing data on apache spark over multivariable time series. IEEE Trans. Big Data 4(4), 473–486 (2018). https://doi.org/10.1109/tbdata.2017.2719703
Iredale, T.B, Erfani, S.M., Leckie, C.: An efficient visual assessment of cluster tendency tool for large-scale time series data sets. In: Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, pp. 1–8 (2017)
Galicia, A., Torres, J.F., Martínez-Álvarez, F., Troncoso, A.: A novel Spark-based multi-step forecasting algorithm for big data time series. Inf. Sci. 467, 800–818 (2018). https://doi.org/10.1016/j.ins.2018.06.010
Talavera-Llames, R., Pérez-Chacón, R., Troncoso, A., Martínez-Álvarez, F.: Big data time series forecasting based on nearest neighbours distributed computing with Spark. Knowl. Based Syst. 161, 12–25 (2018)
Hussain, L., Banarjee, S., Kumar, S., Chaubey, A., Reza, M.: Forecasting time series stock data using deep learning technique in a distributed computing environment. In: Proceedings of the International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, UP, India, pp. 489–493 (2018)
Talei, H., Essaaidi, M., Benhaddou, D.: An end to end real time architecture for analyzing and clustering time series data: case of an energy management system. In: Proceedings of the 6th International Renewable and Sustainable Energy Conference (IRSEC), Rabat, Morocco, pp. 1–7 (2018)
Bouslama, A., Laaziz, Y., Tali, A.: Scalable and real-time time series analytics: telemedicine as use case. In: Proceedings of the IEEE 5th International Congress on Information Science and Technology (CiSt), Marrakech, Morocco, pp. 70–73 (2018)
Oleksandra, L., Djamel-Edine, Y., Reza, A., Masseglia, F., Kolev, B., Shasha, D.: Spark-parSketch: a massively distributed indexing of time series datasets. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018). Association for Computing Machinery, New York, USA, 1951–1954 (2018). https://doi.org/10.1145/3269206.3269226
Zhang, L., Alghamdi, N., Eltabakh, M.Y., Rundensteiner, E.A.: TARDIS: distributed indexing framework for big time series data. In: Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE), Macao, Macao, pp. 1202–1213 (2019)
https://github.com/ziczacziczac/customer-clustering/tree/master/data
Berndt, D.J., Clifford, J.: Using dynamic time warping to findpatterns in time series. In: Proceedings of the Workshop on Knowledge Discovery in Databases, Washington, pp. 359–370 (1994)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Spilioopoulou, M., Mtoutsi, I., Theodoridis, Y., Schult, R.: MONIC - Modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD International conference on Knowledge discovery and data mining, Philadelphia PA, USA (2006)
Oliverira, M., Gama, J.: MEC: monitoring clusters’ transitions. In: Proceedings of the 5th Starting AI Researchers’ Symposium, Amsterdam, Netherlands (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dat, D.Q., Hung, P.D. (2020). Clustering of Time-Series Balance History Data Streams Using Apache Spark. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2020. Lecture Notes in Computer Science(), vol 12341. Springer, Cham. https://doi.org/10.1007/978-3-030-60816-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-60816-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60815-6
Online ISBN: 978-3-030-60816-3
eBook Packages: Computer ScienceComputer Science (R0)