Clustering of Time-Series Balance History Data Streams Using Apache Spark

Do Quang Dat⁹ &
Phan Duy Hung⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12341))

Included in the following conference series:

International Conference on Cooperative Design, Visualization and Engineering

1109 Accesses

Abstract

Clustering customers, predicting account balances, scoring credits, detecting risk cash flows, etc. are the problems that have been focused on research in the banking sector. With the explosion of big data, these problems will take a new approach. This paper proposes a new solution based on historical information of balances to cluster customers. The work has implemented clustering algorithms for time series in a big data environment. In addition, stream data clustering was tested with positive results. The result of customer clustering helps to make marketing decisions, forecasting of customer deposits in the following month, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Survey on Issues of Data Stream Mining in Classification

Case Study I: Data Clustering using Scalding and Spark

Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture

References

Shi, W., et al.: Effective prediction of missing data on apache spark over multivariable time series. IEEE Trans. Big Data 4(4), 473–486 (2018). https://doi.org/10.1109/tbdata.2017.2719703
Article MathSciNet Google Scholar
Iredale, T.B, Erfani, S.M., Leckie, C.: An efficient visual assessment of cluster tendency tool for large-scale time series data sets. In: Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, pp. 1–8 (2017)
Google Scholar
Galicia, A., Torres, J.F., Martínez-Álvarez, F., Troncoso, A.: A novel Spark-based multi-step forecasting algorithm for big data time series. Inf. Sci. 467, 800–818 (2018). https://doi.org/10.1016/j.ins.2018.06.010
Article Google Scholar
Talavera-Llames, R., Pérez-Chacón, R., Troncoso, A., Martínez-Álvarez, F.: Big data time series forecasting based on nearest neighbours distributed computing with Spark. Knowl. Based Syst. 161, 12–25 (2018)
Google Scholar
Hussain, L., Banarjee, S., Kumar, S., Chaubey, A., Reza, M.: Forecasting time series stock data using deep learning technique in a distributed computing environment. In: Proceedings of the International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, UP, India, pp. 489–493 (2018)
Google Scholar
Talei, H., Essaaidi, M., Benhaddou, D.: An end to end real time architecture for analyzing and clustering time series data: case of an energy management system. In: Proceedings of the 6th International Renewable and Sustainable Energy Conference (IRSEC), Rabat, Morocco, pp. 1–7 (2018)
Google Scholar
Bouslama, A., Laaziz, Y., Tali, A.: Scalable and real-time time series analytics: telemedicine as use case. In: Proceedings of the IEEE 5th International Congress on Information Science and Technology (CiSt), Marrakech, Morocco, pp. 70–73 (2018)
Google Scholar
Oleksandra, L., Djamel-Edine, Y., Reza, A., Masseglia, F., Kolev, B., Shasha, D.: Spark-parSketch: a massively distributed indexing of time series datasets. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018). Association for Computing Machinery, New York, USA, 1951–1954 (2018). https://doi.org/10.1145/3269206.3269226
Zhang, L., Alghamdi, N., Eltabakh, M.Y., Rundensteiner, E.A.: TARDIS: distributed indexing framework for big time series data. In: Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE), Macao, Macao, pp. 1202–1213 (2019)
Google Scholar
https://github.com/ziczacziczac/customer-clustering/tree/master/data
Berndt, D.J., Clifford, J.: Using dynamic time warping to findpatterns in time series. In: Proceedings of the Workshop on Knowledge Discovery in Databases, Washington, pp. 359–370 (1994)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
MATH Google Scholar
Spilioopoulou, M., Mtoutsi, I., Theodoridis, Y., Schult, R.: MONIC - Modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD International conference on Knowledge discovery and data mining, Philadelphia PA, USA (2006)
Google Scholar
Oliverira, M., Gama, J.: MEC: monitoring clusters’ transitions. In: Proceedings of the 5th Starting AI Researchers’ Symposium, Amsterdam, Netherlands (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

FPT University, Hanoi, Vietnam
Do Quang Dat & Phan Duy Hung

Authors

Do Quang Dat
View author publications
You can also search for this author in PubMed Google Scholar
Phan Duy Hung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phan Duy Hung .

Editor information

Editors and Affiliations

University of the Balearic Islands, Palma, Mallorca, Spain
Yuhua Luo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dat, D.Q., Hung, P.D. (2020). Clustering of Time-Series Balance History Data Streams Using Apache Spark. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2020. Lecture Notes in Computer Science(), vol 12341. Springer, Cham. https://doi.org/10.1007/978-3-030-60816-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-60816-3_13
Published: 16 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60815-6
Online ISBN: 978-3-030-60816-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Clustering of Time-Series Balance History Data Streams Using Apache Spark

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Issues of Data Stream Mining in Classification

Case Study I: Data Clustering using Scalding and Spark

Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Clustering of Time-Series Balance History Data Streams Using Apache Spark

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Issues of Data Stream Mining in Classification

Case Study I: Data Clustering using Scalding and Spark

Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation