Nothing Special   »   [go: up one dir, main page]

Skip to main content

Clustering of Time-Series Balance History Data Streams Using Apache Spark

  • Conference paper
  • First Online:
Cooperative Design, Visualization, and Engineering (CDVE 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12341))

  • 1109 Accesses

Abstract

Clustering customers, predicting account balances, scoring credits, detecting risk cash flows, etc. are the problems that have been focused on research in the banking sector. With the explosion of big data, these problems will take a new approach. This paper proposes a new solution based on historical information of balances to cluster customers. The work has implemented clustering algorithms for time series in a big data environment. In addition, stream data clustering was tested with positive results. The result of customer clustering helps to make marketing decisions, forecasting of customer deposits in the following month, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Shi, W., et al.: Effective prediction of missing data on apache spark over multivariable time series. IEEE Trans. Big Data 4(4), 473–486 (2018). https://doi.org/10.1109/tbdata.2017.2719703

    Article  MathSciNet  Google Scholar 

  2. Iredale, T.B, Erfani, S.M., Leckie, C.: An efficient visual assessment of cluster tendency tool for large-scale time series data sets. In: Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, pp. 1–8 (2017)

    Google Scholar 

  3. Galicia, A., Torres, J.F., Martínez-Álvarez, F., Troncoso, A.: A novel Spark-based multi-step forecasting algorithm for big data time series. Inf. Sci. 467, 800–818 (2018). https://doi.org/10.1016/j.ins.2018.06.010

    Article  Google Scholar 

  4. Talavera-Llames, R., Pérez-Chacón, R., Troncoso, A., Martínez-Álvarez, F.: Big data time series forecasting based on nearest neighbours distributed computing with Spark. Knowl. Based Syst. 161, 12–25 (2018)

    Google Scholar 

  5. Hussain, L., Banarjee, S., Kumar, S., Chaubey, A., Reza, M.: Forecasting time series stock data using deep learning technique in a distributed computing environment. In: Proceedings of the International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, UP, India, pp. 489–493 (2018)

    Google Scholar 

  6. Talei, H., Essaaidi, M., Benhaddou, D.: An end to end real time architecture for analyzing and clustering time series data: case of an energy management system. In: Proceedings of the 6th International Renewable and Sustainable Energy Conference (IRSEC), Rabat, Morocco, pp. 1–7 (2018)

    Google Scholar 

  7. Bouslama, A., Laaziz, Y., Tali, A.: Scalable and real-time time series analytics: telemedicine as use case. In: Proceedings of the IEEE 5th International Congress on Information Science and Technology (CiSt), Marrakech, Morocco, pp. 70–73 (2018)

    Google Scholar 

  8. Oleksandra, L., Djamel-Edine, Y., Reza, A., Masseglia, F., Kolev, B., Shasha, D.: Spark-parSketch: a massively distributed indexing of time series datasets. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018). Association for Computing Machinery, New York, USA, 1951–1954 (2018). https://doi.org/10.1145/3269206.3269226

  9. Zhang, L., Alghamdi, N., Eltabakh, M.Y., Rundensteiner, E.A.: TARDIS: distributed indexing framework for big time series data. In: Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE), Macao, Macao, pp. 1202–1213 (2019)

    Google Scholar 

  10. https://github.com/ziczacziczac/customer-clustering/tree/master/data

  11. Berndt, D.J., Clifford, J.: Using dynamic time warping to findpatterns in time series. In: Proceedings of the Workshop on Knowledge Discovery in Databases, Washington, pp. 359–370 (1994)

    Google Scholar 

  12. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    MATH  Google Scholar 

  13. Spilioopoulou, M., Mtoutsi, I., Theodoridis, Y., Schult, R.: MONIC - Modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD International conference on Knowledge discovery and data mining, Philadelphia PA, USA (2006)

    Google Scholar 

  14. Oliverira, M., Gama, J.: MEC: monitoring clusters’ transitions. In: Proceedings of the 5th Starting AI Researchers’ Symposium, Amsterdam, Netherlands (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phan Duy Hung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dat, D.Q., Hung, P.D. (2020). Clustering of Time-Series Balance History Data Streams Using Apache Spark. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2020. Lecture Notes in Computer Science(), vol 12341. Springer, Cham. https://doi.org/10.1007/978-3-030-60816-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60816-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60815-6

  • Online ISBN: 978-3-030-60816-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics