Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3550356.3561568acmconferencesArticle/Chapter ViewAbstractPublication PagesmodelsConference Proceedingsconference-collections
research-article

Dynamic data management for continuous retraining

Published: 09 November 2022 Publication History

Abstract

Managing dynamic datasets intended to serve as training data for a Machine Learning (ML) model often emerges as very challenging, especially when data is often altered iteratively and already existing ML models should pertain to the data. For example, this applies when new data versions arise from either a generated or aggregated extension of an existing dataset a model has already been trained on. In this work, it is investigated on how a model-based approach for these training data concerns can be provided as well as how the complete process, including the resulting training and retraining process of the ML model, can therein be integrated. Hence, model-based concepts and the implementation are devised to cope with the complexity of iterative data management as an enabler for the integration of continuous retraining routines. With Deep Learning techniques becoming technically feasible and massively being developed further over the last decade, MLOps, aiming to establish DevOps tailored to ML projects, gained crucial relevance. Unfortunately, data-management concepts for iteratively growing datasets with retraining capabilities embedded in a model-driven ML development methodology are unexplored to the best of our knowledge. To fill in this gap, this contribution provides such agile data management concepts and integrates them and continuous retraining into the model-driven ML Framework MontiAnna [18]. The new functionality is evaluated in the context of a research project where ML is exploited for the optimal design of lattice structures for crash applications.

References

[1]
P. Agrawal et al. 2019. Data platform for machine learning. In Proceedings of the 2019 International Conference on Management of Data. 1803--1816.
[2]
Sridhar Alla and Suman Kalyan Adari. 2021. What is mlops? In Beginning MLOps with MLFlow. Springer, 79--124.
[3]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In ICSE-SEIP'19. 291--300.
[4]
Abdallah Atouani, Jörg Christian Kirchhof, Evgeny Kusmenko, and Bernhard Rumpe. 2021. Artifact and Reference Models for Generative Machine Learning Frameworks and Build Systems. In GPCE'21. 55--68.
[5]
Amine Barrak, Ellis E. Eghan, and Bram Adams. 2021. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. In SANER'21. 422--433.
[6]
Marouane Birjali, Abderrahim Beni-Hssane, and Mohammed Erritali. 2017. Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Procedia Computer Science 113 (2017), 65--72.
[7]
Matthias Boehm, Arun Kumar, and Jun Yang. 2019. Data management in machine learning systems. Synthesis Lectures on Data Management 11, 1 (2019), 1--173.
[8]
Carl Boettiger. 2018. Managing larger data on a github repository. Journal of Open Source Software 3, 29 (2018), 971.
[9]
Robert Culkin and Sanjiv R Das. 2017. Machine learning in finance: the case of deep learning for option pricing. Journal of Investment Management 15, 4 (2017), 92--100.
[10]
Mike Folk, Gerd Heber, Quincey Koziol, Elena Pourmal, and Dana Robinson. 2011. An overview of the HDF5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases. 36--47.
[11]
Nicola Gatto, Evgeny Kusmenko, and Bernhard Rumpe. 2019. Modeling Deep Reinforcement Learning Based Architectures for Cyber-Physical Systems. In Proceedings of MODELS 2019. Workshop MDE Intelligence (Munich), Loli Burgueño, Alexander Pretschner, Sebastian Voss, Michel Chaudron, Jörg Kienzle, Markus Völter, Sébastien Gérard, Mansooreh Zahedi, Erwan Bousse, Arend Rensink, Fiona Polack, Gregor Engels, and Gerti Kappel (Eds.). 196--202. http://www.se-rwth.de/publications/Modeling-Deep-Reinforcement-Learning-based-Architectures-for-Cyber-Physical-Systems.pdf
[12]
Lei Gu and Huan Li. 2013. Memory or time: Performance evaluation for iterative operation on hadoop and spark. In 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing. IEEE, 721--727.
[13]
Robert Ilijason. 2020. Getting Data into Databricks. In Beginning Apache Spark Using Azure Databricks. Springer, 51--73.
[14]
et al. Jain, A. 2020. Overview and importance of data quality for machine learning tasks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3561--3562.
[15]
Nils Kaminski, Evgeny Kusmenko, and Bernhard Rumpe. 2019. Modeling Dynamic Architectures of Self-Adaptive Cooperative Systems. The Journal of Object Technology 18, 2 (July 2019), 1--20. The 15th European Conference on Modelling Foundations and Applications.
[16]
Sandeep Koranne. 2011. Hierarchical data format 5: HDF5. In Handbook of open source tools. Springer, 191--200.
[17]
Holger Krahn, Bernhard Rumpe, and Stefen Völkel. 2010. MontiCore: a Framework for Compositional Development of Domain Specific Languages. International Journal on Software Tools for Technology Transfer (STTT) 12, 5 (September 2010), 353--372.
[18]
Evgeny Kusmenko, Sebastian Nickels, Svetlana Pavlitskaya, Bernhard Rumpe, and Thomas Timmermanns. 2019. Modeling and Training of Neural Processing Systems. In 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS). 283--293.
[19]
Evgeny Kusmenko, Sebastian Nickels, Svetlana Pavlitskaya, Bernhard Rumpe, and Thomas Timmermanns. 2019. Modeling and Training of Neural Processing Systems. In MODELS'19 (Munich). IEEE, 283--293.
[20]
Evgeny Kusmenko, Bernhard Rumpe, Sascha Schneiders, and Michael von Wenckstern. 2018. Highly-Optimizing and Multi-Target Compiler for Embedded System Models: C++ Compiler Toolchain for the Component and Connector Language EmbeddedMontiArc. In MODELS'18 (Copenhagen). ACM, 447 -- 457.
[21]
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2017), 2935--2947.
[22]
Jie Lu, Anjin Liu, Fan Dong, Feng Gu, Joao Gama, and Guangquan Zhang. 2018. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2018), 2346--2363.
[23]
Gunasekaran Manogaran and Daphne Lopez. 2017. A survey of big data architectures and machine learning algorithms in healthcare. International Journal of Biomedical Engineering and Technology 25, 2-4 (2017), 182--211.
[24]
Jyoti Nandimath, Ekata Banerjee, Ankur Patil, Pratima Kakade, Saumitra Vaidya, and Divyansh Chaturvedi. 2013. Big data analysis using Apache Hadoop. In 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI). IEEE, 700--703.
[25]
S. Pirmohammad and S Esmaeili Marzdashti. 2018. Crashworthiness optimization of combined straight-tapered tubes using genetic algorithm and neural networks. Thin-Walled Structures 127 (2018), 318--332.
[26]
Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2017. Data management challenges in production machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data. 1723--1726.
[27]
Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record 47, 2 (2018), 17--28.
[28]
Philipp Ruf, Manav Madan, Christoph Reich, and Djaffar Ould-Abdeslam. 2021. Demystifying MLOps and Presenting a Recipe for the Selection of Open-Source Tools. Applied Sciences 11, 19 (2021).
[29]
Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, et al. 2018. Accelerating the machine learning lifecycle with MLflow. IEEE Data Eng. Bull. 41, 4 (2018), 39--45.

Cited By

View all
  • (2024)Model driven engineering for machine learning componentsInformation and Software Technology10.1016/j.infsof.2024.107423169:COnline publication date: 2-Jul-2024
  • (2024)Bridging MDE and AI: a systematic review of domain-specific languages and model-driven practices in AI software systems engineeringSoftware and Systems Modeling10.1007/s10270-024-01211-yOnline publication date: 28-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MODELS '22: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings
October 2022
1003 pages
ISBN:9781450394673
DOI:10.1145/3550356
  • Conference Chairs:
  • Thomas Kühn,
  • Vasco Sousa
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • Univ. of Montreal: University of Montreal
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. artificial intelligence
  2. data management
  3. model-driven engineering
  4. retraining

Qualifiers

  • Research-article

Funding Sources

  • Federal Ministry for Economic Affairs and Climate Action

Conference

MODELS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 144 of 506 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)7
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Model driven engineering for machine learning componentsInformation and Software Technology10.1016/j.infsof.2024.107423169:COnline publication date: 2-Jul-2024
  • (2024)Bridging MDE and AI: a systematic review of domain-specific languages and model-driven practices in AI software systems engineeringSoftware and Systems Modeling10.1007/s10270-024-01211-yOnline publication date: 28-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media