Continuously Learning Prediction Models for Smart Domestic Hot Water Management
<p>Schematic view of a smart domestic hot water management system.</p> "> Figure 2
<p>Prediction model architecture.</p> "> Figure 3
<p>First twenty-eight days of Dwelling A daily cumulative DHW consumption.</p> "> Figure 4
<p>First twenty-eight days of Dwelling B daily cumulative DHW consumption.</p> "> Figure 5
<p>First three weeks of type 1 synthetic time series.</p> "> Figure 6
<p>First three weeks of type 2 synthetic time series.</p> "> Figure 7
<p>Dwelling A MAE between multiple time horizon predictions and their associated ground truth for each task. The prediction time horizons are 0.5 h, 1 h, 2 h, 6 h, 12 h, 18 h, 24 h. The results shown are obtained from an average of 5 runs.</p> "> Figure 8
<p>Dwelling B MAE between multiple time horizon predictions and their associated ground truth for each task. The prediction time horizons are 0.5 h, 1 h, 2 h, 6 h, 12 h, 18 h, 24 h. The results shown are obtained from an average of 5 runs.</p> "> Figure 9
<p>Dwelling A standard deviation associated with the 5 runs whose mean is shown in <a href="#energies-17-04734-f007" class="html-fig">Figure 7</a>.</p> "> Figure 10
<p>Dwelling B standard deviation associated with the 5 runs whose mean is shown in <a href="#energies-17-04734-f008" class="html-fig">Figure 8</a>.</p> "> Figure 11
<p>Example of 6-h prediction versus ground truth for 10 days in the last task of Dwelling A’s DHW consumption with a model continuously trained with the Dream Net algorithm.</p> "> Figure 12
<p>Residual plot (ground truth minus predicted value) of the last task of Dwelling A DHW consumption with a model continuously trained with Dream Net with all tasks except the last one of Dwelling A.</p> "> Figure 13
<p>Residual plot (ground truth minus predicted value) histogram of the last task of Dwelling A DHW consumption with a model continuously trained using the Dream Net algorithm with all tasks except the last one in Dwelling A.</p> "> Figure 14
<p>Example of 24-h prediction versus ground truth for 11 days of d1_test, after training the model on d1_train.</p> "> Figure 15
<p>Example of 24-h prediction versus ground truth for 11 days of d2_test, after training the model on d1_train.</p> "> Figure 16
<p>Example of 24-h prediction versus ground truth for 11 days of d1_test, after first training the model on d1_train and then finetuning the model on d2_train, with no particular continual learning strategy.</p> "> Figure 17
<p>Example of 24-h prediction versus ground truth for 11 days of d2_test, after first training the model on first d1_train and then finetuning the model on d2_train, with no particular continual learning strategy.</p> "> Figure 18
<p>Example of 24-h prediction versus ground truth for 11 days of d1_test, after training the model first on d1_train and then on d2_train using Dream Net.</p> "> Figure 19
<p>Example of 24-h prediction versus ground truth for 11 days of d2_test, after training the model first on d1_train and then on d2_train using Dream Net.</p> "> Figure A1
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling A DHW consumption with a model continuously trained with DER algorithm.</p> "> Figure A2
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling A DHW consumption with a model continuously trained with ER algorithm.</p> "> Figure A3
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling A DHW consumption with a model continuously trained with finetuning setup.</p> "> Figure A4
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling A DHW consumption with a model continuously trained with offline setup.</p> "> Figure A5
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling B DHW consumption with a model continuously trained with DER algorithm.</p> "> Figure A6
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling B DHW consumption with a model continuously trained with ER algorithm.</p> "> Figure A7
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling B DHW consumption with a model continuously trained with Dream Net algorithm.</p> "> Figure A8
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling B DHW consumption with a model continuously trained with the Finetuning setup.</p> "> Figure A9
<p>Example of 6 h prediction versus ground truth for 10 days in the last task of Dwelling B DHW consumption with a model continuously trained with the offline setup.</p> "> Figure A10
<p>Residual plots (ground truth value minus 6 h predicted value) of the last task of Dwelling A DHW consumption with models continuously trained with DER with every task except the last one in Dwelling A.</p> "> Figure A11
<p>Residual plots (ground truth value minus 6 h predicted value) of the last task of Dwelling A DHW consumption with models continuously trained with ER with every task except the last one in Dwelling A.</p> "> Figure A12
<p>Residual plots (ground truth value minus 6 h predicted value) of the last task of Dwelling A DHW consumption with models continuously trained with finetuning setting with every task except the last one in Dwelling A.</p> "> Figure A13
<p>Residual plots (ground truth value minus 6 h predicted value) of the last task of Dwelling A DHW consumption with models continuously trained with offline setting with every task except the last one in Dwelling A.</p> "> Figure A14
<p>Example of 24 h prediction versus ground truth for 11 days of d1_test, after the model has been initially trained on d1_train and then learned d2_train using ER.</p> "> Figure A15
<p>Example of 24 h prediction versus ground truth for 11 days of d2_test, after the model has been initially trained on d1_train and then learned d2_train using ER.</p> "> Figure A16
<p>Example of 24 h prediction versus ground truth for 11 days of d1_test, after the model has been initially trained on d1_train and then learned d2_train using DER.</p> "> Figure A17
<p>Example of 24 h prediction versus ground truth for 11 days of d2_test, after the model has been initially trained on d1_train and then learned d2_train using DER.</p> ">
Abstract
:1. Introduction
- The first family is the parameter isolation method. Parameter isolation methods rely on freezing neurons associated with previous tasks and adding new neurons to each new task [19].
- The second is regularization-based methods. This method focuses on minimizing the change in the weight that carries more information. For example, Elastic Weight Consolidation [20] uses the Fisher information to determine weights that carry more information.
- The third family is replay methods, where some previously seen examples or pseudo-examples, i.e., examples belonging to the same distribution as previously seen examples, are learned along with new examples in order to preserve the knowledge from previous tasks. Replay methods have shown the most promising results [18,21].
2. Materials and Methods
2.1. Related Works
2.2. Prediction Model Description
2.3. Datasets Description
2.4. Experiment Description—Real Dwelling Data
- -
- Finetuning, in which the model is updated on each task without any special strategies to retain knowledge from previous tasks,
- -
- One-month learning, in which the model is trained on the first task and never updated thereafter,
- -
- Three-month learning, in which the model is trained on the first three tasks and never updated thereafter,
- -
- Offline, in which the model is trained from scratch using both the current task data and all previous task data. This baseline is the upper bound for any continual learning strategy because it uses all previously seen data.
2.5. Experiment Description—Synthetic Data
- -
- We first train a prediction model using d1_train;
- -
- We test this model against both d1_test and d2_test;
- -
- We train the model with different learning strategies, Dream Net, Dark Experience Replay, Finetuning on d2_train. Note that this situation should lead to catastrophic forgetting on d1_test without a continual learning algorithm;
- -
- We test the models associated with the different strategies against d1_test and d2_test to determine both whether the old data with weeks of type 1 can be accurately predicted by the models and whether the new data, with weeks of type 2, can also be accurately predicted by the model.
3. Results
3.1. Prediction Model Hyperparameters Setting
3.2. Results with Real DHW Consumption of Individual Dwellings
3.3. Results for the Synthetic Dataset
4. Discussion
5. Conclusions
- The implemented continual learning algorithms, Experience Replay, Dark Experience Replay and Dream Net, can tackle catastrophic forgetting for prediction models based on transformers.
- Successively learning tasks from the two studied real dwellings data shows the importance of updating the model during the use of the prediction model.
- No update strategy is consistently better than the other with the real DHW consumption data we have: even the offline setting, which uses all previously seen data to predict the next task, can be outperformed by other strategies. This suggests the counterintuitive fact that a high degree of model stability can, in some cases, reduce time series forecasting performance.
- Continual learning algorithms significantly improve prediction performance when brutal changes occur, as demonstrated in the case of the synthetic dataset, which implements two separate patterns learned sequentially.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix B
Appendix C
References
- Mainsant, M.; Solinas, M.; Reyboz, M.; Godin, C.; Mermillod, M. Dream Net: A Privacy Preserving Continual Learning Model for Face Emotion Recognition. In Proceedings of the 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), Nara, Japan, 28 September–1 October 2021. [Google Scholar]
- Hoang Vuong, P.; Tan Dat, T.; Khoi Mai, T.; Hoang Uyen, P.; The Bao, P. Stock-Price Forecasting Based on XGBoost and LSTM. Comput. Syst. Sci. Eng. 2022, 40, 237–246. [Google Scholar] [CrossRef]
- Zhang, L.; Bian, W.; Qu, W.; Tuo, L.; Wang, Y. Time Series Forecast of Sales Volume Based on XGBoost. J. Phys. Conf. Ser. 2021, 1873, 012067. [Google Scholar] [CrossRef]
- Liu, Z.; Wu, D.; Liu, Y.; Han, Z.; Lun, L.; Gao, J.; Jin, G.; Cao, G. Accuracy Analyses and Model Comparison of Machine Learning Adopted in Building Energy Consumption Prediction. Energy Explor. Exploit. 2019, 37, 1426–1451. [Google Scholar] [CrossRef]
- Lomet, A.; Suard, F.; Chèze, D. Statistical Modeling for Real Domestic Hot Water Consumption Forecasting. Energy Procedia 2015, 70, 379–387. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Lipton, Z.C.; Berkowitz, J.; Elkan, C. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
- Koprinska, I.; Wu, D.; Wang, Z. Convolutional Neural Networks for Energy Time Series Forecasting. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Wang, K.; Li, K.; Zhou, L.; Hu, Y.; Cheng, Z.; Liu, J.; Chen, C. Multiple Convolutional Neural Networks for Multivariate Time Series Prediction. Neurocomputing 2019, 360, 107–119. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Ahmed, S.; Nielsen, I.E.; Tripathi, A.; Siddiqui, S.; Ramachandran, R.P.; Rasool, G. Transformers in Time-Series Analysis: A Tutorial. Circuits Syst. Signal Process. 2023, 42, 7433–7466. [Google Scholar] [CrossRef]
- Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in Time Series: A Survey. arXiv 2023, arXiv:2202.07125. [Google Scholar]
- Compagnon, P.; Lomet, A.; Reyboz, M.; Mermillod, M. Domestic Hot Water Forecasting for Individual Housing with Deep Learning. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases; Koprinska, I., Mignone, P., Guidotti, R., Jaroszewicz, S., Fröning, H., Gullo, F., Ferreira, P.M., Roqueiro, D., Ceddia, G., Nowaczyk, S., et al., Eds.; Communications in Computer and Information Science; Springer Nature: Cham, Switzerland, 2023; Volume 1753, pp. 223–235. ISBN 978-3-031-23632-7. [Google Scholar]
- French, R.M. Catastrophic Forgetting in Connectionist Networks. Trends Cogn. Sci. 1999, 3, 128–135. [Google Scholar] [CrossRef] [PubMed]
- van de Ven, G.M.; Tolias, A.S. Three Scenarios for Continual Learning. arXiv 2019, arXiv:1904.07734. [Google Scholar]
- Zeno, C.; Golan, I.; Hoffer, E.; Soudry, D. Task Agnostic Continual Learning Using Online Variational Bayes. arXiv 2019, arXiv:1803.10123. [Google Scholar]
- De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; Tuytelaars, T. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3366–3385. [Google Scholar] [CrossRef]
- Rusu, A.A.; Rabinowitz, N.C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; Hadsell, R. Progressive Neural Networks. arXiv 2022, arXiv:1606.04671. [Google Scholar]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming Catastrophic Forgetting in Neural Networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef]
- Bagus, B.; Gepperth, A. An Investigation of Replay-Based Approaches for Continual Learning. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–9. [Google Scholar]
- Rolnick, D.; Ahuja, A.; Schwarz, J.; Lillicrap, T.P.; Wayne, G. Experience Replay for Continual Learning. arXiv 2019, arXiv:1811.11682. [Google Scholar]
- Buzzega, P.; Boschini, M.; Porrello, A.; Abati, D.; Calderara, S. Dark Experience for General Continual Learning: A Strong, Simple Baseline. arXiv 2020, arXiv:2004.07211. [Google Scholar]
- Wang, L.; Zhang, X.; Su, H.; Zhu, J. A Comprehensive Survey of Continual Learning: Theory, Method and Application. arXiv 2024, arXiv:2302.00487. [Google Scholar] [CrossRef] [PubMed]
- Pham, Q.; Liu, C.; Sahoo, D.; Hoi, S.C.H. Learning Fast and Slow for Online Time Series Forecasting. arXiv 2022, arXiv:2202.11672. [Google Scholar]
- Mainsant, M.; Mermillod, M.; Godin, C.; Reyboz, M. A Study of the Dream Net Model Robustness across Continual Learning Scenarios. In Proceedings of the 2022 IEEE International Conference on Data Mining Workshops (ICDMW), Orlando, FL, USA, 28 November–1 December 2022; pp. 824–833. [Google Scholar]
- Li, K.-H. Reservoir-Sampling Algorithms of Time Complexity O (n (1 + log(N/n))). ACM Trans. Math. Softw. 1994, 20, 481–493. [Google Scholar] [CrossRef]
- Jeeveswaran, K.; Bhat, P.; Zonooz, B.; Arani, E. BiRT: Bio-Inspired Replay in Vision Transformers for Continual Learning. arXiv 2023, arXiv:2305.04769. [Google Scholar]
Symbols | Names & Description |
---|---|
Number of DHW consumption of day j of the week () | |
Amplitude of the i-th consumption of the j-th day of the week () | |
Standard deviation of the noise for amplitude for the i-th consumption of the j-th day of the week | |
Random increment for amplitude of the i-th consumption of the j-th day of the week, | |
Slope of the i-th consumption of the j-th day of the week | |
Centre of the i-th consumption of the j-th day of the week | |
Standard deviation of the noise for centre for the i-th consumption of the j-th day of the week | |
Random increment for centres of the i-th consumption of the j-th day of the week, | |
Sigmoid function, | |
Time index |
Parameters | of Week of Type 1 | of Week of Type 2 |
---|---|---|
3 | 3 | |
2 L, if 0 L, else | 2 L, if 0 L, else | |
0.1 | 0.1 | |
10 | 10 | |
0.1 | 0.1 | |
Hyperparameters | Tested Value | Optimal Value from Grid Search |
---|---|---|
Key dimension | [1, 2, 4, 8] | 8 |
Number of heads | [1, 2, 4, 8] | 4 |
Penultimate_ratio | [1, 2, 4, 8] | 8 |
Dream Net Hyperparameters | Tested Value | Optimal Value from Grid Search |
---|---|---|
Epochs | [30, 50, 70] | 70 |
Reinjection number | [2, 5, 7] | 7 |
Real example batch size | [32, 64, 128] | 32 |
Pseudo example batch size | [128, 256, 512] | 128 |
DER/ER Hyperparameters | Tested Value | Optimal Hyperparameters for ER | Optimal Hyperparameters for DER |
---|---|---|---|
Epochs | [30, 50, 70] | 70 | 70 |
Buffer size | [500 | 500 | 500 |
[0, 0.5, 1] | 0 | 0.5 | |
[0, 0.5, 1] | 0.5 | 0.5 | |
Batch size of current task examples | [50, 200] | 50 | 50 |
Batch size of examples from the buffer | [50, 200] | 200 | 200 |
Continual Learning Algorithm | Mean MAE on Dwelling A | MAE Standard Deviation on Dwelling A | Mean MAE on Dwelling B | MAE Standard Deviation on Dwelling B |
---|---|---|---|---|
Dream Net | 24.805 | 6.929 | 21.559 | 6.737 |
DER | 25.084 | 7.744 | 21.457 | 7.328 |
ER | 25.041 | 6.967 | 21.492 | 7.352 |
Finetuning | 24.423 | 6.822 | 21.351 | 6.212 |
Offline | 22.695 | 7.449 | 19.365 | 5.377 |
First month training (MAE on second month excluded) | 35.513 | 7.546 | 35.950 | 5.736 |
First three months training (MAE on second to fourth month excluded) | 26.376 | 9.249 | 29.458 | 6.617 |
Dream Net Hyperparameters | Tested Value | Optimal Value from Grid Search |
---|---|---|
Epochs | [30, 50, 70] | 50 |
Reinjection number | [2, 5, 7] | 7 |
Real example batch size | [32, 64, 128] | 64 |
Pseudo example batch size | [128, 256, 512] | 512 |
DER/ER Hyperparameters | Tested Value | Optimal Hyperparameters for ER | Optimal Hyperparameters for DER |
---|---|---|---|
Epochs | [30, 50, 70] | 50 | 70 |
Buffer size | [500] | 500 | 500 |
[0, 0.5, 1] | 0 | 0.5 | |
[0, 0.5, 1] | 0.5 | 1 | |
Batch size of current task examples | [50, 200] | 200 | 200 |
Batch size of examples from the buffer | [50, 200] | 200 | 200 |
Learning Setup | MAE on d1_val | MAE on d1_val Standard Deviation | MAE on d2_val | MAE on d2_val Standard Deviation |
---|---|---|---|---|
d1_train only | 0.1517 | 0.007415 | 0.9064 | 0.01876 |
Joint | 0.1595 | 0.0028 | 0.1785 | 0.0044 |
Finetuning | 0.6796 | 0.0354 | 0.1344 | 0.0012 |
Dream Net | 0.4266 | 0.0622 | 0.4578 | 0.0288 |
DER | 0.1933 | 0.0052 | 0.2386 | 0.0039 |
ER | 0.2483 | 0.0092 | 0.2411 | 0.0054 |
Learning Setup | MAE on d1_Test | MAE on d1_Test Standard Deviation | MAE on d2_Test | MAE on d2_Test Standard Deviation |
---|---|---|---|---|
d1_train only | 0.1502 | 0.0064 | 0.9131 | 0.0482 |
Joint | 0.1413 | 0.0044 | 0.1579 | 0.0062 |
Finetuning | 0.6874 | 0.0102 | 0.1162 | 0.0037 |
Dream Net | 0.4050 | 0.0323 | 0.4071 | 0.0258 |
DER | 0.2117 | 0.0066 | 0.2299 | 0.0040 |
ER | 0.2415 | 0.00188 | 0.2034 | 0.0054 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bayle, R.; Reyboz, M.; Lomet, A.; Cook, V.; Mermillod, M. Continuously Learning Prediction Models for Smart Domestic Hot Water Management. Energies 2024, 17, 4734. https://doi.org/10.3390/en17184734
Bayle R, Reyboz M, Lomet A, Cook V, Mermillod M. Continuously Learning Prediction Models for Smart Domestic Hot Water Management. Energies. 2024; 17(18):4734. https://doi.org/10.3390/en17184734
Chicago/Turabian StyleBayle, Raphaël, Marina Reyboz, Aurore Lomet, Victor Cook, and Martial Mermillod. 2024. "Continuously Learning Prediction Models for Smart Domestic Hot Water Management" Energies 17, no. 18: 4734. https://doi.org/10.3390/en17184734