Abstract
This paper introduces a novel approach for occupancy estimation in smart buildings. In particular, we focus on the challenging yet common situation where the amount of training data is small and imbalanced (i.e. the classes are not approximately equally represented). Our model is based on two parts namely predictive modelling, performed via the inverted Dirichlet mixture (IDMM), and an over-sampling approach that we propose. The first part, in which the main goal is to tackle the small training data problem, concerns the calculation of the predictive distribution of the IDMM by marginalizing over its parameters, with their posterior distributions, which are estimated by a Bayesian variational inference approach that we develop. Based on over-sampling, the second part can be viewed as a complement to tackling the imbalanced domains problem. Extensive experiments and simulations involving synthetic data as well as real data extracted from smart buildings sensors show the merits of our statistical framework.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility
Data could be made available on reasonable request.
Notes
It is noteworthy that when the training set become sufficiently large, the posterior distribution variance decreases and then the predictive distribution, which can be viewed as an average over model’s parameters (Snelson and Ghahramani 2005), could be approximated by \(f({\textbf {x}}|{\hat{\theta }})\), where \({\hat{\theta }}\) is a point estimate (obtained via maximum a posteriori or expectation propagation or variational Bayes, for instance) (Gelman et al. 1996; Sinharay and Stern 2003) .
Data generation techniques themselves can be categorized into two groups (Branco et al. 2016). The first group of approaches introduces perturbations (i.e. producing noisy replicates of existing data). The second group is based on interpolating existing data. Our approach belongs to the second group.
References
Ahmad J, Larijani H, Emmanuel R, Mannion M, Javed A (2021) Occupancy detection in non-residential buildings: a survey and novel privacy preserved occupancy monitoring solution. Appl Comput Inf 17(2):279–295
Alawneh L, Alsarhan T, Al-Zinati M, Al-Ayyoub M, Jararweh Y, Lu H (2021) Enhancing human activity recognition using deep learning and time series augmented data. J Ambient Intell Humaniz Comput 12(12):10565–10580
Amayri M, Arora A, Ploix S, Bandhyopadyay S, Ngo QD, Badarla VR (2016) Estimating occupancy in heterogeneous sensor environment. Energy Build 129:46–58
Amayri M, Ploix S, Bouguila N, Wurtz F (2019) Estimating occupancy using interactive learning with a sensor environment: real-time experiments. IEEE Access 7:53932–53944
Bdiri T, Bouguila N (2011) Learning inverted dirichlet mixtures for positive data clustering. In: Kuznetsov SO, Slezak D, Hepting DH, Mirkin BG (eds) Rough sets, fuzzy sets, data mining and granular computing - 13th international conference, RSFDGrC 2011, Moscow, Russia, June 25-27, 2011. Proceedings. Springer, Lecture Notes in Computer Science, Berlin, pp 265–272
Bdiri T, Bouguila N (2011a) Neural information processing - 18th international conference, ICONIP 2011, Shanghai, China, november 13-17, 2011, proceedings. In: Lu B, Zhang L, Kwok JT (eds) Neural Information Processing, vol 7063. Springer, Lecture Notes in Computer Science, Berlin, pp 71–78
Bdiri T, Bouguila N (2012) Positive vectors clustering using inverted dirichlet finite mixture models. Expert Syst Appl 39(2):1869–1882
Bdiri T, Bouguila N (2013) Bayesian learning of inverted dirichlet mixtures for SVM kernels generation. Neural Comput Appl 23(5):1443–1458
Benmansour A, Bouchachia A, Feham M (2016) Multioccupant activity recognition in pervasive smart home environments. ACM Comput Surv 48(3):34:1-34:36
Bentouati B, Khelifi A, Shaheen AM, El-Sehiemy RA (2021) An enhanced moth-swarm algorithm for efficient energy management based multi dimensions OPF problem. J Ambient Intell Humaniz Comput 12(10):9499–9519
Bishop CM (2006) Pattern recognition and machine learning. Information science and statistics. Springer, New York
Bjornstad JF (1990) Predictive likelihood: a review. Stat Sci 5(2):242–254
Bouguila N, Fan W (2020) Mixture models and applications. Springer, Berlin
Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49(2):1–50
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357
Da oca S, Hong T, Langevin J (2018) The human dimensions of energy use in buildings: A review. Renew Sustain Energy Rev 81:731–742
Diethe T, Twomey N, Flach PA (2016) Active transfer learning for activity recognition. In: 24th European Symposium on Artificial Neural Networks, ESANN
Djenouri D, Laidi R, Djenouri Y, Balasingham I (2019) Machine learning for smart building applications: review and taxonomy. ACM Comput Surv 52(2):1–36
Fan W, Bouguila N (2020) Spherical data clustering and feature selection through nonparametric Bayesian mixture models with von mises distributions. Eng Appl Artif Intell 94:103781
Fan W, Bouguila N, Du J, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Trans Neural Netw Learning Syst 30(6):1683–1694
Fan W, Yang L, Bouguila N (2021) Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3128271
Gelman A, Li MX, Stern H (1996) Posterior predictive assessment of model fitness via realized discrepancies. Stat Sinica 6:733–807
Hao C, Chen D (2021) Software/hardware co-design for multi-modal multi-task learning in autonomous systems. In: 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE, pp 1–5
Hossain HMS, Khan MAAH, Roy N (2017) Active learning enabled activity recognition. Pervasive Mob Comput 38:312–330 (IEEE International Conference on Pervasive Computing and Communications (PerCom) 2016)
Huang Q, Hao K (2020) Development of cnn-based visual recognition air conditioner for smart buildings. J Inf Technol Constr 25:361–373
Huang Q, Mao C (2017) Occupancy estimation in smart building using hybrid CO2/light wireless sensor network. J Appl Sci Arts 1(2):5
Huang Q, Ge Z, Lu C (2016) Occupancy estimation in smart buildings using audio-processing techniques. In: International conference on computing in civil and building engineering. Osaka, Japan, pp 1413–1420
Huang Q, Rodriguez K, Whetstone N, Habel S (2019) Rapid internet of things (iot) prototype for accurate people counting towards energy efficient buildings. J Inf Technol Constr 24:1–13
Jaouhari SE, Bouabdallah A, Corici AA (2021) Sdn-based security management of multiple wot smart spaces. J Ambient Intell Humaniz Comput 12(10):9081–9096
Li M, Zhou P, Liu Y, Wang H (2020a) Data-driven predictive probability density function control of fiber length stochastic distribution shaping in refining process. IEEE Trans Autom Sci Eng 17(2):633–645
Li T, Chien Y, Chou C, Liao C, Cheah W, Fu L, Chen CC, Chou C, Chen I (2020b) A fast and low-cost repetitive movement pattern indicator for massive dementia screening. IEEE Trans Autom Sci Eng 17(2):771–783
Ma Z, Leijon A (2011) Approximating the predictive distribution of the beta distribution with the local variational method. In: 2011 IEEE International Workshop on Machine Learning for Signal Processing, pp 1–6
Ma Z, Leijon A, Tan ZH, Gao S (2014) Predictive distribution of the dirichlet mixture model by local variational inference. J Signal Process Syst 74(3):359–374
Manouchehri N, Dalhoumi O, Amayri M, Bouguila N (2020) Variational learning of a shifted scaled dirichlet model with component splitting approach. In: Third International Conference on Artificial Intelligence for Industries, AI4I 2020, Irvine, CA, USA, September 21-23, 2020, IEEE, pp 75–78
Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28(1):92–122
Naser A, Lotfi A, Zhong J (2020) Adaptive thermal sensor array placement for human segmentation and occupancy estimation. IEEE Sens J 21(2):1993–2002
Nasfi R, Amayri M, Bouguila N (2020) A novel approach for modeling positive vectors with inverted dirichlet-based hidden markov models. Knowl-Based Syst 192:105335
Nguyen TA, Aiello M (2013) Energy intelligent buildings based on user activity: a survey. Energy Build 56:244–257
Oldewurtel F, Sturzenegger D, Morari M (2013) Importance of occupancy information for building climate control. Appl Energy 101:521–532
Rabie AH, Saleh AI, Ali HA (2021) Smart electrical grids based on cloud, IOT, and big data technologies: state of the art. J Ambient Intell Humaniz Comput 12(10):9449–9480
Sefidpour A, Bouguila N (2012) Spatial color image segmentation based on finite non-gaussian mixture models. Expert Syst Appl 39(10):8993–9001
Shao W, Ge Z, Yao L, Song Z (2020) Bayesian nonlinear gaussian mixture regression and its application to virtual sensing for multimode industrial processes. IEEE Trans Autom Sci Eng 17(2):871–885
Siirtola P, Röning J (2021) Context-aware incremental learning-based method for personalized human activity recognition. J Ambient Intell Humaniz Comput 12(12):10499–10513
Sinharay S, Stern HS (2003) Posterior predictive model checking in hierarchical models. J Stat Plann Inference 111(1):209–221
Snelson E, Ghahramani Z (2005) Compact approximations to Bayesian predictive distributions. In: Proceedings of the 22nd International Conference on Machine Learning, Association for Computing Machinery, New York, NY, USA, pp 840–847
Tiao GG, Cuttman I (1965) The inverted dirichlet distribution with applications. J Am Stat Assoc 60(311):793–805
Tirdad P, Bouguila N, Ziou D (2015) Variational learning of finite inverted dirichlet mixture models and applications. In: Laalaoui Y, Bouguila N (eds) Artificial intelligence applications in information and communication technologies, studies in computational intelligence, vol 607. Springer, pp 119–145
Viard K, Fanti MP, Faraut G, Lesage JJ (2020) Human activity discovery and recognition using probabilistic finite-state automata. IEEE Trans Autom Sci Eng 17:2085–2096
Wang J, Zhao C (2020) A gaussian feature analytics-based dissim method for fine-grained non-gaussian process monitoring. IEEE Trans Autom Sci Eng 17:2175–2181
Yan Y, Luh PB, Pattipati KR (2020) Fault prognosis of key components in hvac air-handling systems at component and system levels. IEEE Trans Autom Sci Eng 17:2145–2153
Yang Y, Hu G, Spanos CJ (2020) Hvac energy cost optimization for a multizone building via a decentralized approach. IEEE Trans Autom Sci Eng 17:1950–1960
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Third IEEE International Conference on Data Mining, pp 435–442
Zheng J, Lu C, Hao C, Chen D, Guo D (2020) Improving the generalization ability of deep neural networks for cross-domain visual recognition. IEEE Trans Cognit Dev Syst 13(3):607–620
Acknowledgements
The completion of this research was made possible thanks to Natural Sciences and Engineering Research Council of Canada (NSERC), the “Nouveaux arrivants Université Grenoble Alpes Grenoble INP - UGA Â\(\gg \)/G-SCOP” program and the National Natural Science Foundation of China (61876068). The authors would like to thank the associate editor and reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proof of equation 11
Appendix: Proof of equation 11
The logarithm of the Multivariate-Inverse-Beta has been proved to be concave (Ma et al. 2014). Thus, the following inequality can be easily obtained by first order Taylor expansion
where \(\tilde{\alpha }_{d}, k=1,2,...,D+1\) is any point from the posterior distribution. Taking the exponential of both sides, we have
By substituting (24) into (7) and with some mathematical manipulations, we can obtain the following upper-bound
For simplicity let’s denote
where \(d = 1,2,...,D+1\). Thus, the integration in Eq. 25 has a same form as Gamma function and could be reduced to
Here, we attempt \({\textbf {G}}(x_{d},\tilde{\varvec{\alpha }}) > 0\) for any d because the situation of \({\textbf {G}}(x_{d},\tilde{\varvec{\alpha }}) \le 0\) is unsolvable. Finally, the analytically tractable form of finite upper-bound of the predictive distribution is
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, J., Amayri, M., Najar, F. et al. Occupancy estimation in smart buildings using predictive modeling in imbalanced domains. J Ambient Intell Human Comput 14, 10917–10929 (2023). https://doi.org/10.1007/s12652-022-04359-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-04359-x